Monday, January 30, 2012

Behind the Scenes: The Technology of the New Spoke

As you know, Spoke recently re-launched as a moderated wiki of business information, and it is quite different from our previous service. We still have many of the same company and person profiles, but how you interact with the information is brand new. We now allow you, the community, to update and edit the information, either as a registered user or anonymously. When we began this project, a little over one year ago, we knew that the end product would be different enough that we would have to build this project from scratch. This posed an important question to be answered: on what technologies do we build the new Spoke web application?

The infrastructure for the previous version of Spoke can be summed quickly as Java, Struts 1.x, MySQL, Microsoft SQL Server, and a home-grown service-oriented architecture (SOA) infrastructure. We also had a codebase that was almost 9 years old. Since the new application had to be built from scratch, we had the opportunity to choose a new platform for development.

Our first decision was to choose a new web framework. We explored a few frameworks such as Scala on Lift, Python on Django, Java on Spring MVC, and Ruby on Rails. Ruby on Rails, in the end, was the clear choice for our team. Despite not having experience with Ruby development, we felt that the framework had the focus on rapid development that we desired. Furthermore, both Ruby and Rails have great online documentation, several books, numerous blogs , and a large, active community.

Coming from a Java background, it took us a little while to get used to developing with the Ruby programming language and the Rails framework. However, after a few weeks we started to reap the benefits of Ruby on Rails and appreciated the ease of development over using Java frameworks like Struts and Hibernate.

There is no arguing that the Java development community is expansive, has plenty of books, good documentation, and several web frameworks. Although we could have chosen Java again, Rails has proven to be excellent for rapid web development while Java has lagged behind in this regard. There are other things that are just easier with Ruby as well. Java has tools such as Maven to help manage dependencies and pull in new libraries, but it does not compare to the ease of using the Ruby gems system. Furthermore, there seems to be a gem (i.e., a Ruby library) for just about everything. Need authentication and login for your website?, There's a gem for that: devise. Authorization? There's a gem for that: cancan. Without such a plethora of Ruby gem libraries, we would not have been able to complete this project as quickly as we did.

After we decided upon a web framework, the next key decision our team had to make was what database to use. We needed a database that is web scalable, easy to maintain, and suited to our read heavy access pattern. For the previous version of Spoke, we were primarily using MS SQL Server and MySQL. We did, however, deploy one feature on the previous Spoke site using MongoDB in order to get a feel for using it in production. The test was a success and drove our decision to deploy the new Spoke site using MongoDB. Other factors in the decision included the built in ability to shard the database (as needed), replication, schema-free database flexibility, support for Ruby on Rails 3 via . Features aside, the fact that many high traffic internet startups, such as IGN and Foursquare are using Mongo DB eased any concerns we had about deploying this NoSQL database in production.

Our next decision was where to host the new site. We're a small team, and the previous incarnation of Spoke is still hosted in a co-location facility on hardware, which we own. When a server goes down, with travel time, it can take several hours to get it back up and running. To scale up, it required a lot of lead-time to order new servers, get them provisioned, and deployed into production. In order to alleviate these issues and to supplement our colo servers on the old Spoke site, we were already using Amazon Web Services EC2 and S3 in order to launch new infrastructure and features. When it came time to decide where and how to host the new Spoke web service, the decision was easy: launch the new site 100% on Amazon Web Services.

Our final decision was how to deal with searching the profiles in our system. The old Spoke search was built upon a very old version of the Lucene library as well as home-grown code to manage indexing, searching, swapping indexes, and more. We looked at open source projects such as Solr and Elastic Search, but as a small team we would rather not have to manage yet another large piece of infrastructure. That is when we found IndexTank, which provides search infrastructure as a service. All we had to do was write code to use their APIs for indexing documents and executing searches , and they handled the infrastructure and scaling. The end result is very fast and very good search in the new Spoke system.

As a result of carefully choosing our technology platform, the launch of the new Spoke web application has been successful. The infrastructure and frameworks that we have chosen have been able to scale and handle the same levels of traffic as the previous Spoke application. We fully expect that the combination of Ruby on Rails and MongoDB hosted on Amazon EC2 will further scale to handle our growth throughout the year and beyond.

I hope you have enjoyed this small peek into our new technology infrastructure. Please feel free to ask any questions you may have about it in the comments section below.

No comments:

Post a Comment