Here’s something you and I already know: Spoke is far from perfect. Sometimes you get a search result that has bad or outdated info. Maintaining data quality at a very large scale is a problem that bedevils all “people-search” sites. Recently
Seth Grimes wrote a very interesting article taking us – in particular – to task for this. He did a search for a friend of his, Neil Raden, and got results that incorrectly identified the firm he worked for (Hired Brains Magazine is actually Hired Brains Consultancy) and the dates he worked for another company. Seth was totally right to point out flaws in our product. However, it’s worth noting the results from a couple of other sites he compared us to.
- One lists 19 different people named Neil Raden – none of which are connected to anything called Hired Brains.
- The other listed just one person and if you click on the related search marked “works for Hired Brains Inc.” it tells you nothing. If you want further contact information you have to sign up with another service.
Our results – assuming you are a Spoke member and not just using the public search results -- actually provide the information to find and contact the Neil Raden in question. Further, by clicking on our web results tab a Spoke member gets links to a large number of articles by him. I want to make it clear, we are not defending our having data that is even slightly incorrect.
Data quality is our most important issue and an area which we have improved on. Neither Seth nor we are satisfied with where we are right now.
When Spoke opened its network less than two years ago, we didn't expect the success we are currently undergoing in term of community involvement. Our community is updating or adding about 350,000 new profiles and contacts every day. This is more than ten times the data we were processing daily at the beginning of last year. As the number of participants in the Spoke community increases (it’s already at 55 million profiles), our data continues to grow. Our infrastructure for data processing has had trouble keeping pace with our growth, hence the data quality issue. We want to solve this issue once and for all. To that end and in addition to implementing the most advanced de-duplication technologies, we are currently migrating our data processing technology to the cloud. This is letting us to scale our capacity dynamically to handle increased traffic, data and more.
What do you think? Any complaints, comments or successes? Please don't hesitate to let us know.