Monday, February 9, 2009

Our data quality is good and needs to get better

Here’s something you and I already know: Spoke is far from perfect. Sometimes you get a search result that has bad or outdated info. Maintaining data quality at a very large scale is a problem that bedevils all  “people-search” sites. Recently Seth Grimes wrote a very interesting article taking us – in particular – to task for this. He did a search for a friend of his, Neil Raden, and got results that incorrectly identified the firm he worked for (Hired Brains Magazine is actually Hired Brains Consultancy) and the dates he worked for another company. Seth was totally right to point out flaws in our product. However, it’s worth noting the results from a couple of other sites he compared us to.

  • One lists 19 different people named Neil Raden – none of which are connected to anything called Hired Brains.

  • The other listed just one person and if you click on the related search marked “works for Hired Brains Inc.” it tells you nothing. If you want further contact information you have to sign up with another service.

Our results – assuming you are a Spoke member and not just using the public search results -- actually provide the information to find and contact the Neil Raden in question. Further, by clicking on our web results tab a Spoke member gets links to a large number of articles by him. I want to make it clear, we are not defending our having data that is even slightly incorrect.

Data quality is our most important issue and an area which we have improved on. Neither Seth nor we are satisfied with where we are right now.

When Spoke opened its network less than two years ago, we didn't expect the success we are currently undergoing in term of community involvement. Our community is updating or adding about 350,000 new profiles and contacts every day. This is more than ten times the data we were processing daily at the beginning of last year. As the number of participants in the Spoke community increases (it’s already at 55 million profiles), our data continues to grow. Our infrastructure for data processing has had trouble keeping pace with our growth, hence the data quality issue. We want to solve this issue once and for all. To that end and in addition to implementing the most advanced de-duplication technologies, we are currently migrating our data processing technology to the cloud. This is letting us to scale our capacity dynamically to handle increased traffic, data and more.

What do you think? Any complaints, comments or successes? Please don't hesitate to let us know.


  1. Thanks for the post. I would agree that your data quality needs improvement. This, after I've noticed interesting (or is it suspect) information in search results.

    But I've just recently had an opportunity to encounter your support. If my experiences over the last week are any indication of your general support quality, then you might have bigger concerns to worry about than data quality. Support is terrible.

    I'm hoping it's either an extremely isolated incident or April Fool's just rolled around about two months earlier than usual. But it's been a week and I'm yet to get anyone to resolve my issue.

    I'll say this though- I've received two responses to two support emails- the subsequent one seemingly more explanatory than the first email sent. And both responses are exactly the same- word for word.

  2. I have a complaint. Why is it so difficult to get a hold of a human being at Spoke? I have tried using the support request page and have gotten no response. The phone number just directs folks back to the website. Is this company going out of business or is the customer service just that bad?


  3. First, thanks for your response. I can understand your frustration and we are genuinely sorry about it. Because we are a small (but thriving!) company, we don't have the resources to immediately address all support requests. We have millions of visitors every month and we try to get back to everyone as quickly as possible. In cases where it seems you can resolve the issue yourself we send an automated response outlining what to do. Even in cases where you can't resolve the issue. the fastest way to get the problem addressed is by entering a request at

  4. I noticed that you have now reduced the # of viewable names in the free account from 25 to 5. You should probably update your membership levels page to reflect this.

    My 2 cents - I think this makes people much less likely to try your free service. Given the data integrity issues you admit to in this post - I think users are much less likely to pay for the service without seeing what they will get. Any $ gains you get from this strategy I think will probably be short term from existing free users on the fence about your service.