Playing With Technorati and PubSub
By Adrian Sutton
I previously mentioned that I don’t get Technorati. In response to that I got a bunch of good responses so I thought I should play with it more in light of those comments. It seems that Technorati is best used for tracking news as it happens, so the release of the Mac Mini was the perfect test case. To add to the fun I also ran the same search on PubSub to compare the results.
The search itself was something along the lines of “mac mini” OR “mini mac” OR “minimac” or “macmini” so it pretty comprehensively covered any reference to the Mac Mini. Both engines seemed to understand the exact same query string which was nice considering search engines tend to have slightly varying formats for query logic.
Overall, both services were somewhat useful – I got a ton of comments about the Mac Mini and a few of them were worth reading. I’d say I got a reasonably good understanding of what bloggers thought about the Mac Mini. Unfortunately, that understanding won’t translate at all to what matters for the success of the Mac Mini since the target audience for the Mac Mini typically don’t write blogs. Still interesting to view the reactions though.
Improvement 1: Support Language Filtering
The big problem with both engines was that because my search was on a product name I got a lot of results in foreign languages. There really needs to be a way to specify that I only want languages X, Y and Z.
Improvement 2: Fix Character Set Support
The second problem with both engines was that they made a total mess of any character set that wasn’t ASCII. This problem is probably caused more by the fact that bloggers themselves tend to tag the character set incorrectly and made worse by the fact that planet style aggregators are notoriously bad at corrupting character sets. So it’s not going to be easy to fix the problems but there needs to be a fix – it’s very annoying to receive a bunch of question marks as a search response. I think this happened more with PubSub but it occurs with both engines.
Improvement 3 (Technorati Only): Give Meaningful Summaries
There seems to be two key differences in the approaches taken by Technorati and PubSub – Technorati searches the entire page, PubSub only searches the RSS feed. This means that Technorati can find matches that occur outside of the short snippet in the RSS feed but means that PubSub can provide the actual RSS feed entry as the result match (and know that it contains the search term) whereas Technorati try to create their own summary from the page. Technorati’s algorithms for this are really woeful – I don’t recall ever getting a match from Technorati that I understood without opening it in a browser. With PubSub on the other hand, most matches could be read right from my aggregator which makes life manageable when you have that many entries flying past.
Improvement 4: Provide Magic Pixies To Summarize Results
Basically, I was absolutely flooded with matches. I’ve had to unsubscribe from the feeds so I could get some work done. My aggregator updates feeds every 30 minutes or so and every time it updated for 2 or 3 days there would be another hundred updates to read about the Mac Mini. Users could help mitigate this by subscribing to the search feeds in a second aggregator that didn’t grab their attention when new matches come through. That still wouldn’t work well though because when you scanned it at the end of the day you’d have a few thousand results to go through. Maybe Robert Scoble has time and interest to read that much but I certainly don’t. I have no idea how this feature would work but I want it.
I suppose the ideal approach would be to parse all the matches using (as yet unheard of) natural language processing, work out the key points in the discussion and combine them. Then you provide a report like:
I want one: 10000 entries
Not so cheap as you have to buy a monitor (side note: aka people who missed the point): 1000 entries
I hate Apple fanboys: 20000 entries
Each key point could come with a few choice examples and a link to go view the entire list of entries that mentioned that point. I should get an update in my RSS feed for the report either whenever a new entries is discovered or when a new key point is discovered (user chooses which when they conduct the search).
Of course if I knew how to actually achieve this I’d write it and become an instant millionaire. Maybe you just need to use collaborative categorization instead of a magical language processor to work out what the key points are. Then again, the mainstream news has been happily providing these summaries anyway.
Conclusion
These systems would be very useful if it were my product that was being released or if the news item wasn’t the biggest event happening that absolutely everyone just had to comment on. Both systems became unusable purely because they found too much stuff. The best engine however was clearly PubSub because of it’s meaningful RSS feeds. Technorati really need to fix this as it really makes the entire system worthless. Even with low-volume searches I tried Technorati drove me crazy with it’s useless matches (made worse by the fact that it returns a lot of false-positives due to picking up links in sidebars etc despite the fact that the post had nothing to do with the search terms).
On the other hand, Technorati at least provides instant results instead of PubSub’s really weird approach of returning nothing up front and then getting back to you on your query an hour or so later (the length of time taken is exacerbated by the fact that aggregators only check for updates every so often). Technorati then would be my choice of the systems if I wanted to know what people had already said about a topic but didn’t want to follow it over time (this would have been a more effective approach to the Mac Mini conversation), but in reality I’d probably just use Google. PubSub is definitely better for watching conversations over time though.
Now do I have to submit this to Technorati’s feedback email address (and the equivalent that I assume PubSub have) that was mentioned in my last post or is their system good enough to find this and tell them about it for me….