The Short Head is not because of Economics and a PHP ngram tokenizer

Posted on Aug 7th, 2006
Share 

In a previous post I talked about the short head and long tail of keyword traffic. The 80/20 rule doesn’t quite apply but some general 20-40/80 rule does. Even though multiple and more targetted keyword phrases don’t cost the searcher anything, keyword traffic still roughly follows a pareto curve.
In his book Chris Anderson asserts the 80/20 rule is enforced by powers of economics. In the music industry the risks and costs to music companies defines what becomes a “hit”. In keywords the only powers driving towards a pareto distrobution are similarities in the way people express themselves. Its only a small slice of expression, a few words used to describe something we want, but the head and tail distribution suggests we all comunicate in strikingly similar ways. Or at least a lot of us do.
Building on my new PHP POS Tagger and my PHP ngram tokenizer I plan to study wikipedia as a corpus to derive some data about distribution curves of the most popular ngrams and how they can relate to keyword selection.

If you liked this post, don't forget to subscribe to my RSS feed or my email newsletter so you never miss the science.

Leave a Reply

blog comments powered by Disqus

Get my 22 page report full of scientifically proven ways to get more ReTweets by subscribing to my blog via email.

the social media marketing book

Key Posts

Recent Posts

Topics

Blogroll

Copyright © 2010 by Dan Zarrella, social media marketing and viral marketing consultant. All rights reserved. site map

DanZarrella.com, Social & Viral Marketing Scientist