In a previous post I talked about the short head and long tail of keyword traffic. The 80/20 rule doesn’t quite apply but some general 20-40/80 rule does. Even though multiple and more targetted keyword phrases don’t cost the searcher anything, keyword traffic still roughly follows a pareto curve.
In his book Chris Anderson asserts the 80/20 rule is enforced by powers of economics. In the music industry the risks and costs to music companies defines what becomes a “hit”. In keywords the only powers driving towards a pareto distrobution are similarities in the way people express themselves. Its only a small slice of expression, a few words used to describe something we want, but the head and tail distribution suggests we all comunicate in strikingly similar ways. Or at least a lot of us do.
Building on my new PHP POS Tagger and my PHP ngram tokenizer I plan to study wikipedia as a corpus to derive some data about distribution curves of the most popular ngrams and how they can relate to keyword selection.
If you liked this post, don't forget to subscribe to my RSS feed or my email newsletter so you never miss the science.







Leave a Reply