HomeReportsToolsPortfolioBioContact
The Short Head is not because of Economics and a PHP ngram tokenizer

If you're new here, you may want to subscribe to my RSS feed, follow me on Twitter or sign up to my email list.

If you want to talk about how I can help with your social & viral marketing campaigns, contact me. Thanks!

In a previous post I talked about the short head and long tail of keyword traffic. The 80/20 rule doesn’t quite apply but some general 20-40/80 rule does. Even though multiple and more targetted keyword phrases don’t cost the searcher anything, keyword traffic still roughly follows a pareto curve.
In his book Chris Anderson asserts the 80/20 rule is enforced by powers of economics. In the music industry the risks and costs to music companies defines what becomes a “hit”. In keywords the only powers driving towards a pareto distrobution are similarities in the way people express themselves. Its only a small slice of expression, a few words used to describe something we want, but the head and tail distribution suggests we all comunicate in strikingly similar ways. Or at least a lot of us do.
Building on my new PHP POS Tagger and my PHP ngram tokenizer I plan to study wikipedia as a corpus to derive some data about distribution curves of the most popular ngrams and how they can relate to keyword selection.

If you liked this post, share it:
  • del.icio.us
  • Reddit
  • StumbleUpon
  • Technorati
  • Digg
  • Facebook
  • LinkedIn
  • Mixx
  • Pownce
  • TwitThis

Contact me to talk about what I can do to help your viral & social marketing campaigns, and don't forget to subscribe to my feed, join my email list or follow me on Twitter to stay up to date.


SPEAK / ADD YOUR COMMENT
Comments are moderated.

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Return to Top