The Short Head is not because of Economics and a PHP ngram tokenizer

In a previous post I talked about the short head and long tail of keyword traffic. The 80/20 rule doesn’t quite apply but some general 20-40/80 rule does. Even though multiple and more targetted keyword phrases don’t cost the searcher anything, keyword traffic still roughly follows a pareto curve.
In his book Chris Anderson asserts the 80/20 rule is enforced by powers of economics. In the music industry the risks and costs to music companies defines what becomes a “hit”. In keywords the only powers driving towards a pareto distrobution are similarities in the way people express themselves. Its only a small slice of expression, a few words used to describe something we want, but the head and tail distribution suggests we all comunicate in strikingly similar ways. Or at least a lot of us do.
Building on my new PHP POS Tagger and my PHP ngram tokenizer I plan to study wikipedia as a corpus to derive some data about distribution curves of the most popular ngrams and how they can relate to keyword selection.

Read More

SEO Don’t Need No Stinkin’ Branding

OK, maybe thats a bit hyperbolic, in fact it definitely is, and not totally true. Establishing an effective brand can do wonders for search traffic, but Brian Clark had this to say about SEO:

There?s no doubt that optimization for better search engine rankings will always be a huge part of the online marketing equation. However, it may be that the top SEO players are finding that pigeonholing themselves with that narrow acronym is not in their best interest.

The SEO skill set emerges organically from more general web development. I know I didn’t start off wanting to be an SEO, but rather by I became one by learning a little design, a little coding, a little writing, a little market research and a whole lot of NLP. So perhaps “pigeonholing” isn’t the best word, maybe “specializing” is better.
The web is the nascent information economy right? The key to that would necessarily be information retrieval engines, its simply the most important way people find what they’re looking for and when they’re looking for something you can offer, then its the most important way to connect with potential customers.
Search doesn’t stalk or interrupts its customers, it lets the user beat a path to its door. Its not the “push” of traditional advertising: banner ads shoving messages down your gullet, its the “pull” of answering your customer’s needs. So I’m not sure I want search marketing lumped in with advertising, in fact I prefer to see it as the user-centric antithesis of advertising.
And yeah, conversion analysis and monetization often comes hand in hand with search-based traffic generation, but so does design and programing, its all just part of the larger skill set of web development.

Read More

Cumulative Percentage Curves of Keyword Niches

As promised, here’s 5 cumulative percentage curves generated from real live keyword data.


The top 24.8% of “lyrics” based keywords account for 80% of the total traffic (337154).


The top 29.5% of “dog” based keywords account for 80% of the total traffic (418969).


The top 34.8% of “personals” based keywords account for 80% of the total traffic (30106).


The top 37.4% of “blog” based keywords account for 80% of the total traffic (48092).


The top 49.3% of “buy” based keywords account for 80% of the total traffic (119031).

One important thing to note is that this data is generated only by sampling the top 1000 keywords that contain the base keyword. If all the keywords were sampled the tail would be longer, but the curve’s shape would be about the same.

Read More

Does the 80/20 Rule Apply to Keyword Traffic?

Thats “The Long Tail” to my left. Its called a Pareto distribution. Chris Anderson, and everyone who lusts for long tail economics (of which I’m one), like to talk about the never ending nature of the tail, about how even way out to the right there is always going to be at least one person. And yeah, for the most part thats true, but at least in search traffic distribution a power-law curve like the Pareto presents a calculable point of diminishing returns.
If you took a set of keywords, each with the number of searches for it per day, and graphed them, with the keywords on the x axis listed with the most popular keywords on the left, and the traffic on the y axis you’d get the top graph. The head and the tail.
That second graph there on the bottom is the cumulative percentage of the traffic. One of the key ideas of this type of keyword traffic distribution is the point where that line crosses 80% of traffic on the y axis, in the keyword niches I studied it happened somewhere between 20% and 40% of the keywords on the x axis. Essentially, what percetange of the total number of keywords account for 80% of the total traffic of the set. My research indicates that for most niches between 20% and 40% of the top keywords account for 80% of the total traffic.

Update: the images weren’t working. I’ll show you some real examples derived from specific niches…

Read More