Introducing the ReTweetability Index

If we want to be able to create contagious Tweets, we have to know what contagious Tweets look like. And I’ve created a new site that allows you to do just that:

The site has a list of the most ReTweetable users, as well as a search feature that allow you to find the most contagious users Tweeting about various topics.

There are 3 major areas where Twitter users can affect the number of ReTweets they get:

  • Followers
  • Tweeting Volume
  • Contagiousness of Content

We know what “more followers” and “more Tweets” look like, providing well-defined targets in those areas, but, until now,there has been no standard measurement of contagiousness.

I’ve looked into the effect that a user’s number of followers and content of their Tweets had on the level of ReTweeting that occurred. Predictably, the number of followers you have will get you more ReTweets, but the correlation isn’t as strong as expected. Certain patterns of common words and phrases do emerge.

Lists that rank users by the simple number of times they are ReTweeted are not displaying a list of those users with the most ReTweetable content. If a user has a large number of followers, or posts a huge amount of content, naturally they’re going to get more ReTweets; however, it is important to note that this isnot due to how contagious his or her Tweets were.

What I’m trying to do with the ReTweetability metric is begin to develop a simple formula upon which the infectiousness of a user’s content can be measured. This algorithm would eliminate the effect of the user’s follower count and Tweeting rate.

The ReTweetability metric I’m using for the index right now is based on the natural logarithm of both the followers and Tweets per day numbers. This is done to compress the range of variation in both numbers, while acounting for the power law shaped graph displayed by the distributions of the two variables.
Prior to using the logarithm, the formula over-penalized users with higher than average followers (around 100) and Tweets per day (around 5), which turns out to be most users.

I’ve also explored the possibility of using the square root of the 2 values; this produces a range smaller than without using the natural logarithm, but larger than with it. I would love to have a discussion about the correct method for this, and I expect some variation in the formula here.

Due to the extremely small result of the formula, I’ve had to multiply it by 10,000,000 to enhance its readability — I would also love feedback on this.