How ReTweets Spread: The Epidemiology of Viral Messaging on Twitter

Now that my ReTweet mapping system is functioning, I’m able to start compiling more granular data on the actual dynamics of the spread of ReTweet streams.

First, I’ll start with some simple averages. For the first 3 numbers –depth, users and Tweets, I’m looking at entire ReTweet streams, that is the whole tree, starting with the original Tweet and all of the subsequent ReTweets. What we see here is that the large majority of ReTweet streams only contain 2 levels of depth, that is ReTweets of the initial Tweet do not themselves produce further ReTweets. They also tend to only include two participating users (the original Tweeter and the ReTweeter) and two individual Tweets. From this, we begin to understand that most RT streams are merely one user ReTweeting another, and never go any further.


Average Value
Depth 2.09
Users 2.41
Tweets 2.44

The next data point I’m looking at is the ReTweets per Follower (RTpF) ratio for the users involved in streams I’ve indexed (just under 20,000 users). The graph below shows the distribution of RTpF in the top 9000 most followed users in my database, I’ve graphed the actual distribution line in blue, with a 30-point moving average over it in black.

Here we see that while most users had an RTpF of under 1% in my dataset, some users showed much larger ratios, possibly indicating that there are a class of users who are more “ReTweetable” than others. In the future, as I have more data indexed, I plan to release a list of those users with the highest RTpF ratios.

To explore the followers to ReTweet issue a bit more, I then analyzed correlations between followers and stream depth and total Tweets. I used two follower numbers, the first is the number of followers the “root” level of the stream had, that is how many people were potentially exposed to the “seed” Tweet. The second follower number is the combined total of followers of every user who participated in each stream.

While we must remain cautious not to assume a causal relationship between these numbers, it does become clear that there is no significant correlation between either follower number and the depth of a stream. There is on the other hand a significant, though weak, positive correlation between the number of users exposed to a Tweet and the number of times it was ReTweeted. What this (and the distribution graph above) tells me is that while users who have more followers get ReTweeted more often, the number of followers plays a less-than-expected role in predicting how widely something is ReTweeted. I expect to find that the actual content of Tweets explains more of its “ReTweetability”.


Values Correlation
Seed Followers to Total Tweets .226
Seed Followers to Depth .029
Total Followers to Total Tweets .383
Total Followers to Depth .132

The last data point I looked at in this stage of research are average reproduction rates, that is how many ReTweets in turn triggered further ReTweeting. This is comparable to the biological Reproduction rate (R0) concept in that it represents the average number of additional infections a single case of infection results in.

Of those streams with 2 or more levels, only 7.57% eventually gain an additional level, yet, of those streams with 3 or more levels, nearly 11.5% grow another level. This trend continues out to the 5th level (I did index some streams with more than 5 levels, but not enough to generate any significant data). The more levels a ReTweet stream has, the more it is likely to accumulate.

What this may indicate is that social proof (or imitation more specifically) plays a role in a user’s decision to ReTweet. The more users a Twitterer sees ReTweet something, the more likely they are themselves to ReTweet it. Another factor in the decision to ReTweet that this data point (as well as the previously noted higher occurrence rate of the “please” call-to-action) may be highlighting is that when the act ReTweeting is called to a user’s attention, they may be more likely to ReTweet.

Reproduction Rates by Depth

Depth Reproduction Rate
2 7.57%
3 11.47%
4 22.31%
5 48.44%

Look for more posts on this subject, as I’ll be developing more functionality into my ReTweet tools and I’ll also start investigating content-based correlations, that is what are the factors of the content of a Tweet that make it more or less likely to be ReTweeted.

Read More

A Very Beta ReTweet Mapper

I’ve been working on a ReTweet mapping system for a while, in fact, I’ve already published some data I accumulated while building it. The idea is to index all ReTweets and map them to each other so that visual display and programmatic analysis can be done on the structure of viral messaging on Twitter.

Now, I can finally publish a rough beta version of the mapping system. Click on the images to use the features. Please keep in mind: this all very beta still, and rough around the edges.


The ReTweet mapper is the core of the system, it indexes ReTweet streams into hierarchical structures that can be displayed visually as they are here. It also allows for further analysis as seen below.

The search feature allows you to search for ReTweet streams that match keywords, phrases, usernames or links. The search results page shows the original Tweet that started the stream that matched your query. Clicking on the Tweet brings you to the ReTweet map of that stream.

The most ReTweeted page shows a leaderboard of those users who were ReTweeted the most in the last day, hour or week. Clicking on the name of a user leads to a search for that user’s name.

Read More

New Design

Its been a while since this site got a full out, drastic redesign. I think it was about time, and I’m really happy with this new look. If you’re reading this in a feedreader, you should click through and check out the site.

I’d also love to hear what you guys think.

Read More

What’s in a Retweet? The Data Behind Viral Messaging on Twitter

I started collecting ReTweets a few weeks ago and have collected just over 84,000. I’m working on a system that will allow for mapping and analysis of ReTweet streams (sneak peak below), but in building that, I’ve already uncovered some interesting data.

Contrary to what I initially thought, “RT” is used more than 4 times more often than the full word “retweet”.

ReTweets occur at an average rate of around 258 per hour, and show a distinct increase during the work day and early evening.

Retweets contain the word please over 5 times more often than most tweets.

Retweets are generally longer than other tweets.

Almost 70% of ReTweets contain a link.

Tinyurl is overwhelmingly preferred as the URL shortener to use in ReTweets.

Let me know what other data points you’d like to see and I’ll see what I can do.

And here’s a very simple, very rough preview of the mapping tool:

Read More