How ReTweets Spread: The Epidemiology of Viral Messaging on Twitter

Now that my ReTweet mapping system is functioning, I’m able to start compiling more granular data on the actual dynamics of the spread of ReTweet streams.

First, I’ll start with some simple averages. For the first 3 numbers –depth, users and Tweets, I’m looking at entire ReTweet streams, that is the whole tree, starting with the original Tweet and all of the subsequent ReTweets. What we see here is that the large majority of ReTweet streams only contain 2 levels of depth, that is ReTweets of the initial Tweet do not themselves produce further ReTweets. They also tend to only include two participating users (the original Tweeter and the ReTweeter) and two individual Tweets. From this, we begin to understand that most RT streams are merely one user ReTweeting another, and never go any further.


Average Value
Depth 2.09
Users 2.41
Tweets 2.44

The next data point I’m looking at is the ReTweets per Follower (RTpF) ratio for the users involved in streams I’ve indexed (just under 20,000 users). The graph below shows the distribution of RTpF in the top 9000 most followed users in my database, I’ve graphed the actual distribution line in blue, with a 30-point moving average over it in black.

Here we see that while most users had an RTpF of under 1% in my dataset, some users showed much larger ratios, possibly indicating that there are a class of users who are more “ReTweetable” than others. In the future, as I have more data indexed, I plan to release a list of those users with the highest RTpF ratios.

To explore the followers to ReTweet issue a bit more, I then analyzed correlations between followers and stream depth and total Tweets. I used two follower numbers, the first is the number of followers the “root” level of the stream had, that is how many people were potentially exposed to the “seed” Tweet. The second follower number is the combined total of followers of every user who participated in each stream.

While we must remain cautious not to assume a causal relationship between these numbers, it does become clear that there is no significant correlation between either follower number and the depth of a stream. There is on the other hand a significant, though weak, positive correlation between the number of users exposed to a Tweet and the number of times it was ReTweeted. What this (and the distribution graph above) tells me is that while users who have more followers get ReTweeted more often, the number of followers plays a less-than-expected role in predicting how widely something is ReTweeted. I expect to find that the actual content of Tweets explains more of its “ReTweetability”.


Values Correlation
Seed Followers to Total Tweets .226
Seed Followers to Depth .029
Total Followers to Total Tweets .383
Total Followers to Depth .132

The last data point I looked at in this stage of research are average reproduction rates, that is how many ReTweets in turn triggered further ReTweeting. This is comparable to the biological Reproduction rate (R0) concept in that it represents the average number of additional infections a single case of infection results in.

Of those streams with 2 or more levels, only 7.57% eventually gain an additional level, yet, of those streams with 3 or more levels, nearly 11.5% grow another level. This trend continues out to the 5th level (I did index some streams with more than 5 levels, but not enough to generate any significant data). The more levels a ReTweet stream has, the more it is likely to accumulate.

What this may indicate is that social proof (or imitation more specifically) plays a role in a user’s decision to ReTweet. The more users a Twitterer sees ReTweet something, the more likely they are themselves to ReTweet it. Another factor in the decision to ReTweet that this data point (as well as the previously noted higher occurrence rate of the “please” call-to-action) may be highlighting is that when the act ReTweeting is called to a user’s attention, they may be more likely to ReTweet.

Reproduction Rates by Depth

Depth Reproduction Rate
2 7.57%
3 11.47%
4 22.31%
5 48.44%

Look for more posts on this subject, as I’ll be developing more functionality into my ReTweet tools and I’ll also start investigating content-based correlations, that is what are the factors of the content of a Tweet that make it more or less likely to be ReTweeted.

, ,