How ReTweets Spread: The Epidemiology of Viral Messaging on Twitter





Now that my ReTweet mapping system is functioning, I’m able to start compiling more granular data on the actual dynamics of the spread of ReTweet streams.

First, I’ll start with some simple averages. For the first 3 numbers –depth, users and Tweets, I’m looking at entire ReTweet streams, that is the whole tree, starting with the original Tweet and all of the subsequent ReTweets. What we see here is that the large majority of ReTweet streams only contain 2 levels of depth, that is ReTweets of the initial Tweet do not themselves produce further ReTweets. They also tend to only include two participating users (the original Tweeter and the ReTweeter) and two individual Tweets. From this, we begin to understand that most RT streams are merely one user ReTweeting another, and never go any further.

Averages

Average Value
Depth 2.09
Users 2.41
Tweets 2.44




The next data point I’m looking at is the ReTweets per Follower (RTpF) ratio for the users involved in streams I’ve indexed (just under 20,000 users). The graph below shows the distribution of RTpF in the top 9000 most followed users in my database, I’ve graphed the actual distribution line in blue, with a 30-point moving average over it in black.

Here we see that while most users had an RTpF of under 1% in my dataset, some users showed much larger ratios, possibly indicating that there are a class of users who are more “ReTweetable” than others. In the future, as I have more data indexed, I plan to release a list of those users with the highest RTpF ratios.

To explore the followers to ReTweet issue a bit more, I then analyzed correlations between followers and stream depth and total Tweets. I used two follower numbers, the first is the number of followers the “root” level of the stream had, that is how many people were potentially exposed to the “seed” Tweet. The second follower number is the combined total of followers of every user who participated in each stream.

While we must remain cautious not to assume a causal relationship between these numbers, it does become clear that there is no significant correlation between either follower number and the depth of a stream. There is on the other hand a significant, though weak, positive correlation between the number of users exposed to a Tweet and the number of times it was ReTweeted. What this (and the distribution graph above) tells me is that while users who have more followers get ReTweeted more often, the number of followers plays a less-than-expected role in predicting how widely something is ReTweeted. I expect to find that the actual content of Tweets explains more of its “ReTweetability”.

Correlations

Values Correlation
Seed Followers to Total Tweets .226
Seed Followers to Depth .029
Total Followers to Total Tweets .383
Total Followers to Depth .132




The last data point I looked at in this stage of research are average reproduction rates, that is how many ReTweets in turn triggered further ReTweeting. This is comparable to the biological Reproduction rate (R0) concept in that it represents the average number of additional infections a single case of infection results in.

Of those streams with 2 or more levels, only 7.57% eventually gain an additional level, yet, of those streams with 3 or more levels, nearly 11.5% grow another level. This trend continues out to the 5th level (I did index some streams with more than 5 levels, but not enough to generate any significant data). The more levels a ReTweet stream has, the more it is likely to accumulate.

What this may indicate is that social proof (or imitation more specifically) plays a role in a user’s decision to ReTweet. The more users a Twitterer sees ReTweet something, the more likely they are themselves to ReTweet it. Another factor in the decision to ReTweet that this data point (as well as the previously noted higher occurrence rate of the “please” call-to-action) may be highlighting is that when the act ReTweeting is called to a user’s attention, they may be more likely to ReTweet.

Reproduction Rates by Depth

Depth Reproduction Rate
2 7.57%
3 11.47%
4 22.31%
5 48.44%


Look for more posts on this subject, as I’ll be developing more functionality into my ReTweet tools and I’ll also start investigating content-based correlations, that is what are the factors of the content of a Tweet that make it more or less likely to be ReTweeted.

If you liked this post, don't forget to subscribe to my RSS feed or my email newsletter so you never miss the science.

{ 19 comments }

Ezra Butler December 29, 2008 at 9:28 am

Very interesting. How does the algorithm deal with the erasing of previous people’s names due to the space constraint?
I especially like your proof of the social proof.

Jessica Routier, IACEZ December 29, 2008 at 1:28 pm

Interesting! Just more proof that tweeting is worth the time! (Who cares that it’s fun?!?! :) )

Troy Peterson December 29, 2008 at 1:31 pm

LOVE this line of thinking. If you need help with content analysis. I’d be willing to participate. There is going to be a very real value in having data to determine what TYPES of tweets are most successfully retweeted. I’d also like to see how requests for R/T’s fail.

Very interesting, very cool!

Troy

Chris Lockwood December 29, 2008 at 1:45 pm

I don’t think most people realize that a retweet is not seen by many new people, mostly just those who are already following the person you are quoting.

In other words if @bob says this: “RT @joe something cool” most of bob’s followers will NOT see that, just those who are already following both bob and joe. This is the Twitter default that each user has to change in their account if they want to see more of these.

Dan Zarrella December 29, 2008 at 1:49 pm

@chris that seems to be a common misconception today. Only tweets that begin with @ are hidden, not tweets that begin with rt or retweet and then the @.

Peopleshark December 29, 2008 at 1:56 pm

Very curious about job tweets…wondering if job opportunities are shared more widely among networks than other kinds of tweets. Your analysis is very helpful. Thank you!

Nathan Ketsdever December 29, 2008 at 5:59 pm

Chris’ comment may rest on flawed assumptions. The repeat in the same friend group (social graph, tribe, whatever) may actually boost the value, because those receiving the tweet know who Dan Z is or whomever the original tweeter is.

Jeremy Mandle December 29, 2008 at 6:35 pm

Great work Dan. I’m wondering how much your queries take into account the different RT, ReTweet formats used?

For example:
tweet (via @twittername)
I tend to use this most often as it is a bit more “readable”, and is the default retweet format in Tweetie.

Applications like Twhirl all you to completely customize your retweet format.

It has been awhile since I’ve studied the XML anatomy of a tweet BTW.

Sean Carmody December 29, 2008 at 6:57 pm

@Dan: I wasn’t aware that any @ messages were hidden (I see plenty in my stream). When/where are they hidden.

P.S. Love the RT analysis.

Jim Tobin at Ignite Social Media December 31, 2008 at 3:47 pm

Good stuff Dan. Love seeing some sociology and some hard data applied to this stuff. Keep it coming.

Ironically, I found this by Tweet and I’m going to retweet it.

~Jim

Erica DeWolf December 31, 2008 at 10:46 pm

Thanks for this research. It will be very effective in proving to people the power of twitter

Brian Carter January 1, 2009 at 5:17 pm

Two thoughts about your retweet mapper findings:

1. What makes retweets more likely to be retweeted? (beyond that first retweet)

2. Does it make sense to separate retweets that contain links vs those that don’t? I suspect sharing a link makes retweeting more likely BUT retweets of just ideas without links are also valuable in a different way- what makes them retweet-worthy?

André T January 2, 2009 at 8:16 am

Super!

People must understand how oiwerful Twitter is.

Thanks for this info!

Regards,
André T

Jesse Luna February 9, 2009 at 11:11 am

Another factor is how some Retweets are solicited. Many times people use DM’s to request Retweets, either directly or indirectly. This would be tough to factor into the retweet equation. I’ve received some very aggressive requests and they often end up being retweeted by others quite a bit, even if half-heartedly.

@toddlucier February 9, 2009 at 11:12 am

Interesting research Dan.
I see a problem emerging with Tweetbacks that I think you need to be aware of in your research.

By definition, Tweetbacks that are merely RT’s of previously posted content or links – as featured in the comment section of a blog are clearly Spam.

http://www.tourismkeys.ca/blog/2009/02/is-twitter-spamming-your-blog/

Tweetbacks inherently give those who retweet posts significant profile on blogs that feature tweetbacks.

This benefit of visibility for simply pressing a button and associating oneself with a successful blog is the sort viral positive feedback loop we can do without.

For the blogger, the RT’s get their blog more exposure, for the Retweeter the RT gets them exposure for their twitter brand. . . in the process blogs like yours get filled with useless links that do not contribute to the discussion and in fact are nothing more than spam.

It’s a lazy person’s way of a Twit pulling on the bloggers coattails.

What makes RT comments on a blog any different from Spam comments like – “Hey, nice post.”

I look forward to your response. The use of tweetbacks clearly has consequences for your statistical analysis and use of such tools clearly needs sober second thought.

After all, what’s to slow someone from following peeps who’s blogs feature tweetbacks and constantly RTing their content to get followers. Seems to me the reward of featuring a tweet on a blog post for a tweet isn’t desirable.

Taa March 17, 2009 at 10:10 pm

wow – you’re one really smart guy! You truly are “a social & viral marketing scientist” and I look forward to following you on twitter shortly!

http://twitter.com/720media

Eric Hellman July 16, 2009 at 7:23 pm

It seems to me there are other explanations for the increasing RT reproduction rate with RT depth.

1. Filtering- only the most retweetable messages propagate deeply.

2. Network Topology- at higher RT depth, the message accesses a more globally connected social network.

Health_Campus August 30, 2009 at 12:46 am

This is quite impressive, I am pleased to read this post, keep posts like this coming, you totally rock!
Cheers
sain-web.com

Social Media Agencies March 26, 2010 at 1:06 pm

Nice Post!
You Just Solve how important ReTweets are?
and
well that article just shows that ReTweets does not lose their importance they are as important as the tweets.

{ 11 trackbacks }