How ReTweets Spread: The Epidemiology of Viral Messaging on Twitter

Posted on Dec 29th, 2008 Comments

Now that my ReTweet map­ping sys­tem is func­tion­ing, I’m able to start com­pil­ing more gran­u­lar data on the actual dynam­ics of the spread of ReTweet streams. 

First, I’ll start with some sim­ple aver­ages. For the first 3 num­bers –depth, users and Tweets, I’m look­ing at entire ReTweet streams, that is the whole tree, start­ing with the orig­i­nal Tweet and all of the sub­se­quent ReTweets. What we see here is that the large major­ity of ReTweet streams only con­tain 2 lev­els of depth, that is ReTweets of the ini­tial Tweet do not them­selves pro­duce fur­ther ReTweets. They also tend to only include two par­tic­i­pat­ing users (the orig­i­nal Tweeter and the ReTweeter) and two indi­vid­ual Tweets. From this, we begin to under­stand that most RT streams are merely one user ReTweet­ing another, and never go any fur­ther.

Aver­ages

Aver­age Value
Depth 2.09
Users 2.41
Tweets 2.44




The next data point I’m look­ing at is the ReTweets per Fol­lower (RTpF) ratio for the users involved in streams I’ve indexed (just under 20,000 users). The graph below shows the dis­tri­b­u­tion of RTpF in the top 9000 most fol­lowed users in my data­base, I’ve graphed the actual dis­tri­b­u­tion line in blue, with a 30-point mov­ing aver­age over it in black. 

Here we see that while most users had an RTpF of under 1% in my dataset, some users showed much larger ratios, pos­si­bly indi­cat­ing that there are a class of users who are more “ReTweet­able” than oth­ers. In the future, as I have more data indexed, I plan to release a list of those users with the high­est RTpF ratios.

To explore the fol­low­ers to ReTweet issue a bit more, I then ana­lyzed cor­re­la­tions between fol­low­ers and stream depth and total Tweets. I used two fol­lower num­bers, the first is the num­ber of fol­low­ers the “root” level of the stream had, that is how many peo­ple were poten­tially exposed to the “seed” Tweet. The sec­ond fol­lower num­ber is the com­bined total of fol­low­ers of every user who par­tic­i­pated in each stream. 

While we must remain cau­tious not to assume a causal rela­tion­ship between these num­bers, it does become clear that there is no sig­nif­i­cant cor­re­la­tion between either fol­lower num­ber and the depth of a stream. There is on the other hand a sig­nif­i­cant, though weak, pos­i­tive cor­re­la­tion between the num­ber of users exposed to a Tweet and the num­ber of times it was ReTweeted. What this (and the dis­tri­b­u­tion graph above) tells me is that while users who have more fol­low­ers get ReTweeted more often, the num­ber of fol­low­ers plays a less-than-expected role in pre­dict­ing how widely some­thing is ReTweeted. I expect to find that the actual con­tent of Tweets explains more of its “ReTweetability”.

Cor­re­la­tions

Val­ues Cor­re­la­tion
Seed Fol­low­ers to Total Tweets .226
Seed Fol­low­ers to Depth .029
Total Fol­low­ers to Total Tweets .383
Total Fol­low­ers to Depth .132




The last data point I looked at in this stage of research are aver­age repro­duc­tion rates, that is how many ReTweets in turn trig­gered fur­ther ReTweet­ing. This is com­pa­ra­ble to the bio­log­i­cal Repro­duc­tion rate (R0) con­cept in that it rep­re­sents the aver­age num­ber of addi­tional infec­tions a sin­gle case of infec­tion results in. 

Of those streams with 2 or more lev­els, only 7.57% even­tu­ally gain an addi­tional level, yet, of those streams with 3 or more lev­els, nearly 11.5% grow another level. This trend con­tin­ues out to the 5th level (I did index some streams with more than 5 lev­els, but not enough to gen­er­ate any sig­nif­i­cant data). The more lev­els a ReTweet stream has, the more it is likely to accu­mu­late.

What this may indi­cate is that social proof (or imi­ta­tion more specif­i­cally) plays a role in a user’s deci­sion to ReTweet. The more users a Twit­terer sees ReTweet some­thing, the more likely they are them­selves to ReTweet it. Another fac­tor in the deci­sion to ReTweet that this data point (as well as the pre­vi­ously noted higher occur­rence rate of the “please” call-to-action) may be high­light­ing is that when the act ReTweet­ing is called to a user’s atten­tion, they may be more likely to ReTweet.

Repro­duc­tion Rates by Depth

Depth Repro­duc­tion Rate
2 7.57%
3 11.47%
4 22.31%
5 48.44%


Look for more posts on this sub­ject, as I’ll be devel­op­ing more func­tion­al­ity into my ReTweet tools and I’ll also start inves­ti­gat­ing content-based cor­re­la­tions, that is what are the fac­tors of the con­tent of a Tweet that make it more or less likely to be ReTweeted.

  • Very interesting. How does the algorithm deal with the erasing of previous people's names due to the space constraint?
    I especially like your proof of the social proof.
  • Interesting! Just more proof that tweeting is worth the time! (Who cares that it's fun?!?! :) )
  • LOVE this line of thinking. If you need help with content analysis. I'd be willing to participate. There is going to be a very real value in having data to determine what TYPES of tweets are most successfully retweeted. I'd also like to see how requests for R/T's fail.

    Very interesting, very cool!

    Troy
  • I don't think most people realize that a retweet is not seen by many new people, mostly just those who are already following the person you are quoting.

    In other words if @bob says this: "RT @joe something cool" most of bob's followers will NOT see that, just those who are already following both bob and joe. This is the Twitter default that each user has to change in their account if they want to see more of these.
  • @chris that seems to be a common misconception today. Only tweets that begin with @ are hidden, not tweets that begin with rt or retweet and then the @.
  • Very curious about job tweets...wondering if job opportunities are shared more widely among networks than other kinds of tweets. Your analysis is very helpful. Thank you!
  • Chris' comment may rest on flawed assumptions. The repeat in the same friend group (social graph, tribe, whatever) may actually boost the value, because those receiving the tweet know who Dan Z is or whomever the original tweeter is.
  • Great work Dan. I'm wondering how much your queries take into account the different RT, ReTweet formats used?

    For example:
    tweet (via @twittername)
    I tend to use this most often as it is a bit more "readable", and is the default retweet format in Tweetie.

    Applications like Twhirl all you to completely customize your retweet format.

    It has been awhile since I've studied the XML anatomy of a tweet BTW.
  • @Dan: I wasn't aware that any @ messages were hidden (I see plenty in my stream). When/where are they hidden.

    P.S. Love the RT analysis.
  • Good stuff Dan. Love seeing some sociology and some hard data applied to this stuff. Keep it coming.

    Ironically, I found this by Tweet and I'm going to retweet it.

    ~Jim
  • Thanks for this research. It will be very effective in proving to people the power of twitter
  • Two thoughts about your retweet mapper findings:

    1. What makes retweets more likely to be retweeted? (beyond that first retweet)

    2. Does it make sense to separate retweets that contain links vs those that don't? I suspect sharing a link makes retweeting more likely BUT retweets of just ideas without links are also valuable in a different way- what makes them retweet-worthy?
  • Super!

    People must understand how oiwerful Twitter is.

    Thanks for this info!


    Regards,
    André T
  • Another factor is how some Retweets are solicited. Many times people use DM's to request Retweets, either directly or indirectly. This would be tough to factor into the retweet equation. I've received some very aggressive requests and they often end up being retweeted by others quite a bit, even if half-heartedly.
  • Interesting research Dan.
    I see a problem emerging with Tweetbacks that I think you need to be aware of in your research.

    By definition, Tweetbacks that are merely RT's of previously posted content or links - as featured in the comment section of a blog are clearly Spam.

    http://www.tourismkeys.ca/blog/2009/02/is-twitt...

    Tweetbacks inherently give those who retweet posts significant profile on blogs that feature tweetbacks.

    This benefit of visibility for simply pressing a button and associating oneself with a successful blog is the sort viral positive feedback loop we can do without.

    For the blogger, the RT's get their blog more exposure, for the Retweeter the RT gets them exposure for their twitter brand. . . in the process blogs like yours get filled with useless links that do not contribute to the discussion and in fact are nothing more than spam.

    It's a lazy person's way of a Twit pulling on the bloggers coattails.

    What makes RT comments on a blog any different from Spam comments like - "Hey, nice post."

    I look forward to your response. The use of tweetbacks clearly has consequences for your statistical analysis and use of such tools clearly needs sober second thought.

    After all, what's to slow someone from following peeps who's blogs feature tweetbacks and constantly RTing their content to get followers. Seems to me the reward of featuring a tweet on a blog post for a tweet isn't desirable.
  • taa
    wow - you're one really smart guy! You truly are "a social & viral marketing scientist" and I look forward to following you on twitter shortly!

    http://twitter.com/720media
  • It seems to me there are other explanations for the increasing RT reproduction rate with RT depth.

    1. Filtering- only the most retweetable messages propagate deeply.

    2. Network Topology- at higher RT depth, the message accesses a more globally connected social network.
blog comments powered by Disqus