The Punctuation of ReTweets

140 characters doesn’t leave much room for extraneous letters, numbers or symbols, so you might think that punctuation would be sparse in Tweets. But I compared a random sample of over 1 million “normal” Tweets to a sample of over 10 million ReTweets and found that 85.86% of Tweets contain some form of punctuation, and an overwhelming 97.55% of ReTweets do as well.

Of course, the prevailing ReTweet format includes a colon to better display the original Tweet, but even when ignoring this form of punctuation, ReTweets still contain more punctuation than non-ReTweets (93.42% to 83.78%).

I then analyzed the frequency of specific types of punctuation and found that hyphens, periods and colons are the most ReTweetable punctuation, occurring far more commonly in ReTweets than in regular Tweets, while the rarest mark, the semicolon, is the only unReTweetable punctuation mark.