Statistics on Retweet

July 1, 2009

I’m a little late picking up on this but Dana Boyd, Scott Golder and Gilad Lotan of Microsoft Research recently posted a draft version of their study on the retweet function in Twitter. They collected several independent samples of tweets and produced some interesting statistics.

Statistics

One sample consisted of 720,000 random tweets from 437,708 unique Twitter accounts and produced the following statistics:

  • 22% of tweets contained a URL.
  • 36% contained an @reply.
    • 86% of these started with the @reply and were therefore presumed to be direct replies to other users.
  • 5% of tweets contained a #hashtag denoting some topic, event or group.
    • 41% of tweets with a hashtag also contained a URL (which was almost double the overall percentage of tweets with URLs).
  • 3% of tweets were retweets.
    • Of these 88% used the ‘RT’ syntax, 11% used ‘retweet’ and 5% used ‘via’.

Retweet is Messy

Initially, I didn’t think tweets containing ‘via’ should be classified as retweets, particularly if they contained links. Original tweets that acknowledge the source of a link are different from retweets that echo the message of a another user (whether they contain links or not).

However, the authors describe the use cases they encountered for each retweet convention (more than the three listed above) and state convincingly that retweet is a very messy function precisely because so many conventions are used in so many different ways.

Measurement becomes even more complicated by retweets that are shortened, paraphrased or otherwise altered either because users want to add commentary, express something differently and/or are constrained by the 140 character limit.

These difficulties are one reason users might ultimately benefit if gestures like retweet became part of the meta data rather than part of the content of each tweet, but that’s a separate topic.

More Statistics

Another sample (independent of the first) was comprised of 203, 371 tweets from 107,116 unique Twitter accounts and revealed the following:

  • 51% of retweets contained a URL.
  • 11% of retweets contained an encapsulated retweet (i.e. it was a retweet of a retweet).

The paper also includes more data and an excellent discussion of what and why people retweet and Dana et al are accepting feedback before publishing a final version. If you like to analyse social media, I recommend you read the whole thing.

The Allure of Retweet

Retweet is an interesting social gesture for several reasons:

  • The gesture and syntax of retweet was invented by users. It has been subsequently adopted into the functionality of popular Twitter clients but there is still no retweet functionality built into Twitter.com.
  • Users’ ability and desire to retweet and be retweeted in their entirety turns the idea of copyright on it’s head.
  • Retweet is exciting to marketers, advertisers and anyone else wanting to spread messages. The top-level statistics produced by this research indicate that users hoping for retweets might increase their chances by including a link given that 51% of retweets (versus 22% of overall tweets) contained links but a causal relationship wasn’t established.

There will likely be further research into all of these topics (and more) because the motivation to understand them is too strong for there not to be.

Posted on July 1, 2009

blog comments powered by Disqus