Twitter has been having all kinds of scaling challenges. There have been hundreds, if not thousands, of posts on the subject. Dave Winer pushed an idea for a decentralized Twitter (and has since admitted the power of Twitter is in its centrality). There is a single, simple, reason for Twitter’s challenges – Math is against them.
The facility of communication on the Twitter service is absolutely outstanding. I’ve written extensively about using it to receive an amazing amount of quality information in my series on flow.
I originally questioned the scaling ability of the service prior to SXSW, but when the service held up I went back to the drawing board to make sure my numbers were correct.
Before continuing, let’s establish the basics about the service so the math will make sense…
- Each Twitter account can follow any other Twitter account (bear with me and forget those accounts with private updates).
- Messages travel in one direction, from the updater to the follower.
- Each account has updates from other accounts it follows placed in its timeline.
- A Twitter account can selectively receive pushed updates immediately via instant messenger and SMS in addition to having an update added to its timeline.
- An update added to an account’s timeline may or may not be push based (lets assume it’s demand driven, or pull based).
- An update sent to an account from an account denoted as SMS or IM announcement is push based (there is no other way to send an update – it must be actively pushed from the server).
- The mere possibility of an update needing to be pushed requires the system to check with each follower’s settings, thus requiring analysis of each follower for each update.
A warm-up equation
If there are one hundred (100) users and each user follows ten (10) fellow users, and each user sends ten (10) updates per day, assuming all updates are push-based, how many updates are sent?
The answer is 10,000 – each sent update (100 users x 10 updates) is forked out to 10 followers who have requested push updates. This is a very large number of updates to send out via SMS or IM compared to the base of users.
A very important fact: It doesn’t matter if a user follows with the intention of receiving an SMS or IM update. The possibility of an updating needing to be pushed requires Twitter to examine every follower when an update is received.
From 1999 to 2004 I worked as a software engineer at Mplayer (who then changed names to HearMe, who then sold their video technology to LIvVE, who then was bought by GameSpy).
As with any chat room, the scaling is similar to Twitter. We had to restrict rooms to 500 users (and had insanely reduced reliability as we approached 500). As shown with the warm-up, each message is forked out to every user. We capped at 500 in a chat room because each user in a room contributes some amount of messages, and therefore as users join a room the traffic grows exponentially.
Official and unofficial numbers
According to Twitter’s blog post with stats, 50% of the Twitter population has 10 followers. 10% of users have 80 or more followers. According to TechCrunch’s research there were 200,000 active users posting 3,000,000 updates per day (as of the end of April 2008). The average Twitter user posts 15 updates per day (3,000,000 divided by 200,000 = 15).
We’ll use Twitter’s percentages in their blog post and combine them with TechCrunch’s numbers. From this, we know there are 100,000 daily users with 10 followers and there are 20,000 people with 80 or more followers. To keep things simple, we’ll leave the other 80,000 daily users out of the equation for now.
- 100,000 users x 15 updates per day x 10 followers = 15,000,000
- 20,000 users x 15 updates per day x 80 followers = 24,000,000
For laughs, let’s put in the top 10 Twitter accounts with the most followers (beware, there were fights over this).
- Kevin Rose x 15 updates per day x 46,646 followers = 699,690
- Leo Laporte x 15 updates per day x 44,948 followers = 674,220
- Barack Obama x 15 updates per day x 42,201 followers = 633,015
- Alex Albrecht x 15 updates per day x 30,348 followers = 455,220
- Jason Calacanis x 15 updates per day x 28,773 followers = 431,595
- Robert Scoble x 15 updates per day x 28,037 followers = 420,555
- Mars Phoenix (rover) x 15 updates per day x 26,828 followers = 402,420
- Veronica x 15 updates per day x 26,199 followers = 392,985
- John C. Dvorak x 15 updates per day x 24,102 followers = 361,530
- MacRumors x 15 updates per day x 23,846 followers = 357,690
Total of average users + top 10: 43,828,920 updates delivered per day.
And that’s half of the Twitter user base, mixed with a tiny fraction of the users who have large number of followers. Realistically, my estimate above is less than 10% of actual traffic because I’ve left out the 40% and have not included the thousands of highly popular users with more than 80 followers. Additionally, the number of followers for the people in the top-10 has grown between 50 and 100 percent since the end of April! (Twitterholic)
This puts Twitter’s actual message analysis and possible delivery between 100,000,000 and 1,000,000,000 per day.
This also does not include a single page view or web service call to their servers. Those alone account for a huge amount of Twitter’s traffic.
Compared to IM traffic
Back in 2005 (ZDNet) there were 13.9 billion instant messages sent per day, with estimates of quadrupled traffic by 2009 (46.5 billion). Instant messaging is divided up among a few primary services and IMs are one-to-one. According to Wikipedia, AOL AIM has 53 million users. If Twitter became as widely used as AIM, it would grow 265 times (53,000,000 divided by 200,000).
Take our findings for the number of delivered (or analyzed) updates on Twitter and multiply by this growth and you find Twitter has to be capable of delivering between 26.5 billion and 265 billion updates (probably much closer to the latter).
How can Twitter scale?
Decentralized XMPP is probably the answer, but I don’t really know. I can see the problem though. If they grow to having one million daily users, they have between 500 million and 5 billion messages to deliver. If they grow as popular and as relied upon as AIM, they’re staring straight in to their own exponential order of magnitude.
Very interesting. Wish I were more tech-savvy to understand the hardware implications of all of this. At the very least, though — and even if your numbers are back-of-the-envelope style — it explains much better for this tech neophyte why “Just add more servers” isn't necessarily the solution.
Can they get around these architecture issues, do you think?
I don't think so. They are, or will be, hitting numbers that today's hardware can't keep up with. An exponential increase in load is way beyond the growth of Moore's law.
In order to keep their service stable they will have to make changes to the base functionality or impose limitations on maximum followers or following.
Facebook capped at 5,000. Since they offer push services, I wonder if this was a calculated decision based on scale.
Very interesting. Wish I were more tech-savvy to understand the hardware implications of all of this. At the very least, though — and even if your numbers are back-of-the-envelope style — it explains much better for this tech neophyte why “Just add more servers” isn't necessarily the solution.
Can they get around these architecture issues, do you think?
I don't think so. They are, or will be, hitting numbers that today's hardware can't keep up with. An exponential increase in load is way beyond the growth of Moore's law.
In order to keep their service stable they will have to make changes to the base functionality or impose limitations on maximum followers or following.
Facebook capped at 5,000. Since they offer push services, I wonder if this was a calculated decision based on scale.