Somebody Should have Seen this Coming

Although I’m not a Twitter user I found this story interesting…

http://www.macworld.com/article/141146/2009/06/twitpocalypse_twitter.html

Apparently several Twitter applications are melting down since the count of tweets is exceeding the maximum number a 32-bit integer can hold. This would be 2,147,483,6471 in this case since the software writers were using signed integers.

For those of you who don’t know how computer programming works I’ll give a quick run down. An integer is a standard unit of storing integer numbers. On most platforms an integer is 32-bits in size. This means it can have a range of 0 to 4,294,967,295. In order to store negative numbers one bit must be used to indicate positive and negative. With the removal of this bit to store a value your range of storable numbers becomes −2,147,483,648 to 2,147,483,647.

There were two mistakes made here that I can see. The first one was using a signed integer. Since you never need to store a negative number of tweets there is no reason to waste that single bit to store whether a number is negative or positive, it’ll always be positive. The second issue, although understandable, is using a 32-bit integer. With the popularity of Twitter and number of tweets being made by each person every day it’s easy to see where more than 4,294,967,295 tweets will eventually be made. It would have been much smarter to use a 64-bit integer which unsigned gives a range of 0 to 18,446,744,073,709,551,615. Although not impossible it’s very unlikely there will ever be that many tweets before Twitter falls out of existence.

The first screw up was just poor planning, probably from an inexperienced programmer. The second mistake is understandable since most of the time when programming people simply use a basic integer type to store integer numbers.

But this story is a good example of what goes wrong when something isn’t fully planned out. I would imagine had more people been working on these applications somebody would have pointed this potential issue out. Always have an understanding on the possible maximum values your data may contain.