Search This Blog

Tuesday, November 27, 2007

That's a Lot of Data

Ever wonder how the amount of information available on the Internet compared to, say, the Library of Congress? Wonder no more:
The Library of Congress, which is larger than the New York Public Library, contains about 11 terabytes of information. That’s a huge amount of information. Yet it is dwarfed by the amount of information already accessible online through search engines, about 167 terabytes. This is about fifteen times as much as the Library of Congress, a figure which even Grafton admits is impressive. But the information available through search engines like Google in turn shrinks to a literal dot compared to the material for which no ready directory exists: the so-called Deep Web. Deep Web is that part of the Internet for which there is no street map. The University of California in Berkeley estimates the Deep Web to be 91,000 terabytes in size — 545 times larger than all the material indexed by search engines and 8,150 times larger than the holdings of the Library of Congress. The difference between paper and online holdings is the difference between a small chicken and a fully grown Tyrannosaurus rex. And if Google, Microsoft and others ever finish their plan to migrate books online it will simply mean that the T-Rex has eaten the chicken.
I still wonder, though, how unique each one of those 91,000 terabytes of information is. For example, the bytes in every copy of all Harry Potter books is something like 100 terabytes and that's for just one book series! Nonetheless, I'm sure the amount of unique information on the "Deep Web" T-Rex likely dwarfs the printed chicken.

No comments: