DailyDirt: The Ever-Growing Growth Of Data...
from the urls-we-dig-up dept
There are a lot of reasons to be optimistic about the future. Some folks will always predict doom and gloom, but we say, "The Sky Is Rising!" (loud and proud -- and again with sequel The Sky Is Rising 2). The advent of digital information has created an enormous wealth of data, and the amount of this digital awesomeness seems to be growing all the time. Here are just a few more examples of the amazing abundance of media that surrounds us.- The Internet Archive has updated its Wayback Machine, indexing 5 petabytes of internet goodness, covering the web from 1996 to December 2012. That data is from over 240,000,000,000 URLs, and this virtual backup of the web doesn't even touch sites that have a login or a robot.txt file that blocks the Wayback Machine. [url]
- Sandvine's global internet phenomena report contains a prediction that US internet traffic may rise to over 700,000 exabytes per year by 2019. And if Netflix continues to do well (accounting for about double the amount of traffic as YouTube and crushing Amazon Video, Hulu and HBO Go), a lot of that traffic will be people watching streaming movies and TV shows (legitimately, too, not just using BitTorrent). [url]
- Every minute of the day, more and more data is generated. For example, 571 new websites per minute, 100K+ tweets per minute... gazillions of infographics and bazillions of random factoids. [url]
- Since the beginning of time until 2003, humans generated about 5 billion gigabytes of data... and now we generate that much every 2 days. And that rate is accelerating (but humans are not exclusively generating all that data). [url]
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: data, factoids, internet archive, media, sky is rising, wayback machine
Companies: sandvine
Reader Comments
Subscribe: RSS
View by: Time | Thread
I would find the prospect enlightening if I didn't suspect that the majority of that new information isn't being generated by humans.
[ link to this | view in chronology ]
The Internet Archive...
In other words; Person puts up site. Internet Archive makes copy of site. Site goes bust. New owner puts up generic site with robots.txt file. IA see robots.txt file and disables access to existing backup of old site.
When I asked if they couldn't manually override this for sites that are obviously not the same anymore, I was told that it was impossible.
[ link to this | view in chronology ]
Data Generation
Now how about telling us of that data we actually understand, or today even utilize? I seem to recall that number was extremely small (like a single digit percentage).
The problem is we have all this data but most of it is locked away so that it can't be used by the masses, poorly organized so that even if one has access you can't find the data you need, and much of the data is inaccurate or incomplete.
So we are nothing more than a bunch of pack rats!
[ link to this | view in chronology ]
Way to go!
Nice. Now people who didn't know you could do that will start.
Don't give it away, people!
[ link to this | view in chronology ]
generated data incorrect, maybe
but.. how much of that is redundant data?
I see the same posts on multiple sites, and news churns through thousands of news sites, and millions of blogs..
retweets by the really giant bucket, the same movie 50 times per torrent site, the same song in a billion drop boxes, etc.
so, removing the echos, how much data is actually generated?
[ link to this | view in chronology ]
[ link to this | view in chronology ]