Will Digital Archiving Difficulties Wipe Out Important Elements Of Our History?

from the it's-a-challenge dept

Over the years, we've had quite a few posts about the risks of data extinction. That is, as more and more of our important data goes digital, there's a bigger risk that it could disappear. At the very least, it's easy for digitally stored data to become corrupted. Even if there are backups, it's possible that multiple copies could become corrupted. A bigger concern, though, is in the applications necessary to read the data. Even if you can store the data perfectly forever, without the right applications, it's meaningless. Matt Sullivan writes in with yet another article on the topic, this time from Popular Mechanics, that suggests we could be facing a "digital ice age" as plenty of data from this era of history are lost to bad archiving capabilities.

Of course, there are some people working on solutions. A few years ago, we wrote about Dan Bricklin's idea that we need "social infrastructure software" that is designed to last for many years to deal with exactly this issue. Of course, that only works if such software exists and people use it. The Popular Mechanics article notes that the National Archives is working on a big system to deal with just this issue -- though, when we last wrote about the system, it sounded full of potential problems, and reading the latest details are not that reassuring. Basically, they're spending over $300 million to have Lockheed Martin build a system that will translate more than 4500 different document types into flexible formats, like XML. However, it seems quite likely that important data or metadata is likely to get lost in the process. Others are suggesting that such a plan is dangerous, and they'd be much better off focusing on emulation techniques -- but again, that seems to get awfully cumbersome awfully fast, and that doesn't even touch on the copyright issues associated with such a project. In the meantime, some are arguing that the entire problem of data extinction is overblown -- saying that important data gets updated as systems change, and there will always be some way to go back and get other data if necessary.
Hide this

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team


Reader Comments

Subscribe: RSS

View by: Time | Thread


  • identicon
    Anonymous Bum, 21 Nov 2006 @ 5:01am

    Bosses

    Great now my boss is going to walk into my office and interupt my BF2142 game and tell me our data will be corrupted and do something.

    link to this | view in chronology ]

  • identicon
    Jezsik, 21 Nov 2006 @ 5:06am

    "...there will always be some way..."?

    I seem to recall a story about how some NASA researchers are concerned that the data collected in some older probes are already lost because the machines that can read the tapes no longer exist. We can't use new analytical techniques to re-examine the old data - much to our loss.

    I tell ya, the Babylonians were on to something with those clay tablets.

    link to this | view in chronology ]

  • identicon
    Chronno S. Trigger, 21 Nov 2006 @ 5:24am

    easy way

    won't it be easier to put all the data into one format with all the metadata and then make the program to read it into something that can be recreated in the future?

    Like make everything PDF and then leave detailed documentation on how Acrobat works.

    link to this | view in chronology ]

    • identicon
      Did you read?, 21 Nov 2006 @ 5:29am

      Re: easy way

      Oh very nice Chronno. Looks like we have some thinkers in the room. Wait a second! How is that going to stop the PDF from becoming corrupt?

      link to this | view in chronology ]

  • identicon
    Someon, 21 Nov 2006 @ 5:32am

    PDF

    ODT then, the man has a point

    link to this | view in chronology ]

  • identicon
    Anonymous of Course, 21 Nov 2006 @ 5:33am

    It's only a matter of desire.

    As long as the media hasn't degraded too much a
    machine could be built to read any format. Only
    the desire to spend the necessary funds is needed.

    And there's always eBAY...

    link to this | view in chronology ]

  • identicon
    Rico J. Halo, 21 Nov 2006 @ 5:36am

    another potential problem

    I wonder if important data might be "lost" but not due to corruption or loss of the hardware to read it but just because there’s such a constant avalanche of new data it gets buried. I think the bigger problem is going to be keeping track of the huge amounts of new data. There’s so much that no indexing system can possibly keep up with it all. I am already seeing clients with the problem not of "lost" data per se but misplaced data because they don’t have a suitable indexing system.

    www.thatpoliticalblog.com

    link to this | view in chronology ]

  • identicon
    Adam, 21 Nov 2006 @ 6:12am

    The answer are CDs

    Why not just put everything into ascii txt and store on CDs! Cds are supposed to last like a 200 years right? Much better than floppy disks! (And yes, I'm being cyncial here!). There are really two questoins of concern here though, not 1. 1. What format to store the data in. 2. What medium to store the data on? I think the 2nd quesiton is far more pressing then the first. Watching the evelotion of computing over the years, (especially in the last 20), it is reasonable to say that we can and likely will continue to have the ability to read data in old formats, even if worst comes to worst, someone just needs to write a software/application bridge to import or convert the data. The real problem is what to store it on. Lots of real world examples. Data backed up on tape, where the reaaders are hard to come by. Floppy disks who now are unreadable because the magnetic strips have become depolarized. CDs & DVDs.... I hate them. I have NEVER liked the format ,and thanks to HD-DVD & Blu-Ray, looks like this crappy medium is with us for a bit longer. The medium is fragile, a few scratches or to much sun and you can turn the disc into a coaster, etc. etc. There is no perfect medium yet. I much prefer solid state flash memory, but their just getting under way now, so the formats are rapdily changing (Compact Flash, SD, SD-mini, etc.). And their prices are still to expensive and storage ability is still to limited. In a few years though, these devices should come down in price and increase in storage so that they can match or exceed the storage of a new DVD (Blu-ray / HD), and price wil hopefully be less than $1 per gig. Don't know what the life cycle is of these though, but in about 3 years, we should start seeing notebook manufacturers start moving away from IDE based drives and using these as secondary or alternative drives. Adam

    link to this | view in chronology ]

  • identicon
    slide23, 21 Nov 2006 @ 6:20am

    Much ado about NOTHING

    I architected a digital library that focuses on historical documents. Archivists and librarians have forgotten more about how to keep historical documents safe than all of the Chicken Littles put together. The issue is completely overblown.

    For starters, CD-ROM is barely accepted as an archival format by archivists. Second, the same plans that go into three-9 business continuity are often the same kinds of plans that are put into place for archiving. Rotate formats, archival hard copies, redundant digital copies stored far apart, etc, etc.

    But really, this all boils down to doing a good job. The same person that would do a bad job archiving their digital documents would probably do an equally bad job archiving their "rlspc" documents. Either way, documents are just not safe in their hands.

    And how many of us have ever experienced irretrievable digital documents because the application no longer existed? Data corruption I can see for not being able to open a particular file, but you did your job badly if the information in that file was lost because you did not back it up properly.

    link to this | view in chronology ]

  • identicon
    jd, 21 Nov 2006 @ 6:43am

    Another license monopoly

    Oh yeah, let's put it in Adobe format and let them bend us over like Microsoft is doing. Microsoft software is already more than hardware, and now we want to give Adobe more bargaining power to force us to pay them and to manage "licensing" issues like Microsoft where it is ok to copy and use so long as you buy their hardware?? And talking about updates that hose a system... Adobe has done a good job to make sure I can only use their $500 software on ONE of my computers and it has a freshly formatted HD thanks to an Adobe update that I finally said OK to get it off my screen.

    It is only a matter of time until we move to a subscripton service for software and then we can revert to older versions as part of our monthly payment. The same probably holds true for storing your data... I would pay a nominal fee to guarantee my data is backed up frequently. The problem is still how to back up all your data since the Microsoft tools to search and copy all data on a computer is still useless in my experience. It probably will not be long until Microsoft tells me that I don't own the copyright to my data because my certificates are not valid or some other BS they are trying to force down our throat to get vendors to pay them to be their protectors. It is like hiring the wolf to watch over the sheep.

    link to this | view in chronology ]

    • identicon
      slide23, 21 Nov 2006 @ 7:24am

      Re: Another license monopoly

      Oh, shut up. Big bad Adobe is going to lock down the PDF standard and suddenly we won't be able to read our PDFs? Please stop engaging in gluteal dialectic. When you are going on an anti-corporate rant, it helps to base your viewpoint somewhere near reality.

      In case you were not aware, PDF has become an ISO standard. There is even archival-specific version of PDF (1.4, ISO 19005-1) that specifically disables particular features that just MIGHT not work in future applications.

      And if you felt that Adobe had you grabbing ankles, perhaps you should run a search for other PDF utilities.

      link to this | view in chronology ]

  • icon
    chris (profile), 21 Nov 2006 @ 8:09am

    look at previous civilizations

    if a civilization (digital or otherwise) was able to preserve it's history, there would be no need for archaeologists. the presence of the field of archaeology proves that all civilizations eventually fall and all but the tiniest of bits of thier histories fall with them.

    the moores' law style of the advance of storage technology *should* mean that we can cheaply store multiple copies of everything... and yet artificial software death, closed formats, copyrights, DRM, and the like pretty much guarantee that our digital history will be lost to the next century, and possibly even to the next generation.

    even if you could maintain all that was written, it is impossible to archive the semantics of what was written. you cannot archive the context of a work, it's true meaning. look at the constitution, even though it was written in english, the language used is vastly different than what we use today... and it is interpreted differently than it was interpreted when it was written... and the document is merely two centuries old. compare that to the bible, which is far older. look at how it was interpreted by the romans, then by the puritains, and compare that to the way it is interpreted today.

    link to this | view in chronology ]

  • identicon
    spoon!?!?!, 21 Nov 2006 @ 8:23am

    Too much ado about Adobe

    It's by far not the only PDF reader around, and OpenOffice supports one-click PDF creation. There IS competition, here. And still, I doubt Adobe would (or even could) lock the format down if the masses turned to them and shouted "f*ck you!"...

    And I gotta agree with Rico @ 7. An exponental growth in data only compounds the problem. One aspect we should be focusing on is what data isn't important enough to keep. We can't just keep it all forever.

    My solution: RAID 0+1's for all!

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 21 Nov 2006 @ 9:35am

    This topic is very much a part of Issac Asimovs Foundation series. They can't trace back the history of society past 30,000 years because no one uses the information enough from that long ago so it is erased. They even lost the location of which settled planet was Earth....I should read the series again.

    link to this | view in chronology ]

  • identicon
    Fred Munster, 21 Nov 2006 @ 9:46am

    It could happen

    I wondered about this scenario myself. Luckily I just signed up for this new service by a company called Tilana. They have a off-site data protection service with a pretty cool "versioning" feature. I highly recommend it!
    www.tilana.com

    link to this | view in chronology ]

  • identicon
    Fred Munster, 21 Nov 2006 @ 9:49am

    It could happen...

    I wondered about this scenario myself. Luckily I just signed up for this new service by a company called Tilana. They have a off-site data protection service with a pretty cool "versioning" feature. I highly recommend it!
    *corrected link www.tilana.com

    link to this | view in chronology ]

  • identicon
    PhysicsGuy, 21 Nov 2006 @ 11:41am

    Is it possible to store data and not have it become corrupted in some way? The most complex form of data storage, the human brain, has an absurd amount of problems maintaining proper information. The reasons (regarding the emotional input of the data) are obvious as to why, however, even in our old system... writing everything down, the data still gets "corrupted" by our interpretation of the written material. I side with the skeptics who say this problem is overblown. Let the important data be updated with the advance of technology. If any important data remains in an old format, I'll guarantee you can find an old reader of said format and then engineer it to work within whatever current system we posses.

    link to this | view in chronology ]

  • identicon
    PhysicsGuy, 21 Nov 2006 @ 11:43am

    Re: Too much ado about Adobe

    Raid 0 sucks... I shouldn't even have to get into the problem relying on that...

    link to this | view in chronology ]

  • identicon
    yarrdape6, 21 Nov 2006 @ 12:15pm

    SSDD

    The problems faced by digitally encoding data are not new problems. They are simply an extension of the problems faced by traditional means of historical data storage. Nothing lasts forever. I do think though that data does have a better chance digitally than otherwise. For the same reason that people recommend that you do not sell old HD's , data may become corrupted, but you can bring back a lot of data that has been partially deleted. Plus digital storage offers an almost unlimited backup capacity. Redundant systems are cheap enough that if the data is important enough, we can save it indefinitely.

    Thank god for Firefox spell check.

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 22 Nov 2006 @ 6:57am

    Nobody is asking the real question here- exactly what IS important? We're currently being crushed under a mountain of useless information. Are my personal pictures, notes, blogs, etc really important for future generations?

    Someone mentioned it above. The stuff that's actually important will survive. A lot of the little stuff will die. This is not a worse situation than a thousand years ago. The important stuff always survives.

    link to this | view in chronology ]


Follow Techdirt
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Discord

The latest chatter on the Techdirt Insider Discord channel...

Loading...
Recent Stories

This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it
Close

Email This

This feature is only available to registered users. Register or sign in to use it.