As a documentarian, I have mostly been drawn to working on historical films. My first experience in this genre, a dipping of my toe in the proverbial water, was a documentary short about one of the remaining Dentzel carousels owned and operated by the City of Burlington NC. That prompted me to work on something vaster and so my first feature-length documentary was about the Regulator Movement in pre-Revolutionary North Carolina. I am currently working on another documentary piece concerning Burlington and which is set in the 50’s. All of these works have one obvious thing in common, which is that they all depend upon the historic record whether it is written, painted, photographed, or filmed.
It occurred to me that, while this day and age is very convenient in terms of digital content, that very convenience will become the bane of decent historic research decades from now. Historians and documentarians will face challenges 20, 50, or 100 years down the road that can only be prevented by addressing them now.
As we begin this discussion, consider your own digital materials. More than likely, you have a treasure trove of photographs you snapped on various digital cameras and on your cell phones. You probably have several wonderful threads of communication in some Email system, be it Gmail, Yahoo, Outlook or some other provider. Maybe, you have some precious videos shot on a digital media video camera or on your cell phone. If you are a writer, you might have an online diary or you maintain an active blog either of which you regularly update.
The question which you may well be anticipating is this: Where are these precious collections of digital data stored and do you implement a valid backup plan to prevent their loss if a hard drive fails or if you lose your cell phone? You see, once those files disappear into the big bit-bucket in the sky, they are gone forever. Rephrasing that last statement a bit we can say that a piece of your personal history, your family history, your community’s history has disappeared forever.
Of course, traditional sources of historical research can also be destroyed by fire, flood, rot, war, and so on. True, but how likely is a fire or a flood to wipe out your home versus the statistical probability of a computer hard drive crashing? I would dare to say that is a number of orders of magnitude less likely. Our little mental exercise serves to touch on one small aspect of the many problems that face historians in the future, that of storage. Add to this the challenge of searching for it and the challenge of being able to use today’s digital material at some time in the distant future.
So, let’s get into the challenge of storage itself. If you are a very astute computer user, you may well have dealt with the above scenario by saying to yourself, “This won’t be a problem for me because I back my data up.” Ok, that sounds really good, but what exactly is your backup strategy? Do you back up to an online service (e.g. Mozy)? Do you make physical backups on DVD-ROM or BDROM? Do you back up to other drives? Or, if you are a very computer-savvy person, you may do proper backups to several generations of drives housed in different safe locations. Perhaps, your precious memories are work-related memories housed within some governmental or corporate shared drives which are backed up every night and stored in data centers scattered around the country.
All of these strategies expose some flaw and other strategies exist that overcome those flaws. If you back up to disks that you keep in the same location of the original drive, then you run the risk of some disaster (fire, flood, tornado) destroying them both at the same time. If you back up to an external service, then you have overcome that flaw but now you have another dependency which we will discuss shortly – what is the longevity of the company that provides that service? If you back up to other drives and keep generations of backups, are you making sure that you exercise those drives every month or so to prevent them from seizing up? If you back up to DVD-ROM or BD-ROM disks, do you diligently check that the backup is perfect and there are no flaws? Do you protect the surfaces of those disks from any sort of scratches or dings? Should something happen to you and you are no longer around, are there other people around you who know where these treasure troves of data are stored and how to ensure that they are preserved?
I know that there are some who are literally shouting at the screen right now that they keep their memorable photos and videos on online services such as Facebook, Flickr, YouTube, Picasa, and Vimeo. You are saying, “Should I lose the photos here, they are always available there and I can always download and print one from there!” Sounds reasonable until you realize that these services probably won’t last forever. Yes, Facebook, Google (YouTube and Picasa), and Verizon (Flickr) are solid companies today, but what will happen to them when the next big thing arrives in a few years? Where is MySpace today after Facebook stomped them? Will these services be around another 25 years? How about 50 years or 100 years from now? Sites like the Internet Archive may help preserve snapshots of their pages but even this is a service that may go belly-up one day.
The majority of corporate and government entities do have archival policies in place but even they can lose critical information that would be important to historians at some later date. Emails, photos, videos, and other recordings may be handled by backup strategy policies but many fall prey to traps set by lawyers. Yes, there are many entities that have legal retention policies attached to their archival policies. Items are run through the digital shredder in a year, 3 years, 5 years, 7 years, 10 years, and so on depending upon the legal nature of their content and thus are not preserved based upon their historical value! Daily, we lose terabyte after terabyte of material that may be significant to some later generations.
Another challenge facing the historian of tomorrow, assuming that precious digital assets are properly saved and transmitted from generation to generation, is the challenge of finding them. Consider that, with traditional photographic and written assets, these may be stored in drawers or even in boxes in an attic. Whenever they are located, it is clear what they are and an historian can quickly browse through them and determine that they are relevant to their research or not. Even home movies on 8mm film can be unspooled a bit and held up to the light and one can gain a quick understanding of what kind of material may be on the spool. How does the same process occur in the digital realm?
How often does someone find a hard drive or a stack of backup disks and immediately be able to view their contents. Disks may be formatted differently depending upon the source operating system and the preferences of the original operator (FAT, Fat32, NTFS, XFS, EXT2, EXT3, EXT4, HFS, etc.). Is the media compatible with today’s equipment? Will it be compatible with tomorrow’s? What if the contents of the disk are the result of some proprietary backup system and not plain files? Would the drive even be pluggable to anything available in the future? For example, what would you do today if someone produced a stack of 5 and 1/4 inch diskettes and said that they contained the backup of critical images archived with an early Apple Mac computer? Where would you even begin to physically read the data and be able to get it into some form that you could use? The point of this is that as technology changes, storage formats change also.
Even if the documents, images, or video are online somewhere in the giant web soup, how would you be able to find them and derive a proper provenance for the material? A large number of family-history images are uploaded by people with the generic names assigned by their camera (IMG0720.JPG instead of JoeAndSuzieQHoldingNewborns.JPG) and so there is not a good tag for locating them. If images have been posted to a private page on a site such as Facebook or Flickr, or a diary is posted privately on LiveJournal, all bets are off in doing any sort of generic search for them. They are, essentially, lost to history forever.
Yet another challenge faces tomorrow’s historians assuming they are able to overcome the previous two. It is the challenge of being able to actually open the material. This becomes more of a challenge with video or correspondence file formats than with photographic material in most cases. Standard photo formats such as JPEG or PNG are well-documented and very mainstream. PNG may have a bit of an edge because its specification is open-source, completely within the public domain. For more serious images, there is also an open-source RAW format created by Adobe called DNG (Digital Negative). TIFF image format is in the public domain also but can contain runs of proprietary data so it is not necessarily completely safe. Most RAW formats (used by serious photographers) which are essentially digital negatives, are proprietary to camera manufacturers such as Nikon, Canon, and Sony. Probably as long as these manufacturers continue to produce cameras these formats will continue to be supported.
Video is a completely different animal. In order to support moving images in an efficient manner, video is encoded using specialty algorithms that compress the data and permit it to be streamed easily to the playing device. The encoders, known as CODECs (Coder/Decoder), are very specialized and very proprietary software components which are used to accomplish these tasks. Note that the file extension of a video file does not have anything to do with the CODEC – a .MOV file or a .MP4 file container holds a video that is encoded with this proprietary format. CODECs include H.264 (several flavors), ProRes (Apple), Cineform (GoPro), DNxHD, and so on. Unlike a home movie’s negatives which can be easily seen, a video is a closed box which may not be visible ever. Why I say “ever” has to do with the fact that some CODECs are invariably tied to a specific computer’s operating system in such a way that they won’t work on others. There are older proprietary CODECs which are 32-bit Windows 95/98 only and which are unavailable for any other devices.
A word of warning about CODECs. There are a lot of sites which offer “CODEC Packs” that seem to be treasure troves of support for lots of these old CODECs but downloading their product merely introduce some nasty virus into the target computer. Mainstream codecs generally are readily available in Quicktime, Windows media players, or the excellent VLC media player. Top end editing packages such as Final Cut, Adobe Premiere Pro, Avid, Magix Vegas Pro, and so on also tend to provide as many CODECs as they can legally license.
Correspondence also falls into this same boat. We have already discussed the longevity problems of online Email services, but consider that they are literal black boxes for searches. How long will they keep their data after the death of the user? How can their contents be archived in any meaningful fashion? Local Email systems such as Outlook do provide the ability to archive their data files with a proper backup process, but their contents are contained in proprietary data formats which only can be viewed by their host program. What will happen with these treasure troves of correspondence information 50 or 100 years down the road when Microsoft Outlook is some distant memory like the Commodore VIC-20 is to us today? The sad thing is, how could someone properly preserve their correspondence short of printing every Email to paper or to an open format such as PDF? Even the most diligent of us would tire of this process very quickly!
Electronic diaries and journals, either online or proprietary software versions, fall into the same boat as Email. The contents of these caches of historical information are just as unsearchable and un-archivable as their correspondence peers.
So, in closing, the purpose of this article is to sound a warning about a problem which we documentarians and historians will encounter soon. I dare say that some have already run into some of these issues depending upon the timeframe of history that they are researching. I, sadly, have no real answers but it would behoove historians to pair up with software folks and attempt to find ways to tackle these problems now. Whatever technological improvements that can be discovered to fix these problems need to work into the mainstream soon so that John Q. Public’s contributions to the writing of our history by future researchers will be available and usable.