Digital Archives: Choosing Sustainable File Formats

The sustainability of digital materials depends on standard file formats that will last for the long term. As technology changes rapidly, archivists and other information professionals need to use a narrow set of sustainable file formats to retain information between systems and programs. As new formats develop, sustainability must become part of the design process from the beginning to make the efforts of digital preservation successful.

Cultural heritage institutions have led efforts to identify formats that are promising for long-term sustainability, and to develop strategies for sustaining these formats—including recommendations about the tools and documentation needed for their management. 

Sustainability Factors

The Library of Congress has developed some criteria for predicting sustainable file formats in digital archives. The first, of course, is adoption. The format should have widespread use. File formats should have a level of transparency, which allows files to be identified and their contents checked. Disclosure—whether the file format specifications are in the public domain—is also analyzed. Is the format an open standard, fully documented, partially documented, or have little documentation? Documentation of the standard and self-documentation through metadata support are essential too, as the latter should be provided within the format. File formats should also embody interoperability, functioning within a variety of services; they should be independent of external hardware or software. Lastly, the format should be “open,” because digital rights protections, licensing, patents, and intellectual property issues complicate preservation.

File Format Assessments

The British Library, the Library of Congress, Harvard Library, the National Archives and Records Administration (NARA), and the Digital Preservation Coalition (DPC) analyzed commonly-used file formats. The organizations wished to document gaps in current best practice, understanding, and capability in working with specific file formats. Their file format assessment is available on the DPC website (here). These organizations have identified ways in which format creators can produce more evergreen formats.

Some Definitive Conclusions

Some file formats have become preferred, such as the use of TIFF (Tagged Image File Format) to create preservation master images for many digitization programs. TIFFs are flexible, stable, and widely used. More than likely, this file format will continue to be the standard in the long term.

In a similar vein, PDF/A, a variant of PDF, is a standard file format for documents. PDF was originally a proprietary file format but became an open standard. PDF reproduces the visual appearance of digital materials, like a Word document. The PDF/A format adds a layer of preservation by being self-contained. All the information needed to display a document must be present in the file and cannot refer to external sources such as fonts. The PDF/A standard also mandates the use of metadata to specified standards.

Others Still in Flux

Other file format standards are still being agreed upon, especially with formats used for complex digital objects like video. The previously used formats for video were lossless Motion JPEG 2000 and uncompressed video.

Currently, for video files, some archivists prefer uncompressed V210 Video Picture Encoding/linear pulse-code modulation (LPCM) in a MOV wrapper. V210 digital, color-difference component video picture format and LPCM is a standard format used for digital audio applications. Others prefer open-source FFv1/LPCM in a Matroska Multimedia Container (MKV) wrapper. FF video codec 1 or FFV1 is a lossless intra-frame video code. The MKV is an open-standard format that can hold an unlimited number of video, audio, picture, or subtitle tracks in one file. No consensus to date exists among the archival community as to which file format or codecs should be used for preservation purposes for digital video. As an archivist with limited experience in preserving digital video, I look forward to more definitive agreements in this area.

Evolving Digital Sustainable File Formats

No matter your preservation actions—migration to new formats, emulation of current software on future computers, or a hybrid approach—sustainable file formats are crucial. Articulating our needs, analyzing our options, and agreeing upon formats helps preserve digital files as authentic, reliable resources for future generations. Lasting file formats influence the feasibility of protecting content in the face of changes to the technological environment in which users and repositories operate.

The blog was originally published on Lucidea's blog.