Skip to content
NARA digital preservation file format risk analysis and preservation plans
Branch: master
Clone or download
Latest commit ab8d577 Sep 12, 2019

README.md

U.S. National Archives Digital Preservation Framework

The National Archives and Records Administration is seeking public comment and discussion on our Digital Preservation Framework, which consists of a Risk and Prioritization Matrix and 15 File Format Preservation Action Plans. The public is encouraged to join the discussion, September 16 through November 1, 2019, here on GitHub.

Background

The National Archives 2018–2022 Strategic Plan embraces the primacy of electronic records. Our vision is to ensure cutting-edge access to extraordinary volumes of government informa­tion and unprecedented engagement to bring greater meaning to the American experience. To do so, we must collaborate with other Federal agencies, the private sector, and the public to ensure records and archives thrive in a digital world.

Digital preservation is critical to this work. It becomes even more important because of the June 2019 direction (M-19-21, Transition to Electronic Records) to Federal agencies to transition business processes and recordkeeping to a fully electronic environment and to end the National Archives’ acceptance of paper records by December 31, 2022.

Our digital preservation subject matter experts, led by Director of Digital Preservation Leslie Johnston, have been hard at work to prepare the National Archives for this change. They have formalized a set of documents that describe how we identify risks to digital files and prioritize them for action, and created specific plans for the preservation of these many file formats.

To date, NARA holds about 1.5 billion files representing more than 350 file formats. These files can be categorized into 15 general records types. The vast majority are email, followed by JPEG and TIFF images and plain ASCII text.

We are posting these documents because we want to share what we are doing, and because we need your help.

The NARA Risk and Prioritization Matrix

We use the Risk and Prioritization Matrix to measure the preservation risk of digital file formats in our holdings and to assess formats we’ll get in the future. By answering questions related to the ability to preserve and sustain a file format, we generate a numeric risk score.

The sustainability factors each have a different level of impact (positive or negative) on a format’s risk level, with several high-impact factors affecting the calculations the most.

High Impact Factors:

  • Positive:
    • A high level of adoption, the availability of format documentation, the ability for a file to document itself, a lack of software dependencies, and no requirement for technical protections (such as encryption) provide the most positive impact.
  • Negative:
    • Format age and required hardware and/or software dependencies have the most negative impact.

The answers to all the questions have numeric values, which are used to calculate an overall Risk Rating and a general risk level: Low Risk; Moderate Risk, and High Risk.

We also prioritize the formats in our holdings for preservation actions. The Prioritization Matrix is modeled on the traditional preservation model of Value/Use/Need. For our purposes, we use Need/Use/Readiness to determine our preservation priorities. The Risk Rating goes into the “Need” column, representing the Need for a preservation action. “Use” is represented by how common the format is in our holdings, approximating level of use of the format in the permanent records of the Federal Government. There is no way to map the “Value” of the holdings to individual file formats because record sets/series typically contain multiple file formats. Instead, we have replaced Value with Readiness, or the capacity for NARA to process and convert formats. We assess Readiness based on the general availability of tools for format migration that do not alter the content in unacceptable ways as well as our capacity to perform acceptable migrations.

For both the Risk and Prioritization, the lower the number, the greater the risk or need.

We are sharing our current completed matrix, as a template for its use by any interested organizations.

For a more technical discussion of the development and use of the Risk and Prioritization Matrix, a conference paper presented at the 2018 iPRES International Digital Preservation meeting is available at: https://osf.io/ctw3g/.

File Format Preservation Action Plans

We are also sharing our recently completed draft File Format Preservation Action Plans.

These plans correspond to the record types in the NARA Transfer Guidance, which outlines Preferred and Acceptable file formats for the transfer of electronic records from agencies: https://www.archives.gov/records-mgmt/policy/transfer-guidance-tables.html.

They 15 records types are:

  • CAD/Digital Design/Vector
  • Digital Audio
  • Digital Still Image
  • Email
  • GIS
  • Moving Image: Digital Cinema
  • Moving Image: Digital Video
  • Multimedia
  • Presentation/Publishing
  • Software Code
  • Structured Data: Databases
  • Structured Data: Generic
  • Structured Data: Spreadsheets
  • Textual/Word Processing
  • Web Records

Each plan contains a list of “Essential Characteristics,” also known as “Significant Properties,” which identify the characteristics of a record (its Appearance, Behavior, Context, and Structure) that should, if possible, be retained if any format migration. These characteristics are important to ensure the highest fidelity format record migrations.

  • Each plan also contains a list of related file formats currently identified in our holdings and identifies:
  • Current Risk Rating and Prioritization Rating
  • Links to specifications and documentation
  • Recommended preservation migration actions, including no action when appropriate
  • Recommended tools for processing and preservation migrations
  • NARA Transfer Guidance for this record type
  • Identification of formats often provided to researchers through reference requests and through the National Archives Catalog.

How Can You Help?

We are sharing these documents to be transparent about our approach to digital preservation and to solicit input from Federal agencies, records managers, archivists, digital professionals, researchers, private industry, other stakeholders and allied professionals, and members of the public to help us identify ways we can improve them. In particular, we are hoping to get feedback on the following topics:

  • What revisions can you suggest to the proposed processing and preservation actions for the formats?
    • Are the Essential Characteristics for each record type comprehensive enough for digital preservation?
    • Are the proposed preservation actions for the formats technically appropriate?
  • Are there appropriate tools for processing and preservation of specific formats that we do not have listed?
  • What can you suggest in terms of appropriate public access versions of the formats?
  • Are there other formats we haven’t identified that need plans?

Please use the issues feature on this site to leave a specific comment or question or to just start a discussion. You can read more about how to contribute here. NARA staff will respond as quickly as they can.

The Digital Preservation Framework documents will be open for comment until November 1, 2019. After that date, we will take all the feedback and update the matrix and plans, incorporating what you've told us. Then final versions will be released publicly.

We expect to update the matrix and plans on an ongoing basis in response to changing risks and new technologies and formats.

Thank you for your assistance.

You can’t perform that action at this time.