Term: File format

Set of structural conventions that define a wrapper, formatted data, and embedded metadata, and that can be followed to represent images, audiovisual waveforms, texts, etc., in a digital object. The wrapper component on its own is often colloquially called a file format. The formatted data may consist of one or more encoded binary bitstreams for such entities as images or waveforms, and/or textually-encoded data, often marked up with XML or HTML, for texts. The embedded metadata may be skeletal or extensive.

This definition has been tailored to fit the planning activities carried out by the FADGI File Format Subgroup. Meanwhile, in the digital library community, the broad concepts underlying the FADGI definition are often subsumed under the generic term format, although this usage does not generally require that all three elements (wrapper, bitstream, and metadata) be present at the same time. Here are two definitions for format from authoritative bodies in the field:

  • A set of syntactic and semantic rules for mapping between an information model and a serialized bit stream. Many formats can be grouped into loose categories, or families, sharing a general set of encoding rules that are further restricted or extended for the specific format or profile. A format version is considered a profile. (Combined definition from the United Digital Formats Registry (UDFR), slide 7 in the Unified Digital Formats Registry Stakeholder Meeting PowerPoint Slides; and JHOVE2, JHOVE2 glossary.)
  • The internal structure and encoding of a digital object, which allows it to be processed, or to be rendered in human-accessible form. A digital object may be a file, or a bitstream embedded within a file. (From the U.K. National Archives Digital Preservation Technical Paper Automatic Format Identification Using PRONOM and DROID.)

Additional definitions of format have been offered by the InterPARES 2 Project and the Library of Congress Sustainability of Digital Formats Planning Web site.

