Matroska and FFV1:
One File Format for Film and Video Archiving?

Summary

To have a single file format which is suitable for the preservation of both film and video has been a dream of many archives, especially in smaller ones, since the beginning of digital moving images. Today, on the horizon, we can see a solution that will prove exactly this capability, in a free and open-source environment.

The following article is an updated and written version of three presentations I made last year on this topic at:

the No Time to Wait: Standardising FFV1 and Matroska for Preservation symposium on 18–20 July 2016 in Berlin, together with Kieran O’Leary from the IFI Irish Film Archive;
The Reel Thing technical symposium on 18–20 August 2016 in Hollywood;
a meeting of Memoriav’s video specialists on 22 November 2016 in Bern (PDF of the German slides).

It presents the situation as at the beginning of 2017. My goal is to discuss the evident potential of this new solution, as well as to openly address the still-unresolved aspects of the Matroska container and the FFV1 video codec.

Definitions

I describe film as having single-image-based content, mainly represented in the RGB or R′G′B′ colour space at 4:4:4 chroma sampling,¹ which currently is often stored as TIFF files into a folder, as DPX files into a MXF container or as JPEG 2000 files into an AXF container, etc.

I call video a stream-based content, mainly in the colour space Y′C_BC_R at 4:2:2 chroma sub-sampling,² which is currently often stored uncompressed into either a MOV (QuickTime), an AVI or an MP4 container. In practice, the choice of container does not matter, because only the file header (and possibly the file footer) are different in different containers, while the stream is bit-by-bit identical for the full image content. The file can be trans-muxed (i.e. the file is de-muxed and then re-muxed) very quickly, because transcoding (i.e. extremely time-consuming decoding and re-encoding) of the file’s content is not required. Re-wrapping can be easily done if needed, e.g. during a data migration, without any additional cost. Therefore, the passionate discussions about the best container choice – MP4, AVI or MOV – should now be relegated to the past. The important factor for an archive is that Y′C_BC_R 4:2:2 content is often used by the video and broadcast community to achieve the best quality of high-level professional production and post-production. An archive should therefore be able to provide historic content in a format that the commercial clients can use, perhaps without any transcoding.

Standardisation

Standardisation is of course fundamental to every technical field. Different bodies have recently standardised, or are currently standardising file formats that closely relate to the audio-visual preservation field:

The Society of Motion Picture and Television Engineers (SMPTE) has standardised the CineForm or VC-5 and the ProRes video codecs. ProRes has been one highly relevant de facto standard in post-production, but Apple will soon stop supporting QuickTime on Windows – and probably on macOS in the not-too-distant future. While the popularity of GoPro’s CineForm/VC-5 seems to be increasing at present, sadly, the published standard does not contain all the relevant information needed to implement the codec.
A group of scholars, led by the University of Basel in Switzerland, is preparing a proposal for an archival version of the popular TIFF file format, which they plan to submit to the International Organisation for Standardisation (ISO) for approval and inclusion. The format was initially called TIFF/A, like PDF/A, but Adobe, who claims some rights in TIFF, would not agree to this; the new format is therefore called TI/A for Tagged Image for Archival.
The standardisation of EBML, Matroska (.mkv), FFV1 and FLAC is currently being undertaken by the IETF’s CELLAR working group. This is the main topic of my paper.

What do all these acronyms mean?

IETF: The Internet Engineering Task Force (IETF) is the body that governs the internet from the technical point of view, in particular, the TCP/IP internet protocol suite. It develops and promotes voluntary internet standards, the so-called Request for Comments (RFC). It is an open standards organisation, with no formal membership or membership requirements. All participants and managers are volunteers, though their work is usually funded by their employers or by sponsors.
CELLAR: One of its numerous working groups is called Codec Encoding for LossLess Archiving and Realtime transmission (CELLAR). This group is attempting to standardise a coherent set of open, transparent, self-descriptive and lossless formats. This is indeed an important mission for the open-source community to undertake for the archival world. CELLAR is standardising four different elements.
EBML: The first element is the Extensible Binary Meta-Language (EBML). You may think of it as a binary equivalent to XML, which allows the encoding of bitstreams instead of bytes, like Unicode characters for XML.
Matroska ³: The second element of CELLAR’s standardisation work is Matroska, a container or wrapper with the file extension “.mkv”. It can contain, among many other elements and possible formats, an image stream encoded by the FF Video Codec 1 (FFV1) and one or more audio streams encoded by the Free Lossless Audio Codec (FLAC). Matroska is actually a fork of an unfinished and abandoned container called Multimedia Container Format (MCF). Google’s WebM container from Google is technically a fork – mathematically a subset – of Matroska.
FFV1: The third element is FFV1, a simple and efficient lossless intra-frame-only video codec. This content can be compressed losslessy, needing roughly 40% of the uncompressed storage space, when using the FFV1 video codec. This is a similar compression rate to that achieved by the JPEG 2000 video codec, but FFV1’s compression time is less than that of JPEG 2000, because of its much simpler compression algorithm. This is true for both the stream-based Y′C_BC_R 4:2:2 content, as used in the video and broadcast world, and the single-image-based R′G′B′ or RGB 4:4:4 linear or logarithmic content, as used by the cinema industry.⁴
FLAC: The fourth element is FLAC, an audio codec. While the Broadcast WAVE Format (BWF) is a good archival choice for sound, FLAC provides lossless compression as well, though this is less relevant for sound than for image, because of the very different sizes. During CELLAR’s first year of activity, nothing has been done on FLAC standardisation, but, as Google has announced in January 2017 that its Chrome browser will support FLAC natively, I imagine this will become a priority during the year.

When standardised by the IETF, this suite of objects provides the key to a non-proprietary, trans-generational, functional and stable deep-storage schema for data that can exist as fixed media (tape, HDD or SSD), or on servers, or in a complex and multi-level environment such as that known as “cloud storage”. It allows for the deployment of archive data across many storage environments and through generations of migration with a high degree of confidence and interoperability.

What is inside my DPX?

One of the current, so-called “raw” formats for scanner outputs is DPX (Digital Picture Exchange). Kieran O’Leary from the IFI Irish Film Archive offers in-depth discussion of many aspects of the current situation in an outstanding blog: Introduction to FFV1 and Matroska for Film Scans. I would mention here only the real advantage of storing CRC-32 checksums for every slice of frame that FFV1 provides over DPX or TIFF, which do not contain any embedded fixity information. This is one key factor that allows institutions with only a small infrastructure to achieve professional preservation of audio-visual files.

Kieran O’Leary also notes that FFV1 does not store (encode) and retrieve (decode) all metadata correctly at present. This is partly related to the fact that DPX can code the RGB information it holds in many different ways – which means the archive really must know what is inside the DPX files.

DPX is a strange construct, an umbrella that groups together many different encodings. The origin is the Cineon format developed by Kodak for the digital intermediate workflow in the early 1990s. At that time, films were shot on analogue film reels and screened in the same format. Cineon was designed for an interim step, i.e. for post-production purposes, not for conservation. Therefore a .dpx file may contain different encodings of RGB-based information:

log neg encoding: Examples: Cineon Printing Density (CPD/DPX), ARRI log C
log RGB encoding or quasi-log encoding: Examples: FilmStream (log₆₀), SI-log (Silicon Imaging, log₉₀), ARRI log F, Panalog (Panavision), S-log (Sony), REDlogFilm
gamma encoding or power function encoding: Examples: sRGB, CineGamma, Film Rec (Panasonic), hyper-gamma
scene-linear encoding: Example: ACES

As Kieran O’Leary says, at present, it is very hard – maybe even impossible – for an archivist to know exactly what is inside the different DPX files, from different sources, held by his/her archive. Production and post-production processes don’t give high priority to technical metadata, perhaps because it is not particularly relevant if the colourist has to tweak a little the controls during the creative process. It is entirely the opposite for the archivist, of course: it is crucial to preserve the document as it is, without any additional creative work.

File Formats

Archive Master and Mezzanine

The Matroska container and the FFV1 video codec are good choices for single-image-based content when making archive masters. Often, a resolution of 2K, or sometimes 4K, an RGB colour space, the 4:4:4 chroma sampling and a bit-depth of 16 bit per colour channel are canonical choices.

For stream-based content, the Matroska container and the FFV1 video codec are also good choices for the archive master. A resolution of HD (ith pillar-boxing of letter-boxing if required), in general, the Y′C_BC_R colour space, the 4:2:2 subsampling and a bit-depth of 10 bit are usually considered best practice.

The Matroska container can also be used for sound, with FLAC as the audio codec. Good parameters are a sample rate of 96 kHz for preservation and mezzanine, and 48 kHz for access,⁵ with quantisation of 24 bit for preservation and 16 bit for access.

The advantages are having one container format for both single-image-based and stream-based content. Unfortunately, it is too early to recommend the same format for both the archive master and the mezzanine, because, though this may change in a near future, at present, FFV1 is natively supported by only a few applications.

Access

The Matroska container is currently not popular enough for it to be recommended for access. While Matroska’s subset WebM is being implemented more and more in modern browsers, it needs the V9 video codec. In practice, however, MP4 is currently the better choice. An HD resolution (with pillar-boxing or letter-boxing if necessary) can be used for screening on a current television or computer monitor. The “natural” video codec would be H.264, encoding Y′C_BC_R with a 4:2:0 chroma subsampling for the image.⁶ Unfortunately, AAC (Advanced Audio Coding) is the only audio codec permitted by the MP4 container. We recommend a sample rate of 48 kHz and a quantisation of 16 bit.

Outlook

Though some issues remain unresolved, Matroska with FFV1 (and FLAC) is on the way to becoming a solid alternative – especially for small archives or archives with extremely limited resources – for the preservation masters and mezzanine files. It is too early to recommend a change for access.

Both SMPTE and the Library of Congress are evaluating data implementations to accomplish the same goals. It is important for the entire community of archives, from the largest state institutions and media companies to the most modest and local repositories, to understand the economic and technical value that collective, open-source solutions can offer.

We are designing and implementing systems that will retain data over timespans substantially longer than that of the life of motion picture film.

Reto Kromer

Notes

1: The prime (′) indicates that the value is gamma-corrected, i.e. adapted to the human vision and not to the physical reality. It allows the same numbers of steps on the dark side as on the light side of the so-called medium grey. Note that this is not an apostrophe.
2: Y′C_BC_R is sometimes written YCbCr and often incorrectly YUV, which is actually the colour space used for analogue PAL video and not for digital video.
3: I’m sorry, Matroska is not an acronym.
4: For R′G′B′ or RGB 4:4:4 at 16 bit per colour channel the compression rate could be a little improved. Currently the implementation of Bayer-filter-based formats is just an idea; nobody is actively working on it.
5: I don’t believe the so-called “CD quality” at 44.1 kHz to be a good choice. Its storage economy is minimal, while its sound quality is significantly diminished.
6: While the H.264 codec’s definition allows uncompressed coding, as far as we know, these files can only be handled by FFmpeg-based players. We therefore suggest to compress a wee bit.

I wish to acknowledge the help provided by Kieran O’Leary and Adrian Wood, Grover Crisp and Michael Friend.

Having graduated in mathematics and computer science, Reto Kromer became involved in audio-visual conservation and restoration more than thirty years ago. He was head of preservation at the Swiss National Film Archive (in charge of cataloging, conservation and restoration of the film collection), and lecturer at the University of Lausanne and the Academy of Fine Arts Vienna.

He has been running his own preservation company, AV Preservation by reto.ch, and lecturing at the Bern University of Applied Sciences. His current research includes colour spaces, look-up tables and codec programming and emulation.

This article has been published in Journal of Film Preservation, n. 96 (April 2017), FIAF, Brussels, Belgium, p. 41–45 (PDF, 354 KB).

2018-04-26

Matroska and FFV1: One File Format for Film and Video Archiving?