Get all your news in one place.
100’s of premium titles.
One app.
Start reading
The Guardian - UK
The Guardian - UK
Comment
Mariot Chauvin

Discover new compression innovations Brotli and Zstandard

Mathematician Claude E. Shannon, inventor of information theory (Photo by Alfred Eisenstaedt/Time & Life Pictures/Getty Images)
Mathematician Claude E. Shannon, inventor of information theory. Photograph: Alfred Eisenstaedt/Time Life Pictures/Getty Images

In 1948, Claude Shannon published an extraordinary article, defining for the first time a mathematical model of information and determining the maximum information quantity that can be transferred over a channel, now called the shannon limit, and the limits to possible lossless data compression.

Since, engineers have been trying to approach such limits dealing with two other practical factors the speed to compress and the speed to uncompress data.

This article will present two quite recent algorithms and how you can already benefit by using them.

Zstandard


Zstandard is both a new compression algorithm and a reference implementation which has been designed to be extremely performant with modern hardware. It is a general-purpose compression for a variety of data types.

While usually algorithm trades-off either compression ratio, compression speed, or decompression speed, Zstandard is designed to be good at all 3!

Compared to zlib (wrapper and de facto standard implementation of the deflate algorithm), which tries to balance compression ratio and speed:

  • At the same compression ratio, it compresses ~3-5x faster
  • At the same compression ratio, it decompresses ~2-3x faster
  • At the same compression speed, it compresses to 10-15 percent smaller files

Zstandard achieve this performance thanks to several design decisions:

At the Guardian we are now using ZStandard instead of zlib (using the java JNI binding) for compressing articles in our most critical component, the publication pipeline!

Brotli


Brotli is a general purpose lossless compression algorithm, that has been recently been standardised as an http compression encoding. Brotli has been developed by google, and has the following characteristics:

  • sliding window between 1KB and 16MB
  • a static dictionary with around 13,500 words or syllables in 6 languages and common phrases in HTML and JavaScript
  • 121 transforms to combines entries in the dictionary
  • A huffman based entropy encoder

Brotli trades-off compression speed for decompression speed and a slightly improved compression ratio.

Compared to gzip (thin wrapper around zlib, if you are confused this is expected), it decompresses about 20% faster, at the same compression ratio.

Although brotli uses a less efficient entropy encoder than Zstandard, it is already implemented and available in Google Chrome, Mozilla Firefox, Opera and (support is in development in Microsoft Edge)

Support of brotli by web browsers on 28-11-2016
Support of brotli by web browsers on 28-11-2016 Illustration: caniuse.com

Support has as well started to be added in web client and servers:

At the Guardian we are using the play framework which provide a built-in gzip filter but not yet a brotli one, so I decided to write it.

Google’s brotli repository doesn’t yet provide a reference java implementation, however you can use jbrotli, a JNI binding.

CDNs have recently improved their support as well:

At the Guardian we have been successfully using the playframework brotli filter on an internal tool and plan to apply it soon to our main frontend.

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.