Nikita Vasilyev

Gzip

Gzip reduces the file size by compressing repeating sequences of bytes. The longer the sequences are, the better the compression rate. When all the bytes in a file are unique, the compressed file becomes even larger than the original.

“abcdefghijklmnopqrstuvwxyz” contains 26 characters, or 26 bytes.

echo "abcdefghijklmnopqrstuvwxyz" > alphabet.txt
wc -c < alphabet.txt
  27

The wc -c command shows the file size in bytes; it works on Mac OS X, Linux and FreeBSD. 27 bytes are for the 26 letters of alphabet and a newline character at the end.

gzip -9 --no-name alphabet.txt

-9 indicates the maximum compression level.
--no-name tells gzip not to store original filename or timestamp.

wc -c < alphabet.txt.gz
  47

The compressed file is 20 bytes larger; Gzip metadata accounts for the additional bytes. Dissecting the GZIP format goes into greater details on the compression algorithm.

When to use Gzip?

The deploy script for this website gzips every file and compares the resulting file’s size to the original’s. When the result is larger, it keeps the original.

Gzip improved file size for almost all text files (JS, CSS, HTML):

  • index.html went from 5.8 KB to 2.5 (43% of the original)
  • min.js went from 11.4 KB to 4.8 (42% of the original)
  • main.css went from 4.4 KB to 1.7 (38% of the original)
  • debug.js is a tiny file that became larger after compression, from 46 to 63 bytes. Gzip metadata accounted for this overhead.

The file size of all PNG and JPG images increased or stayed about the same. That’s expected because these formats already support compression.

favicon.ico went from 5.4 KB to 2.4 KB. Ico is a a container format for BMP images, which does not support compression, thus the result.

Zopfli

Zopfli is a better gzip compressor from Google:

Zopfli Compression Algorithm is a new zlib (gzip, deflate) compatible compressor. This compressor takes more time (~100x slower), but compresses around 5% better than zlib and better than any other zlib-compatible compressor we have found.

zopfli command-line tool allows to specify number of iterations to perform:

zopfli -i20 -v index.html
Saving to: index.html.gz
block split points: (hex:)
Iteration 0: 17755 bit
Iteration 2: 17741 bit
Iteration 3: 17739 bit
Iteration 4: 17738 bit
Iteration 5: 17737 bit
treesize: 75
compressed block size: 2141 (2k) (unc: 6569)
Original Size: 6569, Deflate: 2227, Compression: 66.098341% Removed
Original Size: 6569, Gzip: 2235, Compression: 65.976557% Removed

It didn’t yield any improvements in size after 5th iteration.

“≈100× slower” might look scary but in my case it didn’t make any noticeable difference:

time zopfli -i20 index.html
        0.03 real         0.02 user         0.00 sys
time gzip -9 --no-name index.html
        0.00 real         0.00 user         0.00 sys

On Mac OS X zopfli can be installed via homebrew:

brew install zopfli
Published by
Nikita Vasilyev
· Updated