IT folks: I’ve read with interest about the DjVu format – often used for distributing books, scores, manuscripts. It turns out that the compression mechanism used there is 20-100 times better than the familiar JPEG and GIF – up to 15Kb per page. I know how these two formats work internally, but I became curious about how to squeeze almost two orders of magnitude more out of them.
It turns out, the whole trick is that the image is divided into three layers, which are compressed differently – foreground, background, and a black-and-white (one-bit) mask. The mask is saved at the resolution of the original file; it contains the image of the text and other clear details. The resolution of the background, which retains illustrations and page texture, is typically lowered to save space. The foreground contains the color information about the mask; its resolution is usually reduced even more. Then, the background and foreground are compressed using wavelet transformation, while the mask uses the JB2 algorithm.
Wavelet transformation, in a nutshell, involves dividing the image into high-frequency and low-frequency areas, which can be compressed better individually since they have lower entropy. High frequencies are contrast-rich patches where brightness changes abruptly, and low frequencies are smooth areas where brightness changes gradually. To put it very simply, it’s like taking the derivative of the image multiple times, which expresses smooth structures with fewer bits. JPEG works in a similar way, only it uses a different smoothing algorithm (DCT) and operates with 8×8 blocks, which is absent in wavelets.
A distinctive feature of the JB2 algorithm is that it looks for repeating symbols on the page and saves their image just once – i.e., it clusters the picture into similar areas. For example, all the letters ‘a’ printed in the same font of the same size could serve as examples of such clusters. Slightly different letters ‘a’, for example with distortions from scanning or printed in another font, would fall into different clusters. As a result, a dictionary is created, where frequently occurring identical letters are combined. Then the location of each letter is saved, resulting in a very compact format. Overall, it is similar to JBIG2, of which there is much more information available online. In multi-page documents, every few consecutive pages use a common “dictionary” of images.
Quite an interesting area, computer graphics. I was once passionate about it.