As developers compete for quality content on the web and mobile apps, image compression has become an important issue to master. The larger the content, the longer it takes to load, and the less time users are willing to spend with your app. Often this means less clicks and therefore less money coming to either your client or yourself. By reducing the size of images on your app, you will provide a better network experience to your users.
Of course, you can find general purpose compression programs to do the work for you, but results are not optimal. To get the best results, it’s best for developers to work on image compression themselves, and integrate an existing codec into their apps to improve the speed and choose the optimized parameters for their needs. In this article, we take you through the steps needed to make your image files smaller and more network-friendly.
What is Image Compression?
Image compression refers to a set of algorithms and procedures that aims to eliminate the redundancy that is present in a given image in order to minimize the memory/storage space required. Nowadays, almost every image stored is compressed in one way or another, because images that are not compressed are basically way too big in terms of data storage and transmission. Consequently, keeping images like this would be a huge loss of energy and resources.
Mobile users don’t want to pay to download your data. For apps that take up a lot of space, or take too long to download, you risk imposing high phone bill fees for your user. Mobile operators are counting every bit of data being used, and will charge for any data used that surpasses the user’s mobile plan. For anyone looking to get their content seen in emerging markets for example, this should be a priority, because users in emerging countries will be much more limited in terms of data usage, and will certainly be sensitive to the amount of space your app uses. Image size should not be overlooked, as it can really be the “make or break” element for your app.
To give you an example, images with the BMP format are uncompressed, and a small image of 256×256 pixels, will weigh 192 KB, whereas a compressed version (PNG) of the same image, without any degradation, would weigh 128 KB and its compressed version, without any (or very little) perceptible degradation would be 23 KB. So basically, the image weight is divided by more than 8 without any perceptible degradation. That’s why image compression is, and will be everywhere digital data is found. To make your apps stand out, you need to start compressing!
Before delving into the step-by-step process for image compression, let’s talk about data compression. Data compression is based on transforming a suite of A bits into a suite of B bits that are shorter, and contain either, exactly the same information once they are “decompressed” (in this case we’re talking about lossless compression), or similar/identical information from a semantic point of view (in this case, we’re talking about lossy compression).
Lossy compression is often used for compression of files or text. As a simple example, we can look at the word “aaaaaaaaaaabc”. This word is composed of 3 different letters, and one suite of 11 letters “a” followed by one letter “b” and one letter “c”. In this case, space can be saved by replacing “aaaaaaaaaaaa” with “11a”. Our word, once compressed, then becomes “11abc” and is now only 5 characters, as opposed to the original 13. This technique is called “Run-Length Encoding” and can be combined with other techniques like “Huffman Coding” for example. Nowadays we mostly use arithmetic coding based on mathematical principles such as the popular “Context Adaptive Binary Coding” (CABAC).
In the case of lossy compression, it’s a question of deleting or mitigating the information that is less perceptible to the human senses. Take for example the human ear, which on average only hears frequencies from 20Hz to 20kHz, and more precisely, only hears correctly between 2kHz end 5kHz. For these reasons, in the process of audio lossy compression, we try to mitigate, or even delete the frequencies outside these ranges to reduce the size of data. Image compression works around the same principle, in that it mitigates or deletes the visual elements that have little importance to the human visual perception.
To better illustrate image compression, let’s look at one of the first and most popular compressed image formats, which is the well-known JPEG (Joint Photographic Experts Group). The JPEG was standardized and adopted in 1992, after work on the concept began in 1978.
Step 1: Change the Color Space
We already know that one image is composed of 3 color channels corresponding to the primary colors: red, green, and blue (RGB). Each pixel in these different channels is represented by an integer value between 0 and 255. What happens when we separate the image based on these channels?
The problem with this representation is that the pixels of the 3 different channels have a very strong correlation with one another. However they need to be dissociated as much as possible in order to facilitate the compression. The solution is to change the space of colors RGB (by applying a reversible mathematical operation to each pixel) in a space YCrCb (Y for the luma component, and Cr and Cb for the red-difference and blue-difference chrominance channels respectively).
In the following, each image channel is treated independently.
This representation already strongly diminishes the correlation between the color channels. In addition, we notice that we better understand the image by looking at the Y channel (or Luma) than by looking at the 2 others. This is because the Luma color channel transports much more information from the human point of view than the 2 others.
Based on this observation, an operation is often performed at this stage: the chroma channels Cr and Cb are sub-sampled by a factor of two, only saving a portion of the information that they contain, so that we can move to the format YCrCb 4:2:0.
(For the rest of this article, we’ll only be focusing on the Luma channel).
Step 2: Divide The Image Into Blocks
The image can be divided into blocks of a fixed size (8×8 pixels for a JPEG) or a variable size (4×4 to 32×32 pixels for more evolved formats).
Step 3: Decorrelate the Pixels
A discrete cosine transform (DCT) must be applied to each of the blocks shown above. This transform is a mathematical operation that’s totally reversible and provides an excellent decorrelation. Analogously with the Fourier transform, the DCT, in practice, moves the pixels from the “spatial domain” to the “frequency domain” which contain the same information but represented and “ordered” differently.
To illustrate, we apply the DCT to the entire image and not to each block, so that we can correctly visualize the result. This is what we get:
This is obviously not a very striking image. But by taking the above steps, we were able to “clean” what we call the low frequencies at the top left of the image, and the high frequencies at the bottom right. In practice, the low frequencies are the elements that the human eye is more sensitive to; the more we get into the high frequencies, the more these elements reveal detail.
Step 4: Quantization
This is the step that causes an irreversible loss of information and therefore, degrades the image. This is why we speak of lossy compression. Quantization consists of representing each value in the transformed domain by a reduced number of bits. For this we need to apply a filter to each transformed block, which takes into account the spatial frequencies. In practice, low frequencies are quantized more finely (with a bigger number of bits) whereas high frequencies are quantized in a more brutal manner (by affecting them a reduced number of bits). This procedure enables us to leave the low frequencies as intact as possible while roughly representing the high frequencies. This completes the “lossy” part of image compression.
Step 5: Encode the Quantizied / Transformed Coefficients
The last step consists of encoding the group of quantized and transformed coefficients with one (or several) of the lossless compression techniques cited above (for this JPEG example, RLE+Huffman is used). This gives us an image file that can be decompressed by applying the inverse (mathematically speaking) of all of these operations in the reverse order. Once decompressed, the image is ready to be displayed on a screen.
The Science of Image Compression
Image compression techniques are still being improved. For example a predictive step was added before applying the transform to the blocks (example: BPG [better portable graphic] format). This step predicts the value of pixels in blocks, based on the value of pixels of the neighboring blocks. These predicted values are subtracted from the real values, and it’s the resulting number that is transformed and quantified. Of course, the predicted values will always be added at the decompression stage.
This predictive step makes it possible to greatly reduce the range of values that will be transformed, quantized, and then encoded. In fact, with a precise prediction, we pass from values ranging from 0 to 255 to values ranging from -10 to +10 for example.
Transitioning to Image Reconstruction
Because of the JPEG’s omnipresence on the internet, image compression was for many years a side note, but today it’s making a real comeback. This renewed interest in image compression is thanks to advances in machine learning. These advances are letting us conceive of image formats that are entirely managed or optimized by machine processes, most often exploiting deep neural networks. In fact, today’s conversations revolve more around “image reconstruction” than image decompression. Many advances are foreseen for this domain, and the many means of implementation for the “image compression of tomorrow” seem as rich as they are varied.
-Article written by Sébastien Hamis