How do video compression algorithms work?

This article has used H.264 as a prototype compression standard. Although it is no longer the latest video compression format, it is still a sufficiently detailed example to explain the main concepts of video compression.

Modern video compression algorithms are not like the image compression algorithms that you are used to before. Adding size and time means different mathematical and logic techniques are applied to video files to reduce the size but still maintain the video quality.

This article has used H.264 as a prototype compression standard. Although it is no longer the latest video compression format, it is still a sufficiently detailed example to explain the main concepts of video compression.

Learn about the operation of video compression algorithms

  1. What is video compression?
  2. I-frame, P-frame and B-frame
  3. Intraframe encryption (I-frame)
  4. Interframe Prediction (P-frame and B-frame)

What is video compression?

Video compression algorithms search for time and excess space. By encrypting redundant data a minimum number of times, the file size can be reduced. For example, you need to compress a scene emphasizing the change of expression slowly on the face of the character, lasting for 1 minute. Compression doesn't mean to encode images on every frame, but instead, you can encode a frame, then refer to that encoded frame until the video changes. This interpersonal prediction coding is responsible for strange, unreliable components that appear during video compression: parts of the old image move incorrectly because there is some element in the encoding process. broken.

I-frame, P-frame and B-frame

How do video compression algorithms work? Picture 1How do video compression algorithms work? Picture 1

I-frame is a fully encoded image. Each I-frame contains all the data needed to represent an image. P-frame is predicted based on how to change the image from the last I-frame. B-frame is predicted in two dimensions, using data from both the last P-frame and the next I-frame. P-frame only needs to store unique image information for itself. In the above example, P-frame needs to keep track of how the dots move on the frame, and Pac-Man can retain its position.

The B-frame looks at the next P-frame, I-frame and average movement on those frames. The algorithm knows where the image starts (first I-frame) and ends (I-frame second) and it uses partial data for predictive coding, eliminating all static pixels that are not needed to create image

Intraframe encryption (I-frame)

How do video compression algorithms work? Picture 2How do video compression algorithms work? Picture 2

The I-frame is independently compressed, just like how static images are saved. Since the I-frame does not use predictive data, the compressed image contains all the data used to display the I-frame. I-frames are still compressed by an image compression algorithm like JPEG. This encoding usually takes place in the YCbCr color space, separates brightness data from color data, allows movement and changes in color coded separately.

For the non-predictive codec like DV and Motion JPEG, everything will stop there. Because there is no predictive frame, the only thing that can be done is to compress the image in a single frame. The method is less efficient but produces a higher quality raw (raw) file.

In codecs that use predictive frame as H.264, predictive frame is periodically displayed to refresh the data stream, by setting a new reference frame. The farther I-frames are, the smaller the video file. However, if I-frames are too far apart, the accuracy of the predictive frames in the video will gradually decrease to an incomprehensible level. A bandwidth-optimized application will insert as few I-frames as possible without breaking the video stream. The frequency of I-frames is usually determined indirectly by the 'quality' setting in the encryption software. Professional video compression software like ffmpeg allows clear control of this problem.

Interframe Prediction (P-frame and B-frame)

Video encoders trying to 'predict' change from one frame to the next. The more accurate the prediction of the encoder, the more efficient the compression algorithm. This is what creates P-frames and B-frames. The number, frequency and exact order of the predictive frame, as well as the specific algorithm used to encode and reproduce them, is determined by the specific algorithm you use.

How do video compression algorithms work? Picture 3How do video compression algorithms work? Picture 3

Consider how H.264 works. The frame is divided into sections called macroblocks, usually consisting of 16 x 16 samples. The algorithm does not encrypt raw pixel values ​​for each block (block). Instead, the encoder looks for a similar block in an old frame, called a reference frame. If a valid reference frame is found, the block will be encoded with a mathematical expression called a motion vector, describing the exact nature of the change from the reference block to the current block. When the video is played, the video player will interpret those motion vectors correctly to 'translate' the video. If the block doesn't change, no vector is needed.

How do video compression algorithms work? Picture 4How do video compression algorithms work? Picture 4

After the data is sorted into frames, it is encoded into a mathematical expression with a transform encoder. H.264 uses a DCT (discrete-cosine transform) to change the visual data into mathematical expressions (namely the sum of the cosine functions oscillating at different frequencies). The selected compression algorithm determines the variable encoder. Then the data is 'rounded' by the quantitative set. Finally, the bits are run through a lossless compression algorithm to shrink the file size again. This does not change the data. It only organizes data in the most compact form possible. After that, the video is compressed, smaller in size than before and ready to watch.

4.4 ★ | 5 Vote