DCT-based Image/Video Compression: New Design Perspectives
MetadataShow full item record
To push the envelope of DCT-based lossy image/video compression, this thesis is motivated to revisit design of some fundamental blocks in image/video coding, ranging from source modelling, quantization table, quantizers, to entropy coding. Firstly, to better handle the heavy tail phenomenon commonly seen in DCT coefficients, a new model dubbed transparent composite model (TCM) is developed and justified. Given a sequence of DCT coefficients, the TCM first separates the tail from the main body of the sequence, and then uses a uniform distribution to model DCT coefficients in the heavy tail, while using a parametric distribution to model DCT coefficients in the main body. The separation boundary and other distribution parameters are estimated online via maximum likelihood (ML) estimation. Efficient online algorithms are proposed for parameter estimation and their convergence is also proved. When the parametric distribution is truncated Laplacian, the resulting TCM dubbed Laplacian TCM (LPTCM) not only achieves superior modeling accuracy with low estimation complexity, but also has a good capability of nonlinear data reduction by identifying and separating a DCT coefficient in the heavy tail (referred to as an outlier) from a DCT coefficient in the main body (referred to as an inlier). This in turn opens up opportunities for it to be used in DCT-based image compression. Secondly, quantization table design is revisited for image/video coding where soft decision quantization (SDQ) is considered. Unlike conventional approaches where quantization table design is bundled with a specific encoding method, we assume optimal SDQ encoding and design a quantization table for the purpose of reconstruction. Under this assumption, we model transform coefficients across different frequencies as independently distributed random sources and apply the Shannon lower bound to approximate the rate distortion function of each source. We then show that a quantization table can be optimized in a way that the resulting distortion complies with certain behavior, yielding the so-called optimal distortion profile scheme (OptD). Guided by this new theoretical result, we present an efficient statistical-model-based algorithm using the Laplacian model to design quantization tables for DCT-based image compression. When applied to standard JPEG encoding, it provides more than 1.5 dB performance gain (in PSNR), with almost no extra burden on complexity. Compared with the state-of-the-art JPEG quantization table optimizer, the proposed algorithm offers an average 0.5 dB gain with computational complexity reduced by a factor of more than 2000 when SDQ is off, and a 0.1 dB performance gain or more with 85% of the complexity reduced when SDQ is on. Thirdly, based on the LPTCM and OptD, we further propose an efficient non-predictive DCT-based image compression system, where the quantizers and entropy coding are completely re-designed, and the relative SDQ algorithm is also developed. The proposed system achieves overall coding results that are among the best and similar to those of H.264 or HEVC intra (predictive) coding, in terms of rate vs visual quality. On the other hand, in terms of rate vs objective quality, it significantly outperforms baseline JPEG by more than 4.3 dB on average, with a moderate increase on complexity, and ECEB, the state-of-the-art non-predictive image coding, by 0.75 dB when SDQ is off, with the same level of computational complexity, and by 1 dB when SDQ is on, at the cost of extra complexity. In comparison with H.264 intra coding, our system provides an overall 0.4 dB gain or so, with dramatically reduced computational complexity. It offers comparable or even better coding performance than HEVC intra coding in the high-rate region or for complicated images, but with only less than 5% of the encoding complexity of the latter. In addition, our proposed DCT-based image compression system also offers a multiresolution capability, which, together with its comparatively high coding efficiency and low complexity, makes it a good alternative for real-time image processing applications.