TRANSFORM DOMAIN RESIDUAL CODING TECHNIQUE FOR DISTRIBUTED VIDEO CODING

June 19, 2017 | Autor: Hemantha Arachchi | Categoria: Computational Complexity, Video Coding, Reference Frame, Distributed Video Coding
Share Embed


Descrição do Produto

TRANSFORM DOMAIN RESIDUAL CODING TECHNIQUE FOR DISTRIBUTED VIDEO CODING M.B. Badem, H. Kodikara Arachchi, S.T. Worrall, A.M. Kondoz Centre for Communication Systems Research, University of Surrey, Guildford GU2 7XH, UK {M. Badem, H. Kodikaraarachchi, S. Worrall, A. Kondoz}@surrey.ac.uk ABSTRACT Due to its lightweight encoder architecture, the Distributed Video Coding (DVC) concept has been seen as an attractive alternative to its conventional counterparts for a number of applications. Exceptionally low computational complexity has been achieved by moving redundancy exploitation to the decoder. However, going against this norm, redundancy exploitation techniques such as DCT transform and frame difference have been utilized at the encoder end at the expense of a slight increment in computational cost at the encoder. This paper proposes a novel residual quantization technique for DCT transform based inter-frame error coding for DVC. The proposed technique minimizes the entropy of a given video frame by taking the pixel-wise difference between the current frame and a reference frame before DCT transformation. Subsequently an improved quantization technique is proposed to take advantage of small transform coefficients. Experimental results show that the proposed technique significantly improves the objective quality. Index Terms— DVC, DCT, Video Coding, quantization 1. INTRODUCTION Distributed Video Coding (DVC) has been predicted to be an attractive video compression technique for distributed video communication scenarios such as wireless sensor networks [1][2]. The basis of the DVC concept is the information-theoretic bounds established by Slepian and Wolf for distributed lossless coding [3] and by Wyner and Ziv for lossy coding with decoder side information [4]. The major advantages of DVC are the extreme simplicity of the encoder and the ability of jointly exploiting possible redundancies spread across many nodes of the system without sensors communicating with each other. The encoded outputs of all sensor nodes are sent to a central point where they are decoded jointly exploiting intra-node and cross-node redundancies [1]. Thus the overall system cost can indubitably be minimized drastically. In this paper, we discuss a novel quantization technique for DCT transform based coding of inter-frame error signal [5]. The rest of the paper is arranged in the following order: In section 2, the state-of-the-art of the DVC is discussed; in section 3, the proposed technique is

described and in section 4 experimental results are presented. 2. STATE-OF-THE-ART In the 1970s, Slepian and Wolf provided the first study on distributed source coding [3]. Consider two statistically dependent discrete random sequences X and Y. SlepianWolf theorem states that the achievable rate region for an arbitrarily small error probability is: RX + RY ≥ H(X,Y),RX ≥ H(X|Y), RY ≥ H(Y|X ) (1) with separate encoding and joint decoding. Wyner-Ziv coding, the complement of this work for lossy compression, deals with source coding with side information [4]. If X and Y are jointly Gaussian, Wyner and Ziv showed that conditional rate-mean squared error distortion function is the same whether, the statistical dependency between X and Y is explored at the decoder or at both encoder and decoder. It is expected, at a glance, that correlation exploitation is performed entirely at the decoder side in order to make the encoder as simple as possible. However, not all practical implementations follow this rule. Therefore, present implementations of the DVC codecs can be classified into to two categories: Conventional DVC techniques and Hybrid DVC techniques. 2.1. Conventional DVC Techniques The correlation exploitation is performed entirely at the decoder and therefore these solutions offer the most lightweight encoder implementations. Based on the aforementioned Wyner-Ziv theorem, Aaron et al. has developed an asymmetric video compression scheme in which the encoding is intra-frame while the decoding is inter-frame [6][7]. In this scheme, odd frames are known as key frames and it is assumed that they are available at the decoder. Even frames, known as Wyner-Ziv frames, are quantized and turbo coded. At the decoder, simple frame interpolation or extrapolation is used to predict the side information. In [8] motion compensated temporal filtering is used by Tagliasacchi et al. In [9], Natario proposed, a motion field smoothening algorithm to generate side information. Ascenso et al. used forward and bidirectional motion estimation [10][11] proposed a spatial smoothing algorithm by using weighted median vector filter [11].

2.1. Hybrid DVC Techniques

X2i-1

Slepian-Wolf Coder DCT

Turbo Encoder

Quantizer

X2i Encoder Decoder

Y2i - X2i-1 Side Information

Reconstructed frame (X’2i)

IDCT

DCT

Reconstruction

3. PROPOSED ALGORITHM The architecture of the proposed transform domain Wyner-Ziv residual codec is depicted in Figure 1. At the encoder, the difference between the current Wyner-Ziv frame (X2i) and the previous key frame (X2i-1) is taken. For this purpose, communication between key frame encoder and Wyner-Ziv frame encoder is necessary. Resulting residual frame is divided into 4×4 blocks which serve as the basic coding units. Subsequently each block is DCT transformed. The DCT coefficients are grouped together according to the coefficient band and quantized. At the decoder, the side information is obtained by taking the difference of the interpolated frame (Y2i) and the previous key frame (X2i-1), and it is transformed and fed into the turbo decoder and reconstruction block. After IDCT, the previous frame (X2i-1) is added and the reconstructed frame X’2i is obtained.

Turbo Decoder

X2i-1

Figure 1. Proposed transform domain Wyner-Ziv codec In DCT based DVC applications, an adaptive quantizer step size is used. For adaptation purposes, the dynamic range of each coefficient band is used. The quantization step size Wk for the DCT band bk is determined according to (2) below:

In parallel to decoder based correlation exploitation, some light-weight correlation exploitation is performed at the encoder also. Almost all codecs belonging to this class exploit spatial redundancies at the encoder side through transform coding. Therefore, the compression efficiency offered by this class of coders is higher than simple DVC techniques [14][15]. Spatial redundancies have been exploited in the Wyner-Ziv video coding techniques presented in [14] and [15]. A 4x4, block-wise DCT, is applied to the Wyner-Ziv frame at the encoder, and on the predicted frame at the decoder. After the reconstruction block, an IDCT is applied to get the decoded Wyner-Ziv frame. They have slightly more encoder complexity than the other intraframe video encoders, but their coding efficiency approach to the H.263+ inter-frame performance. Aaron et al. proposed a residual coding scheme using LDPC coding to exploit the temporal correlations in pixel domain [5]. They have also proposed a hash-based side information generation technique to aid the decoder in estimating the motion accurately.

Buffer

Feedback Channel

While most of the research effort has been centred on inter-frame prediction, Adikari et al., pays attention to intra-frame prediction [12]. He proposes an intra-frame coding technique in which pixels in a frame are reorganized into two sub frames: a key sub frame and a Wyner-Ziv sub frame consisting of odd or even vertical pixel lines. The key sub frame is conventionally encoded, and it is used at the decoder as the reference for generating side information. In [13] Tagliasacchi et al. propose another subframe coding: key frames are coded as usual, but Wyner-Ziv frames are split into two subframes. The first Wyner-Ziv frame is decoded with the help of temporal side information only. The side information for the second Wyner-Ziv frame is generated by spatiotemporal prediction using the key frames and the previously decoded Wyner-Ziv subframe.

Wk = 2 Vk

max

(2

Mk

)

−1

(2)

where |Vk|max is the highest absolute value within bk and Mk is quantization level value for the DCT band bk. The decoder receives the dynamic range of each DCT coefficient [15] in order to achieve encoder-decoder synchronism. Figure 2 illustrates the distribution of DC transform coefficient values (b0) for the original frame and the residual frame. The average distribution over the first 100 frames of the test sequence is shown. DC coefficients in the original frame are distributed over a large range of values. In contrast, most of the DC coefficients in the residual frame are very small for a low motion activity sequence such as those found in security scenarios. It is observed that the AC transform coefficients are also smaller in the residual frame for low motion frames. Since the dynamic range of DCT coefficients is small, the use of conventional quantization step size calculation techniques results in smaller quantization step sizes. As a result, no bit rate gain can be achieved from obtaining the residual. Moreover, the correlation between transform coefficients of the original residual and the predicted residual is also less. Therefore, the turbo decoder needs even more parity bits to decode these transform coefficients, particularly the low significant bits. As a result, actual bit rate is even higher than that is generated by coding the original frame. Thus, it is clear that the conventional quantization scheme is not capable of exploiting the reduced energy in DCT coefficients of the residual frame to maximize the compression efficiency. The technique presented in this section addresses the aforementioned problem. As explained in the previous paragraph, the dynamic range of the DCT coefficients may be very small. Moreover, the DC coefficients can also be negative unlike in the case of the original frame. Therefore, all DCT coefficient bands bk are quantized using a uniform scalar quantizer with a symmetric quantization interval around zero. The proposed step size calculation algorithm is shown in (3) below:

(

Wk = max 2 Vk where

(2

max

) )

Mk

−1 , Ck

(3)

C k is a preset threshold. The value of C k is DCT

band dependant as shown in Figure 3. These values are obtained empirically so that the objective quality is maximized. The quantizer step size threshold of low frequency coefficients are smaller recognizing the importance of these coefficients. The threshold value of the DC band, C0 , ranges from 6 (for the quantizer matrix Q8 shown in Figure 4) to 16 (for the quantizer matrix Q1). Probability Density Function of DC values 0.35 0.3

Probability

0.25 0.2

Original Difference

0.15 0.1 0.05 0 -200

0

200

400

600

800

1000

DC Va lues

Figure 2. Probability density function of DC values for Mobile QCIF sequence (First 100 frames are considered)

C0 C0+2 C0+4 C0+6

C0+2 C0+4 C0+6 C0+4 C0+6 C0+8 C0+6 C0+8 C0+10 C0+8 C0+10 C0+12

4. EXPERIMENTAL RESULTS In the simulations, a number of test video sequences have been considered. Figure 5 illustrates the rate-distortion performance for the first 101 frames of the Salesman, Mother and Daughter, Mobile QCIF video sequences at 30 fps. In the experiments, only the luminance data is considered, as in [15]. The original key frames are considered to be available at the decoder. Rate distortion plots contain the rate and PSNR values of the Wyner-Ziv frames. The rate distortion performance of the proposed technique has been compared against the technique presented in [15] and the H.264/AVC frame difference coding. Performance of the latter is obtained by restricting motion vectors to zero in predictive coded pictures in JM12.4 reference software. In order to make the coding structure similar to that of the DVC structure, IPIPIP... structure is used with H.264/AVC. Figure 5 shows that, there is a significant PSNR gain up to 0.6 dB, or up to 50% reduction in the bit-rate compared to the technique presented in [15]. The proposed technique has also been tested for several other test sequences and similar results were observed. Results presented in Figure 5 also suggest that the proposed technique outperforms H.264/AVC residual coding. Figure 6 shows the performance comparison of the proposed technique against H.264/AVC biprediction in an IBIB… structure in terms of overall rate distortion performance. In this experiment, we used intra-coding for encoding DVC key frames. The quantization settings for the DVC Wyner-Ziv frames and H.264/AVC bipredictive frames are selected in order to achieve similar bit rates. These results demonstrate that rate distortion performance of the proposed technique is almost similar to that of the H.264/AVC biprediction.

Figure 3. Minimum step size values for DCT coefficients It is also found that the quantization matrices proposed in [15] are not suitable for the transform domain Wyner-Ziv residual codec. Therefore, based on extensive experimental results, the quantization matrices illustrated in Figure 4 are proposed. 4

4

0

0

8

4

0

0

8

4

4

0

8

8

4

0

4

0

0

0

4

0

0

0

4

4

0

0

8

4

0

0

0

0

0

0

0

0

0

0

4

0

0

0

4

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

(Q1)

(Q2)

(Q3)

(Q4)

16

8

8

0

16

8

8

4

16

8

8

4

32 16

8

4

8

8

0

0

8

8

4

0

8

8

4

4

16

8

4

4

8

0

0

0

8

4

0

0

8

4

4

0

8

4

4

0

0

0

0

4

0

0

0

4

4

0

0

4

4

0

0

0

(Q5)

(Q6)

Figure 4. Quantization matrices

(Q7)

(Q8)

5. CONCLUSIONS In this paper, we proposed a new technique for transform domain Wyner-Ziv video codecs. In the proposed technique, residual frames are coded as Wyner-Ziv frames. In this new scheme, the encoder exploits both spatial and temporal redundancies by DCT based coding and taking the pixelwise difference of two consecutive frames. Also, a novel quantization technique is proposed to optimize the rate-distortion performance. Simulation results show that there is an improvement in PSNR up to 0.6 dB compared to other transform domain Wyner-Ziv codecs, at the expense of a minor increase in computational complexity. 6. ACKNOWLEDGMENTS Authors would like to extend their gratitude to Fernando Pereira for kindly granting permission to use VISNET2WZ-IST software in this research. The work presented was developed within VISNET II, a European Network of Excellence (http://www.visnetnoe.org), funded under the European Commission IST FP6 programme.

[2] X. Zhu, A. Aaron, and B. Girod, “Distributed compression for large camera arrays,” Proceedings of IEEE Workshop on Statistical Signal Processing, Saint-Louis, Missouri, pp. 30-33, Oct. 2003.

First 100 frames of Salesman QCIF Sequence at 30fps 46

PSNR of even fram es [dB]

45 44 43 Proposed 42

Reference [15] H.264 P frames

41 40

[3] J. Slepian and J. Wolf, “Noiseless Coding of Correlated Information Sources,” IEEE Trans. on Information Theory, Vol. 19, No. 4, July 1973.

39

[4] A. Wyner and J. Ziv, “The Rate-Distortion Function for Source Coding with Side Information at the Decoder,” IEEE Trans. on Information Theory, Vol. 22, No. 1, July 1976.

38 0

20

40

60

80

100

120

140

160

Ra te of even fra me s [kbps]

First 100 frames of Mother-Daughter QCIF Sequence at 30fps 47

PSNR of even fram es [dB]

46

[5] A. Aaron, D. Varodayan and B. Girod, “Wyner-Ziv residual coding of video,” Proc. Picture Coding Symposium, PCS-2006, Beijing, China, April 2006.

45 44 proposed 43

Reference [15] H.264 P frames

42

[6] A. Aaron, R. Zhang and B.Girod, “Wyner-Ziv Coding for Motion Video,” Asilomar Conference on Signals, Systems and Computers, Pasific Grove, USA, November 2002.

41 40

[7] B. Girod, A. Aaron, S. Rane, D. Rebollo-Monedero, “Distributed Video Coding,” Proc. of the IEEE, Vol. 93, No. 1, pp 71-83, Jan. 2005.

39 0

50

100

150

200

Ra te of e ve n fram e s [kbps]

First 100 frames of Mobile QCIF Sequence at 30fps

[8] M. Tagliasacchi and S. Tubaro, “A MCTF Video Coding Scheme Based on Distributed Source Coding Principles,” Visual Communication and Image Processing, July 2005.

35 34

PSNR of even frames [dB]

33 32 31

Proposed

30

Reference [15] H.264 P frames

29 28 27

[9] L. Natario, C. Brites, J. Ascenso and F. Pereira, “Extrapolating Side Information for Low-Delay Pixel Domain Distributed Video Coding,” Int. Workshop on Very Low Bitrate Video Coding, Sept 2005.

26 25 0

50

100

150

200

250

Rate of e ve n fra m es [kbps]

Figure 5. Rate-Distortion performance for the proposed technique, the reference [15] and H.264/AVC zero-motion residual coding First 100 frames of Salesman QCIF Sequence at 30fps

[10] J. Ascenso, C. Brites and F. Pereira, “Motion Compensated Refinement for Low Complexity Pixel Based Distributed Video Coding,” IEEE International Conference on Advanced Video and Signal Based Surveillance, Como, Italy, July 2005. [11] J. Ascenso, C. Brites and F. Pereira, “Improving Frame Interpolation With Spatial Motion Smoothing for Pixel Domain Distributed Video Coding,” 5th EURASIP Conference On Speech and Image Processing, July 2005.

46 45 44

PSNR [dB]

43 42 Proposed

41

[12] A. Adikari, W. Fernando, H. K Arachchi, W. Weerakkody, “Wyner-Ziv Coding with Temporal and Spatial Correlations for Motion Video,” IEEE Electrical and Computer Engineering, Canadian Conference, Ottawa, May 2006.

H.264 IBIB

40 39 38 37 36 400

500

600

700

800

900

1000

1100

[13] M. Tagliasacchi, A. Trapanese, S. Tubaro, J. Ascenso, C. Brites and F. Pereira, “Exploiting Spatial Redundancy in Pixel Domain Wyner-Ziv Video Coding,” IEEE International Conference on Image Processing, Oct. 2006.

Ra te [kbps]

Figure 6. Rate-Distortion performance for the proposed technique and H.264/AVC biprediction 7. REFERENCES [1] Z. Xiong, A. D. Liveris, and S. Cheng, “Distributed Source Coding for Sensor Networks,” IEEE Signal Processing Magazine, vol. 21, no. 5, pp. 80-94, Sep. 2004.

[14] A. Aaron, S. Rane, E. Setton and B. Girod, “TransformDomain Wyner-Ziv Codec for Video,” VCIP, San Jose, USA, January 2004. [15] C. Brites, J. Ascenso, and F. Pereira, “Improving Transform Domain Wyner-Ziv Video Coding Performance,” IEEE International Conf. On Acoustics, Speech and Signal Processing, Toulouse, France, May 2006.

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.