Compressed domain content based retrieval using H.264 DC-pictures

August 16, 2017 | Autor: Mohammad Ghanbari | Categoria: Information Systems, Content Analysis, Video Coding, Video Analysis, Image coding, Computer Software, Image Retrieval, Discrete Cosine Transform, Content based Retrieval, Color Histogram, Computer Software, Image Retrieval, Discrete Cosine Transform, Content based Retrieval, Color Histogram

Share Embed

Denunciar este link

Descrição do Produto

Compressed Domain Content Based Retrieval Using H.264 DC-pictures Mahdi Mehrabi• Farzad Zargari• Mohammad Ghanbari• 1

Abstract A fast and simple method for content based retrieval using the DC-pictures of H.264 coded video without full decompression is presented. Compressed domain retrieval is very desirable for content analysis and retrieval of compressed image and video. Even though, DCpictures are among the most widely used compressed domain indexing and retrieval methods in pre H.264 coded videos, they are not generally used in the H.264 coded video. This is due to two main facts, first, the I-Frame in the H.264 standard are spatially predicatively coded and second, the H.264 standard employs Integer Discrete Cosine Transform. In this paper we have applied color histogram indexing method on the DC-pictures derived from H.264 coded I-frames. Since the method is based on independent I-frame coded pictures, it can be used either for video analysis of H.264 coded videos, or image retrieval of the I-frame based coded images such as advanced image coding. The retrieval performance of the proposed algorithm is compared with that the fully decoded images. Simulation results indicate that the performance of the proposed method is very close to the fully decompressed image systems. Moreover the proposed method has much lower computational load.

Keywords Compressed domain image indexing and retrieval• DC-picture• H.264 video coding standard• Color histogram

1 INTRODUCTION Visual information is expanding rapidly in the recent years and effective retrieval of visual data according to the visual contents is a challenging research issue. Since manipulating the visual information requires large amount of storage

1

M. Mehrabi is PHD candidate in Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran. e-mail: [email protected], [email protected]

F. Zargari is with the Information Technology Research Institute of Iran Telecom Research Center (ITRC), Tehran, Iran. phone: +98-21-84977272; fax: +98-21-88630036; e-mail: [email protected]

M. Ghanbari is with the School of Computer Science and Electronic Engineering, University of Essex, Colchester, CO4 3SQ UK. e-mail: [email protected]

1

capacity and processing power, there is a need to efficiently index and retrieve the visual information in multimedia applications. Content based retrieval (CBR) was introduced for managing image and video libraries. In content based image retrieval (CBIR) various image features such as color, texture and shape are used for retrieving images from image libraries. Video retrieval uses the same features in image retrieval along with temporally related features in video sequences. Digital image and video libraries usually store visual information in the compressed form. For retrieval, which normally need uncompressed data, in these libraries; an unwanted decompression stage will increase the search time and complexity. On the other hand, the retrieval techniques that apply directly to compressed data are faster and are more preferred in terms of computational cost and retrieval time, particularly for real-time applications. A survey of features used in compressed domain image and video retrieval is presented in [1]. The most common video coding standards are among those of MPEG and H.26X families. These video coding standards employ hybrid coding of DCT block based transform and motion compensation. These standards also employ intra and inter frame coded pictures for refreshment and easy access. The intra coded pictures (Iframes) are coded independently and without reference to other frames. The inter frame coded pictures (P and B frames) are coded with reference to other frames using motion compensated prediction. Even though in the previous versions of these video coding standards I-frames are coded similar to the JPEG image coding standard, in the recently introduced H.264/AVC standard they are coded with spatial prediction. Since all these video coding standards and also the JPEG image coding standard are DCT block based, the DCT coefficients of the coded pictures are widely used to generate the compressed domain feature vectors from the compressed visual data. An important technique for fast and easy access and manipulation of compressed visual data is to construct a lower quality picture i.e. DC-picture instead of performing inverse DCT transform and full decompression. The DC coefficients of the DCT transform in the blocks are used to produce the DCpictures. Even though the DC-pictures can be constructed by averaging over the pixels, it requires full decoding of pictures. On the other hand constructing DCpictures directly from the compressed video is more desirable, because the inverse DCT transform constitutes a large portion of the decoding process of H.26x coded 2

video. The average of a block is an approximation of its pixels; hence, by replacing the blocks with their average values, an approximated picture can be constructed for fast access to the content of the original picture. This picture is a lower quality resemblance of the original picture. DC-pictures are used in various image and video analysis and retrieval applications [2-10]. DC-picture is used in [6] to extract color histogram, which is one of the most important feature vectors in image retrieval, and it is indicated that the DC value of 2×2 blocks has the best performance in making color histograms for retrieving color images [6]. The authors in [6] and 11] have proposed a method of extracting DC coefficients of small sub-blocks from the DCT coefficients of larger blocks to access the DCpictures directly from the DCT based coded videos or images. The performance of this method for extracting the DC values of 2×2 sub-blocks from the DCT coefficients of 8×8 block is given in [6]. Since the method in [6] and 11] is designed for the non-integer DCT codecs prior to the H.264 coding standard, it is inappropriate for extracting DC coefficient of a small sub-block in the H.264 standard. This is because in the H.264 standard, the DCT transform and also it’s inverse are carried out with integer transform. Moreover, H.264 standard employs spatial prediction for coding of I-frames, a technique which is one of the innovations introduced in H.264 standard. This paper introduces compressed domain image retrieval and indexing method based on color histogram of the DC-pictures derived from the I-frames of H.264 coded video. The proposed method can be used for I-frames in the H.264 video coding standard. In addition to the H.264 coded video, the proposed methods can also be used in the I-frame based image coding, such as advanced image coding (AIC) or modified advanced image coding (M-AIC) [12]. The proposed DC-Picture based method for indexing and retrieval of I-frame in H.264 coded video has two important features. First, processing I-frame is independent of other frames in a group of pictures (GOP); hence the proposed descriptor can be extracted easily and rapidly from the coded video. Second, in the video coding methods which use I-frames to code the key frames [13], or compressed domain video indexing and retrieval methods, which use I-frames as the best candidate for key frames [14], the proposed method can be used for analysis of key frames in video analysis applications.

3

The rest of paper is organized as follows. In section 2 the proposed method is introduced. The performance and computation time of the proposed method are evaluated in section 3, and the paper ends with concluding remarks in section 4. 2 The Proposed Method This section provides a short overview of the inverse integer DCT transforms and dequantizer in the H.264 standard to the extent that is required to follow the discussions in the rest of the paper. More detailed explanations about inverse integer transform and dequantizer can be found in [15]. In the Baseline, Main and Extended profiles of H.264 video coding standard the integer DCT transform is performed on 4×4 blocks. The inverse 4×4 DCT transform in H.264 is defined as: Z=CfT(Y⨂P)Cf

(1)

where Y is the dequantized coefficients matrix, ⨂ indicates element by element matrix multiplication, Cf and P are integer DCT inverse transform matrices, CfT represents the transpose of Cf and Z is the decoded block. The matrix Cf in the H.264 standard is defined as: 1 1 Cf = 1  1

1

1

1/2 

1/2 -1 -1  -1/2 -1 1   -1 1 -1/2 

(2)

Although, the way the DCT transform is implemented in the H.264 standard reduces its computational cost over the non-integer DCT transform of pre H.264 standards, the DCT transform in the H.264 standard still is amongst the highest computational stages in coding or decoding process. It is due to the fact that DCT transform should be applied to a large number of blocks. In this paper we aim to extract average of 2×2 sub-blocks from the DCT coefficients of H.264

coded

4×4 blocks and then apply the retrieval methods to the resulted approximated images. Consider Z in (3) that represents a 4×4 block decoded from an H.264 coded block given by (1). Matrix M in (4) can extract average of 2×2 sub-blocks of Z through operation given in (5).

4

x  00 x 10 Z=  x  20 x  30

M

T

DC 2=1/4  (M Z M

x01 x11

x02 x12

x03   x13 

x21

x22 x32

x23  x33 

x31

1 0 = 0  0



(3)

1 0 0 0 0 0 0 1 1  0 0 0

(x00 +x01+x10 +x11 )/4 0 ) = (x20 +x21+x30 +x31 )/4 0 

(4)

  0 0

0 (x02 +x03 +x12 +x13 )/4 0 0

0

0

0 (x22 +x23 +x32 +x33 )/4 0

0

(5)

Since Z is also equal to the left side of equality in (1), we can also use the right hand side of equality in (1) instead of Z in (5): DC2=¼⨂(MCfT(Y⨂P)CfMT)=1/4⨂N'(Y⨂P)N'T

(6)

where N'=MCT and the matrix N' is calculated as: 2 0 N' =M C T =  2  0

3 /2 0

0 0

-3 /2 0

0 0

-1 /2  0   1 /2   0 

(7)

Since in (6), the elements of DC2 are averages of 2×2 sub-blocks, given the coefficients of a 4×4 H.264 coded block say Y, the average of its 2×2 decoded sub-blocks can be calculated using equation (6). Operation in (6) can be further simplified through matrix factorization. The matrix N' can be decomposed to N and K as: 2 0 N'=  2  0

3/2 0 -1/2

4

1 0 -11/2

0 0 0  0 0 0 0  0 = -3/2 0 1/2  4 -1 0 1  0    0 0 0  0 0 0 0  0

0

0

0



3/2 0 0  =NK 0 1/2 0   0 0 1/2

(8)

Now replacing N′ in (6) with its equivalent product of NK, and using associative and commutative properties of matrix multiplication, operations in (6) can be rewritten as: DC2=1/4⨂(NK(Y⨂P)KTNT)=N(1/4⨂(K(Y⨂P)KT))=N(1/4⨂(KYKT)⨂P)NT (9) Since K is a diagonal matrix, multiplication of Y by K from left and right can be written as element by element multiplication of Y by K': 5

DC2= N (1/4⨂(Y⨂K')⨂P)NT = N (Y⨂(1/4⨂K'⨂P)NT

(10)

where K' is : 1 3 K'=1/4   1  1

3 1 1 9 3 3 3 1 1  3 1 1

(11)

If we select P′ as: P' =1/4⨂K'⨂P

(12)

Then: DC2 = N (Y⨂P') NT

(13)

The right hand side of equation (13) represents the final method of achieving average of 2×2 blocks from the coefficients of 4×4 integer DCT transformed block. Since (13) represents the inverse DCT transform in the same way as (1), matrix P' can be combined into the dequantizer, resulting a new dequantization table similar to dequantization table for P in (1). This means that the proposed method is compatible with H.264 quantizer. The difference between (13) and (1) lays in matrices N and Cf. Since most of the elements in N are zero, the proposed method has much lower computational load than the inverse transform in (1). In the proposed compressed domain retrieval method for H.264 coded I-frames the DCT coefficients are extracted from the compressed video file. Then using (13) the DC value for each 2×2 sub-block of a 4×4 block is calculated. These DC values are up sampled (Fig. 1) to produce an approximation for the 4×4 blocks, which are the residues of the intra predicted blocks in the original image. Hence, we add the resulted approximation of up sampled 4×4 residue blocks to the spatially predicted values to get an approximated resemblance for each coded block in an I-frame. In this way we generate a lower quality DC-Picture without the Inverse DCT and full decompression of the coded video. We use the color components of the resulted DC-picture to extract the color histogram feature vector of the coded I-frame without full decompression.

6

Fig. 1: Up sampling DC values for 4×4 blocks

In this retrieval method we use the color histogram feature vector introduced in [16]. The color histogram feature vector for a color image including three eight bit color components A, B and C for each pixel Pij is calculated as: h w

 

H  a k ,bm ,c n     f Pij i

j

(14)

where Pij is the pixel at i-th row and j-th column of the image and indices h and w are the picture height and width, respectively. The function f(Pij) is defined as follow:

 

f Pij  1 if

 

 

 

a k  A Pij  a k 1 and b m  B Pij  b m 1 and c n  C Pij  c n 1

(15)

 

f Pij  0 otherwise

where ak, bm and cn are the decision boundaries of A, B and C color components. A(Pij), B(Pij) and, C(Pij) are Y, Cb and Cr color components of pixel Pij respectively. The number of decision boundaries should be different for three color components because the uniform quantization for perceptually non-uniform color spaces may be problematic [17]. Since H.264 uses YCbCr color space and the chromatic components, which vary slowly in a picture, are used along with achromatic component for retrieval, we choose five bins for A (Y component) and 16 bins for B (Cb component) and C (Cr component) color components. ak= k×(255/5)

k=0,1,…,4

bm =m×(255/16) , cn=n×(255/16)

(16) m,n=0,1,…,14,15

(17)

The intersection of two histograms H1 and H2 is calculated as:

7

4 15 15 S=    Min(H1(a k ,b m ,c n ),H2(a k ,b m ,c n ))/(w×h) k=0 m=0n=0

(18)

S is a number in the range of [0, 1] and is the measure to represent the similarity between the color histograms of two images. Since there are five bins for Y component and 16 bins for Cb and Cr color components, the proposed color histogram will be a three dimensional histogram including 5×16×16=1280 bins. In the next section we present the performance evaluation of the proposed DCPicture based color histogram retrieval method. 3 Performance evaluation The proposed indexing method is used in a query by example method for retrieving images from the Washington University image database. The image database includes 1330 color images. Each image in the database was coded as Iframe using the joint video team (JVT) H.264 encoder, and the conventional settings of the encoder such as: quantization parameter equal to 26, dispersed macroblock order, and the prediction mode optimization for each block was selected to minimize the residual prediction errors. We extracted the color histograms of the coded images by two different methods. In the first method, which is called method 1 here after, we used the standard H.264 decoder to decode the coded image and found the color histogram for each image according to (14). In the second method, which is called the proposed method hear after, we extracted DC-Pictures without full decompression of the coded images and used DC-Pictures to generate the color histogram using (14). In this way, there are two histograms for each coded image in the data set. To evaluate the performance of the proposed retrieval method, a sample query image set was used. This sample query image set consists of 30 selected images from the Washington data set. We used the produced color histograms in the previous stage and the histogram similarity metric (18) to find ten first retrieved images for each query image in the sample query images set. We made two ranked retrieved lists for each query image, one using method 1 (full decompression) and the other for the proposed method (DC-Picture). Fig. 2 shows the first 4 retrieved images using the two retrieval methods. It is obvious that the first retrieved image at each retrieved list is the query image. In order to evaluate the performance of the proposed method with method 1 we calculated the percentage of the relevant retrieved images to the total number of 8

retrieved images in rank ordered retrieved list of all query images for each rank (Table 1). The results in Table 1 indicate that the proposed method has very close retrieval performance to Method 1 for each rank.

Fig. 2: Left column contains query pictures and (a) indicates ranked retrieved pictures of left query picture using method 1, and (b) indicates those from the proposed method.

a)

b)

a)

b)

a)

b)

a)

b)

9

Table 1: percentage of relevant pictures in rank retrieve list of two methods Method 1

The proposed method

Rank 1

100%

100%

Rank 2

90%

82%

Rank 3

73%

73%

Rank 4

73%

73%

Rank 5

45%

36%

Rank 6

18%

27%

Furthermore to evaluate the overall performance, we used the well known metric Mean Average Precision (MAP), which is used by TRECVID. MAP provides a single-figure measure of retrieval quality for query set [18]. MAP is calculated as: MAP( Q ) 

1 Q 1 mj   Pr ecision( R jk ) Q j 1 m j k  1

(19)

where Q is the index of query in the query set, mj is the number of relevant images in the database for j-th query, Rjk is the rank of k-th retrieved relevant image in rank retrieved list of j-th query image, and Precision(x) is the Precision for xth retrieved image as defined in (20). It is worth noting that MAP ranges from 0 to 1 and higher values for MAP indicate better performance of retrieval experiment. Pr ecision( x ) 

# relevant retrieved images up to rank x # retrieved images

(20)

The MAP values for the tested retrieval methods in the two decoded databases are listed in Table 2. Table 2: MAP for retrieval experiments of two methods

MAP

Method 1

The proposed method

0.84

0.81

As Table 2 indicates, the overall performance of the proposed method is very close to the results derived from full decompression in Method 1. Moreover, we measured the processing time for extracting feature vectors from DCT coefficients of the coded images in a 2.26 GHz PC with 1 GB of RAM. The average processing time for the tested methods is tabulated in Table 3. Table 3 indicates

10

that the proposed method in average requires less than 39% of the processing time of method 1 to calculate the feature vector. TABLE 3: Average processing time required for extracting Color histogram from DCT coefficients.

Average Processing Time

Method 1

The Proposed Method

562

219

(millisecond)

4 Conclusion In this paper we introduced a novel compressed domain image retrieval method for I-frame coded images of the H.264 standard. This method can be applied either to I-frames of H.264 coded videos or to the images coded by the techniques such as advanced image coding (AIC) and modified advanced image coding (MAIC) [12], which use intra frame block prediction. The proposed method uses color histogram of DC-Pictures for visual information retrieval in compressed domain image retrieval and video analyses applications. Simulation results indicate that the proposed method reduces the computation time to less than 39% of the full decompression method, while its retrieval performance is very close to the method that uses color histograms resulted from fully decoded images. The low complexity and computation time of the proposed method and little reduction in the performance of retrieval method compared to full decompression indicate that the proposed method can be used as a fast and simple method to extract content-preserved DC-Pictures of H.264 coded videos without full decompression. Hence, the proposed compressed domain indexing method is effective image retrieval and indexing method and can be used for retrieval of AIC coded images and analysis of H.264/AVC coded videos in various applications. REFERENCES 1. H. Wang,A. Divakaran, A. Vetro, S.-Fu Chang, H. Sunb, "Survey of compressed-domain features used in audio-visual indexing and analysis", Journal of Vis. Commun. Image Ret. 14 , pp. 150–183, 2003. 2. Yue Feng, Hui Fang, Jianmin Jiang, “Region Growing with Automatic Seeding for Semantic Video Object Segmentation”, Lecture Notes in Computer Science, vol. 3687, pp. 542-549, 2005. 3. Wallapak Tavanapong, Junyu Zhou,”Shot Clustering Techniques for Story Browsing”, IEEE transactions on multimedia, vol. 6, no. 4, august 2004. 4. Robert A. Joyce, Bede Liu,”Temporal Segmentation of Video Using Frame and Histogram Space”, Image Processing, Proceedings. 2000 International Conference on, vol., pp. 941-944, 2000. 5. LI Xiang-wei, ZHANG Ming-xin, LI Xiang-wei, ZHU Ya-lin, Xin jin-hong,” A novel RS-based key frame representation for video mining in Compressed-Domain’, Second International Workshop on Knowledge Discovery and Data Mining, IEEE computer society, 2009. 6. J. Jiang, A. Armstrong, G.C. Feng," Direct content access and extraction from JPEG compressed images", Pattern Recognition 35 (2002) 2511 – 2519.

11

7. Kwang-deok Seo, Seong Park, Soon-heung Jung, “Wipe scene-change detector based on visual rhythm spectrum”, IEEE Trans. on Consumer Electron., vol. 55, no. 2, pp. 831 – 838, May 2009. 8. Divakaran, A.; Vetro, A.; Asai, K.; Nishikawa, “Video browsing system based on compressed domain feature extraction”, IEEE Trans. on Consumer Electron., vol. 46, no. 3, pp. 637 – 644, Aug 2000. 9. Xueming Qian, Guizhong Liu, and Rui Su, “Effective Fades and Flashlight Detection Based on Accumulating Histogram Difference”, IEEE trans. on Circuits and Sys. Video technol., vol. 16, no. 10, pp. 1245-1258, october 2006. 10. J. Jiang , Y. Weng , P. Jie Li c," Dominant colour extraction in DCT domain", Image and Vision Computing, vol. 24, pp. 1269–1277, 2006. 11. J. Jiang, G. Feng, "The Spatial Relationship of DCT Coefficients between a Block and Its Sub-blocks", IEEE transactions on signal processing, vol. 50, no. 5, pp.1160-1169, may 2002. 12. Z. Zhang, R. Veerla, K.R. Rao, ” Modified Advanced Image Coding”, In: Proc. International Conference on Complexity and Intelligence of the Artificial and Natural Complex Systems, Medical Applications of the Complex Systems, Biomedical Computing, pp. 110-116, 2008. 13. Z. Shu-long, Y. Zhi-sheng, L. Shi-yong; Z. Xin; “An Improved Video Compression Algorithm for Lane Surveillance”, Fourth International Conference on Image and Graphics(ICIG), pp. 224 – 229, 2007 14. Kobla, V., Doermann, D., Lin, K.-I., " Archiving, indexing, and retrieval of video in the compressed domain. In: Proc. SPIE Conf. on Multimedia Storage and Archiving Systems, SPIE vol. 2916, pp. 78–89, 1996. 15. H. S. Malvar, A. Hallapuro, M. Karczewicz, L Kerofsky, ”Low-Complexity Transform and Quantization in H.264/AVC.” IEEE trans. on Circuits and Sys. Video technol., vol. 13, no. 7, July 2003. 16. Swain, M., Ballard, D. “Color indexing”. Int. J. Comput. Vis. 7, 11–32. 1997. 17. S. M. Lee, J. H. Xin, S. Westland; “Evaluation of Image Similarity by Histogram Intersection.” Color Research & Application Volume 30, Issue 4, pages 265–274, August 2005. 18. C. D. Manning, P. Raghavan and H. Schütze, “Introduction to Information Retrieval,” Cambridge University Press. 2008.

12

Lihat lebih banyak...

Compressed domain content based retrieval using H.264 DC-pictures

Descrição do Produto

Comentários