A New Object-Based System for Fractal Video Sequences Compression

June 16, 2017 | Autor: Kamel Belloulata | Categoria: Multimedia, Video Compression, Image segmentation, Computer Software, Affine Transformation

Share Embed

Denunciar este link

Descrição do Produto

JOURNAL OF MULTIMEDIA, VOL. 2, NO. 3, JUNE 2007

17

A New Object-Based System for Fractal Video Sequences Compression Kamel Belloulata D ! "#$%& %!'( # '!)nieur, Universit* #!!!#+#,s de Sidi Bel Abb,', Sidi Bel Abb,'!)# Email: [email protected]

Shiping Zhu Department of Measurement Control & Information Technology, School of Instrumentation Science & Optoelectronics Engineering, BeiHang University, 100083, Beijing, P.R.China. Email: [email protected] AbstractA novel object-based fractal monocular and stereo video compression scheme with quadtree-based motion and disparity compensation is proposed in this paper. Fractal coding is adopted and each object is encoded independently by a prior image segmentation alpha plane, which is defined exactly as in MPEG-4. The first n frames of right video sequence are encoded by using the Circular Prediction Mapping (CPM) and the remaining frames are encoded by using the Non Contractive Interframe Mapping (NCIM). The CPM and NCIM methods accomplish the motion estimation/compensation of right video sequence. According to the different coding or user requirements, the spatial correlations between the left and right frames can be explored by partial or full affine transformation quadtree-based disparity estimation/ compensation, or simply by applying CPM/NCIM on left video sequence. The testing results with the nature monocular and stereo video sequences provide promising performances at low bit rate coding. We believe it will be a powerful and efficient technique for the object-based monocular and stereo video sequences coding. Index TermsMonocular and stereo video coding, fractal coding, object-based coding, low bit rate coding.

I. INTRODUCTION he next generation visual communications must address the application of capture, transmission, and display of 3D visual information and then realize one of the most desired features of high quality telecommunication services, which is in terms of sensation of 3D reality Although holographic and volumetric 3D displays may provide full 3D perception of the scene, but the vast amount of the optical information prevent their practical uses for the time being and also because their state-of-the-art presentation abilities of only still images. Alternatively, 3D stereoscopic displays can supply the 3D representation through the human brain to fuse the left and right

T

This work is the extension of the paper titled -Based Fractal Stereo Codec with Quadtree-Based Disparity or Motion Compensation Belloulata and S. Zhu, which appeared in the Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing 2006, Toulouse, France, May 2006.

© 2007 ACADEMY PUBLISHER

views of the same scene, which are captured from slightly different viewing angles. Undoubtedly, it will be a very attractive and effective direction and technique to realize the 3D visual communication in the near future [1]. Consequently, monocular video sequence [2], and especially stereo video sequences [3], or in general multiview video sequences [4], which can provide vivid and plentiful information about the 3D scene, have been researched recently to generate virtual synthesized intermediate views [5] and thus supply the 3D feeling of the scene through a flat 2D screen. They have wide applications in many fields, such as in the 3D visual communication, telemedicine [6], etc. However the initial challenge of delivery of huge amount of video sequence data should be overcome to allow them to be transmitted economically and efficiently. Meanwhile efficient coding techniques should be developed to reduce bit rate in the transmission of monocular [2], stereo [7, 8] and multiview [4] video sequences. Two major approaches are used for video sequence coding: block-based and object-based. The most currently used block-based approach has the advantages of simplicity, efficiency and robustness. It has achieved great success and is commonly adopted in many video compression standards. But the subjective quality of reconstructed images may be bad at low bit rate. Object-based video sequence coding has been intensely investigated in the last few years and is also supported by the new MPEG-4 standard [9], [10]. It has an important advantage over the block-based coding: it allows manipulation of image objects without complete decoding of the stream, and then improves the coding quality and reduces the bit rate. In such a scheme, a prior segmentation map (alpha plane) [11] of the image, which segments the image into objects, is known in advance [12]. The object-based approach has been considered as a very promising alternative to the block-based approach. It alleviates the problem of annoying coding effects, such as blocking artifacts and mosquito effects compared to block-based approach at low bit rate, especially when the blocks coincide with boundaries of different objects. The object-based approach can also provide more natural representations of the scene and has another potential benefit of acquiring the depth information of semantically meaningful objects [8].

18

In such a scheme, the task of automatic extraction and modeling of objects directly from the image intensities require sophisticated image analysis techniques to segment the image into homogeneous regions [13, 14] or semantically meaningful objects [12], or even user interaction will be needed to segment the image into regions corresponding to the real objects before the automatic segmentation algorithm [11]. Several methods for coding of the binary alpha plane have been considered during the development of MPEG-4, such as chain coding of the object contours, quadtree coding [2], modified modified reed (MMR) and context-based arithmetic encoding (CAE) [15]. However, little work has been reported on the fractal video coding technique [16], [17]. A scheme, which is not truly object-based, has been proposed for object-based coding system [18], and it is based on quadtree partitioning [19]. We have proposed a fractal based image codec with region-based functionality [20]. It permits new functionalities at the decoder, such as independent transmission/decoding of each object in the video, object/background replacement, objectbased video retrieval, and especially the gain of better video visual quality than block-based coding since the object boundaries usually coincide with the intensity edges that are difficult to encode [21]. We can also benefit from the inherent advantages of fractal coding, such as good compression ratio and size scaleable output to get rate scaleable decoded video sequence, etc. Because the decoding process is formulated in terms of geometric partitions and not in terms of pixels or fixed size blocks, it is possible to perform it at any resolution [19]. All these potential applications promote the development of a new object-based fractal video codec. The paper is organized as follows. The fractal coding with object-based functionality is summarized in section II. In section III, a detailed design of a new object-based fractal compression of monocular video sequence is presented. This new scheme is extended to stereo video sequences compression in section IV. The experimental results are presented in section V, and finally the conclusions are outlined in section VI.

II. FRACTAL CODING WITH OBJECT-BASED FUNCTIONALITY A popular method for still image compression is fractal coding. It has been well known that a fractal image codec performs better, in terms of very fast decompression process as well as the promise of potentially good compression ratio, for variable sizes than for fixed size block, using a quadtree-based image partition by which the image is progressively subdivided by thresholding the range-domain comparison errors [19]. There is another approach allowing irregular shape partition, which can outperform quadtree scheme but significantly increase computational complexity [22]. It creates an image partitioning during the encoding process and does not support semantically meaningful prior image segmentation. In this sense, the technique in [22] is not truly object-based. We proposed a truly object-based fractal coding scheme. The objects can be defined by a prior segmentation map (alpha plane) and are encoded independently of each other [21]. Such a method allows object-by-object organization of the bit stream and consequently recovers and manipulate individual object at

© 2007 ACADEMY PUBLISHER

JOURNAL OF MULTIMEDIA, VOL. 2, NO. 3, JUNE 2007

the receiver without complete bit stream decoding. In the proposed method, the range and domain blocks remain rectangular. It is obvious that some of them are located at object boundaries thus containing pixels from two or more objects (i.e., the object and background). Therefore, in order not to mix pixels from different objects within one transformation, we associate the alpha plane a label with each pixel. It means that same label pixels are from one object as shown in Fig. 1.

Alpha plane Segment 1 Boundary block

s1

Segment 2

s1

s2

s2

Fig. 1. Illustration of using alpha plane for the boundary blocks, same label pixels are from one object.

We modify the dissimilarity measure to account for pixels belonging to one object only, and accordingly, we restrict the search for matching domain blocks to the object of interest. Let I ( x, y ) be the image intensity of a pixel at position

( x, y ) . Let {r1 ,..., rN } be the set of N non-overlapping range blocks partitioning the image, similarly, let

{d1 ,..., d M }

be the set of M , possibly overlapping, domain blocks covering the entire image. So I ri {I ( x, y ) : ( x, y ) ri } and

I d j {I ( x, y ) : ( x, y ) d j } . In order to encode range block ri , a search for index

j of domain block d j and an

i p must be executed, jointly with the computation of photometric parameters s i and oi . In order to assure

isometry

object-by-object encoding/decoding, both range and domain blocks must be located within the same object. Therefore there are four cases, regarding the locations of blocks ri and d j with respect to the object R0 , that may arise: 1)

ri and d j are both interior blocks (int/int);

2)

ri and d j are both boundary blocks (bnd/bnd);

3)

ri is an interior block whereas d j is a boundary block

(int/bnd); 4) ri is a boundary block whereas d j is an interior block (bnd/int); In the first case, standard full block search is executed among same object interior blocks. In the second case, let

S dnj be the

n-th segment in the domain block d j (a block may consist of more than 2 segments). Similarly, let

S rmi be the m-th segment

JOURNAL OF MULTIMEDIA, VOL. 2, NO. 3, JUNE 2007

19

~

in the range block ri . Let I d j be a padded (extrapolated) version of

I d j defined as

Id (x, y) if (x, y)Sdnj ~ (1) Idj (x, y) j if (x, y) Sdnj , (x, y)Srmi v Where v is a padding value, typical mean local intensity in n Sd j or intensity value at the nearest pixel within Sdn j . Because only partial matching is performed, we define a new dissimilarity measure as follows:

1 ~ (I ri , I d j , im ) m Sri Where segment

x, y Srmi

2 ~ [ I ri ( x, y) I ( x, y)] (2) m i dj

im denotes an affine transformation for S rmi . Comparing to the first case, the above

motion compensated by a domain block in the previous frame, which is of the same size as the range block even though the domain block is always larger than the range block in conventional fractal image coders [19]. The main difference between CPM and NCIM is that CPM should be contractive for the iterative decoding process to converge, while NCIM need not be contractive since the decoding depends on the already decoded frames and is non-iterative. The first n frames of the video sequence are treated as a coding group and are encoded by applying CPM, each frame is predicted block-wise from the n-circularly th

Fk is partitioned into range blocks, and each range block ri in Fk is approximated by a domain block d j in Fk 1n , where [ k 1]n denotes k modulo n . The remaining frames are previous frame. In other words, the k

frame

encoded by employing NCIM as shown in Fig. 3.

dissimilarity measure is evaluated only at pixel positions within F1

m

a single range block segment S ri . In the third case, intensity extrapolation (1) of the domain block is always needed since

ri

IN T E R -FR A M E M A P P IN G

F2

fourth case, no padding is needed. The 1+2 search (int/int, bnd/bnd) is computationally attractive while achieving very good performances. As shown in Fig. 2, interior range blocks are searched and matched within the same object interior domain blocks, and boundary range blocks are partially searched and matched within the same object boundary domain blocks.

im

ri

R0

rk

R1

dj dl

k Fig. 2. Schematic illustration of the proposed object-based fractal coding scheme [21].

III. OBJECT-BASED FRACTAL CODING OF MONOCULAR VIDEO SEQUENCE We proposed a new object-based scheme for fractal monocular video sequence coding [23, 24], which is based on the hybrid circular prediction mapping (CPM) and non-contractive interframe mapping (NCIM) [17]. This new scheme provides object-based functionality for monocular video sequence coding based on its alpha plane. The CPM/NCIM combines fractal video coding with the well-known motion estimation and compensation (ME/MC) algorithm that exploits the high temporal correlations between adjacent frames. In CPM and NCIM [17], each range block is

© 2007 ACADEMY PUBLISHER

N O N -C O N T R A C T IV E IN T E R -FR A M E M A P P IN G

F0

is an interior block whereas d j is a boundary block. In the F3

F4

F5

Fig. 3. Hybrid structure of CPM and NCIM [17].

The structure of NCIM is same as that of an interframe mapping, which forms CPM, except that there is no constraint on the contrast scaling coefficients. Since the moving image sequence exhibits high temporal correlations, this domain-range mapping becomes more effective if the size of the domain block is same as that of the range block. In this case, the domain-range mapping can be interpreted as a kind of motion estimation/ compensation technique. In this context, the main advantage of the proposed domain-range mapping is that in a real moving image sequence, small motion vectors are more probable than larger ones. Therefore, the search region for the motion vectors can be localized in the area near the location of the range block. In the decoder, the first n frames are reconstructed by applying CPM iteratively. Then the remaining frames can be reconstructed by applying NCIM to the previous reconstructed frame without requiring iteration, since NCIM is not a contractive mapping. The first n frames encoded by CPM are the minimal decodable set of all the frames [26], and they can be decoded without references to other frames. Therefore, only CPM affects the convergence of the total fractal mapping and that is the reason why NCIM need not be contractive. In addition, we use a quadtree-based motion estimation/ compensation scheme. A quadtree describes an object by placing non-overlapping squares of different sizes inside the object such that the object can be described as accurate as possible. Typically, the sizes of the squares are the powers of 2. We define a maximum size and a minimum size for the squares.

20

JOURNAL OF MULTIMEDIA, VOL. 2, NO. 3, JUNE 2007

We can describe the shape of the object exactly if the minimum square size is one pixel. But for the fractal coding, this will cost one transformation for one pixel, which is unacceptable. So we fix the minimum square size of the quadtree to be 44 pixels. In a first step, we place squares of maximum size MM pixels next to each other on the entire image. Then we decide whether we need to subdivide a square based on homogeneity criterion (variance), as shown in Fig. 4. If yes, then we replace the square with four sub-squares of size M/2M/2 pixels. We repeat this process recursively until we reach a minimum size of the range block (44 pixels) or the minimum distortion, which is fixed by the users under the video quality requirements.

R1 R0 R2

IV. OBJECT-BASED FRACTAL CODING OF STEREO VIDEO SEQUENCES A. Block-Based Coding and MPEG-2 Multiview Profile The most mature technique for stereo video sequences compression is block-based stereo video sequences coding method defined in the MPEG-2 multiview profile [27]. With this approach, for example, the coder first compresses the left view with a monoscopic video sequence coding algorithm. To code the right view, each macroblock is predicted both from the left view using disparity compensated prediction, and from the previous frame of the right view using motion compensated prediction as shown in Fig. 6. Either or both are used and the prediction error is then coded depending on the one which gives smaller prediction error. To make use of the existing coders for monosequence, the disparity vectors can be estimated by the same way as motion estimation, i.e., assuming the disparity is block-wise constant and finding the best matching macro-block from the left view. The disadvantage of this approach is that the estimated disparities for macro-blocks are usually discontinuous and have visible artifacts because of the fixed size matching blocks, and it is also very time consuming because of its exhaustive block matching process. Right view

R2

P

B

B

B

I

B

B

P

Boundary block Interior block Fig. 4. Illustration of the quadtree-based partition for the interior and boundary blocks, the same partition is used for the range and domain blocks.

To provide object-based functionality, video sequence will be encoded frame by frame and object by object according to its alpha plane. Using rectangular range and domain blocks, we divide boundary blocks into segments belonging to different objects, as shown in Fig. 5. Domain blocks Range blocks

Fn

Fn+1

Interior/interior mapping

Boundary/boundary mapping

Fig. 5. Illustration of the object-based video frame mapping for the interior and boundary blocks.

© 2007 ACADEMY PUBLISHER

Left view

Fig. 6. MPEG-2 Multiview profile [27].

In [28], the input views are first aligned to make their epipolar lines horizontal, and mesh based disparity estimation is then applied to the aligned images. Node points are iteratively moved in the direction that minimizes the prediction errors of full frame. And disparity compensations are then computed based on the nodal displacements that are mapped back to the 2D coordinates of the original image. However, the computational complexity of this approach is high because of iteration, and the occlusion problem is not well considered. This scheme yields visually smoother predicted image but gets a lower PSNR. Some occlusion regions become distorted because the predicated left view from the original right view uses a fixed size block. On the contrary, even visible artifacts appear in the block matching algorithm, but it seems to keep the predicated view more faithfully to the original view since the PSNR is higher. The exhaustive block matching algorithm used in [28] adopts the fixed size block of 16./ #0!s and the search range of 1/ #0!s. We believe that if we adopt variable matching block sizes (see Fig. 4) and increase the searching range, we can get not only higher PSNR but also smoother visible predicated image.

JOURNAL OF MULTIMEDIA, VOL. 2, NO. 3, JUNE 2007

In [29, 30], in order to synthesize a 3D image having a larger number of views from a 3D image having fewer views, a new viewpoint interpolation technique and fractal based data compression method scalable to any number of views are proposed. But this scheme is only applicable to multiview based image interpolation, stereo images are not sufficient to use this technique. B. Object-Based Coding and MPEG-4 Multiview Profile Essentially, the above block-based coding technique approximates the shapes of objects in the scene with fixed size square blocks. Therefore, this coder generates high prediction errors in the blocks on objects boundaries. These boundary blocks contain two (or more) objects with different motions, which cannot be accounted for by a single motion vector. Content dependent coders recognize such problems, and try to segment a video frame into regions corresponding to different objects and code these objects separately. For each object, shape information has to be transmitted in addition to motion and texture information. In [31], a MPEG-4 based stereoscopic video encoder is proposed in which the main view is encoded using MPEG-4 encoder and the auxiliary view is encoded by joint motion and disparity compensation. C. A New Object-Based Fractal Coding of Stereo Video Sequences On the base of our object-based fractal monocular video sequence coding scheme (Section III), we encode the right view video sequence by quadtree-based and object-based CPM/NCIM, and we nominate the reconstructed right view video sequence during the coding process as the reference to predict the left view video sequence. We propose to use partial ( x, y ) or full ( x, y , s, o ) affine transformation and quadtree-based disparity estimation/ compensation, or simply apply CPM/NCIM motion estimation/ compensation on left view video sequence. Because the searching block sizes are variable and the domain to range searching and matching are only performed within the same pre-computed domain classes in the entire image [19], we can expect the quadtree-based scheme can provide good performances both in PSNR and also visible quality and with much less computation time than the exhaustive search and iteration methods. We benefit from fractal coding by using CPM/NCIM on right view video sequence, which can supply higher compression ratio than other compression methods. After the partial or full affine transformation and quadtree-based disparity estimation between the left and right view frames, we can only record the ( x, y ) positions or the full affine transformation parameter ( x, y , s, o ) of the similar blocks of the left frame relative to the reconstructed right view frame. The predicated left frame can be obtained by replacing each block by its best matching block in the reference right frame, thus we can get much higher compression ratio for left view video sequence than for reference right view video sequence. We can also just apply CPM/NCIM on left view video sequence, depending on the video contents and user demands on PSNR and bit rate.

© 2007 ACADEMY PUBLISHER

21

Combing the techniques which have been presented above, we synthesize the procedure of the coding scheme as follows: 1. Encode the right view video sequence by quadtree-based and object-based CPM/NCIM motion estimation/compensation algorithm, thus we can exploit the high temporal correlations between adjacent objects; 2. Decode the right view video sequence by quadtree-based and object-based CPM/NCIM decoding algorithm; 3. Encode the left view video sequence by exploiting the spatial correlations between left and right video sequences by quadtree-based and object-based disparity estimation (QDE). Note that if the backgrounds are very different, CPM/NCIM will be applied for left background instead of QDE; 4. Decode the left view video sequence by quadtree-based and object-based disparity compensation (QDC) and replace each block with its best matching block in the right frame that has been obtained by quadtree-based and object-based disparity estimation. Quadtree-based and object-based CPM/NCIM for the left background maybe used depending on the encoding process. The proposed hybrid stereo video sequences coding scheme combines quadtree-based and object-based CPM/NCIM motion estimation/compensation algorithm with quadtree-based and object-based disparity estimation/compensation (QDE/QDC) algorithm is shown in Fig. 7. F L1

F L2

F L3

F L4

F L5

QDE/ QDC F R0 F R1

F R2

F R3

F R4

F R5

F L0

CPM

NCIM

Fig. 7. Hybrid stereo video sequences coding scheme by combining CPM/NCIM with QDE/QDC.

V. EXPERIMENTAL RESULTS A. Object-Based Monocular Video Sequence Coding To evaluate the performance of the proposed codec for monocular video sequence, we use two monocular video sequences: 2#! (352.33 #0!s, 80 frames, 8.33 frames/second) and '456.33#0!s, 24 frames, 8.33 frames/second) and their alpha planes. The maximum and minimum quadtree-based partition block sizes of CPM/NCIM and QDE/QDC are 16./#0!s and 4.7#0!s respectively. First, we encode the two monocular video sequences as complete sequences without object-based functionality. Second, we encode the two monocular video sequences object-by-object using the proposed object-based coding scheme. The rate distortion curves are shown in Fig. 8 for twenty frames of Children, and in Fig. 9 for twenty frames of News. The PSNR/Bandwidth (BW) curves for the two twenty frames video sequences are shown in Fig. 10. Note that object 0 denotes foreground, object 1 denotes background, NOB denotes non-object-based and OB denotes object-based. Note that the OB algorithm performs better than NOB, from the 4th frame (the beginning of the NCIM) for all the frames, for

22

JOURNAL OF MULTIMEDIA, VOL. 2, NO. 3, JUNE 2007

PSNR for News sequence

36 34

PSNR(dB)

the bit rates and PSNR curves. In Fig. 8 and Fig. 9, from the bit rate curves, the OB algorithm curve is exactly the sum of the curve of the object 0 and the curve of the object 1. The OB algorithm needs lower bit rate than the NOB algorithm, and this is because the OB algorithm performs a natural mapping between the same object during all the frames. It also needs small motion vectors, which are obtained from the separation of the objects with different motion and different texture. In Fig.8 and Fig.9, from the PSNR curves, the OB algorithm curve is between the two objects curves and is better than the NOB algorithm curve for all the 20 frames (for the CPM and NCIM). We can conclude that the boundary blocks have obtained a very good mapping by using the OB algorithm.

32 30 28 26

OB Object 1

24 0

5

10

Object 0 NOB

15

20

Frame number

Fig. 9. The bit rate and PSNR curves for the object-based coding algorithm for the monocular video sequence of News.

Bit Rate for Children sequence 1,20 1,10 1,00 BitRate (bpp)

0,90 0,80 NOB Object 0 Object 1 OB

0,70 0,60 0,50 0,40 0,30 0,20 0

2

4

6

8

10 12 Frame number

14

16

18

20

To verify the performance of the proposed algorithms in more realistic scenario, we performed a rate-distortion comparison on complete frames for all the sequence. The rate-distortion curves are shown in Fig. 10. Note that the new algorithm performs slightly better for the two video sequences 2#! '8" "!)"8#' Also note that the rate for shape information (alpha plane) is not accounted for in our results. However, as we have mentioned before (Section I) and as has been pointed in [10], the shape information rate has negligible impact on the performance since modern compression methods can encode object boundaries at about 0.010.03 bpp [15].

30

32

29

Children

28

29

PSNR(dB)

PSNR(dB)

PSNR for Children sequence 35

26

OB Object 0 Object 1 NOB

23 20 0

5

10

15

20

Frame number

27 26 25 24

NOB

23

OB

22 21 100

Fig. 8. The bit rate and PSNR curves for the object-based coding algorithm for the monocular video sequence of Children.

300

700

900

1100

1300

BW(kbps)

Bit Rate for News sequence

News

0,90

NOB Object 0 Object 1 OB

0,70

30

PSNR(dB)

0,80

BitRate (bpp)

500

0,60 0,50

28 26

NOB OB

24

0,40 0,30

22 0,20 0

2

4

6

8

10

12

Frame number

14

16

18

20

0

260

520

780

1040

1300

BW(Kbps) Fig. 10. The PSNR/band width (BW) for the twenty frames sequences of Children and News.

© 2007 ACADEMY PUBLISHER

JOURNAL OF MULTIMEDIA, VOL. 2, NO. 3, JUNE 2007

A visual confirmation of the above claims can be found in Figs. 11 and 12 where results for encoding 2#! ' 7 '" 9 : '! "#) parameters so as to assure better visual quality for the foreground than for the background. Note the much less distorted area around the ;

Lihat lebih banyak...

A New Object-Based System for Fractal Video Sequences Compression

Descrição do Produto

Comentários