JACOB: just a content-based query system for video databases

June 5, 2017 | Autor: Marco La Cascia | Categoria: Information Retrieval, New Technology, Color, Digital Images, Image Retrieval, Digital Video, Texture, Digital Image, Multimedia Application, Video database, Digital Video, Texture, Digital Image, Multimedia Application, Video database

Share Embed

Denunciar este link

Descrição do Produto

JACOB: JUST A CONTENT-BASED QUERY SYSTEM FOR VIDEO DATABASES

Marco La Cascia

Edoardo Ardizzone

Dipartimento di Ingegneria Elettrica - Universita di Palermo, Palermo, Italy

ABSTRACT

2. RELATED WORK

The increasing development of advanced multimedia applications requires new technologies for organizing and retrieving by content databases of still digital images or digital video sequences. In this paper we describe JACOB, a prototypal system allowing content-based browsing and querying in video databases. The JACOB system automatically splits a video into a sequence of shots, extracts a few representative frames (said r-frames) from each shot and computes r-frame descriptors based on features like color and texture. No user action is required during the database population step. Queries exploit this image content description and may be direct or by example.

In the last years several content-based retrieval systems have been developed. These systems dier in terms of image or video features extracted, degree of automation reached for feature extraction and the level of domain independence. The QBIC system [4, 3] is a content-based retrieval system treating both images and video. The data model has scenes (full images) that contain objects (subset of an image), and video shots that consist of sets of contiguous frames and contain motion objects. Videos are broken into clips called shots. Scenes are characterized through color, sketch and texture; objects are described in terms of color, texture, shape and motion. The object identi cation is done automatically, manually or semi-automatically. Several tools have been developed to aid the user to identify the objects. Direct queries and queries by example are both allowed. The CHABOT system allows for the storage and retrieval of a vast collection of digitized images. One of the goals in the CHABOT system is to integrate image analysis techniques into text-based retrieval systems. In [8] a simple method for color analysis is presented. The CANDID system [7] is a content-based storage and retrieval system of digital images. For each image a global signature comprising texture, color and shape information is automatically computed. Queries are speci ed by example. Other systems [2, 9] are more task-oriented; several of them are been proposed for storage and retrieval of faces. However these systems are usually extensible to other classes of images. The video indexing system that is proposed in the rest of this paper is general purpose. Features are automatically extracted and are based on color and texture.

1. INTRODUCTION Ideally, queries put to a video database should refer to the content of stored videos and results should contain a few videos matching the query. To this aim image and image sequence contents must be described and adequately coded. The traditional approach to this problem is based on textual descriptions of the imagery content manually entered into standard databases. Manual entry of content descriptions is tedious, time-consuming, expensive and subjective, thus users enter only the minimum annotations necessary to accomplish a speci c task. This makes unpractical or useless the use of large databases containing several thousands of images. For these reasons, in the last years more convenient approaches have been attempted, based on the automatic extraction of salient features from images and videos, borrowing methodologies and tools from classic areas of computer vision and image analysis. In this paper we present a prototypal system that allows for the content based retrieval of videos, using global image features like color and texture. Our system is completely automatic, no human interaction is needed to segment the videos and to annotate the images. Moreover, the JACOB system is highly modular and may be improved with more complex and powerful features extractors and query engines. The rest of the paper is organized as follows. Section 2 addresses several related works. Section 3 describes the proposed system. Section 4 presents the experimental results on a relatively large database. Finally, Section 5 contains some nal considerations about the proposed system and its possible extensions.

3. THE PROPOSED SYSTEM The JACOB system was developed in a highly modular way to facilitate future work and improvement. Its architecture can be subdivided into two functional modules: the rst one is dedicated to the database population and the second one to the database querying. The user interaction is necessary only in the querying phase; the population module is completely automatic. The system operation is summarized in Figure 1. The videos are split in short sequences called shots [1, 5] by the "shot extractor". A few representative frames (said

r-frames) are therefore extracted from each shot and characterized in terms of their color and texture content. The r-frames extraction and the color and texture features computation are performed in the "Feature extractor" module. We use a single functional module to extract and characterize the r-frames. We are also developing other shot descriptor based on global motion analysis (we are developing methods to detect common camera operation like pan, zoom, tilt etc..). The shots extracted from the videos and the r-frames extracted from the shots are stored in the "Shot DB". The features extracted from each r-frame are stored in the "Feature DB". The hierarchy of the "Shot DB" is reported in gure 2. On the other side, when a query, direct or by example, is put to the "Query interface", the "Match engine" searches for the most similar r-frames by analyzing the data stored in the "Feature DB". The n most similar r-frames, with n is chosen by the user, are shown. The user can browse the resulting r-frames and can iterate the query, changing if necessary the query parameters (only on the selected shots). This technique leads to a fast retrieval of the desired shot in a simple way. More detailed information regarding the JACOB modules are reported in the following. A WWW demo of the JACOB system is also available at the URL: http://wwwcsai.diepa.unipa.it.

s

T PU ry IN ue Q

T nce PU ue IN seq eo

d Vi

DB population session

DB querying session

Query

Video sequences

Shots

Query interface

Shots extractor d te e ec in el ng ss e ot tch Sh ma by

Shots

OUTPUT Shots

Match engine Features

Shots DB

Pointer to shots

Features DB

Video sequence

Shots

R-frames

Figure 2. A video stored in the raw-data DB. results very eective, in despite of its simpleness: if the video is shorter than one second then only a frame is chosen as r-frame (the middle one). If the shot is longer than one second then a frame for each second is chosen as r-frame. A number of tests showed that a so simple technique is sucient to completely describe a shot. Videos, shots and r-frames are then stored in the "Shot DB", as shown in gure 2. The r-frames are then characterized in a global way computing a few color and texture features. Further work to detect and characterize the moving objects is under development. The automatic feature extraction is a critical task since nding a set of descriptors highly descriptive and automatically computable may be very dicult. The color-based feature we use is a very simple one; we compute for each r-frame a quantized RGB histogram, getting the most signi cative three bits for each color channel. Reducing the color space to only 512 colors and using the 512 levels histogram as a feature vector appears to be a very simple and powerful way to describe an image. Various experiments performed on several images showed that using a color histogram in a similarly reduced YUV space is not so descriptive as reducing the RGB space. The texture-based features we use are the following:

XX

max f (m; n; r; ) m;n

n[f (m; n; r; )]2

(1)

m Shots

Features

Features extractor

Figure 1. General architecture of the JACOB system. 3.1. Database population

In the database population step the user inserts a digital video sequence (MPEG or QuickTime) and the system performs all the needed processing. In the rst step a scene cut detection is performed to extract the shots from the video. The technique used [1] is based on a simple neural network and seems to work very ne on a large variety of videos. Once the shots are extracted, the system extracts the r-frames. The technique used is based on heuristics and

where f (m; n; r; ) is the joint probability that two pixels whose gray levels are respectively m and n are at relative distance r and orientation . We compute the (1) for r = 1 and = 045o , 0o , 45o and 90o so obtaining an 8-dimensional vector. Another texture descriptor we use is the edge density, i.e. the ratio between the pixels whose intensity gradient is over a xed threshold and the whole number of pixels in the image. This is essentially a smoothness measure. We compute the intensity gradient via directional masks along the direction 0o , 45o , 90o and 135o to get information about the smoothness directionality. We determine 4 edge density values for each r-frame; these 4 values are another texturebased vector. In our implementation we use both the texture-based vectors to characterize the texture content of a r-frame. The texture-based vectors, and the reduced color histogram are stored in the "Feature DB".

3.2. Database querying

The querying step in JACOB may be direct or by example. To perform a direct query the user should insert a few values representing the color histogram and/or the texture features. This type of query is not very friendly for a novice user. Better interaction is allowed by querying by example; the user should only insert an image and the system returns the n best matching r-frames from the archive, where n is a user chosen value. The r-frames returned are linked to the correspondent shot and video. If the returned r-frames are not what the user was searching for, then it is possible to perform another query by example starting from one of returned images to obtain a more signi cative result. The retrieval engine is based on a sequential scan of the features-DB and a vector distance measure between the features of the example image and the stored features is used. A similarity index is built and the best matching r-frames are returned in similarity order. When specifying queries based on both color and texture a method to adequately weight the distances computed for each feature descriptor (color, texture) is needed to obtain a global similarity value. For example, images with a similar color distribution lead to a similarity value dierent from that one obtained from images with a similar texture content. Since the user chooses a value between 0 and 1 to indicate the relative importance of a feature with respect to another one, a distance normalization based on extreme cases has been introduced to avoid this potential problem.

4. EXPERIMENTAL RESULTS In the following a few sample queries are reported that show the validity of our approach. These queries were performed on our WWW demo database (http://wwwcsai.diepa.unipa.it/research/projects/jacob) that contains about one thousand r-frames acquired from TV. In gure 3 three queries by example are shown that use the same example image but dierent values for color and texture importance; in particular the shown results refer to querying by example, using only color information, using only texture information and using both color and texture information. In gure 4 three results of direct query are shown, obtained respectively querying for brown images with no texture information, for brown images with ne texture and for brown images and coarse texture. Experimental results showed that color and texture are a good starting point for querying by image content. In the whole test we performed results are coherent.

5. CONCLUSIONS In this paper we proposed the JACOB system that allows the user an eective querying by video content. Our system uses only two methods for describing and searching video content. These methods, color and texture, are intended for comparing entire images. Better results may be certainly obtained using object descriptors. To perform a such description a segmentation

step is needed to locate the moving objects and discriminate them from the background. A robust and automatic segmentation step has been proved to be a very dicult task on static images; the motion information may play a crucial role in solving this problem. Another improvement to our system may be obtained integrating textual information about the video typology (i.e. sport, lm, news, etc...) with the automatically computed features. This information, although manually inserted by the operator, could improve dramatically the system performance. Other areas of future work include motion based queries, optimized database accesses and a more powerful and friendly user interface. The JACOB system was developed in C and X11 on a DEC AXP3000 workstation; the WWW demo was developed and still runs on a DECstation 5000; a version running on Apple Macintosh is under development.

REFERENCES [1] E. Ardizzone, G.A.M. Gioiello, M. La Cascia, D. Molinelli: "A Real-Time Neural Approach to Scene Cut Detection". to appear IS&T/SPIE - Storage & Retrieval for Image and Video Databases IV, January 28 - February 2, 1996, San Jose. [2] J.R. Bach, S. Paul, R. Jain: "A Visual Information Managment System for the Interactive Retrieval of Faces". IEEE Transaction on Knowledge and Data Engineering, Vol.5, No.4, August 1993. [3] C. Faloutsos, R. Barber, M. Flickner, J. Hafner, W. Niblack, D. Pektovic, W. Equitz: "Ecient and eective querying by image content". Journal of Intelligent Information Systems, 3(3/4):231-262, July 1994. [4] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steale, P. Yanker: "Query by Image and Video Content: the QBIC System". IEEE Computer, Sept. 1995 [5] A. Hampapur, T. Weymouth, R. Jain: "Digital Video Segmentation". ACM Multimedia'94 Proceedings: ACM Press. [6] R. Jain, A.P. Pentland, D. Petkovic: "Workshop Report: NSF-ARPA Workshop on Visual Information Management Systems". June 1995, Cambridge MA. [7] P.M. Kelly, T.M. Cannon, and D.R. Hush: "Query by image example: the CANDID approach". Proceeding of the SPIE: Storage and Retrieval for Image and Video Databases III, Vol. 2420, pages 238-248, 1995. [8] V.E. Ogle, M. Stonebraker: "Chabot: Retrieval from a Relational Database of Images". IEEE Computer, Sept. 1995. [9] A. Pentland, R.W. Picard, S. Sclaro: "Photobook: Tools for Content-based Manipulation of Image Databases". Proceedings of the SPIE: Storage and Retrieval Image and Video Database II, No.2185, February 6-10, San Jose. [10] M. Swain, D. Ballard: "Color indexing". Int. Journal of Computer Vision, 7(1):11-32, 1991.

(a)

(b)

(c)

Figure 3. Query by example results. On the left is the query image, the others are the r-frames retrieved in similarity order. In (a) color information only was used, in (b) texture information and in (c) both of them. The results would be more expressive if looking at the color versions.

(a)

(b)

(c)

Figure 4. Direct query results. The query was speci ed by color and texture. (a) " nd the four mainly brown images", (b) " nd the four mainly brown and coarse-textured images" (c) " nd the four mainly brown and ne-textured images". The results would be more expressive if looking at the color versions.

Lihat lebih banyak...

JACOB: just a content-based query system for video databases

Descrição do Produto

Comentários