The MPEG-7 Multimedia Database System (MPEG-7 MMDB)

Share Embed


Descrição do Produto

Available online at www.sciencedirect.com

The Journal of Systems and Software 81 (2008) 1559–1580 www.elsevier.com/locate/jss

The MPEG-7 Multimedia Database System (MPEG-7 MMDB) q Mario Do¨ller *, Harald Kosch Institute of Information Technology, University of Passau, 94034 Passau, Germany Received 23 August 2004; received in revised form 20 December 2005; accepted 31 March 2006 Available online 19 November 2007

Abstract Broadly used Database Management Systems (DBMS) propose multimedia extensions, like Oracle’s Multimedia (formerly interMedia). However, these extensions lack means for managing the requirements of multimedia data in terms of semantic meaningful querying, advanced indexing, content modeling and multimedia programming libraries. In this context, this paper presents the MPEG-7 Multimedia DataBase System (MPEG-7 MMDB). The innovative parts of our system are our metadata model for multimedia content relying on the XML-based MPEG-7 standard, a new indexing and querying system for MPEG-7, the query optimizer and the supporting internal and external application libraries. The resulting system, extending Oracle 10g, is verified and demonstrated by the use of two real multimedia applications in the field of audio recognition and image retrieval. Ó 2007 Elsevier Inc. All rights reserved. Keywords: Multimedia Database Systems; MPEG-7; Multimedia index structures

1. Introduction Multimedia Database Systems (MMDBMS) organize and store multimedia data for content-based retrieval (Kosch, 2003). These systems rely on multimedia data models representing high- and low-level abstraction of media objects for facilitating various operations, e.g., insertion, indexing, querying and retrieval. Many models have been proposed in the past which reflect the needs of database users and developers (Tusch et al., 2000; Chen et al., 2000). However, these models reveal important shortcomings: they are either limited by the use of one kind of multimedia data (e.g., only images are supported) or by the capacity of semantic modeling (e.g., only a keyword description of the content may be entered). This is as aston-

q

This project was funded in part by FWF (Austrian Science Fund) under the Project Number P14789. * Corresponding author. E-mail addresses: [email protected] (M. Do¨ller), harald. [email protected] (H. Kosch). URL: http://www.fim.uni-passau.de/de/fim/fakultaet/lehrstuehle/verteilte-informationssysteme/forschung/codac.html (M. Do¨ller). 0164-1212/$ - see front matter Ó 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jss.2006.03.051

ishing, as there has been recently published a new standard for describing the content of different types of multimedia data that offers richer semantics than existing systems. It is MPEG-7 (Martinez et al., 2002), the first standard of the Moving Picture Experts Group not dealing with coding exclusively. Unfortunately, the benefits of MPEG-7 have not yet reached MMDBMS. In this regard, we realized a fullfledged MPEG-7 Multimedia DataBase System (MPEG-7 MMDB) based on the extensibility services of Oracle 10g in order to use, query, index and store multimedia and its content description in the form of MPEG-7. This paper presents the innovative aspects of our database system. It explains the MPEG-7 database schema, the multimedia querying system, the multimedia indexing framework to support multimedia similarity searches, the query optimizer to support the processing of the queries and several real-world multimedia applications which have been created on top of the MPEG-7 MMDB. The remainder of this paper is organized as follows: Section 2 describes related work and introduces briefly to MPEG-7. In Section 3, we introduce the system and its innovative parts. Section 4 describes how an user could

1560

M. Do¨ller, H. Kosch / The Journal of Systems and Software 81 (2008) 1559–1580

perceive our system. This is followed by Section 5 which demonstrates the mapping strategy of MPEG-7 to an equivalent database model. In Section 6, we introduce our multimedia querying system and discuss all provided services, their advantages and disadvantages. Section 7 presents our Multimedia Indexing Framework (MIF). Section 8 evaluates the query and indexing system. Section 9 details the query optimizer. The MPEG-7 MMDB is demonstrated in Section 10 by the use of two multimedia applications. Finally, Section 11 concludes this paper and points to future work. 2. Related work and MPEG-7 2.1. Related work On the basis of an increasing amount of multimedia data, we are forced to think on how to manage and retrieve these data. Most existing Database Management Systems (DBMS) are basically not designed for multimedia. Therefore, they provide extenders that enable fundamental processing of multimedia data (e.g., Oracle Multimedia1 (formerly Oracle interMedia), IBM DB2 Image Extenders2 and IBM Informix DataBlades3). For instance, Oracle Multimedia provides image storage, content-based retrieval (CBR) functionality and format conversion through the new data type, ORDImage. Basic CBR functionality is realized by the ability to extract an image feature vector containing four different attributes (global color, local color, texture and shape). However, these systems are limited to basic CBR functionality relying only on low-level features without the possibility of creating semantically rich queries. Furthermore, no mean for video and audio CBR are given. The QBIC system (Myron et al., 1996; Niblack et al., 1998), developed by the IBM Almaden Research Center, is a well-recognized retrieval system for image databases and has been successfully integrated into the DB2 Image Extenders. QBIC has been designed to query large image databases based on low-level content properties, namely color percentages, color layout, shape and textures occurring in the images. Queries can be combined with text and keyword predicates to improve the query efficiency. Several real-world applications are available, for instance the Hermitage Web site uses the QBIC engine for searching its world-famous art collection.4 QBIC is part of the multimedia extensions to the IBM Universal Database and integrates smoothly into the basic object hierarchy enabling image annotation and retrieval. The functionality is limited to low-level CBR. It is not possible to describe segmentation of images or high-level image content. Furthermore, support for an extensible indexing structure is not given. 1 2 3 4

http://www.oracle.com/technology/products/intermedia/index.html. http://www-306.ibm.com/software/data/db2/extenders/aiv/. http://www-306.ibm.com/software/data/informix/blades/index.html. http://www.hermitagemuseum.org/.

Then, there exists a handful of special-purpose MMDBMS (Kosch, 2003) and MM-retrieval systems (e.g., SMURF van Leuken et al., 2006, RETIN Cord et al., 2007, or Cortina Gelasca et al., 2007). Representatively, we pick out the DISIMA and MARS systems. DISIMA (Oria et al., 2000; Oria et al., 2004), an acronym for Distributed Multimedia DBMS and developed at the University of Alberta, is an image database system which enables content-based querying. Their focus is on multimedia data modeling and query languages. Another project is MARS, which is an acronym for Multimedia Analysis and Retrieval System (Porkaew et al., 1999; Chakrabarti et al., 2003). This system implements an integrated multimedia information retrieval and database management system, that supports multimedia information as first-class objects suited for storage and retrieval based on their semantic content. Both systems provide more sophisticated means for modeling and querying multimedia data than the above mentioned multimedia extensions, but as a drawback they are not designed to query multimedia and traditional data, nor efficient access structures are available. In regard of the missing functionalities of both related approaches, we implemented an MMDBMS which offers support for rich multimedia data, i.e. low- and high-level multimedia data, by mapping MPEG-7 schema types to database types. Furthermore, we introduced a new indexing and querying system, a query optimizer and supporting internal and external application libraries. Obviously, parts of these problems have been considered elsewhere. Some works focused on data modeling, others on query processing aspects, some on applications. Related works which are relevant to our solutions are highlighted in the following. To the best of our knowledge, none of these systems consider the interaction and integration of the data model, query processing and optimization and application libraries in one system. Multimedia data models have mainly been developed for special purposes (like content-based image processing, spatial and temporal properties and keyword search) (Tusch et al., 2000; Chen et al., 2000; Oria et al., 2004; Jaimes, 2005; Jiang and Elmagarmid, 1998; Wen et al., 2003; Jianfeng and Li, 2004). For instance, the DISIMA model (Oria et al., 2004) allows the user to assign different semantics to an image component and an image representation can be changed without any effect on applications using it. An associated query language (MOQL), extending OQL, allows spatio-temporal querying as well as the definition of a presentation specification. However, no processing strategies are presented. Efficient query processing must be supported by the use of index structures (Bo¨hm et al., 2001). The reason is the high dimension (typically above 664) of the feature vectors describing low-level content of multimedia data (Cha et al., 2002). In this context several index structures for highdimensional data have been proposed, e.g., SS-tree (White and Jain, 1996), SR-tree (Katayama and Satoh, 1997), Mtree (Ciaccia et al., 1997), X-tree (Berchtold et al., 1996) or

M. Do¨ller, H. Kosch / The Journal of Systems and Software 81 (2008) 1559–1580

TV-tree (Lin et al., 1994). Unfortunately, these access methods are rarely available in common DBMS, nor are extensions proposed to integrate them. In this context, Hellerstein and his group at the University of California, Berkeley realized the Generalized Search Trees (GiST), which is a template indexing framework that allows domain experts to customize a CBR system to index their content (Hellerstein et al., 1995). Several papers have used the GiST framework for indexing low-level features (Bliujute et al., 1998; Thomas et al., 2000; Kornacker, 1999; Kleiner and Lipeck, 2003). Most of them employ this framework for simple CBR without the integration into a database system. We are only aware of two database extensions, one for Informix (Kornacker, 1999) and one for Oracle (Kleiner and Lipeck, 2003). But both works do not consider query optimization using indexes, nor do they consider the integration of non-balanced trees for multimedia search. Recently, some MPEG-7 retrieval systems have been demonstrated (Mezaris et al., 2004; Po and Wong, 2004). These systems implement low-level image retrieval based on color, texture and shape descriptors. The user can choose among different descriptors (and combination of them) to be employed in the retrieval and therefore optimize the search. However, these systems provide no indexing framework for querying large image date sets. Moreover, no query language support is given, nor query optimization methods are supplied. Another MPEG-7based database is PTDOM (Westermann and Klas, 2006). PTDOM (Persistent Type Document Object Model) is a schema aware XML (and therefore MPEG-7) database system supporting document validation, typed storage of elements and attribute values, structural indexing facilities and optimizations of query plans. Nevertheless, their system focuses on data centric retrieval and neglect multimedia retrieval features (QBE, spatial-temporal queries, etc.). 2.2. MPEG-7 MPEG-7 (Kosch, 2003; Martinez et al., 2002) is an ISO/ IEC standard published by MPEG (Moving Picture Experts Group) in its first version in 2002. The second version with improved descriptions, proposed by user and developer communities, is available since 2006. The standard is organized in ten parts (from system, low- and high-level multimedia description schemes, to reference software and conformance) and provides a rich set of standardized tools to describe multimedia content. A detailed explanation of all parts is beyond the scope of this paper, but can be found at (Martinez et al., 2002) and on the tutorial MPEG page: http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm. The multimedia data is represented with the help of descriptions. A description consists of Description Schemes (DS) and a set of Descriptors (D). A Descriptor is a representation of a feature and defines its syntax and semantic, it may for instance be a distinctive characteristic of audio–

1561

visual information. The description scheme identifies relationships among other components (DS and D). Both, DS and D, can be defined and modified with the help of the Description Definition Language (DDL) which bases on XML Schema extended by new data types, like for feature vector representation. The example in Fig. 1 illustrates the use of MPEG-7 for describing the content of an Image. Fig. 1 uses the ImageType and StillRegion DS for this purpose. The ImageType DS describes the semantic, the representation and the context of images. We describe one image named goal.jpg. This image is decomposed into two sub-images, each specified with a (StillRegion) DS. This DS offers a large number of different elements (e.g., spatial location information, text annotation, as well as means for further sub-regioning). In our example, we specified an own ID (KloseGoal), a TextAnnotation that contains text description on the image, and a VisualDescription which figures a color feature vector of size 64 for each StillRegion. As the MPEG-7 standard relies on XML Schema, it is necessary for our work to investigate current XML database solutions for their MPEG-7 support. An interesting analysis is presented in Westermann and Klas (2003). There, native XML database solutions (e.g., Xindice Staken, 2002, TIMBER Jagadish et al., 2002) and database extensions (e.g., Oracle XML DB Murthy and Banerjee, 2003) are analyzed for their suitability of managing MPEG-7 descriptions. The analysis relies on five main parts, namely classic DBMS functionality, extensibility, media description schemes, access to media descriptions and representation of media description. The result is that, the examined solutions generally lack in mapping MPEG-7 descriptors and elements to their native data types. Instead of using database types, these solutions use text representation. Besides schema deficits, no solution provides adequate indexing functionality that enables similarity search. Furthermore, native XML database solutions basically have deficiencies with extensibility (e.g., to add a new index structure). As a consequence, our solution to use database types for MPEG-7 schema types and to employ the extension capacity of the Oracle 10g DBMS system to build the MPEG-7 MMDB is an effective approach. 3. Overview of the system The MPEG-7 MMDB is an extension of the Oracle DBMS based on its Data Cartridge Technology.5 Oracle databases are built according to a modular architecture with extensible services. These extensible services enable database designer and programmer to extend e.g., the type system, the query processing or the data indexing (see Fig. 2). Each extensible service offers an extensibility inter5

http://www.oracle.com/technology/documentation/database10g.html.

1562

M. Do¨ller, H. Kosch / The Journal of Systems and Software 81 (2008) 1559–1580

Fig. 1. MPEG-7 example document and its corresponding DOM tree.

Insert. Finally, the extensibility of the query optimizer allows one to guarantee the efficiency of the operation implementations. As mentioned before, the MPEG-7 MMDB uses the data cartridge technology in order to extend Oracle‘s internal services (e.g., type system) for multimedia specific purposes. Currently, our system consists of four main parts (see Fig. 3) which are introduced below: 3.1. Core management system Fig. 2. Oracle data cartridge.

face which can be used to enhance and modify the database for the users needs. The Data Cartridge Technology adds support for new types including user-defined objects, collections and internal large object types and associated methods and functions. The database system can be extended by implementations for these methods and functions in any popular programming language such as PL/SQL, Java or external C language routines. Further, Oracle introduces the concept of an extensible indextype. The new index has to implement all necessary ODCI (Oracle Data Cartridge Interface) functions for indexing e.g., ODCIIndexCreate or ODCIIndex-

The core management system (1) is composed of a multimedia database schema based on MPEG-7, the multimedia query optimizer and the index processing. The multimedia database schema relies on the extensible type system of the cartridge environment. For this purpose, the MPEG7 schema is mapped to a database schema, i.e., to respective object types and tables. Detailed information is presented in Section 5. The indexing processing provides the interface for the index types available in the schema. Its main task is to parse, control and convert the input of the access operators (in PL/SQL) and to call the implementation routines (C++) in the Multimedia Indexing Framework (MIF), see below.

M. Do¨ller, H. Kosch / The Journal of Systems and Software 81 (2008) 1559–1580

1563

Fig. 3. MPEG-7 multimedia database (MPEG-7 MMDB).

The query optimizer (see Section 9) implements a cost model for the similarity search based on an estimation of the number of disk page accesses. 3.2. Multimedia indexing framework (MIF) The indexing facility of the core management system is supported by our Multimedia Indexing Framework (MIF) (2). MIF provides various index structures, among others balanced index trees like SR- and SS-trees and non-tree index like LPC-files, for fast execution of similarity and exact search. The main part is the GistService which is located in an external address space. It manages all available access methods. MIF is detailed in Section 7.

 DeleteLib: The delete-library offers functionality for deleting MPEG-7 descriptions. Update operations are only possible as delete and re-insert.  QueryLib: Besides the possibility of setting up SQL queries, we have the following query services. The first service allows one to query the database with the help of XPath expressions and produces XML-based output. The main disadvantage of the first approach is the complexity and size of the query statements. Therefore, one may use alternatively our QueryLib, where the user can create individual select statements with the help of a modular construction system. This modular construction enables users to specify the select part as well as the where part according to their needs.

3.3. Internal libraries InsertLib and Querylib are detailed in Section 6. On top of the core system reside several internal libraries (3). These libraries provide basic functionalities such as inserting/deleting/updating of MPEG-7 documents in our database schema, as well as a query library which simplifies the query for product specific application libraries. Currently, the internal libraries covers the following parts:  InitLib: This library is used for creating new instances of the multimedia data types. It knows the complete multimedia schema with all essential rights and libraries.  InsertLib: With the help of the insert-library one can insert MPEG-7 descriptions into the database. The MPEG-7 description is inserted twice. First, the complete description is inserted into a database table which enables coarse grained XPath queries. Second, it is broken down according to the MPEG-7-based database schema (see Section 5) in order to allow fine grained queries. The association between the complete and split documents is maintained by the assigned DOC_ID which denotes the membership of a table row to a specific MPEG-7 document.

3.4. Application libraries The application libraries (4) serve as interfaces between applications and our MPEG-7 MMDB. In general, these libraries use the underlying internal libraries for their specific needs. For instance, QueryLib is used to create application specific queries, etc. At the moment, we have implemented libraries for the following applications (see Fig. 4): BlobworldLib for a content based image retrieval application (see Section 10.2) and AudioLib for an audio recognition tool (see Section 10.1). 4. Usage of the system Our MPEG-7 MMDB provides, based on its layered architecture, a broad range of functionality and entry points which differ by their complexity (e.g., development of indexes, query optimization, new similarity algorithms, MPEG-7-based queries, enhancement and development of application libraries).

1564

M. Do¨ller, H. Kosch / The Journal of Systems and Software 81 (2008) 1559–1580

Fig. 4. Multimedia applications on MPEG-7 MMDB.

We may distinguish among three levels of usage: the applied usage, the sophisticated usage and the administrative usage. The following example is intended to give a better understanding of our usage scenarios. Let us assume, that an user wants to provide a new goal detection application for soccer games based on our MPEG-7 MMDB. This application concerns audio recognition (e.g., detect goal squeals in the soccer game), similarity search (e.g., the video sequence shows a penalty area) and text annotation recognition in video sequences (as e.g. described in Xu et al., 2001). In the following, all three usage scenarios are introduced in more detail. 4.1. Applied usage scenario In an applied usage scenario, the user mainly interacts with the system by maintaining its content (inserting and deleting of MPEG-7 documents) and by using the set of provided multimedia applications. Currently, the system offers two applications in the domain of image search and audio recognition and retrieval (for both see Section 10). Using these applications, the user can cover partly the example requirements. For instance, to detect a goal squeal, one has first to record typical goal squeals by microphone and then compare these to the match sequences for similarity. However, we cannot derive discriminating characteristics from the fingerprints of the goal squeals due to non access to the system libraries. Similar to the restrictions in audio recognition, the image search must rely on sample penalty area images. Thus, we are limited to the features provided by the applications (i.e., the text annotation could not be considered), but the advantage is, that no system knowledge is necessary. 4.2. Sophisticated usage scenario In a sophisticated usage scenario, we take advantage of using the upper layers of the system. We are able to

implement new application libraries on top of provided internal libraries. In the given scenario, the user will develop a set of complex queries which are necessary for detecting goals in soccer games. These are the retrieval of goal squeals by the commentator, similarity searches for image sequences where penalty areas are shown and the retrieval of text annotations for goal specific descriptions. These queries will then be expressed with the help of our query library (see Fig. 8b for an example). She/he has first to decide which MPEG-7 descriptors need to be queried and then how the results of each individual query are combined. In particular that means, we will query the AudioSignature DS for goal squeals, then use the EdgeHistogram DS to detect penalty areas and finally employ the TextualType DS to get text annotations. This is a typical cross-media query involving different types of media. The results of querying each descriptor must then be merged together to deliver a single result. How these results have to be combined depends on the needs of the user. Typically, we will weight the individual results based on user experience. In our example, we would trust mostly the text annotations, secondly the audio results and finally the image comparisons. In addition, this user can choose among the available access methods and their operations in order to improve query efficiency, if similarity searches among low level descriptors are necessary. If the user needs to enhance the pool of available index implementations by own realizations, we cross the border to the administrative usage of the system. 4.3. Administrative usage scenario In an administrative usage scenario, the user is responsible for maintaining the internal libraries and the core management system. In our example, she/he has the knowledge how the multimedia indexing framework can be extended and how new indexes can be integrated into the MPEG-7

M. Do¨ller, H. Kosch / The Journal of Systems and Software 81 (2008) 1559–1580

MMDB in order to be used in applied/sophisticated usage scenarios. In addition, the user is able to extend the internal library collection (e.g., by an library with improved update functionality) and to extend the DB-schema by new internal data types. The following sections provide detailed information on the important system parts. 5. Multimedia schema The Multimedia Schema provides a partial mapping of the MPEG-7 schema to related object types and tables in the database. Fig. 5 shows the parts of the class hierarchy of the MPEG-7 standard. We restricted ourself to the Content Description and Content Abstraction part of MPEG7. The gray filled boxes represent MPEG-7 descriptions that have been mapped completely to our database schema. The dotted boxes are MPEG-7 descriptions that partly have been mapped to our schema. The chosen subset contains mainly content descriptions and is therefore the first candidate to be queried by users. The parts not taken into account yet, can if required, be entirely mapped according to our guidelines. As the MPEG-7 standard relies on XML Schema, our mapping approach deals in general with solutions for XML Schema specific constructs. Most of the constructs, such as complex data types, inheritance and collections have counterparts in na object-relational DBMS (ORDBMS). But other constructs, such as recursion, sequence, choice, etc. have to be investigated more precisely.

1565

5.1. Mapping rules The following main guidelines for mapping MPEG-7 were imposed:  Use the XMLTYPE for delimit the number of used object types. MPEG-7 descriptors that are not in the main focus for querying and indexing have only been represented with the help of Oracle’s XMLTYPE (see an example in the next Section 5.2). Nevertheless the information is not lost and can be retrieved by the use of XPATH expressions in SQL statements.  Reduce the mapping complexity by flattening down the inheritance hierarchy to few levels. In general, all abstract data types within MPEG-7 have been skipped.  Use object references for navigation throughout the schema. We have introduced a key system (DOC_ID, PART_ID) in each table. The DOC_ID is used to assign each tupel to the corresponding document and the PART_ID is used to identify their position within a document.  Use a supplemental table column to represent polymorphism. In MPEG-7 polymorphism is realized by the use of ‘‘xsi:type”. Our adopted mapping solution is the introduction of a new table column containing the name of the target descriptor and the corresponding PART_ ID.  Use nested tables to represent collections. Collections are specified in general by using the occurrence constraint ‘‘maxOccurs”. They could be bounded (positive

Fig. 5. MPEG-7 class hierarchy.

1566

M. Do¨ller, H. Kosch / The Journal of Systems and Software 81 (2008) 1559–1580

integer value) or unbounded (maxoccurs = ‘‘unbound”). Both occurrence types are mapped with the help of nested tables and additional nomenclatures and rules. Unbounded collections cover a set of subelements. These subelements are mapped as attributes to new database types labelled dummyTypeXXX. In addition, we have to introduce a nomenclature for the appropriate nested table name and its type. The nested table type is defined as dummyNTTypeXXX and the corresponding nested table name is called dummyNTTabXXX. The nested table type references to the previously created dummy type containing the subelements of the unbounded collection. In order to represent and access the unbounded collection within its origin type, a new attribute name has to be introduced which is labelled dummyAttrXXX and is of the nested table dummy type created beforehand. The XXX-extension represents consecutive numbering. Example mappings for unbounded collections are shown in the next Section 5.2, see Fig. 6 for the gray filled tables and objects.  Order database attributes in sequences. Sequences fix the order of elements within an XML instance documents. In this respect, the order of the database attributes has to be the same as in the MPEG-7 descriptors.  Use additional table attributes for choices. Choices are constructs that provide several elements for selection, whereas within an XML instance document only one element is allowed. Every element of the choice construct is mapped to a table attribute and the corresponding database type. Besides, if the choice is an unbounded collection then the rules from above have to be applied additionally. The combination of object-relational database features, relational keys and object references allows the mapping of the whole MPEG-7 standard into a corresponding database schema. The reduction of the MPEG-7 inheritance hierarchy by skipping abstract types and merging types that only contain a few attributes and elements results in a compact arranged database schema that allows the storage of any kind of MPEG-7 document and offers an efficient and rich model for querying it. The mapping enables us to retrieve multimedia data not only by low-level features, as this is done commonly in CBR-systems (Yoshitaka and Ichikawa, 1999; Veltkamp and Tanase, 2000), but also on semantically meaningful content in combination with low-level characteristics. For instance, in a sport application we could think of a query like: ‘‘give me all goals of Miroslav Klose in the Football WorldCup which he shot by head (high-level) and where he wear the traditional white/black dresses (low-level)”. 5.2. Mapping example Fig. 6 shows a small extract of our multimedia database schema which shall demonstrate the mapping

strategy. The complete schema may be obtained from http://www-itec.uni-klu.ac.at/~harald/codac/schema. pdf. The presented multimedia schema contains the mapping of an MPEG-7 StillRegionType which is a delegate for images in MPEG-7 (i.e., StillRegion denotes complete images and parts of them). In the database schema we created an object type of the same name (StillRegionType). Some of the elements are declared as separate object types, some are defined by the specific SYS.XMLType. The decision which type to use was carried on importance for the querying process. For instance, the element of type TextAnnotationType was chosen to be detailed further, because it is of importance for free-text search in the database, while the UsageInformation was chosen to be declared as SYS.XMLType. The later type contains only few information for a concrete image, because meta-information on the usage may already be declared at the MPEG-7 root level, further it spans a description scheme which might contain many different descriptions with similar content. This would lead to many database objects and tables, probably containing few content. Therefore, we decided to store the subtree of the MPEG-7 document containing UsageInformation directly in the database type/table. However, this information is not lost for querying. The reason is that the XMLType provides XPATH query functionality which enables one to reach elements and attributes of this document by XPATH. In other words, with a combined select and XPATH query any information may be reached. However, as pointed out earlier, important information can be reached directly through object navigation, as for instance from StillRegion to TextAnnotation. Other object types are for representing MediaInformation (e.g.: MediaInformationType, MediaProfile, MediaFormat, etc.). They describe information on coding, media attributes, locations and physical structure of the data. Semantic content of an image may be obtained through the Semantic reference to an SemanticBagType which is an abstract root object type for concrete semantic indexing classes, like events, places and time. Finally, the decomposition of the image (structural aspect) is specified by following the reference of the SpatialDecomposition element in the StillRegionType. Further important object types are ScalableColorType and ColorStrucutureType. They are used to store the feature vectors of the color histograms extracted from the images described by a StillRegion and are indexed with the help of our Multimedia Indexing Framework. Finally, we had to introduce some dummy types. These stem from collections specified in the MPEG-7 schema. As already mentioned in the last Section 5.1, they cannot be directly mapped to object associations and are represented as ‘‘dummy”, i.e., not originally named in MPEG-7. In terms of tables, they are nested tables in the respective parent table. An example is the VisualDescriptor

M. Do¨ller, H. Kosch / The Journal of Systems and Software 81 (2008) 1559–1580

1567

Fig. 6. Small extract of the MPEG-7 database schema (types and tables).

which is an XML element collection in the MPEG-7 StillRegionType. In order to store structured values, all relevant (queryable) types have to be declared as tables. Table names for types are shown in the last line of each box defining a type. For instance for the StillRegionType we defined two tables StillRegion() and Image(). The later models the delegate functionality of the StillRegion description scheme in MPEG-7.

6. Multimedia insert and querying libraries On top of the core management system reside several internal libraries which rely on the core components, the DB-Schema, the Indexing Processing and the Query Optimizer. We gave in Section 3.3 a brief overview of these libraries and detail in this Section the two important ones, the insert and querying libraries.

1568

M. Do¨ller, H. Kosch / The Journal of Systems and Software 81 (2008) 1559–1580

6.1. Multimedia insert library In order to input the MPEG-7 descriptions, we provide an internal library for inserting MPEG-7 documents into the database. Its task is to derive and execute the SQLinsert statements on the schema. Thus, it relies on the DB-schema provided, but supplies also its insert functionality to the application libraries residing on top of the internal libraries (see Fig. 4). Let us illustrate the most important steps that are used during insertion on our example MPEG-7 document (see Fig. 1 left side). The basic idea is to perform a post-order traversal of the document‘s DOM tree (see Fig. 1 right side) which is created with the help of an corresponding XML parser. A leaf node is defined as either having no more child nodes (e.g.: FreeTextAnnotation, Note: the text element is not indicated as a node) or representing an XMLType (e.g.: MediaLocator) which is inserted with all child elements. Besides the XMLType and basic types, such as Integer, the insertion process has to identify the following types: REF Type represents a reference pointing to a row somewhere in the database, Dummy Types indicates nested tables and Empty Types means that the element has no corresponding column in the current table (e.g. VisualDescriptor element of table StillRegion). The main part of the insert process is to identify a node in the DOM tree that corresponds to a database table in our schema. After a successful identification the necessary insert statement has to be created. For this purpose, we have to specify the table name, the parameter list and their values. The attributes of the current node (identified as a database table) can be fetched with methods of the DOM API to retrieve all children. The attribute name/value pairs are stored into the respective column and value list vectors. Finally, the complete insert statement can be built and executed. After a successful execution, the algorithm proceeds with the next node in the post-order traversal. The algorithm terminates as soon as the root element of the document is reached. Looking at the example, that would mean that the tree is traversed along the leftmost subtree until the MediaLocator node has been reached. As its type is XMLType, the traversal is continued at its sibling, SpatialDecomposition. Again, the tree is traversed until the next leaf node is reached, namely FreeTextAnnotation. The parent node is a possible insert candidate and therefore processed as described before. The other nodes considered immediately for being inserted are (in the order of traversal): VisualDescriptor, StillRegion, SpatialDecomposition, Image, MultimediaContent, Description and Mpeg7. 6.2. Multimedia querying library The multimedia querying library, QueryLib, builds on the multimedia schema to provide access to the stored multimedia data. The MPEG-7 MMDB supplies three query alternatives which are detailed in the following paragraphs.

Due to the expressiveness of MPEG-7, we restrict ourselves in the running example to audio descriptions. The described services can be similarly employed for other media types. Fig. 7a represents a possible MPEG-7 description for audio data. For simplicity, the audio description contains only information about MediaLocation and CreationInformation such as Title, Abstract and Creator. Fig. 7b denotes the corresponding tree representation. The first service allows one to query the database traditionally with the help of SQL and XPATH expressions. For instance, one is interested in all music files which are interpreted by Avril Lavigne. The file location is typically described with the help of MPEG-7’s MediaLocator descriptor (see instance document at Fig. 7a). In our database schema, this descriptor is realized as an attribute of the table audio and is of type XMLType. In order to obtain the interpreter, one has to join the audio table with the PersonGroup table, build an intermediate table of Agents in a PersonGroup and select all instances where the interpreter name is like Avril. Therefore, this would result in the following query: SELECT extract(medialocator,‘/MediaLocator/MediaUri/text()’) FROM audio v, PersonGroup pg, table(pg.dummyAttr190) pg_dummy190 WHERE v.doc_id = pg.doc_id and deref(pg_ dummy190.COLUMN_VALUE).Name LIKE‘%Avril%’; As the type of MediaLocator is XMLType, we have to use the XPATH /MediaLocator/MediaUri/ expression in order to retrieve the required information. The text() method in combination with the extract() method gives the desired data. The main disadvantage is that the output is not formatted in MPEG-7. The second service allows one to format the output of a query with the help of Oracle’s XMLDB functionality. By combining XMLElement, XMLAgg and XMLAttribute functions valid MPEG-7 descriptions can be produced. Let us assume, that we want to search the database again for songs which have been interpreted by Avril Lavigne. The selection part is on the bottom of Fig. 8a and remains the same as in the first service with some supplemental table specifications for the MPEG-7 output. The MPEG-7 output should contain information about the location of the audio file and several CreationInformation such as Title, Creator, CreationCoordinates and MediaTime. The top of the query in Fig. 8a fulfills this task. The main advantage of this service is the formatted output. Therefore, the result can be forwarded to any application that can process MPEG-7. Further, we keep the advantage that any information is reachable in combination with XPATH expressions. The main disadvantage of this approach is the high complexity of the query statements which makes it usable only to database experts with a good knowledge of MPEG-7. Thus, one may use alternatively our QueryLib, where the user can create individual select statements with the

M. Do¨ller, H. Kosch / The Journal of Systems and Software 81 (2008) 1559–1580

1569

Fig. 7. (a) MPEG-7 instance document describing audio data and (b) corresponding tree representation.

Fig. 8. (a) Audio query that returns an MPEG-7 description and (b) code fragment that produces same output as (a), but using our QueryLib.

help of a modular construction system. This modular construction enables users to specify the select part as well as the where part according to their needs. The QueryLib uses the circumstance that instance documents of the MPEG-7

standard can be represented with the help of a tree structure (see Fig. 7b). In general, database queries consists of two parts, the select clause and a possible where clause. The select clause defines all information where a user is

1570

M. Do¨ller, H. Kosch / The Journal of Systems and Software 81 (2008) 1559–1580

interested in, whereas the where clause constrains the amount of resulting tuples. The construction of the select clause becomes complex, if we claim that the corresponding output of a query has to follow the MPEG-7 standard, which enjoins a correct ordering of descriptors within the same level and within the tree hierarchy. For instance the Creation descriptor can only follow the CreationInformation descriptor and so on. The main idea of the QueryLib is that the user can pick out all desired information that the resulting MPEG-7 instance document should contain. Let us illustrate this approach with the same scenario as above. The user queries for all music songs of a certain interpreter (Avril Lavigne) and the resulting matches should be formatted as valid MPEG-7 descriptions containing the same information as mentioned above. Fig. 8b shows the code fragment using our QueryLib. First, the user has to instantiate the QueryLib by specifying the area of interest (see line 3 at the code fragment), which is in our case the audio part of MPEG-7. During instantiation, a full tree representation of the required MPEG-7 description is created internally. Afterwards, the user can modify the select clause by adding the desired information (see lines 6–18) which internally leads to a tagging of the chosen descriptor in the tree representation. The addSelectClause method demands two parameters. The first one is the desired descriptor and the second one contains all father nodes up to the root which is in our example the audio descriptor. The list of father nodes is needed because descriptors can occur in different levels (e.g., the MediaLocator can be a descriptor of the top level audio descriptor or of an AudioSegment which is a subpart). In our example, the resulting MPEG-7 instance document should contain the MediaLocator descriptor for each found music song. For this purpose, as shown at line 6, one has to call the respective method with the name of the descriptor (in our case MediaLocator.NAME) as first parameter and a list of all father nodes as second parameter. This list is empty, because the desired MediaLocator directly resides under the root. Further, we are interested in the title of the song and the name of the music group. Therefore, the user has to proceed as follows. From lines 8 to 10, the list of father nodes is specified. This list contains two nodes which represent the path CreationInformation – Creation. Then, in lines 11 and 13, the addSelectClause method is executed with their respective parameters (e.g., Title.NAME as first and the vector as second). The result set of our query should be restricted to a certain entertainer. Therefore, the query has to be extended by a where clause which is realized in the code fragment from lines 22 to 26. There, an instance of class PersonGroupName is instantiated that contains the name of the searched music group. The main advantage of this approach is that it combines the possibilities of the previous services, namely formatted MPEG-7 output and accessibility of all information in

combination with XPATH expressions. In addition it reduces the complexity of the query creation process with the help of a modular construction system that allows the adjustment of the select clause as well as the where clause to the user needs. 7. Multimedia indexing framework (MIF) The Multimedia Indexing Framework (MIF) realizes the access to the index structures and provides efficient execution of Point, Range and Nearest Neighbor (NN) queries. 7.1. Supported query types The MIF support different query types, which are defined in the following. The simplest one is the point query which retrieves all points in the database with identical feature vectors. In all following definitions Q represents the query point: PointQueryðDB; QÞ ¼ fP 2 DBjP ¼ Qg We are commonly interested in similar objects to a given one. The result of a similarity match depends on the used metric and the used query type. A similarity query type is the Range Query. The Range Query returns all points P that have a smaller or equal distance r from the query point Q with respect to the used metric M: RangeQueryðDB; Q; r; MÞ ¼ fP 2 DBjdM ðP ; QÞ 6 rg The Nearest Neighbor Query (NNQ) is very important in content-based retrieval applications. In contrast to the Range Query, NNQ returns exactly one result, namely the most similar (with the lowest distance) to the query point Q. A relative to this type of query is the k-Nearest Neighbor Query (k-NNQ) which returns k nearest points: NNQðDB; Q; MÞ ¼ fP 2 DBj8P 0 2 DB : dM ðP ; QÞ 6 dM ðP 0 ; QÞg k  NNQðDB; Q; k; MÞ ¼ fP 1 . . . P k 2 DBj:9P 0 2 DB n fP 1 . . . P k g ^ :9i; 1 6 i 6 k : dM ðP i ; QÞ > dM ðP 0 ; QÞg The choice of the appropriate similarity query type depends on the a priori knowledge of a good query range. If the dispersion of the query points is not known a-priori, the query range cannot be estimated exact enough. Consequently, using k-NNQ tends to be more useful here. The process of searching is any case iterative, one normally starts with a small range and increases the range iteratively until the most similar objects are found. Equally, the kvalue in the k-NNQ is increased until the most similar objects are found. The used distance metric is the Euclidean Distance. In future works, we plan to integrate other metrics into our Multimedia Indexing Framework.

M. Do¨ller, H. Kosch / The Journal of Systems and Software 81 (2008) 1559–1580

7.2. Index processing MIF serves as back-end to the indexing processing (see Fig. 9) module which is located in the database system. The main task of this module is to manage the multimedia index type consisting of several indextypes, their corresponding operators and the appropriate implementation (objects). The R-tree is one of the index structures supported by MIF. In order to make the R-tree implementation of MIF available to the user, we had first to define a new Oracle indextype for the R-tree. Then, several similarity search operators were realized with this new type. Example of operators are: – rt_equal_point(CLOB, CLOB, number, number); – rt_nearest_point(CLOB, CLOB, number, number). The first operator defines an equality search and the second one a NN-search for point data. The parameters are defined as follows: (1) element in the database table, (2) search item, (3) amount of results and (4) the dimension. Finally, we defined an object that delegates all necessary index methods (e.g., ODCIIndexInsert, ODCIIndexCreate, etc.) to their corresponding implementations. 7.3. Back-end of the MIF The back-end of MIF is divided into three modules (GistWrapper, GistService and GiST Framework) as shown in Fig. 9. Each module may be used on its own and may be distributed over the network. The front-end connects to the back-end through the shared library called GistWrapper. 7.3.1. GistWrapper The GistWrapper module is a shared library written in C++ that is used by the Index Processing to connect to

1571

the GistService. The library has two main tasks. First, it makes the GistService accessible for database procedures. Second, it is responsible for the transformation of the input and output data to make it usable for both the GistService and the database. For instance, a simple C Char type has to be transformed into a BSTR string (binary string which represents wide, double-byte (Unicode) strings on 32-bit Windows platforms) to be employed in the database, or a VARIANT type into a String and so on.

7.3.2. GistService The GistService is the main part of MIF. It runs as an own process (called service) in an Windows Operating System environment and manages all available access methods. The current version offers support for Generalized Search Trees (GiST – see Section 7.3.3) and further access methods not relying on balanced trees (e.g., LPC-files Cha et al., 2002) to support NN-search in high dimensional vector spaces. The service is split into two main components: The GistCommunicator and the GistHolder. The GistCommunicator is a COM-object (Component Object Model) and supplies the necessary functionality (e.g., creating, inserting, deleting) for accessing the index structures. The result of the operations are forwarded by the GistWrapper to the database. It is the task of the GistHolder to manage all currently running index trees and the accesses to them. Each index tree is identified through a global and unique ID which is forwarded to the accessing process. For simplicity, the index trees are internally stored in an array, but this data structure can be replaced easily by any other more dynamic structure.

7.3.3. GiST framework MIF relies partially on the GiST Framework (Hellerstein et al., 1995). The theory and implementation of the GiST

Fig. 9. Multimedia indexing framework.

1572

M. Do¨ller, H. Kosch / The Journal of Systems and Software 81 (2008) 1559–1580

framework was developed by Hellerstein and his group at the University of California, Berkeley. The GiST framework enables an access method developer to build any kind of balanced index tree on any kind of data by implementing some specific methods for insertion, deletion and search. The GiST framework is widely used in the research community (Bliujute et al., 1998; Carson et al., 1999). In general, a GiST is a balanced tree with (key, RID) pairs in the leaves and (predicate, child page pointer) pairs as internal nodes. The framework and their corresponding trees have no restrictions on the key data stored within the tree or their organization within and across nodes. The framework itself is extensible by new access methods. The current GiST version, 2.0, already includes source code6 for R-tree, R*-tree, SS-tree, SR-tree, SP-tree and Btree. The available GiST software cannot run in a DBMS context without changes. First of all only one index tree may be used at time. Second, no support for additional non-balanced tree indexes is given. Thus, we extended GiST to manage several trees by introducing data-structures for controlling the state of multiple trees. Second we implemented the GistService in such a way that it can consider beside GiST other index structures. New index structures have to obey to the GistService interface for insertion, deletion and search. The complete query system and MIF were intensively evaluated. First of all through performance tests which are reported in Section 8, second through the realization of real-world applications which are described in Section 10. 8. Experimental results This Section describes a significant part of the series of experiments we performed in order to evaluate the efficiency of our query and indexing system. The tests were carried out on two distinct datasets, one synthetic (uniform dataset) and one real. The experimental settings are as follows:  The synthetic dataset contains 64- and 96-dimensional feature vectors that are represented as strings. The values were generated uniformly over the normalized [0. . .1] space.  The real dataset was generated from an 1 h and 46 min long movie, encoded with DIVX4. From the movie, we extracted 64-dimensional color histogram of 200,000 frames of size 352  288 pixel, by retaining the two most significant bits in the RGB space. The generated feature vectors were inserted into the database.

6

http://gist.cs.berkeley.edu/.

We compared MIF first to a non-indexing solution (simple scan over the tables) and then to the only available multi-dimensional indexing method in Oracle, the Oracle Text Index. Other build-in indexing methods, like the Btree can be used only for single value attributes. The experiments had to be carried out on exact match queries, as the Oracle Text Index supports only this type of query. The remaining supporting MIF retrieval functions, like range search, NN-search and overlap search, are in addition to the available build-in index functions. The retrieval was carried out through server-sided JDBC, i.e., the java class resides in the database and are executed through Oracle’s own JVM (Java Virtual Machine). The insertion was accomplished through a java class that resides outside of the database and was connected through thin JDBC. 8.1. Indexing details Due to the high dimensionality of the feature vectors (664), we have to use a Large Object representation. Oracle offers either a BLOB (Binary-) or CLOB (Character-). Both data-types can be indexed by the Oracle Text Index. We have decided to use a CLOB representation. It offers for the libraries broader filter and access functionality than a BLOB. It can be used for indexing high-dimensional data by the following means. The indexing engine of the Oracle Text Index creates an inverted index that maps tokens to the documents that contain them. The inverted list is a ‘list’ of words from the document, with each word having a list of documents in which it appears. In our case, the words are points in the various dimensions. Based on this index technique, we compared the response time between a normal (no index) solution, the Oracle Text Index and MIF. The response time was measured for insertion and query operations. The query operation was limited to exact match queries, because of the limited functionality of the Oracle Text Index. As mentioned above this shortcoming can be compensated with MIF. 8.2. Detailed results The comparison of MIF using R-trees and the Oracle 10g build-in Text Index shows that the MIF-based trees show less insertion efficiency (due to the external proc calls to MIF), but offer significantly higher query performance. 8.2.1. Indexing – synthetic dataset The following figures show the results for the insertion process: Fig. 10a for 64-dimensional point entries and Fig. 10b for 96-dimensional point entries. These figures show that the MIF has a higher insertion time than the related solutions. The extra time for the MIF is caused by the overhead from switching and transferring the data between the Oracle address space and the external address

M. Do¨ller, H. Kosch / The Journal of Systems and Software 81 (2008) 1559–1580

1573

Fig. 10. Response time of the insert statements (50,000–200,000 points with 64 dimension (a) and 96 dimension (b)).

space. This overhead does not penalize the MIF, because insertion are rare in the mostly read-only multimedia application context, and second the insertion time does not explode even for a large data set. Although the insert time of the Oracle Text Index is better, this index reveals a severe shortcoming compared to our solution. The memory consumption of the Oracle Text Index is enormous. The table space consumed is in many cases over 3.8 GB for an insertion operation of 200,000 point elements with 96 dimensions. Compared to this, the memory consumption of the MIF is significantly smaller,

e.g., for the same insertion operation as above, it requires only about 220 MB. 8.2.2. Retrieval query – synthetic dataset The results for the query evaluation show that our framework MIF outperforms clearly the related solutions. This is important, as in typical ad hoc scenarios the query process is far more often used than the inserting process. Fig. 11 displays the mean value of five measurements each including 100 select statements. The query was exercised on 64-dimensional point entries for the left figure

Fig. 11. Response time of select statements of points with 64 dimensions (a) and 96 dimensions (b).

1574

M. Do¨ller, H. Kosch / The Journal of Systems and Software 81 (2008) 1559–1580

Fig. 12. Response time of inserts (a) and select (b) statements for color histograms (real dataset) with 64 dimension.

(a) and 96 dimension for the right one (b). A sequential search is, of course, significantly slower. Positively, the MIF environment is in mean from 6 to 7 times faster than the Oracle Text Index and this for both dimensions (64 and 96). Furthermore, MIF performs from 85 to 134 times better than the no index solution for the same amount of indexed elements. 8.2.3. Indexing and retrieval – real dataset The response time for the insertion process with the real dataset was similar to the results obtained from the synthetic dataset test series and is presented in Fig. 12a. The response time is again measured as the mean value of 5 measurements each including 100 select statements. The response time of the sequential scan was very similar to that measured for the synthetic data set (see Fig. 11) and is not presented in this figure. Fig. 12b shows that the Oracle Text Index performs worse than in the previous test series. On average, we increased our efficiency by a factor of 7 compared to the synthetic data which leads to an overall improvement of 46 times faster than the Oracle Index (and this even with the call to an external address space). Contrarily to the uniform dataset from above, the color histogram derived from this special movie contains far more zero values. This fact seems to shorten the use of the Oracle Text Index significantly. 9. Cost-based query optimizer The performance of multimedia query execution can be increased by the use of appropriate index structures, as described in Section 7 and by the use of a query optimizer

that takes care of the optimal query plan. A cost-based query optimizer uses a cost function for finding the optimal query plan (Stonebraker et al., 1998). This cost function is in general composed of used CPU cycles, used network bandwidth and disk page accesses. We concentrate here on the similarity search operators which are the performance critical parts in the query systems. The other operators can be dealt with the build-in optimization strategies of Oracle. In the context of similarity queries, the number of accessed disk is of interest (Bo¨hm et al., 2001). Therefore, an appropriate cost model for counting the disk accesses for the range- and NN-search has to be investigated. Several cost models have been proposed for multidimensional index structures, such as (Berchtold et al., 1997; Lee et al., 1999; Amato et al., 2000). All of them try to approximate the number of disk accesses as accurately as possible with the price of complex evaluation procedures (e.g., for computing the volume of search spheres which intersects with the bounding boxes of the index). In query optimization, as it is an online process, no complex evaluations can be considered. Therefore, we pursued only the principal idea of the related cost models from Berchtold et al. (1997) and Lee et al. (1999) and developed a new model which at first can be evaluated online and secondly incorporated into the extensible query optimization framework of Oracle. Cost Model: The cost model is derived for the rangesearch. Extension for the k-NN search is possible through the methodology described in Lee et al. (1999) (approximate the range for a given k). Let us now assume a set of feature vectors M to be queried by a sample object o. Furthermore, let o and M follow a distribution function over a normalized object space of

M. Do¨ller, H. Kosch / The Journal of Systems and Software 81 (2008) 1559–1580

1575

½0; 1Þd (d is the dimension of the feature vector) and that a balanced tree index structure, such as an R-, SR-, SS-tree, is used. The notations used here are shown in the table below. Symbols Notations o M d N m xi C eff V je DAðoÞ

Query object Set of feature vectors Dimension of the feature vectors in o and M Number of feature vectors in M Number of leaf nodes in the index structure of M Length vector of the ith leaf node Li of M i.e. 1 d xi ¼ ðxi ; . . . ; xi Þ ð1 6 k 6 dÞ Average number of feature vectors in one leaf page of M Hyper-volume of the hypersphere S je with radius e for a dimension 0 6 qj 6 qd Expected disk accesses for the range-search of the object o

The query optimization extension interface of Oracle provides us only with a subset of the information which were introduced in the table above. These are the number of feature vectors N in M and the range e. From MIF, we obtain C eff and d. Our strategy was to develop first a general cost model with the parameters given in the table and then to reduce the number of parameters to those available by adopting uniform distributions where reasonable. The expected number of disk accesses DAðoÞ for a range search of a given object o is determined by the number of leaf partition nodes of M that intersects the hypersphere S de containing the neighbors for o within the distance of e. In order to compute the number of leaf partition nodes of M that intersects S de , the Minkowski Sum of S de and the leaf nodes is determined. The Minkowski Sum corresponds graphically to the volume of an area which results from moving the center of S de over the surface of the bounding box of one leaf node. Summing up the Minkowski Sum for each leaf node in M results in the expected number of disk accesses DAðoÞ. For instance, consider Fig. 13 for the Minkowski Sum in a two-dimensional space. One leaf node Li of M is a rect1 2 angle with length vector xi ¼ ðxi ; xi Þ. The query circle S 2e with radius e is the region which includes the neighbors of the query object. The number of disk accesses for the query object is the sum of the probabilities for intersection with all leaf partitions. The probability that S 2e intersects one leaf partition Li corresponds graphically (see Fig. 13) to an area which results from moving the circle S 2e around the leaf partition Li. The volume of this area, i.e., the probability that S 2e intersects Li, is:

Fig. 13. Minkowski sum in a two-dimensional space. m X

volumeðLiÞ þ perimeterðLiÞ  e þ V 2e

i¼1

¼

m X

1

2

1

2

xi  xi þ x i  e þ xi  e þ p  e2 :

i¼1

Expressing the last line in terms of hypersphere volumes V je of dimensions lower than d, we obtain a slightly modified formulae exactly to the Minkowski Pthat 1corresponds 2 1 2 Sum of S de : mi¼1 xi  xi  V 0e þ xi  V 1e þ xi  V 1e þ p  V 2e . In the general case, DAðoÞ computes than as: (1) DAðoÞ ¼

m X d X i¼1

X

j¼1 fy 1 ;...;y j g2PowerSetfxi 1 ;...;xi d g

volumeðfy 1 ; . . . ; y j gÞ  V dj e where, (a)

pffiffiffi pj V je ¼ Cðj=2þ1Þ  ej , Cðj=2 þ 1Þ denotes the Gamma

(b)

Function7; Qj volumeðfy 1 ; . . . ; y j gÞ ¼ k¼1 y k .

volumeðLiÞ þ perimeterðLiÞ  e þ V 2e :

In order to obtain a cost model for high-dimensional spaces, we have to consider the so-called boundary effects, i.e., the perimeter of the bounding box of one leaf node of d M is partially outside the data space ½0; 1Þ and does consequently not contribute to the probability that this leaf node intersects the query object (Berchtold et al., 1997). The main idea to deal with this problem is to attribute significance to the dimensions. i.e., a dimension is significant if its part in the Minkowski sum contributes to the overall probability. It has been shown by Berchtold et al. (1997) that only the first d 0 dimension are significant with d 0 ¼ dlog2 ðN =C eff Þe. Consequently, Eq. (1) may be rewritten as follows in order to suit to boundary effects.

Summing it up to the index structure with m index nodes, we obtain the expected number of disk accesses DAðoÞ for the range-search of an object o which expresses in the two-dimensional case as:

7 See for instance: http://mathworld.wolfram.com/GammaFunction. html.

1576

M. Do¨ller, H. Kosch / The Journal of Systems and Software 81 (2008) 1559–1580

(2) DAðoÞ ¼

m X i¼1

d0 X

X

10.1. Audio recognition tool

j¼1 fy 1 ;...;y j g2PowerSetfxi 1 ;...;xi d 0g 0

volumeðfy 1 ; . . . ; y j gÞ  V ed j : It is a common use in DBMS to estimate uniform object distribution (Stonebraker et al., 1998), for instance for attribute values in order to compute the join selectivity. We will adopt this principle here. Uniform object distribution means that the object points follow a random distribution d in the normalized object space of ½0; 1Þ .The number of leaf nodes in the index structure of M, m, is computed by fixing an index page size of B Bytes (typically 4096 Bytes). This B implies an effective capacity of C eff ¼ dV data objects per data page, supposing that the value in each dimension is stored in V Bytes. Consequently, m ¼ dVBN and d 0 ¼ dlog2 ðN dV Þe. Finally, the length vector xi of all leaf node B in M is under this assumptions ð1=2; . . . ; 1=2Þ, i.e. k 8kð16k6dÞ ; xi ¼ 1=2. Eq. (2) can then be simplified (using N) for a uniform object distribution to: d 0  0   j X d 1 0 (3) DA ¼ N   V ed j :  o C eff j¼1 j 2 0

where V ed j can be evaluated as, pffiffiffiffiffiffiffiffiffi ffi 0 pd j 0 d 0 j Ve ¼  ed j : Cððd 0  jÞ=2 þ 1Þ 10. Multimedia applications Our MPEG-7 MMDB was validated by the use of two multimedia applications, one in the field of audio recognition and one for image retrieval.

The Java-based audio recognition tool (see Fig. 14a) enables one to search for music compositions in our multimedia database. The developed tool offers two kind of search techniques. The first search technique is based on a search of music title, interpreter and genre. This information is described with the CreationInformation descriptor of MPEG-7 and internally stored in respective database tables and types. The second search technique uses audio signals that are specified with the AudioSignature descriptor. The audio signal is recorded with a microphone, converted to MPEG-7 descriptions and forwarded to the AudioLib. The result of both techniques are given in the MPEG-7 format and presented by the audio tool. The next Section details the basics of the audio recognition and Section 10.1.2 depicts the query process in our MMDB.

10.1.1. Similarity in audio recognition Beside information about e.g., format, storage, author or producer, the MPEG-7 description of audio content can also include descriptions of the audio signal on a more or less abstract level. We use the AudioSignatureDS to describe the content of a piece of music for identification. It was shown that this descriptor is very robust against most common modifications (Hellmuth et al., 2001; Kastner et al., 2002) of a piece of music like: filtering (lowpass, highpass, bandpass, equalizer, etc.), additive noise and amplitude change. In order to create the AudioSignatureDS the average flatness of the spectrum is determined approximately once per second (default) for 16 sub-bands (default). Small values for flatness indicate peaks in the spectrum which is typical for tonal components; values close to one refer to flat

Fig. 14. (a) Audio recognition client and (b) blob-based image retrieval client.

M. Do¨ller, H. Kosch / The Journal of Systems and Software 81 (2008) 1559–1580

spectrum corresponding to a noise-like or an impulse-like signal. The temporal resolution and the bandwidth of the AudioSignatureDS can be varied to find a good tradeoff between the size of the description schema and its robustness. The AudioSignature DS consists of two matrices. For each segment of 10 seconds of music a feature vector is created by reading the values of the corresponding rows of the first matrix and appending the rows with the same index of the second matrix. Note: The order of the values in the feature vector has no influence on the performance of the identification algorithm. It can be adapted to the memory model of the application. If the matrices are stored column-major, it might be more performant to read the values of the corresponding segment columnwise. With the feature vectors described above, the similarity between two segments can be defined by the Mahalanobis distance in the following equation: ðq  sn;Dt ÞT R1 xx ðq  sn;Dt Þ

ð1Þ

with q is the feature vector of query, sn;Dt the feature vector of songs number n after Dt seconds and Rxx is the covariance matrix (2): 0 1 Covðv1 ; v2 Þ . . . Covðv1 ; vn Þ Varðv1 Þ B Covðv2 ; v1 Þ Varðv2 Þ . . . Covðv2 ; vn Þ C B C B C: Rxx ¼ B .. .. .. .. C @ A . . . . CovðvN ; v1 Þ CovðvN ; v2 Þ    VarðvN Þ ð2Þ The covariance matrix Rxx (2)is a N  N-matrix (N: length of feature vector). It describes the covariance between each pair of dimensions of the feature vector. It depends on the channel with which the music signal is modified and has to be measured or estimated. Sub-bands with a low distortion are weighted stronger than sub-bands with a high distortion. The Mahalanobis distance also takes into account the correlation between different dimensions especially the covariance between successive values of the same sub-band and the correlation between values of adjoining sub-bands Crysandt, 2005. 10.1.2. Music composition recognition in the MPEG-7 MMDB The music composition recognition consists of an insert and query process. During insertion of a new music composition, the audio signal described with the AudioSignature descriptor is split into pieces corresponding to fragments of 10 s of an audio signal. From each fragment an 160-dimensional vector is computed. To take into account the effect of audio signal inaccuracy, we have to recalculate the given vector based on the approach described in Section 10.1.1. Further, we use the concept of a singular value decomposition (SVD) to reduce the dimensionality of each fragment down to

1577

an 10-dimensional feature vector. This vector is indexed by an R-tree provided by our multimedia indexing framework (MIF). The query process within our MPEG-7 MMDB for audio recognition (see Fig. 4) is realized by the following means. The audio recognition tool retrieves audio signals with a duration of 10 s of an unknown music composition which is recorded with the help of an microphone or an equivalent input device. Further, the signal is transformed and described with the AudioSignature descriptor. The resulting MPEG-7 description is forwarded to the AudioLib, where the document is parsed and a music composition retrieval is generated. For retrieval, our internal audio index implementation is used with the given audio signature. The audio signature is recalculated and reduced in dimension (see above). The resulting 10-dimensional vector is used as input for the NN-search provided by the MIF. The found music composition is returned to the AudioLib library. Further, the library selects all additional available information (CreationInformation descriptor) of the music composition and creates an MPEG-7 description. This document is forwarded to the audio recognition tool, where it is parsed and presented. 10.2. Image retrieval The MPEG-7 MMDB is also demonstrated by a new image retrieval system (see Fig. 14a) which relies on the blob theory (Carson et al., 1999; Thomas et al., 2000). We will first describe the Blob theory (10.2.1) and then detail the image retrieval in the MPEG-7 MMDB (10.2.2). 10.2.1. Blobs theory Blobs are regions which are roughly homogeneous with respect to color and texture. In the original system, the characteristics of a blob are described with a maximum of 218 color values (L  a  b color space) and the texture values for contrast and anisotropy. The polarity description is not included. In addition, the blob description contains two values for the blob position and 40 shape values. These values are stored in a simple text file. MPEG-7 Blobworld uses the original extracted features and creates a MPEG-7 description for every image. A blob within an image is described with the StillRegion description and an unique ID (within the Image description) as attribute. The (x, y) position of a blob is denoted with the SpatialLocator descriptor. The color, shape and texture feature vectors are characterized with the respective MPEG-7 descriptors (ScalableColor, RegionShape and HomogeneousTexture). Our retrieval web client supports similarity search for existing images that are already stored in the database as well as an upload function for new images. The retrieval is realized by a similarity search of blobs that are compared depending their low-level features (color, shape and texture).

1578

M. Do¨ller, H. Kosch / The Journal of Systems and Software 81 (2008) 1559–1580

10.2.2. Image retrieval in the MPEG-7 MMDB An image contains several blobs which are described respectively with the StillRegion descriptor. Further, the low-level features (color, shape and texture) are specified with the help of the ScalableColor, the RegionShape and the HomogeneousTexture descriptors for each blob. During insertion of an image, the blobs are separated and the values of their features are stored in the according database tables and types. In order to increase the similarity efficiency we use R-trees and SS-trees (White and Jain, 1996) of our MIF for indexing the feature vectors in order to increase efficiency. The query process for image retrieval is realized as follows: at the beginning, the user has to specify a blob of an image. This can be an image of the database or one provided by the user with our upload functionality. Then, all necessary low-level features are extracted and described with MPEG-7. The MPEG-7 description, in combination with user preferences (for color, shape and texture) and the demanded amount of results is forwarded to the database. There, the document is parsed and the image retrieval activates three NN-searches for color, shape and texture, respectively. The results are merged and weighted corresponding to the user preferences. The resulting ranking is affected by the user preferences and the multiple occurrences of the same blob in the three result sets. After a successful ranking, the demanded amount of blobs will be returned as a MPEG-7 description to the web client. There, the description is parsed and presented to the user. 11. Conclusion and future work This paper has introduced the MPEG-7 MMDB as an MPEG-7 supported Database System Extension to the Oracle DBMS. It allows one to process (e.g., insert, query, retrieve) multimedia data more efficiently than related approaches (e.g., DMBS extenders or CBR systems). The core of our system is the MPEG-7 multimedia schema. This schema is a mapping of the MPEG-7 DDL (Data Definition Language) to database object types and XMLTypes. It is the first time, that MPEG-7 is considered in a DBMS data model. As a consequence, one may query all parts of the MPEG-7 schema, e.g., to formulate high-level queries (e.g., for a certain person or object) in connection with CBR-queries (e.g., for color similarity). Second, we have proposed a multimedia querying system where the user can create individual select statements with the help of a modular construction system. This modular construction enables one to specify the select part as well as the where part according to his/her needs. The output is formatted in MPEG-7 which makes it usable for any application that is able to process the standardized MPEG-7 descriptors. Third, we have proposed an extension of the Oracle’s indexing capacities based on our MIF (Multimedia Indexing Framework). This framework allows not only the execution of exact-match queries, but also supplied more multimedia specific operations, like range-search, NN-

search, overlap, etc., and meets precisely the requirements of multimedia search and filter applications. Our experimental analysis have shown, that MIF outperforms clearly the build-in indexes of Oracle for exact queries (other cannot be handled by the base system). Fourth, we have integrated a cost-based query optimizer to our MPEG-7 MMDB by implementing the respective Cartridge interface. For this, a cost model is developed for approximating the number of accessed disk pages in similarity searches. Finally, the resulting system, extending Oracle 10g, was verified and demonstrated by the use of two new applications in the field of audio recognition and image retrieval, that both uses the MPEG-7 standard for content retrieval. The proposed methodology applies to other extensible object-relational DBMS as well, as long as they provide an extensible type system, query processing/optimization interface and access methods, as for instance IBM DB2 and IBM Informix do. In future works, the MPEG-7 MMDB will be enhanced in order to support the MPEG Query Format (MPQF) (Adistambha, 2007; Do¨ller et al., 2007; Gruhne et al., 2007). The lack of a standardized interface for multimedia retrieval systems prevents clients experiencing aggregated services from various multimedia databases. Therefore, the MPEG standardization committee decided to start the work on an universal query format. The objective of this MPQF framework (see Do¨ller et al., 2006 for a first architecture proposal) is to provide a standardized interface to multimedia databases allowing the retrieval by users and client applications, based on a set of precise input parameters for describing the search criteria and a set of output parameters for describing the result sets. References Adistambha, Kevin, Doeller, Mario, Tous, Ruben, Gruhne, Matthias, Sano, Masanori, Tsinaraki, Chrisa, Christodoulakis, Stavros, Yoon, Kyoungro, Ritz, Christian, Burnett, Ian, 2007. The MPEG-7 query format: a new standard in progress for multimedia query by content. In: Proceedings of the Seventh International IEEE Symposium on Communications and Information Technologies (ISCIT 2007), Sydney, Australia. Amato, G., Rabitti, F., Savino, P., Zezula, P., 2000. Estimating proximity of metric ball regions for multimedia data indexing. In: First Biennial International Conference on Advances in Information Systems (ADVIS 2000), vol. 1909 of Lecture Notes in Computer Science, Izmir, Turchia. Springer-Verlag, pp. 77–81. Berchtold, S., Keim, D.A., Kriegel, H.P., 1996. The X-tree: an index structure for high-dimensional data. In: Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB), Mumbai (Bombay), India. Morgan Kaufmann, pp. 28–39, ISBN: 1-55860-3824. Berchtold, Stefan, Bo¨hm, Christian, Keim, Daniel A., Kriegel, HansPeter, 1997. A cost model for nearest neighbor search in highdimensional data space. In: Proceedings of the 16th ACM SIGACT– SIGMOD–SIGART Symposium on Principles of database systems, Tucson, Arizona, United States, pp. 78–86. Bliujute, R., Jensen, C.S., Saltenis, S., Slivinskas, G., 1998. R-tree based indexing of now-relative bitemporal data. In: Proceedings of the 24th International Conference on Very Large Data Bases (VLDB), New York, USA. Morgan Kaufmann, pp. 345–356.

M. Do¨ller, H. Kosch / The Journal of Systems and Software 81 (2008) 1559–1580 Bo¨hm, C., Berchtold, S., Keim, D.A., 2001. Searching in high-dimensional spaces – index structures for improving the performance of multimedia databases. ACM Computing Surveys 33 (3), 322–372. Carson, C., Thomas, M., Belongie, S., Hellerstein, J.M., Malik, J., 1999. Blobworld: a system for region-based image indexing and retrieval. In: Proceedings of the Third International Conference on Visual Information Systems, Amsterdam, The Netherlands. Springer-Verlag, pp. 509–517. Cha, G., Zhu, X., Petkovic, D., Chung, C., 2002. An efficient indexing method for nearest neighbor searches in high-dimensional image databases. IEEE Transaction on Multimedia 4 (1), 76–87. Chakrabarti, K., Ortega-Binderberger, M., Mehrotra, S., Porkaew, K., 2003. Evaluating refined queries in top-k retrieval systems. IEEE Transactions on Knowledge and Data Engineering (TKDE) 15 (5). Chen, S.-C., Kashyap, R.L., Ghafoor, A., 2000. Semantic Models for Multimedia Database Searching and Browsing. The KLUWER International Series on Advances in Database Systems, 21. Ciaccia, P., Patella, M., Zezula, P., 1997. M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB), Athens, Greece. Morgan Kaufmann, pp. 426–435, ISBN: 1-55860-4707. Cord, M., Gosselin, P.-H., Philipp-Foliguet, S., 2007. Stochastic exploration and active learning for image retrieval. Image and Vision Computing 25, 14–23. Crysandt, Holger, 2005. Hierarchical sound classification using MPEG-7. In: International IEEE Workshop on Multimedia Signal Processing – MMSP 2005, Shanghai, China. IEEE CS Press. Do¨ller, Mario, Wolf, Ingo, Gruhne, Matthias, Kosch, Harald, 2006. Towards an MPEG-7 query language. In: Proceedings of the International Conference on Signal-Image Technology and InternetBased Systems (IEEE/ACM SITIS’2006), Hammamet, Tunesia, pp. 36–45. Do¨ller, Mario, Renner, Kerstin, Wolf, Ingo, Gruhne, Matthias, Kosch, Harald, 2007. Introduction of an MPEG-7 query language. In: Proceedings of the Second International Conference on Digital Information Management (ICDIM 2007), Lyon, France. Gelasca, Elisa Drelie, De Guzman, Joriz, Gauglitz, Steffen, Ghosh, Pratim, Xu, JieJun, Moxley, Emily, Rahimi, Amir M., Bi, Zhiqiang, Manjunath, B.S., 2007. CORTINA: searching a 10 Million + Images Database. Technical Report, VRL, ECE, University of California, Santa Barbara. Gruhne, Matthias, Tous, Ruben, Do¨ller, Mario, Delgado, Jaime, Kosch, Harald, 2007. MP7QF: an MPEG-7 query format. In: Proceedings of the Third International Conference on Automated Production of Cross Media Content for Multi-channel Distribution (AXMEDIS 2007), Barcelona, Spain. Hellerstein, Joseph M., Naughton, Jeffrey F., Pfeffer, Avi, 1995. Generalized search trees for database systems. In: Proceedings of the 21st International Conference of Very Large Databases VLDB, Zurich, Switzerland, pp. 562–573. Hellmuth, Oliver, Allamanche, Eric, Herre, Ju¨rgen, Kastner, Thorsten, Cremera, Markus, Hirsch, Wolfgang, 2001. Advanced audio identification using MPEG-7 content description. In: Proceedings of the 111th AES Convention, New York. Audio Engineering Society. Jagadish, H., Al-Khalifa, S., Chapman, A., et al., 2002. TIMBER: A native XML database. The VLDB Journal 11 (4), 274–291. Jaimes, Alejandro, 2005. A component-based multimedia data model. In: Proceedings of the ACM Workshop on Multimedia for Human Communication – From Capture to Convey (MHC05), Singapore. Jianfeng, Y., Li, Z., 2004. A novel multimedia data model supporting temporal semantic abstraction. In: Proceedings of the Third International Conference on Information Systems Technology and its Applications (ISTA 2004), vol. 48 of Lecture Notes in Informatics (LNI), Salt Lake City, Utah, USA. Springer-Verlag, pp. 241–246. Jiang, H., Elmagarmid, A.K., 1998. Spatial and temporal content-based access to hypervideo databases. The VLDB Journal 7 (4), 226–238.

1579

Kastner, Thorsten, Allamanche, Eric, Herre, Ju¨rgen, Hellmuth, Oliver, Cremer, Markus, Grossmann, Holger, 2002. MPEG-7 scalable robust audio fingerprinting. In: Proceedings of the 112th AES Convention, Munich, Germany. Audio Engineering Society. Katayama, N., Satoh, S., 1997. The SR-tree: an index structure for highdimensional nearest neighbor queries. In: ACM SIGMOD International Conference on Management of Data, pp. 369–380. Kleiner, Carsten, Lipeck, Udo W., 2003. OraGiST – how to make userdefined indexing become usable and useful. In: BTW 2003 – Datenbanksysteme fnr Business, Technologie und Web – Tagungsband der 10. BTW-Konferenz, Leipzig, Germany, pp. 324–334. Kornacker, M., 1999. High-performance extensible indexing. In: Proceedings of the 25th International Conference on Very Large Data Bases (VLDB), Edinburgh, Scotland. Morgan Kaufmann, pp. 699–708. Kosch H., 2003. Distributed Multimedia Database Technologies Supported by MPEG-7 and MPEG-21. CRC Press, 248 pp., ISBN: 0-84931854-8. Lee, Ju-Hong, Cha, Guang-Ho, Chung, Chin-Wan, 1999. A model for knearest neighbor query processing cost in multidimensional data spaces. Information Processing Letters 69 (2), 69–76. Lin, K.I., Jagadish, H.V., Faloutsos, C., 1994. The TV-tree: an index structure for high-dimensional data. VLDB 3 (4), 517–542. Martinez, J.M., Koenen, R., Pereira, F., 2002. MPEG-7. IEEE Multimedia 9 (2), 78–87. Mezaris, V., Doulaverakis, H., Beltran de Otalora, R.M., Herrmann, S., Kompatsiaris, I., Strintzis, M.G., 2004. Combining multiple segmentation algorithms and the MPEG-7 experimentation model in the schema reference system. In: Proceedings of the Eighth International Conference on Information Visualisation (IV 2004), IEEE CS Proceedings, London, UK, pp. 253–258. Murthy, Ravi, Banerjee, Sandeepan, 2003. XML schemas in Oracle XML DB. In: Proceedings of the 29th VLDB Conference, Berlin, Germany. Morgan Kaufmann, pp. 1009–1018. Myron, Flickner, Harpreet, Sawhney, Wayne, Niblack, Jonathan, Sashley, Qian, Huang, Byron, Dom, Monika, Gorkani, Him, Hafner, Denis, Lee, Dragutin, Petkovic, David, Steele, Peter, Yanker, 1996. Query by image and video content: the QBIC system. IEEE Computer 28 (9), 46–52. Niblack, W., Zhu, X., Hafner, J.L., Breuel, T.M., Ponceleon, D.B., Petkovic, D., Flickner, M., Upfal, E., Nin, S.I., Sull, S., Dom, B., Yeo, B.-L., Srinivasan, S., Zivkovic, D., Penner, M., 1998. Updates to the QBIC system. In: Proceedings of the SPIE Conference on Storage and Retrieval for Image and Video Databases, vol. 3312 of SPIE Proceedings, San Jose, CA, USA, pp. 150–161. ¨ zsu, M. Tamer, Iglinski, Paul J., Xu, Bing, Cheng, L. Oria, Vincent, O Irene, 2000. DISIMA: an object-oriented approach to developing an image database system. In: Proceedings of the 16th International Conference on Data Engineering, San Diego, California, USA, pp. 672–674. ¨ zsu, M.T., Iglinski, P.J., 2004. Foundation of the DISIMA Oria, V., O Image Query Languages. Multimedia Tools and Applications Journal 23 (3), 185–201. Po, L.-M., Wong, K.-M., 2004. A new palette histogram similarity measure for MPEG-7 dominant color descriptor. In: Proceedings of the 2004 International Conference on Image Processing (ICIP 2004), IEEE CS Proceedings, Singapore, pp. 1533–1536. Porkaew, K., Ortega, M., Mehrotra, S., 1999. Query reformulation for content based multimedia retrieval in mars. In: IEEE International Conference on Multimedia Computing and Systems, vol. 2, Florence, Italy, p. 747. Staken, K., 2002. Xindice Developers Guide 0.7. The Apache Foundation, http://www.apache.org. Stonebraker, M., Brown, P., Moore, D., 1998. Object-Relational DBMSs, second ed. Morgan Kaufmann, ISBN: 1-55860-452-9. Thomas, M., Carson, C., Hellerstein, J.M., 2000. Creating a customized access method for blobworld. In: Proceedings of the 16th International IEEE Conference on Data Engineering, San Diego, CA. IEEE Computer Society, p. 82.

1580

M. Do¨ller, H. Kosch / The Journal of Systems and Software 81 (2008) 1559–1580

Tusch, R., Kosch, H., Bo¨szo¨rmenyi, L., 2000. VIDEX: an integrated generic video indexing approach. In: ACM Multimedia Conference 2000, Los Angeles, USA, pp. 448–451. van Leuken, R.H., Veltkamp, R.C., Typke, R.., 2006. Selecting vantage objects for similarity indexing. In: Proceedings of the 18th International Workshop on Pattern Recognition (ICPR06), Washington, DC, USA, pp. 453–456. Veltkamp, Remco C., Tanase, Mirela, 2000. Content-based image retrieval systems: a survey. Technical Report, Department of Computing Science, Utrecht University, The Netherlands. Wen, J.-R., Li, Q., Ma, W.-Y., Zhang, Z., 2003. A multi-paradigm querying approach for a generic multimedia database management system. ACM SIGMOD Records 32 (1), 26–34. Westermann, Utz, Klas, Wolfgang, 2003. An analysis of xml database solutions for the management of MPEG-7 media descriptions. ACM Computing Surveys 35 (4), 331–373. Westermann, Utz, Klas, Wolfgang, 2006. PTDOM: a schema-aware XML database system for MPEG-7 media descriptions. Software: Practice and Experience 36 (8), 785–834. White, D.A., Jain, R., 1996. Similarity Indexing with the SS-tree. In: Proceedings of the 12th International IEEE Conference on Data Engineering, New Orleans, Louisiana. IEEE Computer Society, pp. 516–523. Xu, Peng, Xie, Lexing, Chang, Shih-Fu, Divakaran, Ajay, Vetro, Anthony, Sun, Huifang, 2001. Algorithms and systems for segmentation and structure analysis in soccer video. In: IEEE International Conference on Multimedia and Expo (ICME), Tokyo, Japan. Yoshitaka, A., Ichikawa, T., 1999. A survey on content-based retrieval for multimedia databases. IEEE Transactions on Knowledge and Data Engineering 11 (1), 81–93.

Mario Do¨ller is assistant professor at the Chair of Distributed Information Systems, University of Passau. His research interests include topics in multimedia retrieval systems such as query languages, metadata and annotation, indexing, clustering, optimization etc. and multimedia middleware systems. He successfully participated on the ISO/MPEG standarization process, especially in the MPEG-7 development and is currently the chair of the MPEG Query Format ad hoc group. Do¨ller received a Diplom-Ingenieur and Doktor-Ingenieur in computer engineering, both from University Klagenfurt, Austria. Harald Kosch is a full professor and the head of the Chair of Distributed Information Systems. His research topics are multimedia metadata, multimedia databases, middleware, and Internet applications. He started his research career at the Ecole Normale Supe´rieure de Lyon in 1993 as a postgraduate student and entered the PhD program in 1994, completing it in June 1997. In 2002, he attained the habilitation at the University of Klagenfurt. In March 2006, he took over the Chair of Distributed Information Systems. Prof. Kosch successfully participates in the MPEG7 and MPEG-21 standardization process, is one of the co-founders of the multimedia metadata community, and owns several patents. He has been organizing many international conferences and workshops covering different aspects of multimedia engineering (e.g., multimedia database workshops at DEXA, Grid workshop at the VLDB 2007 and the Multimedia Information Track at IEEE SITIS 2006). He is the author of several books and publishes regularly in refereed international journals and conference proceedings (more than 60 peer-reviewed publications).

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.