A Novel Approach for Practical Semantic Web Data Management

July 3, 2017 | Autor: Danilo Avola | Categoria: Knowledge Management, Data Management, Semantic Web, RDF, Knowledge Based, Datalog

Share Embed

Denunciar este link

Descrição do Produto

A Novel Approach for Practical Semantic Web Data Management Giorgio Gianforme?1 , Roberto De Virgilio1 , Stefano Paolozzi2 , Pierluigi Del Nostro1 , Danilo Avola2 1 Universit`a Roma Tre, Rome, Italy 2 National Research Council, Rome, Italy [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract. The growing importance of RDF as a tool for describing information on the Web has risen a number of interesting works regarding the management of RDF documents. The effective management of RDF data has been an increasingly pressing active area of the current research. However current data management solutions for RDF data present the following shortcomings: (i) they often define new query languages difficult to integrate in database applications, with a consequent lack of uniformity among the different approaches, (ii) they present extendibility limitations, and (iii) most of them are considered complicated by the final user. In this paper we propose a feasible and intuitive approach for practical RDF management, providing a logical organization technique for RDF data, making use of a formalism to represent concepts and properties of RDF and RDF(S), and an extension of a DataLog rule system for querying RDF documents.

Keywords: Knowledge Management, DataLog, Semantic Web, RDF.

1

Introduction

Semantic Web, as firstly proposed by Tim Berners-Lee [2], represents the new generation of Web. The reference model for Semantic Web is the Resource Description Framework or RDF (http://www.w3.org/RDF/). It is a simple logical language which allows the specification of binary properties on Web resources. The simplest possible structure for representing information was chosen in RDF, that is a labeled graph. The assertions, called triples, are statements expressing that a resource, identified by an URI, is related to another resource or to a value (datatype literal, or XML literal) through a property (or predicate). In this paper we refer to the term predicate as a binary relation between two resources and to the term property as a binary relation between a resource and a datatype ?

Supported in part by Microsoft Research through the European PhD Scholarship Programme

literal. An RDF graph can be viewed as a set of directed edges, commonly represented by triples of form hSubject P redicate Objecti. Syntactically, this graph can be represented using XML syntax (RDF/XML). The relevant advantage of this approach is that it is very general. Any type of data can be expressed in this format, and it is easy to build tools that manipulate RDF. Furthermore, an increasing amount of data is available on the Web in RDF format. This data representation, though flexible, faces to serious problems such as data management and query execution. Our purpose is not to extend the RDF data model expressiveness, rather we want to explore ways to improve the organization and query effectiveness on RDF data. Several solutions has been proposed for RDF data management. Some approaches try to extend the expressiveness of RDF and provide ad-hoc query languages (mainly SQL-like) on these extensions [5, 6]. However these languages are difficult to integrate in database applications. Other proposals [3, 4, 9] focus on an effective organization of data in a RDF document, but they face the problem at a physical level for scalability issue, presenting various limitations at logical level (e.g. NULL values, union and joins management, etc.). Moreover most of these solutions present extendibility limitations and provide tools often complex to use. In this paper we propose a novel approach for a flexible management of RDF documents, based on a logical organization of data. More precisely, the most relevant contributions of our work are: (i) The creation of a high level description model (called metamodel) to represent the information stored in RDF document, pointing out the implicit semantics of elements through constructs. (ii) The definition of an intuitive rules-based system, based on an extension of DataLog, to query and manage RDF documents. The paper is structured as follows. In Section 2 we introduce our metamodel and the rule-based formalism for the management of RDF documents. Finally in Section 3 we sketch concluding remarks and future works.

2

Management of RDF documents

In this section we illustrate our approach to represent RDF and RDF(S) in a compact but expressive form and show how such methodology allows to query and manage RDF documents easily and effectively. 2.1

RDF modeling

Our approach is inspired by works of Atzeni et al. [1, 7] that propose a framework for the management of translations between heterogeneous data models in an uniform way. They leverage on the concept of supermodel that allows a high level description of models by means of a generic set of constructs, thus every model can be seen as an instance of the supermodel. The meta description of a

model is called metamodel. Following this idea, we propose a simple metamodel where a set of constructs properly represent concepts expressible with RDF and RDF(S): (i) Class, to represent an RDF class, (ii) Property, to represent an RDF statement that has a primitive type like object, and (iii) Predicate, to represent an RDF statement that involves two classes. A difference with respect to the approach of Atzeni et al. is that they consider a well marked distinction between the schema level and the instances while, for our purposes, we need to manage at the same time RDF schemes and instances; for this reason three more constructs are introduced to represent instances, namely: i-Class (resources, with URI), iProperty (with value) and i-Predicate. More in details, the various constructs have an identifier and a name, are related each other by means of mandatory references and may have properties that specify details of interest. In our case, the Property construct has a reference to the Class it belongs to and has a type property to store the primitive type of the value of its instances; the Predicate construct has two Class references to represent the subject and the object of a statement and has three boolean properties to represent transitivity, symmetricity and functionality. Constructs of the instance level have a reference to the constructs of schema level they correspond to and inherit from them the same references; moreover, i-property has a value property to store the actual value of the property and i-class has a Name property to store the URI of a resource. In Figure 1 an UML diagram of our metamodel is represented, where the dashed line divides the RDF(S) constructs (at schema level) from the RDF constructs (at instance level). Enclosed in the dashed box, there is the metamodel core.

Schema Level Instance Level

SubClass OID Name ClassOID SubClassOID

Container OID Name Kind ClassOID

SimpleElement OID Name Type Container OID

ResourceElement OID Name Container OID ClassOID

i-SimpleEl OID Name Value i-ContainerOID SimpleElOID

i-ResourceEl OID Name i-ContainerOID i-ClassOID ResourceElOID

Property OID Name Type ClassOID

Class OID Name

i-Property OID Name Value i-ClassOID PropertyOID

i-Class OID Name ClassOID

SubPredicate OID Name PredicateOID SubPredicateOID

Predicate OID Name isTransitive isSymmetric isFunctional SubjectClassOID ObjectClassOID

i-Predicate OID Name i-SubjectClassOID i-ObjectOID PredicateOID

i-Container OID Name i-ClassOID

Fig. 1. Our metamodel for RDF and RDF(S).

As we detail in the following, the approach based on the meta descriptions can be easily extended. When the metamodel is not detailed enough (i.e. expressive enough) new constructs can be added and/or new properties to existing ones. Let us illustrate this point with some examples. Starting from the above mentioned metamodel, if it is also necessary to manage subclasses and subpredicates, it suffices to add two new constructs, namely SubClass and SubPredicate, each one with two references toward Class and Predicate. Moreover, three extra constructs are enough for managing RDF collections: Container, SimpleElement and ResourceElement. Container represents the entire collection, has a reference to the correspondent class and a property kind to denote the type of collection. SimpleElement and ResourceElement represent elements of a collection that can be, respectively, literals and resources; literal elements have a type property while resource elements have a reference to the class and both have a reference to the container to which they belong to. A final remark on our metamodel. Since constructs SubClass and SubPredicate store meta-information on constructs Class and Predicate, respectively, they don’t have a corresponding construct at the Instance level. 2.2

Extending DataLog for RDF management

The adopted metamodel allows us to exploit rules for querying and maintaining RDF documents that, following [1, 7], are expressed referring to metamodel constructs. Such rules are specified in a DataLog extension with OID-invention, by means of Skolem functors. A query or maintenance process is therefore composed by a set of DataLog rules that produce objects (i.e. instances of constructs) belonging to the output, in the first case, to the input, in the latter. The choice of Datalog is essentially due to the high flexibility and to the capacity of expressing recursive queries whereas the lack of recursive primitives in traditional relational languages has represented a major limitation. Then the presence of OID identity, and thus the need for the creation of new objects, each one required to have a new unique object identifier (OID) has brought to the introduction of the notion of OID-invention (e.g. the maintenance of blank nodes). For instance, let us explain the query process using an example. We consider an RDF document describing family relationships between persons. Each class Person has a property, Name, and two predicates, Child and Brother, representing family relationships between persons. There are four instances of the class Person (with URI1, URI2, URI3 and URI4 as URI, respectively), each one with a corresponding instance of the property Name (with values Priam, Hector, Astyanax and Paris, respectively) and linked by three instances of predicate Child and one instance of predicate Brother representing that Hector and Paris are sons of Priam and brothers and Astyanax is son of Hector. Through our approach, we can represent this document as depicted in Figure 2, where we omit properties for the sake of simplicity (we don’t use them here) and references, that are represented only by arrows. In the figure URI1, URI2, URI3 and URI4 represent the name (as construct property) of the instances of the class Person. If a user wants to know just the name of persons that have a child, he

Predicate

Person URI1

Person Name Property

Priam

Name

Person

Person

URI4

URI2

Name

Name

Paris

Hector

Predicate

Person URI3

Name

Astyanax

Brother

brother1

Brother

Schema level

child3

child2

child1

Class

Child

Child

Child

Child

Instance level

Fig. 2. A simple RDF document and its RDF schema

needs to specify just two simple rules: first one selects persons that have a child, second one stores the name of such persons; this is done specifying the involved constructs in the body of the rule, relating each other by means of repeated variables for OID’s and references, and defining the output of the rule in the head, picking values from the body or creating new ones. The two rules are shown in Figure 3.

Person (OID: #person_0(personOID), Name: uri) ← Person (OID: personOID, Name: uri), Child (subjectClassOID: personOID);

Name

(OID: #name_0(nameOID), value: value, personOID: #person_0(personOID))

← Name

(OID: nameOID, value: value, personOID: personOID), Person (OID: personOID), Child (subjectClassOID: personOID);

Fig. 3. Name of persons that have a child

We use a non-positional notation for rules, so we indicate just the names of the fields useful for the rule; for example, in the first rule, we use only the field subjectClassOID of construct Child. Each predicate has an OID argument, used for references. Each construct produced by a rule must have a new OID, which is generated by means of a Skolem functor, denoted by the # sign in the rules. Without loss of generality, we assume that our rules satisfy the standard safety requirements [8]: all construct fields in the head have to be defined as

a constant, a variable that cannot be undefined (i.e. has to appear somewhere in the body of the rule), or a Skolem term. The same state is for arguments of Skolem terms. Moreover, our DataLog programs are assumed to be coherent with respect to referential constraints. More precisely, if there is a rule that produces a construct N that refers to a construct N 0 , then there is another rule that generates a suitable N 0 that guarantees the satisfaction of the constraint. In the example, the second rule is acceptable because produces constructs Child that reference constructs Person produced by the first one.

3

Conclusions and Future Works

RDF is one of the most representative elements in the Semantic Web. The large amount of information stored as RDF documents in modern WIS has highlighted the importance of methodologies and techniques for managing these documents. In this article we have provided (i) a metamodel approach and (ii) a rule-based system by means of DataLog rules with Skolem functors for the maintenance and querying of RDF documents. With our system it is possible to properly organize RDF documents at a logical level through our metamodel and to easily query them using DataLog rules. Future works will regard the application of the proposed methodology to other fields of interest such as RDF information retrieval and Web services management, scalability and performance issue, and expressiveness comparison of our Datalog system respect to other formalisms.

References 1. P. Atzeni, P. Cappellari, and P. A. Bernstein. Model-independent schema and data translation. In Proc. of the 10th Int. Conference on Extending Database Technology (EDBT’06), Munich, Germany, 2006. 2. T. Berners-Lee. Weaving the Web. Orion Business Books, 1999. 3. J. Broekstra, A. Kampman, and F. van Harmelen. Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In Proc. of the first Internation Conference on Semantic Web (ISWC’02), Sardinia, Italy, 2002. 4. E. I. Chong, S. Das, G. Eadon, J. Srinivasan. An Efficient SQL-based RDF Querying Scheme. In Proc. of the 31th International Conference on Very Large Data Bases (VLDB’05), Trondheim, Norway, 2005. 5. T. Furche, B. Linse, F. Bry, D. Plexousakis and G. Gottlob RDF Querying: Language Constructs and Evaluation Methods Compared. In Proc. of Int. Summer School on Reasoning Web, Lisbon, Portugal, 2006. 6. A. Polleres. From sparql to rules (and back). In Proc. of the Int. Conference of World Wide Web (WWW’07), Banff, Canada, 2007. 7. R. Torlone and P. Atzeni. A unified framework for data translation over the Web. In Proc of the 2th Int. Conf. of Web Information System (WISE’01), Japan, 2001. 8. J. D. Ullman and J. Widom. A First Course in Database Systems. Prentice-Hall, 1997. 9. K. Wilkinson, C. Sayers, H. Kuno, and D. Reynolds. Efficient RDF Storage and Retrieval in Jena2. In Proc. of the first International Workshop on Semantic Web and Databases (SWDB’03), Berlin, Germany, 2003.

Lihat lebih banyak...

A Novel Approach for Practical Semantic Web Data Management

Descrição do Produto

Comentários