The LOGIDATA+ Prototype System

October 14, 2017 | Autor: Silvio Salza | Categoria: Database Systems
Share Embed


Descrição do Produto

Taxonomic reasoning in LOGIDATA+

Centro di Studio per I’Interazione Operatore-Calcolatore - CNR Dipartimento di Elettronica, Informatica e Sistemistica - Universitk di Bologna Wale Risorgimento, 2 - 1-40136 Bologna - Italy E-Mail: ciocQdeis64.cineca.it - Tel.: f39 (51) 644.3550

Abstract The applicability to LOGIDATA+ schernas of the inference technique named taxonomic reasoning is demonstrated. The LOGIDATA+ model is enhanced with dcfined classes and set cardinalities. In particular, the problem of determining subsnn ption between classes with cyclic descriptions has been faccd. A s reciitsive definitions are allo,:ved, it 11% been necccsary to clioose a strategy for determining fixpoints. The subsumption algorithm presented uses grentest fixpoint scmnntics and this choice is motivated, coinpxing it to some altcrnative q,pioaclics (least fiApoZnt and desc,.ipf!ve).

ships coming from the closure of the explicit isa given are known. In most knfiwlzdge representation models, such as KL-ONE and the subsequent hybrid s y s t e m s , the necessary conditions are expressed by the p r i m i t i v e classes, while for defined clusses the description constitutes a class definition, with necwwry and sufficient conditions. The semantics of defined dasses is the basis for taxonomic reasoning and has eflcci on two levels: B)

2

an individual can be automatically recognized as a meiiiber of a defined class; a class can be a.iiiomatically recognized as subsumed by a defined c-lass.

T h c x i bsnm p ti on wccn two classes is computed by coinparing the syiit ic description of classes with techiiiqiizs which are :,Liiilar to the computation of subtyping i n Object Oriented Daziabase S y s t e m s (OODM)[lo].

The data model delilied in the LOCIDA’l’Af objcctive of the projeit S L L i Iii~~iii~icitici e Cnlcolo Pnrallelo [4] represents c o m p [ e : objects and d i ~ ~ idcas ~ v s both from logic oriented databases [1,2,9] and o b j e c t - o n e n f e d databases [12,11,10]. The featuies of the proposed model are in many aspects similar to those of K n o w l e d g e R e p resentation S y s t e m s (KRS) proposed in the AI area. The open problem is the integration between two different deduction paradigms: rules and hzerarchzes. In the LOGIDATA+ project, the deductive capabilities are made available by the logical component, while the data model makes available a statzc data structure. On the opposite side, KRS provide deductive capabilities based on the structural description of the objects of interest: this deduction is named t a x o n o m z c reasoning [8], founded on the computation of s u b s u m p t i o n relationships between classes on the basis of class descriptions and allows the discovery of implicit zsa relationships.

IIcwever, in spite ;f t,he similarity of the techniques, the purpose is quite d i i f e t . In fact, subtyping in OODM considers primitive c ,es only, hence cannot provide more than a passive consistency check on types. The subsumption algorithm has a more active role: it deduces every isa relationship which is implied by user’s descriptions, even if not explicitly stated. This job can be effectively done if the algorithm is tractable and complete. The subsumption algorithm proposed in this paper is tractable and complete and computes every real subsumption (unlike the refinement algorithm proposed in [lo], where every possible relationship is computed). As has been shown in previous works on conceptual models [5,7], the adoption of taxonomic reasoning as a support for LOGIDATA+ schema design has many advantages: 1. the hierarchies which can be inferred from class descriptions are made explicit; in other words the user hierarchical structure is enriched with implicit hierarch i es ;

The LOGIDATA+ model, like most database models, takes a class description as a set of necessary conditions, to be satisfied by any individual, which will be m a n u a l l y inserted in the class. In addition, only the isa relation-

2. the user hierarchies are checked with respect to the refinement relationship;

Tics work W ~ Spartly supported by tile L0GIDATA-t project f, CNR - Progetto Finalizzato Sistemi Informatici e Calcolo Parallelo.

3. the schema consistency is checked by discovering isa cycles and empty classes;

CH3001-5/91/0000/0894/$01.00 0 1991 IEEE

894

4. the schema can be transformed into a m i n i m a l f o r m where the redundancies with respect to inheritance are removed.

0 0

As explained above, taxonomic reasoning requires an extended semantics with respect to the tradition of database environment. Thus the LOGIDATA++ model is proposed to extend LOGIDATA+ as follows: 0

defined class semantics;

0

set type with cardinality boundaries.

0

0

The second extension allows integration in an object description of the knowledge about cardinalities, which is usually expressed as an integrity constraint. In the following, a formal definition of the subsumption relationship for classes with definitional cycles and a subsumption algorithm are presented. In section 2 the base notions of the LOGIDATA+ model are introduced. In section 3 a LOGIDATA+ schema is formally presented and the notions of possible and legal instance of a schema are introduced. In section 4 the subsumption for a LOGIDATA+ schema is defined and, finally, in section 5 the subsumption is syntactically characterized and a subsumption algorithm presented.

2

0

2.3

every element of B, C , T is a type; if t is a type then {t}(min,mot)r is a set type, where min and mux are non-negative integers and max can be 00;

if t is a type then ( t ,. . .,t k ) is a sequence type; i f t l , . . . , t k , with k 2 0, are types o f 7 and u l , . . .,uk are distinct attributes in A, then [UI : t l , . . . , ak : t k ] is a tuple type: nothing else is a type of I ( A , B,C, T).

Values and objects

Let 0 be a finite set of object identifiers ( o i d ) , O n 5 = 0. The set V ( 0 ) of the values is defined as follows: 0 0

0

0

every element off? U 0 is a value; if 211,. . . , V h are values, then value; if 111,. . . , v h are values, then value;

(211

,.. . , V h }

is a set

. . , v h ) is a sequence

(U,.

if a l l . . . , ah are attributes of A and V I ,. . . , 2)h are values, then [a1 : 211,. . . , u h : v h ] is a tuple value.

Values are assigned to object identifiers by a total value f u n c t i o n 6 from 0 to V ( 0 ) . The pair D = ( 0 , 6 ) is the d o m a i n . Note that the value function is not f i a t , since an oid can be part of the value of another oid.

LOGIDATAS+ model

In this section the syntax of the LOGIDATA++ model is introduced. The model is based on two main concepts:

Given a domain 'Dl the interpretation function Z over V ( 0 ) is a function from 7 to 2V(0) such that:

classes whose instances are identified objects with encapsulated values and

Z(B) = .ZB(B) Z(C)

t y p e s whose instances are values.

Z(T)

Each class has an associated type describing the structure of class instances.

c

0

Y ( 0 ) -0.

The interpretation is extended to types in 7 as follows:

z({tI(n,m)> = 2.1

{tl,. .,th

Base types system

Let B be a countable set of base values ( b l , b 2 , . . .), for instance the union of integers, reals, strings and booleans, as in most programming languages. Let B be a countable set of base type names, including the (mono-valued) base types in 5 and ZB the standard interpretation function from B to 25, such that Vb E f? : Z,(b) = { b } .

2.2

I v i E Z ( ~ )1I 2 i 5 PIn 5 P 5 m}

Schemes and instances

3

Definition 1 Given a set of base types B and a finite set of attributes A , a s c h e m a is a quadruple S = ( C , T , T Y P , I S A ~ where: ),

Types and classes

Let A be a countable set of attributes, C a countable set of class n a m e s and T a countable set of type n a m e s , with

0

A,C and T non-overlapping. The set of all the type descriptors 7 ( A 1B, C , T) (or, for simplicity, types) is defined as follows: 895

C and T are finite and disjoint sets of class n a m e s and 2ype n a m e s respectively; furthermore, C , and Cd are partitions of C; C, is the set of p r i m i t i v e classes and c d the set of defined classes; the interpretation

of primitive classes is user-defined, while the interpretation of defined classes is derived from the given definitions and by interpretation of the primitives;

- TYP(Person) = [name: string] - TYP(Student) = [name: string, number:

0

indegerl

TYP is a function on C U T such that:

0

- for each

- TYP(Vehic1e) = [name: string]

T E T, TYP(T) is a type of 7;

- TYP(car)

- for each C E C , TYP(C) is a tuple type of T

and a total order exists on the symbols T of T; in other words, there exists a sequence of symbols T I , .. . ,Tpsuch that Ti is referred in TYP(T])if and only if i 5 j ; ISA’

0

is a partial, non-reflexive order over C .

Each schema S = (C, T, TYP, ISA‘) is associated with a set of type descriptors 7. The total order on type names prevents recursive type descriptions, which would generate never-ending structures. On the other hand, recursive descriptions involving classes are allowed, since the references are obtained via the object identifiers. The ISA’ relationship is the isa provided by the user. Xnheritance allows the description of a class with reference to the class parents. Multiple inheritance is allowed: a class can inherit from two or more classes, provided the descriptions are C O ~ L S E S ~ CasI ~will ~ , be shown in section 4 by introducing the notion of consz3tcnt schema In the following the tm,lsifive closure I S A ~of the user-supplied ISA’ will also be considered .

Schema example

3.1

An example of a LOGIDATA++ schema is given. The syntax is that proposed in [4], with the obvious meaning of the keywords. The 5 operator introduces a primitive class description, while the standard = operator introduces a defined class. class Person

3.2

Legal instance of a schema

An arbitrary instance of a schema S = (C, T,TYP, EA”) is the range (subset of Y ( 0 ) )of an arbitrary interpretation Z for a given domain ’D. In the following, for simplicity, we will indicate wifli the same symbol Z an interpretation and the instance it produces. Of course, the only interpretations of interest are those giving an instance which is in accordance with the schema descriptions. In order to define these interpretations formally, let us consider, for a given interpretation 1, the set of object identifiers connected to a class name C which are compatible with both the type definitions and the user defined hierarchy I S A ~ :

qc)

=

{0

E c3 I6(0) E Z(TYP(C))}(-)

{0

E0

I o E Z ( C ~ ) , V CI C~ I S A ~C;)

The above set is trsed in order to characterize the interpretations which are possible, for the given definitions. Definition 2 [Possible Instance] Given a domain D ,an interpretation Zis a possible instance of a schema S if and only if:

Z(P)

c

Z(D) Z(T)

= ?(D), V D E C d = I(TYP(T)), VTE T

f ( P ) , VP E c,

In order to find possible instance of the schema shown in section 3.1, let us consider the following domain:

class Car = isa Vehicle [number : integed class Regular-Student =

0 = {01,~2,~,‘04,05i06,~7} = [name : “Andrea”] b ( 0 2 ) = [name : “Giovanni”, number : 145671 6(03) = [name : “Luigi”, number : 45634, colleagues : {os}] 6(o4) = [name : “Ford”,number : 9871 6(05) = [name : “ROCCO”, number : 12967, colleagues : {03}]

isa Person [number : integer, colleagues : { Regular-Student}(l,s)]

6(01)

ISA’) has the following comThe schema S = (C, T,TYP, ponents:

c= CPUCd ~

0

Student ISA’ Person, Car ISA’ Vehicle

0

3 [name : string]

class Vehicle 3 [name : string] class Student = isa Person [number : integer]

0

[name: sfn’ng, n11,mber: inleger]

- TYP(Re~i.ilaP-Seudent) = [name : string, number : integer, colleagues : { Regular-Student}(l,5)]

(a class type is constrained to be a tuple, in

accordance with the LOGIDATA+ model; the subsumption algorithm can be extended to any class type, as shown in [SI);

:I=

{Person, Vehicle} U{Student, Car, Regular-Student} T={}

,

Et96

6(06)

= [name : “Giuseppe”, number : 65483, colleagues : { 03,05,07)]

27’ 5 Z

6(07) = [name : “Marco”, number : 64789, colleagues : { 05,06)]

If the interpretation of class Person is { 0 1 , 0 2 , 0 3 , OS, 0 6 , 0 7 ) and the interpretation of class Vehicle is {Q), a possible instance of the schema is the following:

= {01 ,02,03 , 05 , 06,07) = (04) Z(Student) = ( 0 z , 0 3 , 0 5 , 0 6 , 07) Z(Car) = (04) Z (Person)

Z (Vehicle)

Z(Regu1ar-Student)

=

The above example shows the impact of primitive classes on instances: in fact, TYP(Studel1t) = TYP(Car), but the interpretation of the two classes is different, since they descend from different primitives. In general, for a given schema S and domain V = (0,a), there are many possible instances. For a given interpretation of the primitive classes, the interpretation of defined classes is unique only if the schema does not contain classes with a cyclic description [13]. For instance, the above example contains the class ltcgular-Student with a cyclic description. Thus the above interpretation of primitive classes admits, among the following instances: -

=

Z (Student)

-

Z (Car) Z (Regular-Student) Z (Student) Z (Car) Z (Regular-Student) . -

= =

(02,

03,O5,06r O 7 )

1 0

(04

{0 2

1

tj

Z’(N)

Z(N) N E T U C .

The relationship 5 can be the basis for the definition of greatest fixed point and of a least fixed point instance. On the other hand, a defined class with cyclic description but without any primitive class in the description cycle has always an empty least k e d point interpretation. For this reason the least fixed point semantics is not very useful in database environment. We define the legal instance of a schema as the greatest possible instance with respect to the relationship 5.

(03, OS,061 07)

Z (Student) Z (Car) Z (Regular-Student)

is defined on \E(’D) as follows:

The relation

Definition 3 [Legal Instance] Given a schema S and a domain D , the legal instance is a possible instance Z E Q(D)such that: Z’

5- Z

VZ’ E 9 ( V )

P r o p o s i t i o n 1 Given a domain V and a schema S , the legal instance exists and is unique. The above proposition is proved in [6].

4

Subsumption

The subsumption relationship can be introduced with reference to the legal instance. Definition 4 [Subsumption] Given two types t and t‘ of 5 t’, if and only if

a schema S, we say that t’ subsumes 2 , written t

% I0 5 , 0 6 1 07) (041

VD’ IV

{03,051 (02

i

legal instance onV, Z‘ ( t )E Z’ (t‘)

03, OS, 0 6 , 0 7 ) (04)

(031

<

Os, 0 6 , O7)

be observed that the interpretation of noncyclic classes is invariant, while the interpretation of Regular-Student varies from a minimum ({}) to a maximum ({03,05, 06, 07)). Since the definition of a possible instance is recursive, we can say that every possible instance is a fixed point. The semantics of the model is then to be specified by choosing the possible instances which are legal instances, that is, are accepted as instances of a given schema. A possible choice is known as descriptive semantics, which accept any possible instance as a legal instance. The descriptive semantics has the drawback of allowing many different instances for the same domain. Let us consider the set Q(D)of the possible instances with identical interpretation of the primitive over D(0,6) classes: Z ( P ) = Z ( P ) ’ VZ,Z’ E q v ) ,P E

c,. 89 7

It immediately follows that is transitive, reflexive and not symmetric (i.e. it is a pre-order), and induces an equivalence relationship N on types: t

E

2‘

e t < t‘

At‘

< t.

A class which explicitly inherits properties from another class is, of course, subsumed by that ChS. Proposition

c:

Given a schema C ISA“ C’

jC

and two classes

cl

cl

E

C’.

Proof The proof follows immediately from the definition o f f (c). 0 Since one of the main activities of taxonomic reasoning is to discover implicit subsumptions, the inverse does not generally hold.

~

Definition 5 [Consistency and inheritance] Given a schema S = ( C , T, Typ, ISA"), a class C E C is consistenf with respect to inheritance if and only if:

A schema S is consistent if it contains only consistent classes, i.e. the explicitly declared isa are consistent with type definitions. The definition of consistent schema allows a partial inverse for proposition 2, valid only for primitive classes.

Proposition 3 Given a consistent schema S and two classes P EC, a n d C E C : C

5

5 P e+

C I S A ~P.

Syntactical characterization

The purpose of this section is to give a syntactical characterization for the subsumption relationship in order to produce an algorithm for the computation of subsumption for a given schema.

tuple t y p e s t,t' E 7 t 5 2' Va; with type tiin t', 3 q i n t : ai = aj A ti tj;

The theorem expresses the syntactical rules necessary to compute subsumption between canonical types. Thus, a given subsumption relationship between classes can be extended to types by a recursive application of the above rules. The recursion stops when a rule fails or a base type is found. Next step is the formulation of a syntactical rule for computing the subsumption between classes. For primitive classes the rule is directly obtained by proposition 3. When defined classes and cyclic description are considered, the computation is more complex. This problem has been faced in [lo] for defined classes only, and in [3] for non-cyclic descriptions. When both primitive and defined classes are considered, it becomes essential to consider the two different semantics in subsumption computation. For instance, in [lo] the possible subsumption relationships are computed on the basis of types. This would lead to the following deductions, with reference to the example of section 3.1: Student Car

A set of types 7 ( A , B,C, T) is in canonicalform if T = 0, i.e. it does not contain type names. The canonical form recursively is produced by the function: 7 : 7 ( A , B,C, T) --.* 7 ( A , B,C , T)

5

5 5

Vehicle Person

The inferences above, which contrast the semantics of primitive and defined classes, are avoided by adding conditions when considering primitive classes. In general, the following proposition holds. Proposition 4 Given a schema S and two classes C, C' E

c, C 5 C'

In fact, if P is a primitive class and is an ancestor of C' but not of C, it is easy to find an interpretation Z such that Z ( C ) Z (C'). In order to exploit the proposition above, the notion of primitive generalizafions is introduced as follows:

The total order on type names ensures that there exists an integer n such that:

0

The canonical form is produced by 7", where n is the smallest integer for which the above equation is satisfied.

Theorem 1 Given a schema S and a type set 7 ( A , B, C, T) in canonical form, the subsumption between types is defined as follows: base types: B, B' E B B 5 B' e+ Zg(B)E Zg(B'); set types: t E 7 {tl(n,m) 5 {t'l(p,q) e+ (1 I 1' A n sequence t y p e s t E 7 ( t ) 5 (i')

* {p E C , IC' I S A ~ C , } c {P E C, IC ISAEC,)

G p ( D ) = {P' I D B A E P'} G p (P) = {P' I P I S A ~P'} U {P}

In other words, G p ( C ) is the set of primitive classes reached with the I S A ~relationship (if the class is primitive, the class itself is explicitly included). Theorem 2 Given a schema S and two classes C,c' E C:

L P Am I q);

Theorems 1 and 2 together give a complete set of rules for the computation of subsumption. Unfortunately, cyclic descriptions prevent a simple recursive application of

w t 5 1'; 898

rules, since this could lead to endless loops. An example of this case is the comparison of the following types: TYP(C) = [a1 : C;az : t ] TYP(C‘) = [a1 : C’] The computation of C 5 C’ also mplies that of C’ 5 C and vice-versa. Thus the subsumption algorithm includes a stop condition in order to avoid this problem. Algorithm 1 [Computation of subsumption] initialization let SO=(C 5 C’) for any C, C’ such that G P (C’) S GP (C); iteration is built from Inas follows: 0

let C, C‘ E C such that C

SnC’;

c Sn+lC’ e TYP(C) InTYP(C’), where the type comparison is verified with the rules of theorem 1; s t o p the algorithm stops when

= &.

The algorithm stops after a finite number of steps; in fact, the initialization sets a finite number of relationships and subsequent steps can exclude relationships. The maximum number of iterations is then equal to the number of initial relationships. The initialization is obtained by taking the condition of proposition 4 as a sufficient condition. The implications which are not compatible with type descriptions are then discarded. The application of the subsumption algorithm to the example of section 3.1 produces: Regular-Student 5 Student since TYP(Regu1ar-Student) 5 TYP(Student) Moreover, the initialization immediately gives:

6

[2] S. Abiteboul and R. Hull. IFO: A formal eemantic database model. ACM l h n s a c t i o n s o n Database Sy st e m s, 12(4):525-565, 1987.

[3] A. Artale, F. Cesarini, and G. Soda. Computation of ISA relationships in LOGIDATA+. Technical Report 5/47, CNR - Progetto Finalizzato Sistemi Informatici e Calcolo Parallelo - S5, 1990. [4] P. Atzeni, F. Cacace, S. Ceri, and L. Tanca. The LOGIDATAt model. Technical Report 5/20, CNR - Progetto Finalizzato Sistemi Informatici e Calcolo Parallelo - S5, 1990.

S. Bergamaschi, L. Cavedoni, C. Sartori, and P. Tiberio. On taxonomical reasoning in E/R environment. In C. Batini, editor, 7th Int e m at i onal Conference on the Ent i t y Relationship Approach, pages 443-454. Elsevier Science Publisher B.V. (NorthHolland), 1989. S. Bergamaschi and B. Nebel. Theoretical foundations of complex object data models. Technica: Report 74, CIOC - CNR, Bologna - Italy, December 1990. S. Bergamaschi, C. Sartori, and P. Tiberio. On t a x e nomic reasoning in conceptual design. Technical Report 68, CIOC - CNR, Bologna - Italia, March 1990. R.J. Brachman and J.G. Schmolze. An overview of the KL-ONE knowledge representation system. Cognitive Science, 9(2):171-216, 1985.

F. Cacace, S. Ceri, S. Crespi-Reghizzi, L. Tanca, and R. Zicari. Integrating object oriented data modelling with a rule-based programming paradigm. In S y m p . on Principles of Database Sy st e m s, pages 225-236. ACM - SIGMOD, 1990.

G p (Student) = G p (Regular-Student) = {Person)

Student Car

-

Schmidt, and M. Missikoff, editors, EDBT ’88 Lecture Not e s i n Computer Science N.303, pages 271293. Springer-Verlag, 1988.

C. Lecluse and P. Richard. Modelling complex structures in object-oriented databases. In Sy m p. on Prtnciples of Database Sy st e m s, pages 362-369. ACM SIGACT-SIGMOD-SIGART, 1989.

$ Vehicle $ Person

C. Lkcluse and P. Richard. The 0 2 data model. In Int. Conf. O n Very Large D a t a Bases, 1989.

Conclusions

The paper introduces the notion of defined class in the LOGIDATA++ model. The legaI instance for a schema is formally defined according to the greatest fixed point semantics and a subsumption algorithm is presented. The work of this paper will be the basis of an inference kernel for the validation and restructuration of a LOGIDATAS+ schema.

References [I] S. Abiteboul and S. Grumbach. Col: a logic-based language for complex objects. In S . Ceri, J.W. 899

C. Lecluse, P. Richard, and F. Velez. 0 2 , an objectoriented data model. In SIGMOD, pages 424-433, Chicago - IL, June 1988. ACM. B. Nebel. Terminological cycles: semantics and computational properties, pages -. Morgan Kaufmann, 1991.

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.