FEMUS: A federated multilingual database system

June 27, 2017 | Autor: Kokou Yetongnon | Categoria: Database Integration, Data Model, ADVANCED DATABASE, Database System

Share Embed

Denunciar este link

Descrição do Produto

FEMUS: a FEderated MUltilingual database System

*

Martin Andersson 1, Yann Dupont 1, Markus Tresch 2, Haiyan Ye 2 1 Département d'Informatique Laboratoire de Bases de Données EPFL CH - 1015 Lausanne

1

2 Department Informatik Informationssysteme-Datenbanken ETH-Zentrum CH - 8092 Zürich

Introduction

This paper describes the objectives and goals of the FEMUS project, a joint research project of the database research groups at EPF Lausanne and ETH Zürich. It presents an overview on FEMUS and the first results on comparison between ERC+ and COCOON. The aim of the project is to set up a framework for building a federated multilingual database system. By federated it is meant that the global system will provide the functionalities to include, as components, different heterogeneous database systems cooperating together. By multilingual it is meant that the ultimate goal is to build a system which can be accessed by different users through the (data model and manipulation language) interface they are used to. Early work in the area interoperability between database systems can be found in the research fields of Multidatabases [Lit84], Federated Databases [DBDZ83, DBDZ85, Die87] and databases in the server-workstation environment [DGK87]. Related work has been done for schema integration [BLN86, DYS87,NDS88], data model compilers [Mar86] and schema translation [MFHP87]. Practical demand comes from applications like multimedia databases, engineering information systems, CIM or geographical information systems. Thus, interoperability is first intended to allow the integration of existing systems into a global federated system . This research is intended to respond to current needs of major enterprises to build a global information management system, to include several computerized applications which may exist in various parts of the enterprise [Sch90, SL90, LMR90].

2

Project Description

As a first step in the FEMUS project, the data models developed in the two research groups are being compared. The group at EPFL investigated entity-relationship (ER) models and extended this approach to the ERC+ model [PS87,PS89a,PS89b,Par87,SPYA89]. The work includes the specification of an alegbra [PS84] and a calculus [PRYS89] for this model. The group at ETHZ extended relational models to non-first-normal-form (NF2) models [SPS87,SPSW90] and to an object model, COCOON. They developed a nested relational query algebra [SS83,SS86] which is also the basis for an object algebra [SS90a,SS90b,SSW90]. While the fundamental concepts in entity-relationship models (like ERC+) are entities and relationships, the fundamental concepts of COCOON are objects and functions.

* This is a working paper based on a revised and extended version of the research proposal submitted in 1990.

The FEMUS Project is supported by the Schweizerischer Schulrat.

Comparing these two different evolution paths (ER → ERC+ vs. Relational → NF2 → COCOON) should result in a translation of static (schemas) and dynamic (operations) aspects of the two models into each other. Of course, non invertible (lossy) mappings are not desirable, thus an approach of integrating one model as a view into the other may be a more appropriate way. The above considerations form the initialization part of the project. Later, a more general approach than transforming one model into the other must be found to achieve the goal of supporting several coexistent data models. A common core object model shall be defined that will serve as a reference model for later work. A five-level schema architecture for federated database systems (FDBS) is proposed by Sheth and Larson [SL90] and defines the meaning of the following levels: - local schema: The local schema is the conceptual schema of the component DBMS. It is expressed in the native data model of the component DBMS. It is sometimes called private schema because it is at the bottom level. - component schema: The component data model is a translation of the local schema into the common data model used by a FDBS. - export schema: An export schema represents a subset of the component schema that can be accessed by the user of FDBS. - federated schema: A federated schema represents an integration of multiple export schemas. It also includes information on data distribution. A construction process associates a federated schema with the export schemas that are integrated in creating that federated schema. - external schema: An external schema defines a view for a user/application, or a class of users and/or applications. This 5-level-architecture serves as the basis for our investigations. Figure 1 shows how we will make coexistent data models cooperative using a core object model. External 2: ERC+ Subschema

External 1: COCOON Subschema transform COCOON

transform

Core Object Model

Federated construct

Export 1

Export 2

filter

filter

Component 1

Component 2

transform

transform

Local 1 relational schema ORACLE Database

Local 2 NF2 - Schema DASDBS Database

Figure 1: Five-level schema architecture of the project prototype (adapted from [SL90]) While on the external level we have subschemas in our favorite data models like ERC+ or COCOON, we may use relational or nested relational schemas on the local level as familiar data models for database management systems like ORACLE or DASDBS. As a common language to describe the federated and component schema we use the COCOON core object model. In this context the following research items must be solved:

- What should an object-oriented core data model include to serve as a common basis for general integration of different models? Are object models like e.g. COCOON suitable for use as canonical data models? - How can the transformation from and to this model be described? COCOON is said to be an extensible data model, how does this help for the mapping process? - How can a federated global schema be constructed out of several component schemas? Although many methodologies for database schema integration have been proposed [BLN86], studying these approaches together with schema translation yields new constraints. Real cooperation between different data models can only be reached by making them operationally integrated. This includes looking at query-languages and investigating update operations in the different models. Besides such theoretical considerations, some parts will be implemented. For a set of selected data models (COCOON, ERC+, ...), a small prototype should be implemented to prove the most important concepts. Especially the transformation from ERC+ to COCOON and the mechanism for integrating several COCOON schema-components will be realized.

3 First Results 3.1 Three Approaches of Database Integration In this subsection we give an overview on three techniques to solve the problem of integrating semantically heterogeneous database schemas: a static integration approach, a dynamic extension approach, and a multidatabase approach. Figure 2 below shows a comparison, where E, F, C, L denote the external, federated, component, and local schemas, respectively (cf. Figure 1). static integration approach

E

E

F

dynamic extension approach

E

multidatabase approach

E

global Data Manipulation & Query Language

E

C

C

C

C

C

C

L

L

L

L

L

L

Figure 2: Three database integration approaches Static Integration This approach is characterized by having a global federated schema F. This schema F is usually generated by a schema integration methodology, as for example described in [BLN86,SP90,Spac91a, Spac91b]. “Statically integrated” means that F is generated once according to a set of given component schemas. If there is a large number of component schemas, the creation of the federated schema F becomes difficult, or even impossible. Even there already exist automatic integration generators, the problem of solving structural conflicts in general has not been solved yet.

The federated schema F holds a global dictionary with additional information about fragmentation and allocation of the distributed data. Thus, to the user of F, both fragmentation and allocation are usually fully transparent. Queries and updates are expressed on the federated schema or on an external schema. Each external schema is derived from the global schema F. The system is responsible for transforming the global queries and updates into statements for the component schemas. Multidatabases In contrast to the above technique, there is the multidatabase approach. In a multidatabase system, local components are kept separate, such that no global federated schema exists. Conflicts must therefore be solved by the user at application level. There is no transparency of distribution, which means that the global language must contain mechanisms for accessing different systems. E.g. in an SQL-like notation, a join involving two relations of two different systems could look like: “select * from A@DB1, B@DB2 where A.x = B.x”. That is, interfacing between the different systems is realized by a common data access and update language. Dynamic Extension The third approach presented here is a compromise between the former ones. There is again no global, fully integrated schema. But to make the distribution transparent to a user, we can define views (the external schemas) on component schemas of different systems. The view definition facility includes mechanisms for linking objects across systems and to deal with semantic conflicts. The necessary view definition method is to extend the local schema by elements of the schema of another system. Consider, for example, Figure 3 below, where one database (DB1) holds information about books and the other (DB2) about persons. Knowing, that the p# attribute of Persons has the same semantics as the author# attribute of Books, one can define an extension-view of class Books, holding an additional function written_by, that returns the authors of books [SLT91,SW91, SWS91]. A similar technique, called the multiview approach has been published by [Bert91]. This approach of stepwise integration of databases by defining views that extend the local schema combines the advantages of the static integration approach (transparency) with those of the multidatabase approach, and avoids the problem of generating a possibly large federated schema. DB1 title author# publisher

DB2

Books

Persons

name p# age

XBooks

written_by

define view XBooks as extend[ written_by:= select[p# = author#](Persons) ] (Books);

Figure 3: Database integration by schema extension.

3.2 Mapping ERC+ Schemas to COCOON We introduce two examples in ERC+ notation from which further schema mapping and query examples will be taken. As usual, rectangles denote entity types, diamonds relationship types. A single solid line denotes a 1:1 cardinality, a single dotted line denotes 0:1 cardinality, where a double dotted line denotes 0:n cardinality; two lines, one dotted and one solid, denotes a 1:n cardinality. Figure 4 shows an example entity type, employee, including a complex attribute with three levels: the attribute child is made up of the attributes forename, date, sex, and vaccine. The latter two are themselves complex. Figure 5 shows entity and relationship types. Employee

name

bdate

day month

forename

salary

child

year forename

date

day month

sex year

vaccine type

date

day month

year

Figure 4: Example of an ERC+ entity type with multi-valued attributes at several levels Sup Employee

Manager

Supplier

Inf

Ename sal

Sname Addr

Job

Delivery

Department

Sale

Dname Floor

QT

QT

Article Aname

Type

Figure 5: Example of ERC+ relationship types The first step in the FEMUS project consists in establishing correspondences between data models. In ERC+ the basic concepts are entity type and relationship type. Each entity type can have simple and/or complex attributes, and the relationship type can also have attributes. COCOON distinguishes types from classes. A class is a set of objects that is associated with a type and restricted by a class predicate. Each type describes the functions that are applicable to its instances. Functions are a uniform abstraction of attributes and relationships of classical data models. By comparing the concepts in ERC+ and COCOON, the following rules have been defined for the mapping from ERC+ to COCOON: 1. Entity type in ERC+ => this corresponds to a COCOON object type. 2. Relationship type: a). A binary relationship type in ERC+ between only two entity types and without any attributes => this is converted into two inverse functions. Each type has one function.

b). A binary relationship type with attributes and/or a relationship type between more than two entity types => each of the relationships is mapped into a pair of inverse functions. A new type in COCOON is needed for the mapping. 3. Attribute: a). A simple attribute => this is mapped into a COCOON function. b). A complex attribute => this is mapped into an additional object type. It is necessary because we do not have tuple types in COCOON so far. According to these rules the two examples of Figure 4 and 5 are converted into COCOON as following: name

string

forename

salary child

bdate dateT day month year

integer

EmployeeT

forename

date

childT date

integer

string

sex

vaccination

VaccinationT

type

string

Figure 6: COCOON types corresponding to ERC+ diagram in Figure 4 Sup

string integer

D-Deliv

workfor inv string

integer QT Delivery

Ename Inf EmployeeT sal

staff

Dname DepartmentT Floor

S-Deliv inv Deliv -S

Sname SupplierT

string Addr

Art-Deliv inv

inv

Deliv-D D-Sale inv Sale-D

integer QT Sale

Art-Sale inv Sale-Art

Deliv-Art string ArticleT

Aname Type

Figure 7: COCOON types corresponding to ERC+ diagram in Figure 5 Most of the concepts in ERC+ can be mapped into the corresponding concepts in COCOON. Several problems have to be solved if the mapping between ERC+ and COCOON is desirable in both directions: - the described mapping above is not 1-to-1: an additional type is needed for the mapping from ERC+ to COCOON because there is no tuple type in COCOON so far. Therefore, the mapping is not invertible. - restructuring problems: if we map the schema back to ERC+ using the rules we get a new schema different from the original one. Is this ‘equal’ to the original one? Is a restructuring process necessary in order to get the original schema back? - how can the different generalization concepts in ERC+ and in COCOON be mapped? - formal mapping between ERC+ and COCOON: a formal translation from ERC+ to COCOON and back should be investigated.

3.3 Mapping ERC+ Operators to COCOON Operators The next step in the FEMUS project has to identify common aspects in the two algebraic query languages, and to propose a mapping between algebras from one model into the other. The mapping of the algebraic operators does not appear to be a trivial task. Some operations with a well-known semantics (like selection or projection) are common to both algebras while some others are model specific. Although simple, the selection operation already differs in these algebras. For example, selection predicates in ERC+ algebra may contain quantifiers (over set-valued attributes), whereas COCOON selections may contrain nested algebra expressions and set comparison operators. In the following discussion we restrict our attention to this particular sub-problem. We denote by EmployeeC the class of objects belonging to the type EmployeeT. (1) For instance, the query "Select all employees who earn, at least, one salary greater than 6000" can be written in the ERC+ algebra and in the COCOON algebra on the respective schemas (Figures 2 and 4 respectively). The ERC+ query is E1 = s[∨s (salarys > 6000)] Employee and the COCOON query is C1 = select [select [s > 6000] (s:salary) ≠ Ø] (EmployeeC) This example shows the similarities between the implementation of the existential quantifier with the ERC+ and the COCOON algebra. (2) But the slightly modified query "Select all employees who have all their salaries greater than 6000" will produce the ERC+ algebraic query E2 = s[∧s (salarys > 6000)] Employee. The corresponding COCOON queries are like: C2 = select [select [s > 6000] (s:salary) = salary] (EmployeeC). C2' = select [select [NOT(s > 6000)] (s:salary) = Ø] (EmployeeC). The second one is equivalent to the first one based on the standard transformation of the universal quantifiers: ∀x: P ~∃x: ~P. We now present our solution for the mapping of selection predicates. It produces a COCOON nested select operator for each ERC+ quantifier (∧ or ∨) and vice versa. Rules: R1. Mapping ERC+ to COCOON: an ERC+ selection is mapped to a COCOON selection where the predicate is transformed according to the rules: . ∨p (P(vp)) is mapped into select [P(vp)] (vp) ≠ Ø . ∧p (P(vp)) is mapped into select [NOT(P(vp))] (vp) = Ø R2. Mapping COCOON to ERC+: a COCOON selection is mapped to an ERC+ selection where nested selections in the predicate are transformed according to the rules: . select [P(vp)] (vp) ≠ Ø is mapped into ∨p (P(vp)) . select [P(vp)] (vp) = Ø is mapped into ∧p (NOT(P(vp))) As said above, the mapping of the algebra seems not to be an easy task. Nevertheless, a number of problems have been raised and some of them are partially solved: - We have to deal with the problem of reversibility. In other words, we have to find a mapping function that transforms a query expressed in one algebra into an other expressed in the second algebra, and vice versa. These mappings are complicated by the fact that both algebras contain operators that do not have a direct correspondence in the other algebra. - The mapping of predicates is not trivial, a preliminary solution has been presented above.

4 Summary and Future Directions The project presented in this paper has just started in summer 1991 and is scheduled for 18 months. Investigations are planned to be seperated into three phases, each of which will take about six months: 1. Making our two models (ERC+ and COCOON) cooperative. This includes comparing data model issues and identify common aspects as well as integration of one data model into the other by view integration mechanisms. 2. Defining a core object model as the reference model for cooperation. 3. Making the systems operationally cooperative. This includes the comparison of algebraic vs. calculus based languages, the investigation of updates, and building a small, demonstration prototype.

References [Bert91] E. Bertino. Integration of Heterogeneous Data Repositories by Using Object-Oriented Views. In Proc. IMS Workshop, 1991. [BLN86] C. Batini, M. Lenzerini, and S.B. Navathe. A comparative analysis of methodologies for database schema integration. ACM Computing Surveys, 18(4):323-64, December 1986. [DBDZ83] A. Diener, R. Brägger, A. Dudler, and C.A. Zehnder. Database services for personal computers linked by a local area network. In Proc. ACM Conf. on Personal and Small Computers, pages 217-221, San Diego, December 1983. ACM. [DBDZ85] A. Diener, R. Brägger, A. Dudler, and C.A. Zehnder. Replicating and allocating data in a distributed database system for workstations. In Proc. ACM SIGSMALL Symposium on Small Systems, pages 364-372, New Orleans, LA, March 1985. ACM. [DGK87] U. Deppisch, J. Günauer, and K. Küspert. Considerations on database cooperation between server and workstations. In Proc. 16th GI Annual Conf., pages 565-580, 1987. in german. [Die87]

A.R. Diener. An Architecture for Distributed Databases on Workstations. PhD thesis, No. 8088, Eidgenössische Technische Hochschule (ETH), Zürich, 1987.

[DYS87] A. Dogac, B. Yuruten, and S. Spaccapietra.VDIES: an expert system for view definition and integration. In 2nd Int'l Symposium on Computer and Information Science, Istanbul, October 1987. [Lit84]

W. Litwin. Malpha: A relational multidatabase manipulation language. In Proc. 1th IEEE Int'l Conf. on Data Engineering, Los Angeles, February 1984. IEEE.

[LMR90] W. Litwin, L. Mark, and N. Roussopoulos. Interoperatability of multiple autonomous databases. ACM Computing Surveys, 1990. [Mar86]

F. Maryanski. a model compiler: A tool for generating object-oriented database systems. In Proc. of Workshop on Object-Oriented Database Systems, pages 73-84, New York, 1986. IEEE.

[MFHP87] F. Maryanski, S. Francis, S. Hong, and J. Peckham. Generation of conceptual data models. In Data and Knowledge Engineering, 1987. [Par87]

C. Parent. L'approche ERC: un modèle de donnée et une algèbra de type entité-relation. Thèse d'état 890401, Université Pierre et Marie Curie (Paris VI), July 1987.

[PRYS89] C. Parent, H. Rolin, K. Yétongnon, and S. Spaccapietra. An ER calculus for the entityrelationship complete model. In Proc. 8th Int'l Conf. on Entity-Relationship Approach, Toronto, October 1989. [PS84]

C. Parent and S. Spaccapietra. An entity-relationship algebra. In Proc. IEEE COMPDEC Int'l Conf. on Data Engineering, Los Angeles, April 1984.

[PS87]

C. Parent and S. Spaccapietra. Un modèle et un langage pour les bases de données de type entité-relation. Techniques et Science Informatique, 6(5), 1987.

[PS89a]

C. Parent and S. Spaccapietra. About entities, complex objects and object-oriented data models. In E.D. Falkenberg, editor, Information System Concepts - An In-depth Analysis, Proc. of IFIP WG 8.1 Working Conference, Namur, October 1989. North Holland.

[PS89b] C. Parent and S. Spaccapietra. Complex objects modelling: an entity-relationship approach. In S. Abiteboul, P.C. Fisher, and H.-J. Schek, editors, Nested Relations and Complex Objects, LNCS 361. Springer-Verlag, 1989. [PSSWD87] H.-B. Paul, H.-J. Schek, M. H. Scholl, G. Weikum, and U. Deppisch. Architecture and implementation of the Darmstadt database kernel system. In Proc. ACM SIGMOD Conf. on Management of Data, San Francisco, 1987. [Sch90]

P. Scheuermann, editor. Report of the Workshop on Heterogeneous Database Systems, Northwestern University, Evanston, December 1990. ACM SIGMOD Record.

[SL90]

A.P. Sheth and J.A. Larson. Federated database systems for managing distributed, heterogeneuos, and autonomous databases. ACM Computing Surveys, 22(3):183 - 236, September 1990.

[SLT91] M.H. Scholl, C. Laasch, M. Tresch. Updatable Views in Object-Oriented Databases. In Proc. 2nd Int’l Conf. on Deductive and Object-Oriented Databases (DOOD), Munich, Germany, December 1991. [SP82]

H.-J. Schek and P. Pistor. Data structures for an integrated database management and information retrieval system. In Proc. of Int'l Conf. on Very Large Database Systems (VLDB), pages 197-207, 1982.

[SP90]

S. Spaccapietra and C. Parent. Integration de vues et relativism sématique. In Journées Bases de données avancées, Montpellier, October 1990. INRIA.

[Spac91a] Spaccapietra, S., Parent, C., and Dupont, Y. 1991. Automating Heterogeneous Schema Integration. Research report, Laboratoire de Bases de Donnees, Departement d'informatique, Ecole Polytechnique Federale de Lausanne. Submitted to VLDB Journal. [Spac91b] Spaccapietra, S., and Parent, C. 1991. Conflicts and Correspondence Assertions in Interoperable Databases. Research report, Laboratoire de Bases de Donnees, Departement d'informatique, Ecole Polytechnique Federale de Lausanne. [SPS87] M.H. Scholl, H.-B. Paul, and H.-J. Schek. Supporting flat relations by nested relational kernel. In P.M. Stocker, W. Kent, and P. Hammersley, editors, Proc. 13th Int'l Conf. on Very Large Data Bases (VLDB), Brighton, UK, September 1987. Morgan Kaufmann, Los Altos, CA. [SPSW90] H.-J. Schek, H.-B. Paul, M.H. Scholl, and G. Weikum. The DASDBS project: Objectives, experiences, and future prospects. IEEE Trans. on Knowledge and Data Engineering, 2(1), 1990.

[SPYA89] S. Spaccapietra, C. Parent, K. Yétongnon, and M.S. Abaidi. Generalizations: a formal and flexible approach. In Proc. Conf. an Management of Data, Hyderabad, India, November 1989. [SS83]

H.-J. Schek and M.H. Scholl. The NF2 relational algebra for uniform manipulation of external, conceptual, and internal data structures. In J.W. Schmidt, editor, Sprachen für Datenbanken, IFB 72. Springer Verlag, 1983.

[SS86]

H.-J. Schek and M.H. Scholl. The relational model with relation-valued attributes. Information Systems, 11(2):137-147, 1986.

[SS90a]

M.H. Scholl and H.-J. Schek. A relational object model. In Proc. 3rd Int'l Conf. on Database Theory (ICDT'90), Paris, 1990.

[SS90b] M.H. Scholl and H.-J. Schek. A synthesis of complex objects and object-orientation. In IFIP TC2 Conf. on Object-Oriented Databases - Analysis, Design & Construction (DS-4), Windermere, UK, 1990. North-Holland. [SSW90] H.-J. Schek, M.H. Scholl, and G. Weikum. From the KERNEL to the COSMOS: The database research group at the ETH Zürich. Technical Report 136, ETH Zürich, Dept. of Computer Science, 1990. [SW91]

H.-J. Schek, G. Weikum, Erweiterbarkeit, Kooperation, Foederation von Datenbanksystemen, in: Proc. 4th GI Conf. on Database Systems for Office, Engineering, and Scientific Applications (BTW), Kaiserslautern, March 1991, Springer IFB.

[SWS91] H.-J. Schek, G. Weikum, W. Schaad, A Multi-Level Transaction Approch to Federated DBMS Transaction Management, in: Proc. of the First Int. Workshop on Interoperability in Multidatabase Systems, (IMS'91), Kyoto, April 1991.

Lihat lebih banyak...

FEMUS: A federated multilingual database system

Descrição do Produto

Comentários