A methodology for conceptual design of office data bases

June 5, 2017 | Autor: Antonio Di Leva | Categoria: Information Systems, Conceptual Design

Descrição do Produto

Inform. Sysrems Vol.9. No. 314, pp. 251-263. Printed in the U.S.A.

1984

0306-4379184 $3.00 + .oa Pergamon Press Ltd.

A METHODOLOGY FOR CONCEPTUAL OFFICE DATA BASES?

DESIGN OF

C. BATINI Dipartimento

di Informatica e Sistemistica,

Universita’ di Roma, Via Buonarroti 12, 00185 Roma, Italy

and Dipartimento

di Informatica,

B. DEMO and A. DI LEVA Universita’ di Torino, Corso M. D’Azeglio 42, 10125 Torino, Italy

(Received 7 March 1983; in revised form 25 February

1984)

Abstract-Paper forms are a widely used mean to collect and communicate data in the office environment. As a consequence, they are a very natural type of user requirements and an effective starting point in data base design of office applications. In the paper a methodology for conceptual design of office data bases is described. It assumes forms as input documents to express requirements and produces as the result of the design process an Entity Relationship conceptual schema. Strategies are proposed to extract from the initial requirements conceptual structures and integrate them into a global description. 1. INTRODUCTION

The fundamental goal of a methodology for Data Base Design (DBD) is to transform a user oriented linguistic representation of requirements concerning the application of interest, into a schema expressed in terms of the Data Description Language of the Data Base Management System (DBMS) chosen for the application. Current approaches to DBD [3, 91 distinguish three main phases: Requirements Analysis, whose goal is to collect and analyze the heterogeneous linguistic descriptions used to express requirements and transform them into a homogeneous description (usually glossaries), independent as far as possible from the linguistic categories of the user and not yet expressed in terms of formal categories of the conceptual model. User requirements concern both static (data) and dynamic (operations, events) aspects of the observed reality. Conceptual Design, whose goal is to produce a formal and DBMS independent description of requirements, called Conceptual schema. We call (Conceptual) Data Schema the representation of static aspects and Dynamics Schema the representation of dynamics. Logical and Physical Design, whose goal is to express the conceptual schema in terms of logical structures allowed in the DBMS at disposal and choose physical parameters. User requirements can be expressed in a wide variety of linguistic representations depending on the different roles of the user and on the organization environment. Following the usual distinction between natural

t This work was supported by the Italian National Research Council (CNR).

and artz$cial languages, linguistic representations of user requirements may be distinguished in formatted and not formatted. Not formatted representations use natural language: their role in DBD has been investigated in [l, 3, 41. Formatted representations can be described by means of a limited number of syntactic rules. According to the type of use, they may be distinguished in (the following list is not at all exaustive): (1) Forms for data acquisition and output description; (2) Record formats (3) Data Divisions of Cobol or similar languages; (4) Data Schemata described in a Data Description Language of a DBMS; (5) Procedures, expressed in terms of Data Manipulation Languages or Query Languages. During the phases of Requirements Analysis and Conceptual Design, (particularly during the first one) the design process should be influenced by the type of linguistic representation at disposal. For example, for natural language requirements heavy activities are usually needed due to the ambiguity of the representation; in case of Cobol data divisions such activities are much less critical due to the formatted representation. The peculiarities of each representation should be used in order to make smooth and fast such process. In ths paper we focus on forms, which are the typical requirements at disposal of the designer when office information systems are conceived. Due to the growing request for such systems, we believe interesting to develop a conceptual design methodology peculiar to forms. The goal of the methodology is to produce the Conceptual Data schema of the application. We assume that forms are the major input to the design process; other requirements describe the operations yet performed manually in the office, or else new operations that 251

C. BATINI, B. DEMO and A. DI LEVA

252

may ask for information not appearing in the forms used at present in the office. In the recent literature [8, 121 forms are used as a design tool for the specification of requirements for data base design. Users give in input to the tool their requirements, expressing them in terms of linguistic categories provided by the form interface. In our approach forms used in real life are the input to the design process. The paper is organized as follows. The conceptual model referred to in the paper is described in Section 2. In Section 3 the structure of forms is discussed. In Section 4 the methodology is discussed and applied to a design example. In Section 5 further research topics and open problems are considered. 2. THE MODEL

The conceptual model used in the methodology is an extension of the Entity Relationship (ER) Model described in [5]. Three basic classes of concepts exist in the original model (see [5]): entities, relationships, attributes. Each concept must have a name and may moreover have associated a set of synonims. Value sets are associated to attributes, with the usual meaning. Four corresponding domains may be declared: Integer, Boolean, String and User defined, whose semantics are the same as in Pascal. In all those situations in which the designer needs to extend the semantics of concepts and value sets, a Text in natural language may be associated to them. Attributes can be simple or compound; in the first case, an attribute corresponds to an atomic domain, in the second case the attribute corresponds to a domain built up as a Cartesian product of other domains (e.g. in Fig. 14 Address is a compound attribute). Several possible identifiers may be attached to an entity. An identifier of an entity E is a set Z of attributes and/or entities such that every occurrence of E is uniquely identified by an occurrence of Z (e.g. in Fig. 14 the pair of attributes Year, Month is an identifier of entity Period). Following the approach in [13], in the model the functionality and the total or partial nature of relationships may be described by means of min-card (inalities) and max-card(inalities). The min-card is the minimum number of times that each occurrence of the entity may be involved in an occurrence of a relationship. Value 0 means that an occurrence may exist without being involved in any relationship occurrence. Value 1 (or n) means that an occurrence cannot exist without being involved in 1 (or n) relationship occurrences. Correspondingly, the max-card is the maximum number of times that each occurrence of an entity may be involved in an occurrence of a relationship.

Such approach is extended in our model to describe cardinalities of attributes. As a consequence we allow also repeating attributes, i.e. attributes with max-cards equal to n. In order to simplify diagrams, cardinality 1,l is assumed in this case as default. The model is finally extended with two further abstraction features, i.e. subset (sub) and generalization (gen) hierarchies: they are defined only for entities. An entity El is a subset (inversely superset) of another entity E2 if every occurrence of El is also an occurrence of E2. An entity E is generalization (inversely, specialization) of the entities El, E2, . . . , En if each occurrence of E is also an occurrence of one of the entities El, E2, . . . , En; in case it is the occurrence of only one of them it is called exclusive. The partition over the occurrences of E established by the generalization is seen as induced by a property of E. This is explicitly represented by an attribute of E: the attribute is referred to as underlying attribute. Both in subset and in generalization, attributes (and relationships) of the entity at the upper level of the hierarchy are also attributes of the entities at the lower level. Entities at the lower level will generally have additional attributes (and relationships) with respect to the entity at the upper level; in the model we explicitly represent only such attributes (and relationships). For example, in Fig. 14 Publication is a superset of Book and a generalization of Journal and Proceedings. 3.

THE STRUCTURE OF FORMS

Forms are documents used in organizations to manage communications (in input, in output, between subsystems of the organization) and provided usually of a sufficient level of structure. Such structure is usually expressed in paper forms by means of bidimensional tables; this feature lead several authors [8, 121 to propose hierarchical models of forms, in our opinion too restrictive to capture the great variety of structures that occur in practice. Forms represent an easy to acquire, “mature” and reliable source of knowledge on data managed in the organization. Hence it is foundamental to use the information they provide during Requirements Analysis and Conceptual Design. Furthermore, being user oriented, forms are structured as we said in a lot of different shapes; their structure and meaning may be expressed by means of textual descriptions or common implicit conventions. Two very simple examples of the variety of rules and conventions that may govern the use of forms are shown in Figs. 1 and 2. In Fig. 1 the meaning of fields is suggested, besides the context in which the fields appear, by their format and the partial filling of the second field. In the fragment of Italian text represented in Fig.

A methodology for conceptual design of office data bases

The bill has been sent in

253

, , , ,I,s,

------_____--------------~~~~~----------

Fig. 1. A fragment of form.

2 an article appears that is only partially specified. In Italian two different definite articles are used for masculine and feminine gender: they are “il” and “la”. If the person referenced in the form is a man, then an “i” has to be written before the “1”; if the person is a woman an “a” has to be written after the “1”. From the above considerations we may state that managing forms during DBD is crucial, and conversely that no hope exists to build an algorithmic method to extract a conceptual schema from a form. In this section we depict a general framework for classifying forms and their structures. We can usually distinguish in forms four parts: certificating, extensional, intensional, descriptive. Certijhting part concerns information as: -date of issue; -stamps, marks; -identifiers, progressive numbers; -signatures. This part certifies the existence and correctness of the form and usually does not convey relevant semantic information; in the following we do not make further reference to this part. Extensional part is the part of the form that is to be filled in when the form is compiled. Zntensional part contains implicit or explicit references to symbolic nouns of areas or fields that are to be filled with values when the form is compiled. Descriptive part contains instructions or rules describing the way to fill the extensional part. Usually extensional and intensional part of the form are strictly interconnected and distributed in the form; as a consequence, the analysis of their structure has to proceed jointly. Descriptive parts may appear interconnected with them or kept apart (e.g. notes at the foot of the page) or even, as we said, may be implicit (everytime rules are based on “obvious” conventions; e.g. if the person is male do not fill the field Number-of-pregnancies). We examine now in some more detail the intensionaliextensional part and the descriptive part. Zntensiona~~extensionai part The basic units of info~ation that make up an intensional/extensional part will be called areas in the following.

u1.u

An area is a piece of the form that consists of information, pertaining to the same argument. A classification of information within areas that is useful for our goals is driven by the type of linguistic representation used: (1) Parametric text: is a unique text in natural language with fields that are to be filled with values extracted from suitable domains (e.g. We certify that Mr. . . . . . was born in . . . . . . . . . . . . .). Once such fields have been filled the text has the shape of one or more complete and coherent sentences in natural language. Frequently, in this case, the intensional information is not explicitly present and must be put in evidence during the design. (2) Structured frame, when it has a structure of n-dimensional table. See in Fig. 7.2 an example of monodimensional table, and in Fig. 7.1 a tridimensional table, expressed by means of an ordered set of bidimensionai tables. An area is classified according to its information type (if single) or as compound area. Descriptive part Descriptive part concerns in general constraints on the way forms are to be used and filled with data. In the Data Base literature, integrity constraints allow to express prescriptions on the reality to be represented by means of the conceptual schema (see for instance [7] for a general discussion). Usually integrity constraints may be classified in: -static. when they express properties of data base states; -dynamic, when they express properties of changes from one data base state to another. Constraints may be expressed by means of assertions of a special purpose language. In 121, for instance, it is shown how constraints of arbitrary complexity may be expressed in such a language. In the following we consider only constraints that may directly be expressed in the ER model (intrinsic constraints). Between them specially relevant are: -Restrictive constraints, that express rules on conditional filling of fields (e.g. Fill only if sex is “male”); -Functional dependencies between fields; -Uptionalities of fields. -Existence constraints.

Sicporu

-f --m-----e ? -______--_---- -T -__---__-- __-__-article

name describing the title

space

for

Fig. 2. A fragment of form.

name

and

surname

C. BATINI, B. DEMO and A. DI LEVA

254

Forms

Constraints specially relevant in forms, and that are not managed in our model, are those concerning derived data (e.g. Totals). 4. THE METHODOLOGY

1

Introduction The gross architecture of the methodology is shown in Fig. 3. As we said in the introduction, together with forms also operations are considered. We assume that operation requirements are expressed by means of natural language descriptions, as usual for forms applications. The analysis of forms is anticipated with respect to the analysis of operations for two different reasons: (1) we have assumed that forms are the major input for the designer.

Draft Data Schema

1

Completness Analysis 1 Global Data Schema

Fig. 3. Gross architecture of the methodology.

-------

Ir----'Publications

(in natural lannuane)

I

4.1

LIST OF PUBLICATIOI9S _____---_----------------~_____-------

Operations

fl ----_----area 1

----__-----

--------------------I

purchased by Department . . . . . . . . . . . . . .. . . . . . . . .. (address........... telephone..........) in . . . . . . . . . . month vear ’

’

I!

I

II

I

II

’I ’I

’I

‘I I

’

’

I

1I ’I

’I ’I

l

’,

I

,,--------------’ I List of topics ’I ’ ’ ’

'I

I

I

-_-----_------------

‘A--

area 1 .l

l

I

I

1

-----

-------_,

I’

Papers(3)

I

I Topic I

;;,.;...,__;....,-

Subtonics

’I

1I I I

I I

I

I

’ ’ Editors of Journals and Proceedinns

Ij

I

I

’ II

II I’

‘I

1

I I

-_________(_“““_1~2______--__--___ __--

----

-----

-----------------------------------_-__

I

L_______________________I

Yates: (1) Book, Journal, Proceedings (2) Fill only for Books (3) Fill Only for Journals and Proceedings Fig. 4. Form 1.

I

A methodologyforconceptual design ofoffkedatabases (2) data appear in forms highly structured; as we’ll see later, formatted requirements give semantic information easier to conceptualize with respect to not formatted ones. This approach allows a better convergence and stability in the global design process. Now we deal separately with the two phases. 4.2 Data design We illustrate step by step the different activities, making use of an example in order to make easier its understanding. We assume that the inputs to the design process are the two forms of Figs. 4 and 5 concerning the organization of a library serving the departments of a University. Three steps are involved in the phase: (1) Form-Analysis and Area Design. (2) Form Design. (3) Interschema Integration. During FORM ANALYSIS forms are investigated in detail in order to identify their structure in terms of categories described in Section 3, i.e. parts, areas and their properties. For every form a Glossary is produced describing concepts contained in the form, their elementary properties and abstraction hierarchies between concepts present in the form. During AREA DESIGN glossaries are used to convert every area into a corresponding conceptual schema. During FORM DESIGN for each form the set of its area schemata is integrated into a single Form Schema. During INTERSCHEMA INTEGRATION possible conflicts between form schemata are solved and a drift data schema is built. Form analysis roughly corresponds to the phase called in Section 1 Requirements analysis, while

255

Area design, Form design and Interschema integration (together with Completeness analysis) correspond to Conceptual design. Area design in our methodology is joined to Form analysis for two reasons: -the type of requirements that here has been assumed is homogeneous (forms only). -there is a strong cohesion between them, due to the iterative process suggested for building glossaries and area schemata. We examine now in detail the three activities.

FORM ANALYSIS

AND AREA DESIGN

The following is a procedural activity:

description

(Form Analysis) For each form (1) Distinguish parts (descriptive, external/internal) (2) Select areas and subareas and give names (3) For each area (3.1) Extract elementary concepts (3.2) Fill glossary with elementary properties (3.3) Fill glossary with abstraction hierarchies between concepts. (Area Design) For each form (4) For each area Repeat (4.1) Choose ER types for concepts in the glossary (4.2) Update and/or enrich the glossary Until (4.3) All the concepts have been typed

Form for request of publications

to an external research croup

--__---.._---______-------------I

, Sender _......................

I 1 Department . . . . . . . . . .. . .._....... I

I

I

Address

. .. . .. . . . . . . . . . .. . . . . .

Dear . . . . . . . . .. .. . .

1 I should nratl:I appreciate receivinc a copy OF the followinn

I 1 publications: I

I I

Authors

of the

Title

I I I I I I I I , I I______________‘________---_--____________________ Fig. 5. Form 2.

C. BATINI.B. DEMOand A. DI LEVA

256

day,month,year 4 r

name,surname 1 I Professor.......,.......

011 . ../..../.....borrows

the followin books: . . . . .. . . . . . . . . . . .. . . .. . . . . . . . . .. . . .. . . . .. . . . .. . . . . .. . . . . . -----__-----------------~------~------~ Fig.6.Example of elementary properties ina parametric text.

Step 1 is self explanatory from Section 3. The goal of step 2 is to get a first draft specification of the structure of abstraction hierarchies present in the form. The process of locating areas may be recursive and can give raise to several nestings. Areas and subareas for our example are shown in Fig. 4 enclosed within frames. Names for areas can be: Publications, General Information, Detail on papers, Grants. The activity of extracting elementary concepts is guided by the type of linguistic representation (parametric text, structured area) used in the area. In case of parametric text, usually atomic values occur whose names appear implicitly or explicitly in the text (see Fig. 6). In case of structured area, besides the above case, an elementary concept may be found when (see Fig. 7): (a) In the table it is possible to single out a set of atomic homogeneous elements, indexes of the table; (b) In the table it is possible to single out a name and atomic elements that are values in the table and instances of that name. Step 3.2 concerns the data glossary, in which for every area the corresponding elementary concepts appear (see in Fig. 8 a fragment of the data glossary

for Form 1). For every concept appear a code, a name, the areas in which the concept occurs, a description, possible instances and synonyms, abstraction hierarchies in which the concept is involved. Description, instances and synonyms are useful in order to have a deep understanding of the meaning of the concept and its role in the application. Two types of abstractions are typically expressed in data models used in conceptual design [ 10, 111, and appear in the glossary: (1) Aggregation, that allows a relation between several concepts to be thought of as a (higher level) concept; (2) Generalization, that allows several concepts to be seen as a higher level concept. Aggregation may be represented in the model by means of composed attributes, entities, relationships. Generalization may be represented by means of subset and what we have called generalization hyerarchies. During Step 3.3 a first draft version of abstractions is produced, that should include the most evident hierarchies existing between concepts. The tree of areas produced during step 2 can be a useful starting point for the investigations involved by Step 3.3. The most common situation while per-

Expenses for the last five vears Jan

Feb

Yar

...

Dee---e,

period I 1979

I “-_c’

I

1-15 ’ 16-31

I

/ year

Example of ease a

1.

Statistics on books --_ Number of books in the librarv '+--.. 1978 1979 1980 I I

I 2. Zxamnle of case b L

Fig. 7. Examples of elementary concepts in a table.

List

DO07

--_

Topics

DO02

~

e

---___

_

---

Instances

__________

-----.

keywor'

,ynonim:

_--__--

Ayqreg.

of

Fig. 8. A fragment of the data glossary for area publications

of form 1.

-- -----

Grant

Gener.

---

---------

if the two occurrences have same meaning

of Notes

____--I----___ 1 I

Publications purchased by a department in a period of time

who made the proposal ?.oss,Smit for acquirincj the publication Arnuments involved in DE Desinn CAD/CA?1 publication or paper

Description

Form

r

s

1 2. 2

I,3

Pirea

I. I

of

Research

Nallle

GLOSSARY

DO01

Zode

DATA

/I

1

C. BATINI, B. DEMOand A. DI LEVA

258 List

of

/

Code

account for the first time and the relationship between Researcher and Publication is reexamined. As a consequence Author is associated to the new entity. Finally, in Figs. 12 and 13 we show schemata for Area 2 of Form 1, and Form 2.

Publications

\

Code

Title

Fig. 9. First draft aggregation tree.

forming Step 3.3 (and in case at the end of Step 3.3 too) should be that a bunch of small disconnected hierarchies is obtained. See in Figs. 9 and 10 a pictorial representation of hierarchies result of this activity for our example. Notice that concepts may appear in the glossary, used in Form Analysis as intermediate conceptualizations, that will have no counterpart in the conceptual schema (e.g. List of Publications). The goal of Area design is to produce a formal description of each area expressed in the model. Area design is in principle an iterative activity: the schema is produced by a set of refinements in which successive details are taken into account and, in case, previous choices are revised. During such iterative activity glossaries are used as source of information and cross reference (steps 4.1 and 4.3). For this reason it is very important to keep them up to date (step 4.2). Step 4.1 is very complex and difficult to formalize. Aggregations and generalizations are expressed in business forms with a lot of different shapes. For instance, in Form 1 (see Fig. 4) the fragment concerning Publication, Code, Title shows a typical graphic structure used for expressing aggregations. Note 2 in form 1 is an example of what we have called in Section 3 a Restrictive Constraint, and is a typical indication of generalization. Identifiers, value sets, cardinalities are chosen after a deep examination of the meaning of fields. Very useful is taking into account and comparing instances of forms filled with values. In Fig. 11 three refinements of the conceptual schema for Area 1 of Form 1 appear. Notice that in the first refinement two entities have been singled out. In the second refinement Address is acknowledged as composite attribute. In the third refinement Book is acknowledged as possible subentity of Publication, in consequence of Note 2 (see Fig. 4). Furthermore, Month and Year are taken into

Publication

Books

Journals Proceedings

Fig. 10. First draft generalization

tree.

FORM DESIGN

The activities are: For each form (1) Check and solve type conflicts between area schemata. (2) Merge area schemata. (3) Analyze schema enrichments and restructurings. During this activity we make the assumption that in the set of area schemata homonyms and synonyms are absent, i.e. in different areas the same concept has the same name and different concepts different names. However we cannot exclude type conflicts (Step l), that occur whenever the same class of objects in the real world appears in two schemata with the same name and different properties (types, cards, etc.). In this case a trunsformation is due in order to unify types. For instance, Topics is an entity in Fig. 12 and attribute in Fig. 11, and Type is an attribute in Fig. 11 and underlying attribute in Fig. 12. An obvious choice in this case is to change Topics into an entity and Type into an underlying attribute in the first schema. When all the types have been unified, Merging can take place (Step 2) by means of a simple superimposition of common concepts. At the same time, while obtaining a global integrated representation of the form, it is useful to still be able to distinguish concepts that belong to different areas: such information is needed in the next step. See in Fig. 14 the schema result of merging in our example. The draft integrated schema is now analyzed (Step 3) in order to obtain a more reliable and clear description. Two main tasks are distinguished: (a) The first task is devoted to enrich the schema with Interschema Properties, i.e. all the modelling features defined between different concepts in different schemata and that where, as a consequence, hidden to the analyst in the design of a single schema. In our example, the hierarchy in Fig. 14 has to be restructured and transformed in a new structure, where Book is acknowledged as member of the generalization hierarchy. (b) The second task is devoted to increase the clarity and expressiveness of the schema. Such features can be achieved by means of restructurings. This activity is highly informal, being clarity and expressiveness design qualities difficult to quantity. In our example Journal and Proceedings in Fig. 24 have exactly the same properties, while represent different concepts. Hence they can be merged into a unique entity called Collective Publications with

259

A methodology for conceptual design of office data bases v

aut’lor researcher topics type year

title Code

1st ^_

2nd

Refinement

Refinement month vear

authors

(1 ,n)

Dep-Res

3rd _-

Refinement:

Fig. 11. Three refinements

final

schema

for the conceptual schema for area 1.

a new attribute Kind who ;e domain values are Journal and Proceedings. The introduction of new ab-

stractions is justified when they really add further clarity and simplicity to the schema, and is discouraged when “innatural” entities are to be created.

INTERSCHEMAINTEGRATION

This phase is in principle very similar to Form Design: both phases concern a bottom-up integra-

tion of several conceptual views. During Interschema Integration, however, several problems usually irrelevant in Form Design become critical. The activities for this phase are:

(Initialization) (1) Choose a first form schema F. (Incremental aggregation) (2) Repeat (2.1) Choose a new form schema Fi.

name

title authors(l,n) editors(l,n)

Lo editors

(I ,n,

Fig. 12. Schema for area 2 of form 1.

O,l

260

C.BATINI,B.DEMOand A. DI LEVA name address

surname

surname

PUBLICATI

title

Written by Fig. 13. Schema of form 2.

(Conflict analysis) (2.2) Check and solve naming conflicts. (2.3) Check and solve type conflicts. (Merging) (2.4) Merge form schemata. (Enrichments and restructurings) (2.5) Analyze interschema enrichments and restructurings. (2.6) Check for redundancies. Until (2.7) Schemata to be integrated are finished. First of all (Step 1) an order must be chosen for schemata to be integrated to anticipate the integra-

tion of schemata with higher relevance. In our example only two schemata are concerned. The activity of Conflicts Analysis, besides type conflicts (Step 2.2) checks for naming conflicts between concepts (Step 2.3). Naming conflicts are Synonyms and Homonyms. Pairs of concepts are synonyms when they have different names, but represent the same class of objects; they are homonyms when they have the same name but represent different classes of objects. In order to guide the designer in such investigation, several possible types of indications can be taken into account, i.e.:

surname

G

O,n

-G

ROCEEDING authors

authorstl ,n)

(l,n)

Fig. 14. Schema 1 after form design.

A methodology for conceptual design of offke data bases ,--7

261

name

surname

surname

editor

(I ,n)

Fig. 1.5. Draft data schema.

0 A-P

LIBRARY PAPER

Fig. 16. Operation schema example.

7EFERENCED PAPER

262

C. BATINI, B. DEMO and A. DI LEVA

-multiname anomalies, that occur when several names or synonyms are attached to the same concept in a schema and to different concepts in the other one. -concept likeness, that occurs when distinct concepts have several common properties and constraints in the schemata. -concept unlikeness, the inverse of the above. In our example, while integrating Form schemata 1 and 2 a synonymy is found between Sender and Researcher (Sender is renamed in Schema 2); an homonymy is found for Publication (renamed into Paper in Schema 2). Furthermore, different cardinalities are found in the two occurrences of the relationship between Department and Researcher: the representation in Schema 1 is considered more comprehensive and is chosen over the other. Finally, a type incompatibility is found for Author (changed into an entity in Schema 2). Finally, redundancies must be checked (Step

2.6). Putting together the schemata in the merging phase may indeed give raise to redundant paths. Redundant paths are to be detected during conceptual design since the analysis and inspection of the information content of the schema is a typical DBMS independent activity. On the contrary, it is a task of logical design to choose which paths to get rid of. This activity could be also anticipated during Form Design, but it seems better to perform it only when a global view of the application has been achieved. In our example a redundancy occurs in the cycle between entities Grant, Researcher, Publication in case the proposing researcher could be only who has the grant. Notice that this specific redundancy could be checked also in the previous phase of Form Design, since the whole fragment appeared in Schema 1. The draft data schema result of these activities is shown in Fig. 15.

‘0 price

,n)

/ /

name

REFERENCED PAPER

Fig. 17. Global schema.

A methodology for conceptual design of offlce data bases 4.3

Completeness

analysis

Operations are now analyzed and compared against the schema for gaining completeness in the representation of requirements in the Data schema. As we shall see, such comparison can help either in enriching the information content of the schema (by adding new concepts) or changing the information content due to the discovery of previous unreliable choices. In general, an operation on forms may require a combination of queries and updates; so it may involve a navigation in the schema and a modification of data in the data base with new values. Furthermore an operation can involve data appearing either in a single form (intraform operation) or in several forms (interform operation). When the operation is simple, e.g. concerns a set of attributes within a single entity or else a navigation over a unique relationship between two entities, the methodology suggests to derive from the draft data schema a subschema that contains the information sufficient to perform the operation. In case such schema cannot be derived, an enrichment of the schema is needed which is in general suggested by the derivation activity itself. As an example, consider the operation “Compute the amount of expenses for publications for a given department in a given period of time”. A simple analysis of the schema suggests to add the attribute Price to the entity Publication. When the operation involves a complex selection pattern, the methodology suggests the following procedurality: (1) First of all, build the operation schema, i.e. a conceptual schema containing exactly the information needed to perform the operation. This is done to get a better comprehension of the involved data. (2) Secondly, integrate the operation schema to the draft data schema, performing the activity yet described as Interschema integration. As an example of the above activity, assume that the following operation appears in the requirements: “For a given paper in the library, find: (a) the authors of referenced papers, (b) the topics of referenced papers which are in the library”. A possible operation schema appears in Fig. 16. Integrating such schema with the draft data schema gives raise to some amendments. An homonymy is found for Paper, that has to be changed in the draft schema with the hierarchy present in the operation schema. Pairs of relationships (P-T, T-L), (A-P, A-P) collapse; furthermore, relationships P-C with the entity Collective and Req with entities Researcher and Author are assigned both to the entity Library Paper.

263

Notice that at this stage an indication of homonymy occurs when an operation requires to update an attribute on one form and not on another form. A splitting is needed in this case for the corresponding entity. As a result of all these activities, at the end of the design process the schema of Fig. 17 is obtained.

5.CONCLUSIONSANDFURTHERRESEARCH

In this paper we described a methodology to build the conceptual data schema of a data base application when user requirements are expressed by means of forms. The methodology has to be extended to express at conceptual level operations, events and integrity constraints. Furthermore, automatic tools are needed to support the designer in this activity. A first approach in this direction appear in [6].

REFERENCES [l] C. Baldissera. S. Ceri. G. Pelaeatti and G. Bracchi: Interactive specificat& and f&ma1 verification of user’s views in data base design. Proc. 5th Conf. VLDB (1979). [21 M. L. Brodie and S. L. Zilles (Eds.): Workshop on data abstraction, databases and conceptual modelling. ACM SZGMOD Record ll(2) (1981). [31 S. Ceri (Ed.): Methodology and Tools for Data Base Design. North Holland, Amsterdam (1983). [41 S. Ceri, G. Bracchi and G. Pelagatti: A structured methodology for the design of static and dynamic aspects of database applications. Inform. Systems 6( 1) (1981). [51 P. P. Chen: The Entity Relationship model: toward a unified view of data. ACM TODBS l(1) (1976). [61 B. Demo, A. Di Leva and C. Batini: A tool for automatic design of business form batabases. Office Information Systems (Edited by N&ah). INRIA/North Holland, Amsterdam (1982). [71 M. Hammer and D. McLeod: The semantic data model: a modelling mechanism for data base applications. Proc. SIGMOD Conf. (1978). 181V. Y. Lum et al.: Automating business procedures with form processing. Proc. Znt. Conf. on Office Information Systems. North Holland, Amsterdam (1982). 191V. Y. Lum et al.: 1978 New Orleans data base design working report. Proc. 5th In?. Conf. VLDB (1979). [lOI J. M. Smith and D. C. P. Smith: Database abstraction: aggregation. Comm. ACM 20(6) (1977). [Ill J. M. Smith and D. C. P. Smith: Database abstraction: aggregation and generalization. ACM TODBS 2(2) (1977).

[I21 N. C. Shu, H. K. Wong and V. Y. Lum: Forms approach to requirements specification for database sign. Proc. SIGMOD Co&., San Jose . (1981).. [ 131 Van Grietuysen et al. : Concepts and termmology the conceptual schema. Preliminary IS0 Rept. TC97JSCSIWG3 (1981).

de-

^

tor IS0

Lihat lebih banyak...

A methodology for conceptual design of office data bases

Descrição do Produto

Comentários