Knowledge-Based Approaches to Database Design

Share Embed


Descrição do Produto

Database Design

Knowledge-Based Approaches to Database Design

Keywords:Knowledge-based software, database design, expert sytems ACM Categories:H.2.1., 1.2.1

Introduction By: VedaC. Storey Wm.E. Simon Graduate School of BusinessAdministration University of Rochester Rochester, NewYork 14627 U.S.A. Robert C. Goldstein Faculty of Commeme and Business Administration University of British Columbia 2053 Main Mall Vancouver, BoC., CanadaV6T lZ2

Abstract Databasedesign is often describedas an intuitive, evenartistic, process.Many researchers, however, are currently workingon applyingtechniquesfromartificial intelligenceto provideeffective automated assistancefor this task. This article presentsa summary of the current state of the art for the benefitof future researchers and users of this technology.Thirteen examples of knowledge-based tools for databasedesign are briefly describedand then compared in termsof the source, content, and structure of their knowledgebases; the amountof support they provideto the human designer;the data models and phasesof the designprocessthey support; andthe capabilities they expectof their users. Thefindings showthat there hasapparentlybeen very little empiricalverification of the effectivenessof thesesystems.In addition, mostrely exclusively on knowledgeprovided by the developersthemselves andhavetittle ability to expandtheir knowledgebasedon experience. Althoughsuchsystemsideally wouldbe usedby applicationspecialists rather than database professionals,mostof thesesystems expectthe user to havesomeknowledge of databasetechnology.

Databasedesignis the task of "designingthe structure of a databasein a given environment of usersandapplicationssuchthat all users’data requirementsand all applications’ processrequirements are ’best satisfied’ "(Batini, et al., 1986).Database designis typically dividedinto four phases:(1) requirements specification; (2) conceptualdesign; (3) logical design; and(4) physical design.Therequirements specification phaseis concerned with identifying the information needsof varioususersor groups.Theconceptual design phasemodelsthe users’ and applications’viewsof informationand, possibly, includesa specificationof the processing or use of the information(Batini, et al., 1986).Theobjective of this phaseis to produce a precise,formal representationof the requirementsthat is independent of any particular databasemanagement system(DBMS)or approach. During the logical designphase,the conceptualschema is translated into the logical data modelof the selected DBMS; for example,a relational data model. Finally, physical database design transformsthe logical data modeldesigninto a formthat is suitablefor the specifichardware and databasemanagement system that are to be used.Forfurther discussionon database design, seeTeorey,et al. (1986)or Storey(1991a).Figure 1 summarizes the databasedesign process. Database designhasoften beendescribedas an art rather than a science.Traditionally, it has been carried out on an ad hoc basis by a "databasedesign expert" whoobtains information abouta user’s dataneedsthroughinterviewing, examiningexisting documents and systems, and other suchmanual,labor-intensive means (Bouzeghoub,et al., 1985). The main weaknessesof this traditional approach are that expert databasedesigners are scarce, and the design of the databaseis doneby someone who is probably unfamiliarwith thespecificapplication domain. Artificial intelligencetechnology maybeable to helpalleviate theseproblems.Givena set of re-

MIS Quarterly~March 1993 25

Database Design

quirements,an expert systemcould generatea databasedesign automatically or semi-automatically. Sucha systemwouldincorporatethe rules and heuristics used by humandatabase designersand apply themin a systematic way to eachdesigntask (Mylopoulos,1983).The rulebasedstructure of manyexpert systemsfacilitates incremental modification and makesit possiblefor themto explainhowa conclusionhas beenreachedand the assumptionson which it is based.This helps to developuser confidence in the system’s recommendations (Demoand Tilli, 1986). Usersalso benefit from the computer’sinexhaustiblesupplyof patience.It nevergetstired or boredaskingthe manyrepetitive questionsrequiredto elicit all of the attributes in a large database design.It is possible, of course,that the user maybecome boredor impatient answering all these questions. However,with an automateddesigner, whenthe user becomes tired, bored,or gets interrupted, he or shecan simply put the system"on hold" and return to it later. Fromthe developer’sperspective, anotheradvantageof this approach is that knowledge-based systemstypically employa modular,non-proceduralstructure. This allows the developerto focus on oneaspectat a time without havingto worryexplicitly abouthowit will relate to all others. In addition, a knowledge-based approach shouldfacilitate, depending uponcharacteristics of the problem:the useof heuristics, learning fromexperience,incorporationof domain-specific knowledge, andthe choiceof a designstrategy. Theobjectiveof this article is to surveyexisting experimentalknowledge-based systemsfor database design in order to provide guidancefor future research.Thecurrent state of the art is assessedby comparingand contrasting systems usinga five-dimensionalframework developed for that purpose.Thenextsectionof the article provides brief descriptions of a numberof knowledge-based systemsfor databasedesign. Then, the framework is appliedto integratethe results of the surveyto producea coherentview of the currentstate of research in this area.In thefinal section, the framework is usedagainto structure suggestions based on the analysis of past research. Therecommendations should be helpful to researchersand developers.

26 MIS Quarterly~March1993

Knowledge of application; existing formsanddocumentation

I. Requirements Specification Determines informationneeds.

Description of users’needs (often expressed as views);mayberepresented in naturallanguage or usingthe artifacts of a particulardesignmethodology.

I1. Conceptual Design Expresses all information requirements in theformof a high-level DBMS-independent model.

Conceptualschema; Oftenexpressed as E-RModel.

III. LogicalDesign Translatesthe conceptual schema into the data modelof the target DBMS,

Logicalschema; e.g., relational,network, or hierarchical.

IV. PhysicalDesign Determines the datastorage structuresandaccess paths.

Physicalschema; storagerepresentations; access paths; performance-optimizing parameter values. Figure 1. DatabaseDesignPhases

Database Design

Survey of Knowledge-Based Systemsfor Database Design Thesystemssurveyedare all concerned with the first threephases of database design.Bythe conclusionof the third phase,the contentandlogical structure of the database havebeendetermined. Thefinal stage, physicaldesign,is conceptually different because it focuseson performance considerationsandrequires knowledge abouttransaction characteristics and DBMS and hardware specifications--information that is often not available at the time a databaseis being designed. Thechosensystemsare summarized in Table 1. For further discussion, see Storey (1993b)and Storey and Goldstein(1992).

Intelligent interview system The Intelligent Interview System (12S) (Kawaguchi, et al., 1986)extractsinformationrequirementsby interviewing the end users of a database.Theintent is for the systemto learn aboutdifferent applicationdomains andaboutthe interviewingprocessas it interviews different users. 12S assumes that the end users are not familiar with computers or database technology. 12Sconstructsa modelof the user’s application domain(for example, a companyor school), calleda planstructure.Theplanstructurereflects an entity-relationship (E-R) model(Chen,1976) and consists of a set of plans, eachof which representsan activity (for example,supplying part) corresponding to a relationship.Thecurrent version of the systemdealsonly with relationships and cannothandleentity attributes. 12S then usesits domain knowledge to infer whatfurther informationmightbe needed from the user. Eventually,a relational data modelis produced. Themostinteresting featuresof the systemare its: (1) objectiveof storingandbuildinguponthe informationthat it obtainsfromeachsession;(2) employment of natural languageas its modeof input; and(3) useof an"attentionlist" consisting of things to "discuss"with the user. Thelatter is intended to prompt the userto articulate informationrequirementsthat he or she might have missedspecifying previously.

View creation system The View Creation System (VCS) (Storey, 1 Storeyand Goldstein,1988)engages 1988; the

end user of a databasein a dialoguedesigned to elicit the entities, relationships, andattributes of a user view. A user viewis the perceptionof users about what a proposeddatabase(or an ideal database) shouldcontain(Navathe,et al., 1986).Theuseris not expected to knowanything aboutdatabasedesign techniquesor terminology. At appropriatepoints duringa session,the system usesbrief tutorials to explainthe concepts of entities, attributes,andrelationships.Theuser is led to describehis or her applicationusingthe constructsof the entity-relationshipmodelwhile the systemattemptsto detectandresolveinconsistencies, ambiguities,andredundancies in the user’s input. TheViewCreationSystempromptsthe user for entities andrelationshipsthat appearto be implied but werenot specifiedby the user; for example,it searchesfor entity typesthat do not appearin any relationship andpromptsthe user to providerelationships for thoseentities.It is also capableof detecting whena user has usedthe wrongconstruct. For example,if the user providesan attribute of oneentity typethat is the name of another,the systemrecognizes it (unless the secondentity type is needed as part of the keyof the first; a keyis anattributeor set of attributes that uniquelyidentifies an occurrence of ar~ entity type) an attribute hasbeenemployed whena relationship shouldhavebeenused.The knowledgebase contains a numberof other heuristics for identifying suchproblems.Eventually, an entity-relationshipmodel of the user’s application is obtained. The system then automatically selects primary keys from candidatekeysandtranslatesthe entity-relationship modelinto a set of relations. Furtherinteraction with the useris requiredto identify andresolve functionaldependencies, anda set of fourth normalform(4NF)relations, representing the user’s view, is produced.

Computer aided requirements synthesis CARS(Computer Aided Requirements Synthesis) (Demoand Tilli, 1986) is a database designtool, the purposeof whichis to produce a preliminaryuserviewfrompreviouslyspecified entities and relationships. CARS assumes that requirementscollection, specification, and analysishavebeencompleted but that the information obtainedhas not beencombined into a

MIS Quarterly/March 1993 27

Database Design

view.CARS then carries out the aggregation process to producea single user viewbasedon the E-R model. Thesystemhasthree basic modes:(1) synthesis mode;(2) verify mode;and (3) assert mode. the synthesis mode,the systemaccepts, as input, entities andrelationships that havebeen previouslyspecified in the requirements collection phase. CARSthen derives an aggregate representation in the formof a user view.In the verify mode,the system takes an entityrelationship subschema and compares it to requirementsthat are alreadyin the system.When conflicts occur,the systemfirst tries to resolve thembasedon its ownknowledge; if this is not possible, it interacts with the user. Theassert modeusesthe sametype of input as the verify mode. In this mode, however, the system assumes that the newsubschema is correct. Any informationderivablefromit that is not consistent with the previouslyexisting viewis usedto modifythe existing one. Notethat the synthesis modeconstructs views from elementary requirements.Theother two modes really address the problemof viewintegration.

Modeller Modeller(Tauzovich,1989)is an expert system that acceptsa descriptionof the enterpriseand producesa conceptualdata modelexpressedin extendedE-Rform. Theconceptualmodelis then usedas input to a logical designmoduleto produce a relational model. Theoutput from the logical designis used, in turn, as input to a modulefor physical design. Theuser provides,as input, a descriptionof the enterprisein an English-likelanguage.Modeller translates the user’s statementsinto a correspondingset of assertionsaboutentities, relationships, and attributes, eachof whicheither addsnewconclusionsto the evolvingconceptual modelor supportsexisting facts. At anypoint the user canask the systemto take a broadlook at the conceptualmodel(a global exploration), refine it in certain ways,and/orpoint out potential semantic problems. The system has a number of waysof detectingthese. For example, it checks that eachentity is relatedto at leastone other entity. To completethe conceptualmodel, many-to-many relationships are eliminated and defaults are applied.

28 MISQuarterly/March 1993

Duringthe logical designphase,aninitial set of relationsis generated. Then,to ensurefirst normalform(1NF), a rule is applied that converts set-valued attributesinto relations. Rulesarealso applied to select primary keys from candidate keys, generatekeysfor thoserelations that do not haveany, eliminateredundant relations, and "explode" aggregateattributes (such as "address") into their components (e.g., "number," "street," and"city"). Finally, the PhysicalModeller performsfurther optimization and produces a physical design. Interesting features of this systemare the two taxonomiesthat havebeendevelopedto assist the conceptual designprocess.Thefirst classifies conceptsof the extendedE-Rmodel;for exampile, it identifiesrelationships bytype, cardinalio ty, scope, and subclass. Thesecondtaxonomy classifies the facts aboutthe designprocess;for example,it distinguishesfacts aboutthe model being developedaccordingto sourceand reliability andalso keepstrack of the currentstate of the modellingprocess.

Expert database design system based on forms The Expert DatabaseDesign System(EDDS) (Choobineh, et al., 1988)is basedon the belief that one can build a conceptualschemaof an enterpriseby analyzinga set of forms.A formis defined to be any structured collection of variablesthat are appropriatelyformattedto support data entry andretrieval (Choobineh, et al., 1988). Duringthe design process,one form is chosento be analyzed,andit becomes the basis for constructing an E-Rdiagram.Anotherform is then chosento augmentthe design, and the processcontinuesuntil there are no moreforms to analyze.Foreachform,a dialogueis initiated with the user(a database designer)to identify entities, attributes, andrelationshipsin the form fields and to determinethe minimum and maximum cardinalitiesof theentities involvedin each relationship.2 Thesystemhasrules to deal with integrity constraintsandmappings between fields within a form. Thefinal output is expressed as an E-R diagram. Tlhe EDDS knowledgebaseis divided into four sections:(1) formabstractioncontainsthe form schema andformfield flows; (2) designdatabase

Database Design

Table 1. Knowledge-Based Systemsfor DatabaseDesign System

Description

Designers

Emulatesan interviewing processin order to Kawaguchi, et al. (1986) extract from endusersinformationrequirements that are representedby frames. VCS

Elicits user views from an end user and createsa relational model.

Storey(1988); Storeyand Goldstein(1988)

CARS

Produces a preliminary user view from previouslyspecifiedentities andrelationships.

Demo andTilli

Modeller

(1986)

Creates an E-R modelfrom a user’s descrip- Tauzovich (1989) tion of anapplication.

EDDS

Creates a conceptual schemaby incrementally integratingrelatedforms.

Choobineh, et al. (1988)

CAERM

Assists databasedesigners by using its own knowledgeand makingjudgmentsabout the appropriatenessof the user’s model.

Hawryszkiewycz (1985)

AVIS

Automates the integration of individual user viewsinto a global conceptualdesign.

SECSI

Aids a designerin translating a naive description of an application into a logical design.

E-RTranslator Usesa semanticnetworkand Prolog to translate an E-Rdiagraminto a logical design.

Wagner (1989) Bouzeghoub, et al. (1985) Briand,et al. (1985)

GESDD

Acts as a shell for obtaining and using Dogac, et al. (1989) databasedesign methodologies.

CHRIS

Assists in the design of information systems that have a databasecomponent.

Furtado,et al. (1988)

EXIS

Assists in the design of a conceptualschema Yasdi and Ziarko (1987) using machine-learningtechniques.

OICSI

Generates a conceptual schemafrom a description of an applicationdomaingiven in the Frenchnatural language.

Priox and Rolland(1988)

contains the evolving schemadiagram (E-R diagram);(3) designstatus recordsthe current status andpast designdecisions; and (4) data design knowledgebase contains the database designrules.

designinformationthat alreadyexists within an organization.

Thesignificanceof this systemis that it recognizesformsas beinga useful sourceof database

The Computer-Aidfor E-R Modelling (CAERM) (Hawryszkiewycz, 1985)assists databasedesign-

Computer-aid for E-R modelling

MIS Quarterly/March 1993 29

Database Design

ers in developingE-Rmodelsusing a natural languagedialogue. Froma designer’s input aboutobjects(entities, relationships,etc.), the systemtries to constructan E-Rmodelusingits own knowledgeabout E-R modelling to make judgmentsabout the "well-formedness"of the user’s model. Basedon these judgments,the system then either suggests changesto the modelor asksthe user for additional input. The system comprises seven modules and operatesin twomodes. In the first mode,the user enters the modelwithout any initial checking; thus, the modelmaynot be well-formedwhenthe designeris finished. AnEXAMINE moduleis then usedto identify anyill-structured components. It interacts with other modules, such as CORE (checksfor non well-formedness),BUILD(makes entries into an E-Rmodel), and SUGGEST (suggests changesto the user). TheAMEND module is used to makecorrections. Changes madeby AMEND can be simple changes(e.g., movingan attribute fromoneobject to another,changing an attribute’s or an object’s name)or structural changes (e.g., convertingan attribute to anentity, or making an existingentity a subsetof a new entity). In the secondmodeof operation, the ANALYST moduleis used to suggest modelling constructs as the data are described, and the model is checked as it is entered.Asa result, the modelis well-formed uponcompletion. Themodularityof the systemand its different modes of operationmakethe systemflexible in its useby different levels of users.Anotherinterestingaspectof this system is the set of rules it contains for identifying design errors and suggestingcorrections.

Automatedview integration

system

TheAutomatedViewIntegration System(AVIS) integrates user viewsin E-Rforminto a single conceptual schema(Wagner,1989). AVISexaminestwo viewsat a time. When morethan two viewsare to be integrated, the first two are chosen arbitrarily. Then,the result of this procedureis integratedwith a third view,andso on, until all viewshavebeenintegrated. AVISworksby sequentially comparingeachobject in oneviewto heuristically selectedobjects in anotherbasedon four characteristics: name, construct,context, andmeaning (whichis a surrogatefor all otherrelevantcharacteristics).One

30

MISQuarterly~March1993

or moreconflicts maybe identified: nameconflicts (e.g., an entity is called Employee in one viewandWorkerin another);constructconflicts (e.g., Marriage is anentity in oneviewanda relationship between two Personentities in another); contextconflicts (e.g., "grade"is anattribute of Studentin oneview andof the relationship betweenstudentand coursein another); andmean#~gconflicts (e.g., grosspayis "salary" in one view and"take-home pay" in another). Notethat the first threeof theseconflicts dealwith syntax and canbe evaluatedby the systemalone, while the last dependson someunderstandingof the semanticsof the application, and, hence,requires assistancefroma human (e.g., a data administrator). In combination, thesefour general typescanyield 17distinct andmeaningfultypes of conflicts (Wagner, 1989). AVlS’s knowledgebase includes rules for recognizinginter-view conflicts, resolvingconflicts, andsequencing its searchandresolution procedures.For eachobject O1in view-l, AVIS searches for anidentical object 02in view-2.If it doesnotfind one,it infersthat 02is eithermissing or represented by an object that hasthe same meaning but is different alongoneor moreof its other dimensions.AVISusespattern-matching to identify likely candidates onthe basisof their construct,context,andattributes(if appropriate). rhe systemcontains a set of 11 solution proc.edures,whichhavebeenshownto be sufficient to resolveall possibleinter-viewconflicts.

SECSI SECSI(Systemeen Conceptionde Syst~me d’lnformations)(Bouzeghoub, et al., 1985)assumes that the viewspecificationandintegration step has beencompleted and concentrateson logical schema design. Thesystemstarts from a naive description of an application that maybe expressed using either: a formal, declarative language; a restricted subsetof naturallanguage; or a graphicalinterface. SECSI then engages the user in a dialogueto obtain the necessary input for a logical databasedesign. Fromthe user’s initial descriptionof his or her application,the systemtries to detect and resolve (with the assistanceof the user) inconsistencies,suchas synonyms and homonyms. Eventually, a semantic networkrepresentingthe applicationis produced. The nodes of the semantic network represententities and attributes, or their in-

Database Design

stances. They are connected by different categoriesof arcs: aggregation,classification, generalization, and equivalence. Database designrules, including a normalizationprocess, are thenappliedto this representation, anda set of normalized relationsis obtained.Theresulting outputconsistsof: 4NFrelationswith their keys; virtual relations(views);anddomain, referential, andinclusion constraints. SECSIinteracts with end users who are knowledgeable abouttheir application domains. A separateinterface is providedfor database designexpertsto createandmodifythe system’s rules. Provisionsare providedfor the designer to incorporatespecific knowledge aboutan application (althoughit is not clear howextensively this canbe used). Thedistinguishingfeatures of the SECSI system are its capability for dynamicmodification of designrules, incorporationof dataabstractions, andinclusionof semantic integrity constraintsas part of its output.

E-Rtranslator TheE-RTranslatoris a systemfor translating an entity-relationshipdiagram into a logical database design(Briand,et al., 1985).A semantic network andPrologare usedto translate an E-Rdiagram into networkand relational designs.Thesemantic network is usedas a meta-schema, i.e., a highlevel representation of anapplication. Its value lies in thefact that different, but equivalent, E-R diagramswill producethe samemeta-schema. Withinthe semanticnetwork,a noderepresents anentity, an arc (directedor undirected)representsa relationship,anda dot ona line indicates anexclusive-orlink. TheE-Rdiagram is modelled by a set of Prologpredicates.For example,an entity attribute is expressed as "entity-attribute (x, y, z)" where x is theentityset, y is anattribute of the entity set, andz indicateswhether the attribute is a key or a non-key.For example: entity-attribute (product,product-number, key) Similarly, predicatesare usedto representrelationship sets, relationship attributes, andthe dimension of a relationship, e.g.: dimension-relationship (warehouse-productorder, 3)

Notethat this providesa mechanism for capturing a non-binaryrelationship. This systemclearly illustrates howan E-Rmodel canbe thoughtof as a special caseof a semantic network.

Generalizedexpert systemfor databasedesign The Generalized Expert Systemfor Database Design(GESDD) (Dogac,et al., 1989)is actually composed of two expert systems--theExpert Systemfor Generating Methodologies(ESGM) and the Expert Systemfor DatabaseDesign (ESDD). ESGMis basically an automated knowledge acquisitiontool that allowsa database designexpert to specify newdesignmethodologies (or modify existing ones)oTodo so, the designerspecifies procedures for requirements specification, conceptualdesign, and logical design;however,it appears that mostof the effort is concentrated onthe logical designphase. ESDD is essentially a shell that is usedin conjunction with oneof the methodologies to produce a database designstarting fromthe requirements specification phaseand producing a logical schema for either a hierarchical, network,or relational data model. Whenusing ESGM,the database designer specifiesa methodology in termsof sets of concepts,fields, rules, andtheir inter-relationships. Concepts are the basicobjectsthat are dealt with at eachphaseof the designprocess.For a requirementsspecification methodology,"realworld object" and "interrelationship" are examplesof concepts. For a conceptual design methodology basedon the E-Rmodel,relevant concepts are "entity," "attribute," and"transaction." Logical design concepts dependupon whichlogical data modelis used.Concepts relevant to the hierarchical modelare "record" and "parent," whereas "relation" is a conceptin the relational model.A field is a propertyof a concept and hasa uniquename."Object-name" and "entity-name" are fields at the requirements specification and conceptualdesign stages, respectively.Rulesconsistof, for example, field entrancerules definingconcepts (e.g., an object namemust be unique) and schemamapping rules that indicate howa conceptualdesignis transformedinto a logical one.

MIS Quarterly/March 1993 31

Database Design

In order to use ESDD,the databasedesigner selects a methodologyfrom the methodology library andcopiesit into his or her workspace, whereis it called a meta-schema. Thedatabase designer then provides information about the enterprise for the requirementsspecification phase.(ESDD doesnot provide any substantive supportfor this phase.)Thedesignprocessis then carried out according to the selected methodology. Thesystemhasa numberof rules that ensurethe conceptual schema is syntactically correct; for example, at least oneattribute of an object is a key. Semantic aspectsof the conceptual schema are checkedinteractively with the designer;for example,the systemsimulates an add, delete, or modifyoperationand allows the designerto compare the results to whatis expected.Transformation rules are appliedto producea relational, network,or hierarchicaldesign and a set of transactions that includes add, delete, modify, and check commands.

tions (for relationshipsthat are represented by foreignkeys);and(4) repeating-attribute relations (relations that are createdto resolvepreviously definedattributes that weremulti-valued). The last phase, interface generation, uses SQL/DS commands to create tables. Thesystem then allowsthe database to be populated.Informationfromthe integrity constraintsis mapped into twotypesof rules--restrictrulesthat indicate whento reject operationsand propagaterules that indicate whatextra operationsneedto be carried out. Sometimes this requiresinteraction with the user to supplyadditional valuesor to propagateinsertions and updates.

Themostsignificant featureof this system is its support for multiple designmethodologies.

Manyof CHRIS’rules deal with trying to ensure that the conceptual designis good.Toillustrate, whenspecifying an entity E, the systemchecks that E is-an F, F hasbeendefinedpreviously,E inherits all of theattributesof F, andthe keysof E are the sameas thoseof F. Therules also deal with constraints and propagation in is-a hierarchies.

CHRIS

Aninteresting feature of this systemis that it takes the design processall the wayto implementationof the actual database.

CHRIS (ConceptsandHints for Relational InterfaceSystems) (Furtado,et al., 1988)is as an pert tool to assist in the designof information systemsthat have a databasecomponent.The systemconsistsof three phases.Duringthe conceptual designphase,the systemextracts, from a designer,an extended entity-relationshipmodel of an applicationthat includesis-a hierarchies andconstraints.Theconstraintsindicate legitimateattribute typesandkeys,referential integrity, and manyothers. The systemmight ask a designerfor additionalinformationto ensurethat a transactionspecifiedby the useris corPect.For example,a transactionshouldbe rejected if it containsa valuefor anattributethat is notwithin the rangeof legal values;for entities in anis-a hierarchy,aninstanceof a subsetentity typecannot be inserted unlessthere is a corresponding insertioninto the superset entity type. Thesystem obtains this information througha rule-driven dialoguewith the user. Thelogical designphasemapsan E-Rmodelinto a relational model.It doesso by creating:(1) entity tables(onefor eachentity type);(2) relationshiptables (for relationshipsthat are represented by separaterelations); (3) extended entity rela-

32 MISQuarterly/March 1993

F..X/S EXIS(Yasdi and Ziarko, 1987) is a systemfor assisting a user in designing a conceptual schema.Thedatabasedesign procedurestarts froma descriptionof an application.Theconceptual schema is representedas a semanticnetworkthat includesvarioustypesof abstractions, namely classification, association,generalization, andaggregation.Cardinality and attribute domainconstraints are expressedin first-order logic. An event modelcaptures the dynamic behaviorof an applicationby defining preconditions that must be true before an event takes place; for example,hiring canonly refer to a person and must have a correspondinghiring date. T.he systemincorporates somemachinelearning mechanisms. For example,givena set of data (in the form of samplerows in a relational database),the systemtries to infer functional dependencies.Another objective is for the systemto acquiredesignrules over time. This approach involves storing sampledatabase designs in an "examplebase," along with the

Database Design

design rules that were used to generate the designs.Theserules are expressed in the form of predicates.Althoughsomeelements of the approach have been implemented, more work needsto be doneso that, eventually,the system will automaticallygeneralizedecisionrules and will recognizewhennewinformationrequiresexisting rules to be updated.

those by Reiner (1992), Loucopoulos and Theodoulidis(1992), and Bouzeghoub (1992).

Themostinterestingfeatureof this system is the useof formallogic to representa design.Also, current researchis underway to applymachinelearning techniquesto the designproblem.

The13 systemswere analyzedin terms of 12 dimensionsorganizedinto five categories as shown in Table2. Theresults of this analysisare summarized in Figure 2 and are discussedin moredetail below.

Frameworkfor Analyzing Knowledge-BasedDatabase DesignTools

OICSl OICSI(Proix and Rolland, 1988)is an expert system,written in Prolog,that generatesa conceptualschema froma descriptionof an application domaingivenby the user in a subsetof the Frenchnatural language.Thesystemstarts by interpreting a natural language problemdescription andproducingthe first versionof a semantic networkcorrespondingto an intermediate conceptual schema.The nodes and arcs are divided into varioustypes that capturesomeof the semanticsof the application. Examples of nodesinclude: object types(e.g., "customer~’); operationtypes(e.g., "updateof stock"); event types(e.g., "arrival of orders"); or conditions (e.g., "balancedue$100").Arcsare dividedinto thosethat express:(1) binaryrelationships(including generalizationand association)between twoobjects;(2) cardinalitiesof relationships;(3) operationtypes(e.g., update);and(4) triggers (e.g., arrival of productimpliesupdateof stock). The systemapplies its knowledgeof database designto the semanticnetworkin order to check for inconsistencies,homonyms, andincompleteness.A normalization step thentakesplace. OICSl’s rule baseis dividedinto five types:(1) natural language analysisrules; (2) interpretationrules; (3) structuralrules; (4) validationrules; and dialoguerules. Theinterestingaspectsof this systemincludeits useof naturallanguage andcategorization of different types of arcs andnodesin the semantic network.

Other research This article reviewsa representativesampleof existing systems.Otherrelated surveysinclude

Knowledge base Thesystems’knowledgebasesare analyzedin. termsof their representationmethods,sources of knowledge,organization of knowledge,and evolvability. The most widely used approachto knowledge representationis productionrules (for example, CARS,EDDS,AVIS, GESDD,CHRIS).3 There are, however,someinteresting variations. The knowledge baseof the ViewCreationSystemincludesproceduralrules (for example,an entity mustbe identified beforeit canparticipatein a relationship)as well as productionrules (for example,if a relationshipis many-to-many, thenask the userto identify anyrelationshipattributes). CARS includes both meta-level and "ground" productionrules. Themeta-levelrules controlthe structure of the global design process. They causethe systemto focus on the subsetof the groundrules appropriateto eachtask, thereby reducing unnecessaryprocessing. The ground rules examine a relationshipto be addedandindicate howa view shouldbe updatedto incorporatethe information.Theserules are intended to reflect the investigationsthat anexpertwould performin orderto deducea set of possibleactions to be performed.Modelleremploysan extendedproductionrule format; for example,it includes a "causes"clause. Someother knowledgerepresentation techniquesare also found.12Susesframesfor planning. Although planning conceptshave been appliedto otherartificial intelligencetasks(e.g., Barr and Feigenbaum (1981)), this is the only systemreviewedthat attemptsto do so for the databasedesignprocess.Semanticnetworksare employedby SECSIand OICSIand, of course,

MIS Quarterly~March 1993 33

Database Design

Table 2. Analysis Framework

1. KnowledgeBase ¯ Knowledge representationmethod:e.g., production rules, frames, semanticnetworks. ¯ Sourcesof knowledgeembodied in the system: e.g., introspection, standardliterature (textbooks,etc.), experienced database designers,field or laboratorytesting. ¯ Organizationof knowledge embodied in the system:e.g., is the knowledgein the system categorized?If so, how? ¯ Evolution of knowlegebase: Canthe knowledgebaseevolve over time?" If so, how? Hasthis actually occurred? 2. Effectiveness ¯ Howmuchsupport does the systemprovide fo=r databasedesign? ¯ Hasthe systembeentested? Hasit applied to real-world problems? ¯ Doesthe systemmeetSzolovits’ (1986)definition of an expertsystem? Thatis, doesit work the waya humandesigner works, or does it work as well as an expert humandesigner? Is it designedto assist or replacea human designer? 3. DesignPhaseand Data Models ¯ For which phasesof the databasedesign processcan the systembe used? ¯ For which databasetechniquesand modelsis the systemsuited; for example,entityrelationship model,relational model,or semanticmodel? 4. Users ¯ Whoare the intendedusers of the system?Are lhey endusers or databaseprofessionals? ¯ Howmuchdatabaseknowledgeis the user expectedto have? 5. Application Domain ¯ Doesthe system’sknowledge basecontain information aboutparticular application areas, or is it completelydomain-independent?

the E-RTranslator. SECSI’ssemanticnetwork captures entities, attributes,variouskindsof relationships(aggregation, association,classification, generalization, andequivalence),and integrity constraints. Rulesand constraints are represented in SECSIby production rules. OICSI’s semantic network represents the database design part of the knowledge base. In the E-R Translator, semanticnetworksare used as a meta-schema from which E-R diagramsare extracted. In EXIS,both logic and semanticnetworksare employed as knowledge representation techniques. The rules contained within the semanticnetworkare usedto classify entities,

34 MIS Quarterly/March 1993

relationships, and certain data abstractions (generalizationand aggregation). The source of the knowledgeincorporated in mostsystemsappearsto be standardtextbook material, usually augmented with heuristics based on the personal experiences of the systems’developers.This observationsupports the viewof Benbasat andNault (1990)that much of the expert systemswork to date has been modelledafter an individual expertwhois often the designer of the expert system. The knowledgebase of the View Creation System also contains rules elicited from experienced databasedesigners.

DatabaseDesign

In all of the rule-based systems, it is possibleto identify groupsof rules related to particular aspectsof the databasedesigntask; for example, entity identification,cardinalitydetermination, andprimarykey selection. In addition, someof the systemsexplicitly categorizethe knowledge basecontentsin distinctive ways.12Sstores domain-specificrelationships onewouldexpect to find in a particular application.Thisis separated from the domain-independent knowledge about the interviewing process (questioning strategiesthat the systemusesto extract informationfrom a user that are intendedto correspondto those a human databasedesignerwould use wheninterviewing an end user), planning (howto searchfor unknown relationships that might be needed), and conceptual database design(howto translate a plan structureinto conceptual schema). Oneof the statedobjectives of future researchon I~S is to havea long-term memorythat would save knowledgeobtained fromindividual designsessions.Thiswouldallow the systemto skip someof the interactions with subsequent users in the samedomain.Themajor challenges will includehowto selectwhichpieces of information to retain and howto structure them.

cess). Thischaracterization of facts is uniqueto Modellerandassists in detectingpossibledesign errors. Thesystemalso distinguisheswhethera pieceof informationis system-generated or usergeneratedand usesthis to attach a degreeof confidenceto the results. (Moreconfidenceis placedon informationsupplied directly by the user.) EDDS,becauseit is based on forms, supplementsits designrules with a "form abstraction" knowledgebase containing the form schemas andform field flows. Therule set includesformselection rules for determiningthe order in whichthe formsare analyzed.

A distinctive feature of Modeller’s knowledge baseis the useof two taxonomies developed by the researchers.Oneclassifies termsrelevantto the E-Rmodel;the other stores facts aboutthe problembeing solved. Thetaxonomy that deals with E-Rmodelconstructshasentities, relationships, properties,domains, andconstraintsat its top level. Eachof theseis thenfurtherclassified. For example, relationshipsare classified according to:

Effectiveness

1. Type:ordinary(e.g., Course as characteristic of Department), primitive (specialization generalization), or virtual 2. Cardinality (i.e., one-to-one,one-to-many, many-to-many) 3. Scope:internal-internal (e.g., a recursive relationship, suchas the "bill-of-material" relationship)or internal-external 4. Subclass(e.g., specialization) Thetaxonomy that dealswith problemfacts includes modelfacts (components of the conceptual modelbeing developed)and environment facts (recordingthe state of the modellingpro-

Onlyfour of the systems reviewed appearto allow the knowledgebase to evolve over time. The View Creation System’s knowledgebase was significantly modified manually during the system’stesting program,which involved real users workingon real problems.Both SECSI and GESDD allow their knowledgebasesto be updated by (expert) database designers. The knowledgebase of EXISevolvesby "learning" throughuse.

Thecriteria for effectivenessinclude: (1) the degreeof supportthat the systemprovidesfor database design;(2) the extentto whichits performance hasbeenv~rified throughtesting; (3) and whether it qualifies as a true "expert system." First, in termsof the degreeof supportprovided, the systemsvary significantly both in comprehensiveness (howmuchof the total design processthey support)andcapability (the quality of support provided). The comprehensiveness dimension is treatedseparatelyin the next section. In this section, the capability dimension is emphasized. Mostof the systemsdeal with the basic constructs of the E-Rmodel. 12Sis a notableexceptionin that it concentrates on relationshipsbut doesnot provideanyassistancein identifying entities. Some systems"understand" and provide support for incorporating data abstractions, and two systems, SECSIand CHRIS, providea mechanism for enteringintegrity constraints.CHRIS is uniquein that it supports taking the designprocessall the waythroughimplementation to the point wheresample(or real)

MIS Quarterly~March 1993 35

Knowledge System 12S

Base

Representation Method Sources Rules& Frames

VCS

Rules

CARS

Rules

Modeller

Rules

EDDS

Rules

CAERM

Rules

Effectiveness Evolution

FunctionalityTesting

Design "Expert" System

Phase

Requirements Conceptual SpecificationMorielling

Users Logical Modelling

AVIS

Rules& Algorithms SECSI Rules& Semantic Nets ER-Translator Rules& Semantic Nets GESDD

Rules Rules

EXlS OICSI

Minimal Limited Strong Figure

2. Comparison of Knowledge-Based Database Design Systems

End Users

Domain Database IndepenProfessionalsdence

Database Design

data canbe usedto populatethe tables of the logical design. Some systemsare limited in termsof the sources of information they can use. EDDS relies on formsas a specificationof the users’ information requirements.Althoughforms wouldappearto be a useful sourceof suchinformation,they are unlikely to becomplete. In general,therefore,the designermustaugment the information obtained froma form-based analysisto obtain a workable design. TheComputer-Aid for E-RModellingchecksfor problems as guidedby its heuristics. It always seems to create separateentities to represent relationships,eventhoughthis is not necessary for one-to-oneor one-to-many relationships. The system’sheuristics probablydo not cover all aspectsof E-Rmodelling; for example,they do not appearto deal with cardinalities. Theydo, however, check for manyimportant aspects. AVIScorrectlyidentifies andresolvesall naming, construct, and context conflicts betweentwo views.It relies heavilyuponthe human designer, however,to identify andresolve meaningconflicts. TheE-RTranslatortranslatesanE-Rmodel into a relational or networkschema. Thesystem assumes the accuracyof the input E-Rmodeland doesnot containanyrulesor heuristicsfor detecting potentialdesignproblems. This is oneof the few systemsthat allows for the easy, explicit specificationof non-binaryrelationships. 12S carries out a

number of interestingfunctions but cannotbe usedas a generaldatabase design tool because it dealsonly with relationships.In orderto recognize relationships,it relies heavily uponits dictionary of verbs (representedby frames).It doesnot recognizerelationshipsthat are expressedby nouns, such as "marriage." Furthermore, the systemdoesnot dealat all with attributes of entities. 12Srequiresthe user to selecta domain froma list in its dictionary.4 The restriction to domainsrepresented in the system’sdictionary reducesthe range of applicability of this system.It does,of course,make the systemappearmore"intelligent" in the domainsfor whichit hasprior knowledge. 12Sis intendedto "interview" a user both to learn about the user’s application and to designa conceptual schema for the user’s database.Although theseare worthyaims,they haveyet to be realized. Thesystemdoesnot appearto learn about

the interviewingprocessas it proceeds fromone user to the next, nor does it augmentits knowledgeof different domainsbased upon repeateduse. CARS is a simplerule-basedsystembasedonly on standarddatabasedesign methodologies.A stated objective is to allow modificationof the system’sbehaviorby human experts, but it appearsthat this wouldbe doneby direct alteration of the rules in the system’sknowledge base. EXISis capableof inferring functionaldependencies from data and learning about database designfromdoingit. It stores the results and design decisions in a knowledgebase. An interesting feature of OICSI is that it has capabilities for interpreting requirementsin natural language,althoughin a limited form,and transformingtheminto a conceptualmodel.The ViewCreationSystemelicits requirements,expressesthemas anentity-relationshipmodel,and thentranslatesthis into a normalized relational model. Of all the systems,GESDD provides the most limited supportfor the designtask. It doesnot appearto containanydatabase designexpertise of its own, other than a basic frameworkfor representing design methodologies.In this sense,it is morelike a shell combined with an automated knowledge acquisition tool (the Expert Systemfor Generating Methodologies--ESGM) than a true expertsystem.All actual expertise must be provided to GESDDby a human databasedesign expert. GESDD does have explanationcapabilities. However, the explanations that will be providedto a user during a design sessionmusthavebeenspecified by the design expert whocreated the methodology. Second,the testing dimensionis significant because the literature reportsverylittle testing to assessthe effectiveness of these systems. EDDS and AVIShavebeentested extensively on sampleproblemsbut have not beenapplied to real-world situations. TheViewCreationSystem hasbeentested on a number of real-worldproblemswith usersof varyingskills in a variety of typesof organizations.Althoughit wasoriginally intendedfor enduserswith no database design experience, it was found that somebasic knowledgeof database management greatly facilitated the designprocess (Storey,1988).This is likely to betrue for the othersystems as well, mostof whichdo not report any testing.

MIS Quarterly~March 1993 37

Database Design

Third, although most of these systems are characterized by their developersas "expert systems,"there is little evidenceto showthat theyincorporatetrue expertiseor performat the level of a humanexpert. The View Creation Systemis the possible exception becauseits knowledgebase was developed and refined throughextensiveconsultationwith experienced databasedesignersother than the developersof the system. It has also been tested more extensively than the other systems. Althoughthe researchersdonot claim to capture expertise explicitly, both 12Sand EDDS do emulate,to someextent, an expert designerin interesting ways.Thequestioningstrategiesthat 12Susesto extract informationfrom a user are intended to correspond to those a human databasedesignerwouldusein interviewing an enduser, as is the system’suseof an attention list to remember things that needto be "discussed"with the user. EDDS emulates,in part, the behaviorof a human designerwhowouldexamineformsas one sourceof information about the requirementsof a database. Manyof the systems(for exampleI~S, CARS, Modeller, EDDS,the Computer-Aid for E-R Modelling, SECSI,EXlS, and OICSI)appearto function best as "intelligent" assistantsrather than as replacementsfor the humandesigner. In fact, SECSland the Computer-Aidfor E-R Modellingwerespecifically intendedto do so. TheE-RTranslator, although claimedto be an expertsystem,is bestthoughtof as an automated tool. Similarly, in AVIS(as in manyothers), attemptwasmadeto incorporateexpertise from humandesigners into the knowledgebase. GESDD canonly operateas well as the designer whoprovidesthe knowledge for the system.The ambitiousobjective of the EXlSresearchis for the systemto acquireenough experienceso that it can replace a humandesigner.

Design phase and data models A systemis givena strongrating for a particular designphase in Figure2 if it providesa relatively completeapproachto that phase.A limited rating indicates secondary support.Mostof the systems address morethan one phaseof the databasedesign process. Thesystemsdevelopedprimarilyfor requirements specificationtend to proceedthroughthe remainingphasesto pro-

38 MIS Quarterly/March 1993

duce a logical databasedesign. Manyof the systemsfor conceptualdesign seemcontent to consideronly that phase. Only 12Sandthe ViewCreationSystemstart at the beginningwith requirementsspecification. Nearlyall of the rest assume that requirements are knownand begin with the conceptualmodelling phase. The E-R Translator and CHRIS assumea conceptual modelhas already been developed andconcentrateon convertingit into a logical design.Only the ViewCreationSystem effectively supportsall threephases. Both CARS and AVISdeal with view integration using the E-Rmodel,althoughAVISdoesso to a muchgreater extent than CARS.AVISwas davelopedspecifically to add viewintegration capability to a morecomprehensive database designaid. It is notsurprisingthat all of thesystems that deal with the conceptualdesign phaseare basedon someform of the entity-relationship model;the relational model is employed by all thosethat perferm logical design, although GESDD could be applied to any data model.

Users The"end user" columnin Figure 2 assessesthe suitability of eachsystemfor useby individuals with no databaseknowledge.Approximately half ot the systemswerejudgedto be usableby an enduser. Noneof these,however, is suitable for a user with no understanding of database concepts. Twoof the systems,I~S and the View Creation System, are specifically intendedto interact with end users to obtain input to the requirements specification process. Both employa systemdirecteddialoguethat is generallyconsidered appropriatefor dealingwith endusers. I~S allows a formof natural language input. This is often seen as an advantagefor inexperiencedusers. Unfortunately,the type of statementrequiredas input to 12Smaynot be very natural for many users. The View Creation System uses a question-and-answer dialogueinterface together wiith menusand somesimple "structured English" expressionsfor relationships suchas "managersupervises employees." The system has beenfoundto function best for peoplewith someunderstandingof databaseconceptsand, therefore,includesa tutorial capability.

Database Design

Thedesignphasesafter requirements specification are moreeffectively performed by database professionals.The"databaseprofessional"columnin Figure2 assesses the level of supportprovided to an experienceddatabasedesigner. Systems givena strongrating providea veryhigh degreeof support,whereas thosewith a limited rating requiresignificantparticipationonthepart of the designer.As notedby severalresearchers (e.g., Bouzeghoub, et al., 1985;Demo andTilli, 1986;Storey, 1988), these systemsmight also serve as good tools for training database designers. For experienceddesigners, the advantagesof these systemsare to confirmtheir ownjudgmentandto relieve themof someof the moreroutine tasks. Someof the systems can only be used by database professionals(notably CARS, Modeller, EDDS,AVIS, E-R Translator, and CHRIS). Modeller,for example is explicitly intendedto act as a consultantto database designers.It plays a relatively passiverole, allowing the user to develophis or her ideas, andprovidesassistance only whenasked to do so. BecauseAVISaddresses specificallythe viewintegrationtask, its user is mostlikely to be a data administrator whoseresponsibility spansthe variousapplication domains.TheAVISuser interface employs windowscontaining: (1) each view being integrated;(2) queriesposedby AVISto the user; and (3) user responses,whichmaybe multiplechoiceor free-form. Thesystemhasno graphical capabilities, so the viewwindows simplylist entities, relationships,andattributes, rather than depict themas E- R diagrams.

Applicationdomain All of the systemsare application-domainindependent with the exceptionof 12S, whichasks the user to start a designsessionby selecting an applicationdomainfroma set that the system already"knows" about.It is not clear if, andhow, the systemdealswith applicationsof whichit has no prior knowledge.TheViewCreationSystem, although containing no prior knowledgeabout specific domains,hasa limited capability for learning aboutattributes that are commonly used as keysin a particularapplication.Similarly, the SECSI systemhas a facility to allow a designer to provide user-specific and domain-specific knowledge.AlthoughCARS is domain-indepen-

dent, there are plannedextensionsto incorporate domain-specific information.Anotherobjectiveis for the systemto gainknowledge as it is applied to various real-worldproblems.

Development and Research Guidelines A numberof experimental knowledge-based systems for database design have been reviewed, employing various techniques for knowledgerepresentationand reasoning. This sectionattempts to distill outof this earlyresearch someguidancefor future development efforts. Similarto the analysisin the precedingsection, this discussionis basedon the frameworkpreviously introduced.

Know~edge base It seemsclear that there shouldbe moreuseof experienceddatabasedesignersas sourcesof knowledge for thesesystems. In addition, the application of the systems to actualproblems, in the laboratoryor the real world, wouldalso help to validate the existing knowledge basesor identify areasrequiring modification. Variousresearchershaveidentified the needto capturemoreof the meaning of the information stored in a databasethrough the use of data abstractions(see, for example,Brodie, 1986; Meersman, 1988). Examples of data abstraction include generalization(Smithand Smith, 1977; Teorey,et al., 1986), aggregation(Smith and Smith,1977),andassociation(Brodie,1981). moderndatabasedesign methodology,whether manualor automated,should havecapabilities for capturingand incorporatingthese abstractions into a design.Research by DavisandBonnell (1989)hastried to capturemuch moreof the semantics of the is-a relationshipby performing an exhaustiveenumeration of all optional and mandatory subtype-supertype relationships betweentwo entity types. Othertypesof semantic relationshipshavebeenidentified by researchers in areassuchas linguistics andcognitivescience (VVinston,et al., 1987).Theimpacton database designof a portion of these(called meronymic, or part-whole)relationshipsis exploredby Storey (1991b;1993a).Provisions could also be made for capturingand incorporatingthese semantic relationships into databasedesignsystems.

MIS Quarterly~March 1993 39

DatabaseDesign

Human databasedesignersapply to their work whateverknowledge of the design processthey havegained throughtraining and experience. This is the type of knowledge capturedby most of the systemsreviewedin this article. Human designers, however,also apply their general knowledgeof the world and any specialized knowledge they mayhaveobtainedfrom previous assignments in the sameapplication area. For example,mentioninga "university" to a human databasedesigner immediately suggests the needfor entities suchas Student,Course,Department,andso forth. This knowledge allowsthe designerto be an active participant alongwith the user in the designprocessandsignificantly eases the user’s task. Noneof the systems reviewed in this article, with the possibleexception of 12S,haveanysuchcapability, andit is an area that must be addressed before these systemsbecome a realistic alternative to human designers. Ongoingresearch in commonsense reasoningfor expert systems,as for the CYC project (Guha and Lenat, 1990; Lenat, et al., 1990),couldbe evaluatedand, possibly, 5 applied. CYC is an ambitiousattemptto automatean extremelylarge collection of knowledge aboutthe world in an attempt to overcomesomeof the "brittleness"problem (relatedto the fact that expert systems performwell in a relatively narrow domain,but their performance deterioratesrapidly outsidethat domain).Theobjectiveof the CYC projectis to build a singleintelligent agentwhose knowledge basecontainstens of millions of entries of generalworldknowledge. Thedevelopers suggestthat this systemcan then be usedas a resourceto be accessedby expert systems. With respectto knowledge representationtechniques,it appears that thereis nosingleonethat can be recommended over all others. Thereare, however,certain situations whereone might be moreappropriate than another. For example, becausethe entity-relationship diagramcanbe consideredas a special caseof a semanticnetwork,it mightbe bestfor thosesystems that are basedon someform of the E-R model to use semanticnetworksas a knowledgerepresentation scheme.The databasedesign task is an iterative one;therefore,if production rules areto be employed,there needsto be somewayfor themto "undo"previousdesign decisions. This

40 MISQuarterly/March 1993

maymean havingexplicit proceduralrules, as in the ViewCreationSystem,or "meta-level"rules, as found in CARS. Systemdevelopersshould recognizethat it is veryunlikely that the initial specificationof a knowledge basewill beentirely completeandcorrect. Specializeddomainknowledge will virtually have to be acquired as the systemis used (Kawaguchi, et al., 1986).Attention, therefore, shouldbe given to capabilities for augmenting and/or modifying the knowledgebase as the systemis used. Machine-learningtechniques (Michalski, 1987), suchas learning from exampies, learningfromanalogy,etc., needto be investigated andapplied.

Effectiveness Noneof the systemssurveyedcan completely replace a databasedesigner. Their mainshortco=ming is that they lack both adequatemechanisms for capturing and reasoning with the semantics of the real worldandadequate natural languageprocessingcapabilities. Thesethings are still muchbetter performed by humans. Thus, the best applicationfor thesesystemsis to act as a consultant to an (inexperienced)database designer. The systems’ value as consultants relies in their ability to: (1) "listen" attentively a user’s needs;(2) be consistentin the application of database designrules andheuristics; and (3) bereadily availablewhenever their expertise is needed. Because usersoften havedifficulty articulating their informationrequirements, it wouldbe helpful to employsourcesof input aboutthe application beyondthe traditional user interviews. A number of alternativesexist andshouldbe investigated. Onepossibility, as shownby the EDDS system, is an organization’sset of businessforms.They shouldhelp, at least, to identify importantdata elementsand someof the relationships among them.Analysisof current database queriescould also providevaluableinput aboutthe data that is neededand common patterns of accessingit (Mizoguchi, et al., 1985). Of the systems reviewed,the only onethat attemptsto do so is 12S. Anotherpossible sourceof input to the design processis the reverseengineeringof existing databases.A large numberof databasesare in use in organizationstoday. Mostof thesewere

Database Design

probablydeveloped withoutthe benefit of a formal conceptual modelor design methodology. Someresearch has been undertakenthat attemptsto reverseengineera conceptualmodel froma relational database design(Chiang,et al., 1992; Davis and Arora, 1988; Johannesson and Kalman,1989). This processcannotbe completely successful becausemuchinformation needed for a conceptual databasedesign is not present in the implemented database.However, reverseengineeringmayprovea valuablesource of input. Except for the E-R Translator, most of the systemsaccommodate only binary relationships. Althoughbinary relationships can representa large fraction of mostapplications, non-binary relationships do sometimes occur (e.g., Project requires Employeeswith Skills). Automated databasedesign systemsshould be extendedto captureand representthese morecomplexrelationships. In doingso, caremustbe takento obtain the minimumkey required to uniquely identify the relationship.Forexample, non-binary relationshipsare usuallyrepresented by newentity typeswhosekeysconsist of the concatenation of the keysof all the involvedentity types. This canresult in an unnaturalrepresentation and, possibly, an over-specifiedkey. Knowledge-based systemsfor databasedesign couldalso benefit greatly fromdevelopingmore capability for capturingandincorporatingsemantic integrity constraintsinto a database design. In thepast,integrityconstraints that areidentified duringthe conceptual phasehaveoften beenlost duringthe translationinto the logical phase(De Troyer, 1989).Part of the problemhasbeenthe lack of a formalizedmethodology for capturing semanticintegrity constraints, evenin manual databasedesign methodologies.Researchis needed to identify the different typesof integrity constraintsassociated withentities, relationships, andattributes. Then,for eachconstrainttype, a mechanismis needed that would enable a knowledge-based systemto both captureand incorporateit into the design.Yang(1992)provides a comprehensive analysis of the problemof capturing and representing semantic integrity constraints duringthe database designprocess. Testing is an area in which muchmoreworkis needed in orderto assessthe usefulness of these systems.Giventhe diversity of the approaches andthelack of testing,it is impossible to assess:

¯ Whetherone approach,in general, worksbetter than others ¯ Whichknowledgerepresentation techniques are mostappropriateand whetherit depends on the type of problem ¯ Whetherthe data modelsused influence the effectivenessof the systems ¯ Whethersomedesign phasesare moreamenable to automationthan others Testingcouldtake the following forms: Laboratoryexperimentsto comparethe performanceof a humanexpert to that of a knowledge-based design system Comparison of one systemto anotheron the samedesignproblem(This wouldhelp to identify where rules or heuristicsare missingin one systemthat could be adaptedfrom another. Work on developing consensusknowledge bases(e.g., Trice, 1991)mightalso beapplied to compare systems,independent of particular design problems.) Field studies to examinethe actual use of thesesystems,focusingon usability, usefulness,andthe typesof usersandapplications (However, it mustberecognized that all of the systemsdescribedhere are researchprototypes that are not yet readyfor commercial use.) A system satisfies Szolovitz’s(1986)definition an expert systemif it embodies knowledge obtainedfromreal expertsor performsat the level of a humanexpert. Noneof the systemshave beentestedsufficiently to qualify onthis performance criterion. Manyof themmerelyemploythe representationaland inferencingtechniquesof expert systemswithout necessarily embodying actual expertise. In order to developtrue "expert" systems, expansion of the knowledge basesandinput sourcesis needed,as discussed earlier in this section.

Design phase and data model Noneof the systemsdescribedhere deals with the entire databasedesign process,including viewintegration andincorporationof integrity constraints.Mostphases,though,are dealt with

MIS Quarterly/March 1993 41

Database Design

reasonablywell by at least oneof the systems. Thereare also somespecific, detailed design problemsthat have beenaddressedby various researchers.Ram (1989), for example, reports anexpertsystem for the specific taskof identifying functional dependencies. It shouldbe possible to combinethe techniques developedby these different researchers into a more comprehensivesystem. In addressingthe requirementsspecification phase,knowledge-based design systemsshould containrules andheuristics for testing the consistency of eachnewinput with that previously obtainedfromthe sameor other users. It would also be helpful to be able to recognize when somerequirementshave been omitted or when userinput is implausible.Thesedifficult-sounding tasksmaybegreatly facilitated by incorporating domainknowledgeand general world knowledge into the systems’knowledgebases. Animportanttask during this phaseconsists of verifying that the set of requirements providedis correct and complete.This task requires human judgmentand a goodunderstandingof the application. In traditional methodologies, it is often performedby the designer, whomaynot have extensiveknowledge of the application domain. With a knowledge-based design tool, however, it is likely that the designwill bedoneby anapplication specialist. Furthermore, the explanatory capabilities that should be included in these systemscan be used to verify aspects of the designthat might appearquestionable. In conceptual design,the mainproblemis choosing the correct modellingconstructfor eachrequirementobtainedfromthe user. Sincethis is the point whererequirements fromdifferent users are merged,it is also necessary to identify and resolve anyinter-view conflicts. An automated designtool shouldretain the detailedinformation obtainedduringrequirements elicitation to help resolve conflicts amongrequirementscoming from different users. Thetaxonomies developed by Tauzovich (1989)providea starting point for the development of in-depth techniquesfor conceptualdesign.It appears that moreuseful work could be donein this area. Anotherarea to be explored for conceptualdesign is the use of graphicalfacilities for input andoutput. Theprincipal task of the logical designphaseis the translation of conceptualmodelling con-

42 MIS Quarter/y/March 1993

structs to corresponding onesin the logical data model.Because there are effective algorithmic procedures for this translation, knowledge-based approaches are less importantfor this phasethan for thefirst two.Thisis thepoint, though, at which semantic integrity constraintsimplicit in the conceptualmodelmustbe represented explicitly. It shouldbe notedthat at this stageof the database designprocess,thereis little benefit to having direct end-userinvolvement.Therefore,tools should be designed to assist professional databasedesigners. A user-directed style of interactionis appropriate. Mostof the systemsdescribedare basedeither directly or indirectlyontheentity-relationship data model.It mightbe useful to explorethe useof otherdatamodels, particularly thosethat provide greater semanticrichness(Hull and King, 1987; Peckham and Maryanski, 1988).

Users Althoughthere are obviousadvantages of obtaining information requirementsdirectly fromthe enduser, it appearsthat neither of the systems focusingonrequirements specificationis entirely successfulin dealingwith this classof user. Futuresystems for requirement elicitation should consider that humandatabasedesigners often give naiveusersa tutorial onbasicdatabase conceptsat the beginning of a designproject(Storey, 1988). Theyfind that a user whois somewhat farniliar with the relevant conceptsand terminology is a muchmoreeffective participant in the designprocess.Database designtools could also adoptthis strategyby includingtutorials as part of the system.In fact, sucha feature may be particularly well-suited to a knowledge-based approach. Because conventionaldatabasedesignrelies so heavily on the useof diagrams,a graphicaluser interface would be advantageous. Manyresearchersallude to this andplan to incorporate graphicsinto their systemseventually.Thelack of graphical interfaces in current systemscan probablybeattributed primarily to the amount of work required to implementthemwith present tools. Tools to support graphical input could assist in minimizing connector lengths and overlap,clusteringentity typesto makea design morereadable,andmaintainingentity hierarchies andpositioning labels (Reiner, 1992).

DatabaseDesign

Application domain Someissues relevant to this topic have already been discussed under the knowledge base dimension becausethey are concernedwith incorporating domain-specific knowledgeinto a system’s knowledgebase. Onewould expect that a truly successful databasedesign expert system should be able to operate in any application domain and accumulatedomain-specific knowledge over time, muchas a humandesigner would. Such a system could provide useful suggestions to the user as a session progressed, thereby shortening and simplifying the development process.

Endnotes Storey (1988)is the published versionof thefirst author’s dissertation, fromthe Facultyof Commerce andBusiness Administration, University of BritishColumbia, Vancouver, British Columbia, 1988. Cardinalitiesrefers to the minimum andmaximum number of instances of anentitytypethatcanberelatedto a single instance of a relationship type. Readers whoareunfamiliarwiththe variousapproaches to knowledge representation arereferredto anintroductory book onexpertsystems, suchas PigfordandBaur(1990). It is notclearfromthe literatureexamined exactlyhowthe system dealswithanapplication domain for whichit does not have anyprior information. Some research currentlyunderway to doso is discussed in Goldstein andStorey(1991).

Conclusion This article has examinedsystems that apply knowledge-based technology to the database design task. Basedon the analysis, it is obvious that, although a great deal of workhas beendone in this area, the existing systemsare generally at quite an early stage of development.Most of them deal only with parts of the overall design task. In addition, they havebeenexposedto very little real-world,or evenlaboratory,testing. Before these systems can provide serious competition for human designers, they will require muchmore comprehensive stocks of knowledge, mechanisms for extending knowledge based on experience, and improved user interfaces employingthe graphics that are a standard part of manual database design methodologies.

Acknowledgements This research was supported by Xerox Corporation, the William E. SimonGraduate School of Business Administration, and by the Information SystemsResearchBureau, University of British Columbia. The authors wish to thank the editor, associate editor, and anonymous reviewers for helpful suggestionson an earlier version of this paper.

References Barr, A. and Feigenbaum, E.A. The Handbook of Artificial Intelligence, Vol. 1, WilliamKaufmannInc., Los Altos, CA, 1981. Batini, C., Lenzerini, M., and Navathe, S.B. "A Comparative Analysis of Methodologies for Database SchemaIntegration," ACMComputing Surveys (18:2), December1986, pp. 323-364. Benbasat,I. and Nault, B. "An Evaluation of Empirical Research in Managerial Support Technologies," Decision Support Systems(6), 1990, pp. 203-226. Bouzeghoub, M. "Using Expert Systems in SchemaDesign," in Conceptual Modeling, Databases, and CASE, P. Loucopoulos and R. Zicari (eds.), Wiley, NewYork, NY, 1992, pp. 465-487. Bouzeghoub, M., Gardarin, G., and Metais, E. "Database Design Tools: An Expert System Approach," Proceedingsof the 1 lth International Conferenceon Very Large Databases, A. Pirotte and Y. Vassilious, MorganKaufmann, San Mateo, CA, 1985, pp. 82-95. Briand, H., Habrias, H., Hue, J-F., and Simon, Y. "Expert System for Translating an E-R Diagraminto Databases," Proceedingsof the 4th Entity-Relationship Conference, NorthHolland, Amsterdam,1985, pp. 199-206. Brodie, M. "Association: A Database Abstraction" in Entity-Relationship Approach to Information Modeling and Analysis, P.P. Chen (ed.), North-Holland, Amsterdam,1981, pp. 583-608.

MIS Quarterly/March

1993 43

DatabaseDesign

Brodie, M. "DatabaseManagement: A Survey," in On Knowledge Base ManagementSystems, M.L. Brodieand J. Mylopoulos(eds.), Springer-Verlag, NewYork, NY, 1986, pp. 201-218. Carsnell, J.L. and Navathe, S.B. "SA-ER:A Methodology that Links Structured Analysis and Entity-Relationship Modeling for Database Design,"in Entity-Relationship,S. Spaccapietra (ed.), Elsevier Science Publishers, Amsterdam, 1987, pp. 381-397. Ceri, S. and Widom,J. "Deriving Production Rules for Incremental View Maintenance," Proceedingsof the 17th International Conference on Very Large Databases, G.M. Lohman,A. Sernadas,and R. Camps (eds.), Morgan Kaufmann, San Mateo, CA, September1991, pp. 577-589. Chen, P.P. "The Entity-Relationship Model: Towarda Unified Viewof Data," ACM Transactions on DatabaseSystems(1:1), March 1976,pp 9-36. Chiang, R.H.L., Barron, T., and Storey, V.C. "Reverse Engineering of Relational Databases: Extraction of an Extended EntityRelationship Model from a Relational Database," working paper, University of Rochester,Rochester,NY, 1992. Choobineh, J., Konsynski,B.R., Mannino,M.V., and Nunamaker,J.F. "An Expert System Basedon Forms," IEEESoftware Engineering (14:2), February1988,pp. 108-120. Davis, J.P. andBonnell, R.D."ModelingSemantics with ConceptAbstraction in the EARL DataModel," Proceedingsof the Eighth International Conference on Entity-Relationship Approach,Toronto, Canada,October 1989, pp. 102-117. Davis,K.H. andArora, A.K. "Convertinga Relational Database Modelinto anEntity-Relationship Model,"in Entity-RelationshipApproach, S.T. March(ed.), ElsevierSciencePublishers, Amsterdam,1988, pp. 271-285. Demo,B. and Tilli, M. "Expert SystemFunctionalities for Database DesignTools,"in Applications of Artificial Intelligence in EngineeringProblems:Proceedings of the Ist International Conference,D. Sriram and R. Adey(eds.), April 1986, Springer-Verlag, Berlin, 1986,pp. 1073-1082. DeTroyer,O. "RIDL*: A Tool for the ComputerAssistedEngineeringof LargeDatabasesin the Presence of Integrity Constraints,"Pro-

44 MISQuarterly/March 1993

ceedingsof the ACMSlGMOD International Conferenceon the Manage.ment of Data, Association for ComputingMachinery,New York, NY, June 1989, pp. 418-429. Dogac,A., Yuruten,B., and Spaccapietra, S. "A Generalized Expert Systemfor Database Design," IEEE Transactions on Software Engineering(15:4), April 1989,pp. 479-491. Furtado, A.L., Casanova, M.A., and Tucherman, L. "The CHRISCONSULTANT," in EntityRelationship Approach,S.T. March(ed.), Elsevier Science Publishers, Amsterdam, 1988,pp. 515-532. Goldstein, R.C. and Storey, V.C. "Commonsense Reasoningin DatabaseDesign," Proceedings of the lOth International Conference on the Entity-Relationship Approach, T. Teorey(ed.), North-Holland, Amsterdam,October 23-25, 1991,pp. 77-91. Guha,R.V. and Lenat, D.B. "CYC:A Midterm Report," TheAI Magazine (11:3), Fall 1990, pp. 32-59. Hawryszkiewycz, I.T. "A Computer-Aid for E-R Modelling," Proceedings of the 4th International Conference on the Entity-Relationship Approach,1985, Chicago,IL, pp. 64-69. Hsu, C., Perry, A., Bouziane,M., and Cheung, w. "TSER:A Data Modelling SystemUsing the Two-Stage Entity-RelationshipApproach," in Entity-RelationshipApproach,S.T. March (ed.), Elsevier SciencePublishers, Amsterdam,1988, pp. 497-514. Hull, R. andKing, R. "SemanticDatabase Modeling: Survey, Applications, and Research Issues," ACMComputingSurveys (19:3), September1987, pp. 201-260. Johannesson,P. and Kalman,K. "A Methodfor into ConcepTranslatingRelational Schema tual Schemas," in Proceedings of the Eighth International Conferenceon the EntityRelationshipApproach,F. Lochovsky(ed.), North-Holland, Amsterdam, 1989, pp. 279-293. Kawaguchi, A., Taoka, N., Mizoguchi, R., Yamaguchi,T., and Kakusho, O. "An Intelligent Interview Systemfor Conceptual Designof Database," ECAI’86: The7th EuropeanConference on Artificial Intelligence, Conference ServicesLtd., London,1986,pp. 1-7. Kerstern, M.L., Weigand,H., Dignum,F., and Boom, J. "A ConceptualModelling Expert System,"in Entity-RelationshipApproach,S.

DatabaseDesign

Spaccapietra (ed.), Elsevier Science Publishers, Amsterdam, 1987, pp. 35-48. Kozaczynski,W. and Lilien, L. "An Extended Entity-Relationship(E2R)Database Specification and its Automatic Verification and Transformationinto the Logical Relational Design,"in Entity-RelationshipApproach, S.T. March(ed.), Elsevier SciencePublishers, Amsterdam, 1988, pp. 533-549. Lenat,D.B., Guha,R.V., Pittman,K., Pratt, D., and Shepherd, M. "CYC: TowardPrograms With Common Sense," Communicationsof the ACM (33:8), August1990,pp. 30-49. Loucopoulos,P. and Theodoulidis, B. "CASE-Methodsand SupportTools," in Conceptual Modeling, Databases, and Case, P. Loucopoulos and R. Zicari (eds.), Wiley, New York, NY, 1992, pp. 373-388. Meersman, R.A. Dataand Knowledge (DS-2), in R. Meersmanand A.C. Sernadas(eds.), North-Holland,Amsterdam, 1988,pp. vii,xii. Michalski, R.S. "Learning Strategies and AutomatedKnowledge Acquisition: An Overview," in Computational Modelsof Learning L. Bolc (ed.), Springer-Verlag, NewYork, NY, 1987,pp. 1-19. Mizoguchi, R., Kobayashi,H., Isomoto, Y., Nomura, Y., Toyoda,J., andKakusho,O. "Interactive Synthesis of ConceptualSchema Basedon Queries," Journal of Information Processing (8:3), January1985,pp. 206-216. Mylopoulos,J. "Knowledge Representationand Databases," Proceedings of the Eighth International Joint Conference on Artificial Intelligence, A. Bundy(ed.), Wm.Kaufmann Inc., LosAltos, CA,1983,pp. 1199-1206. Narate,S.B., Elmasri, I., and Larson,J. "Integrating UserViewsin DatabaseDesign," IEEEComputer,January 1986, pp. 50-62. Obretenov, D., Angelov, Z., Mihaylov, J., Dishlieva,P., and Kirova, N. "A KnowledgeBased Approachto Relational Database Design," Data and KnowledgeEngineering (3:3), November 1988,pp. 173-180. Peckham, J. and Maryanski,F. "SemanticData Models," ACMComputingSurveys (20:3), September1988, pp. 153-189. Pigford, D.V. and Baur, G. Expert Systems for Business:Conceptsand Applications, Boyd & FraserPublishingCo., Cincinnati,OH,1990. Proix, C. andRollandC. "A Knowledge Basefor Information SystemDesign," in Data and Knowledge(DS-2), R.A. Meersman and A.C.

Sernadas (eds.), Elsevier SciencePublishers, Amsterdam, 1988, pp. 293-306. Ram,S. "An Expert Systemfor Deriving Functional Dependencies fromthe Entity Relationship Model," working paper, Departmentof Management InformationSystems,Collegeof Businessand Public Administration,University of Arizona, Tucson,AZ, 1989. Reiner, D. "DatabaseDesignTools," in Conceptual Database Design:AnEntity-Relationship Approach, C. Batini, S. Ceri andS.B.Navathe, (eds.), Benjamin/Cummings Publishing Company, Inc., Redwood City, CA, 1992, pp. 411-454. Smith, J.M. and Smith, D.C.P. "Database Abstractions: Aggregationand Generalization," ACMTransactions on Database Systems(2:2), June 1977,pp.105-133. Storey, V.C. ViewCreation: An Expert System for Database Design,InternationalCenterfor InformationTechnologies Press, Washington, DC,1988. Storey, V.C. "Relational Database DesignBased on the Entity-RelationshipModel," Dataand KnowledgeEngineering (7:1), November 1991a,pp. 47-83. Storey, V.C. "Meronymic Relationships," Journal of Database Administration(2:3), Summer 1991b,pp. 1-14. Storey, V.C. "Understanding SemanticRelationships," VeryLargeDataBasesJournal, forthcoming1993a. Storey, V.C. "A Selective Surveyof the Useof Artificial Intelligence for Database Design Systems,"Dataand Knowledge Engineering, forthcoming1993b. Storey, V.C. andGoldstein,R.C. "A Methodology for the Creation of User Views During DatabaseDesign," ACMTransactions on DatabaseSystems(13:3), September1988, pp. 305-338. Storey, V.C. and Goldstein, R.C. "KnowledgeBasedApproachesto DatabaseDesign," technical report, University of Rochester, Rochester,N¥, 1992. Szolovits, P. "Knowledge-Based Systems: A Survey," On KnowledgeBase Management Systems: IntegratingArtificial Intelligenceand DatabaseTechnologies,M.L. Brodie and J. Mylopoulos (eds.), Springer-Verlag, NewYork, NY, 1986, pp. 339-352. Tauzovich,B. "AnExpertSystemfor Conceptual DataModelling," Proceedings of the 8th In-

MIS Quarterly/March 1993 45

Database Design

ternational Conference on the EntityRelationship Approach,Toronto, Ontario, October1989, pp. 329-344. Teorey,T.J., Yang,D., andFry, J.P. "A Logical DesignMethodology for Relational Databases Using the ExtendedEntity-Relationship Model," ACM Computing Surveys(18:2), June 1986, pp. 197-222. Trice, A. and Davis, R. "Consensus Knowledge Acquisition," Proceedingsof the Seventh Banff Knowledge Acquisition Workshop, B.R. Gaines and J.H. Boose(eds.), S.R.D.G. Publications,Calgary,Alberta, October1991, pp. 20.1-20.2. Wagner,C. ViewIntegration in Database Design, unpublished doctoral dissertation, University of British Columbia,Vancouver, British Columbia,1989. Winston,M.E., Chaffin, R., and Herrmann, D. "A Taxonomy of Part-Whole Relations," Cognitive Science(11), 1987,pp. 417-444. Yang,H-L. IncorporatingSemantic Integrity Constraints in a DatabaseSchema, unpublished doctoraldissertation,Universityof British Columbia, Vancouver,British Columbia,1992. Yasdi, R. and Ziarko, W. "ConceptualSchema Design: A MachineLearning Approach,"in Methodologies for Intelligent Systems,Z.W. Ras and M. Zemankova(eds.), Elsevier Science Publishers, Amsterdam,1987, pp. 379-391.

About the Authors VedaC. Storey is assistant professor of computersandinformationsystemsat the WilliamE. SimonGraduateSchoolof BusinessAdministration, Universityof Rochester. Sheearneda B.S.

46 MISQuarterly/March 1993

from Mt. Allison University, Sackville, New Brunswick(1978), an M.B.A.from Queen’sUniversity, Kingston,Ontario(1980), andher Ph.D. in management information systemsfrom the Universityof British Columbia (1986).in addition, shereceivedin 1978her Associateof the Royal Conservatory of Musicfor flute performance from the Universityof Toronto.Herresearchinterests are in the areas of databasemanagement systernsandartificial intelligence.Professor Storey’s work has beenpublished in ACMTransactions on DatabaseSystems, Information Systems R~.~search,TheVery Large DataBaseJournal and Dataand Knowledge Engineering.Sheis the author of ViewCreation: An Expert Systemfor DatabaseDesign,a book basedon her doctoral dissertationandpublishedby ICIT Pressin 1988. RobertC. Goldsteinis associateprofessor and chairman of the ManagementInformation Systems Division in the Facultyof Commerce and BusinessAdministration at the University of British Columbia. Heholdsa B.S.in physicsfrom the Massachusetts Institute of Technology and a Ph.D.fromthe HarvardBusinessSchool.Prior to .joining UoB.C. in 1974,heheld positionswith IBM, the LawrenceLivermoreNational Laboratory of the U.S. AtomicEnergyCommission, and the Laboratoryfor Computer Scienceat M.I.T. ProfessorGoldstein’s researchhas generally beenin the areasof databasemanagement, expert systems,andcomputerprivacy issues. His currentresearchis concerned with increasingthe effectiveness and usability of expert systems through the incorporation of learning and common-sense reasoningcapabilities. His most recent book, DATABASE:Technology and Management, is publishedby JohnWiley & Sons.

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.