Query by diagram: A fully visual query system

May 22, 2017 | Autor: Michele Angelaccio | Categoria: Cognitive Science, Computer Software, Visual Languages

Descrição do Produto

QBD * : A Fully Visual Query System1

Michele Angelaccio°, Tiziana Catarci•, Giuseppe Santucci•

° Dipartimento di Ingegneria Elettronica Universita' degli Studi di Roma II "Tor Vergata" via O. Raimondo 38- 00173 Roma, Italy

•Dipartimento di Informatica e Sistemistica Universita' degli Studi di Roma "La Sapienza" Via Buonarroti 12 - 00185 Roma, Italy

1

Research partially supported by Istituto Centrale di Statistica (ISTAT) and by Progetto Finalizzato Sistemi Informatici e Calcolo Parallelo, National Research Council (CNR) Italy.

ABSTRACT

The need of a friendly man-machine interaction is becoming crucial for a large variety of applications, in particular, those requiring frequent extraction of information from the database. Experience suggests that traditional query languages are not friendly enough for the casual user: she/he is requested to formulate queries in a textual language, without any iconic or spatial clues to help the querying process. A new generation of languages (visual languages) has been recently investigated, that attempts to make extensive use of the person's instincts and senses. In this paper we propose a fully visual system, called Query by Diagram* (QBD*), which is based on a conceptual data model, a query language defined on this model and a graphical user interface. The main characteristics of the interface are the ease of use, and the availability of a rich set of primitives for both schema selection and query formulation. Unlike many present proposals of graphical query systems, graphical operations are formally defined from both a syntactic and a semantic point of view.

1. Introduction

A crucial point for the effective success of databases in organizations is the achieving of a friendly interaction of the user with the system. Experience suggests that traditional query languages ([11], [29], [16], [15], [22]) are not friendly enough for the casual user: she/he is requested to formulate queries in a textual language, without any iconic or spatial clues to help the querying process. Moreover, no mechanism is generally available in order to guide the user in the understanding of the database information content. A new generation of Query Languages (QL) has been recently investigated, that attempts to make extensive use of the person's instincts and senses. Recent approaches in the literature can be classified as follows: a) "intelligent" QL systems ([27], [18], [20], [24]) which use artificial intelligence techniques to make man-machine interaction more similar to communication among people; b) "graphical" QL systems ([13], [26], [28], [12], [10], [19], [7]) which use appropriate icons to represent objects, color, highlighting, and the natural "arrangement" of the objects. The system presented in this paper, called QBD* (Query by Diagram*), is located in the second class. QBD* is characterized by: a) a uniform graphical interface for both schema specification and query formulation; b) the use of an Entity-Relationship [9] Oriented Data Model; c) the availability of a relationally complete query language, whose expressive power allows also recursive queries. The main idea of the system is to provide the user with a large set of Graphical Primitives, in order to friendly extract the required information from the database schema. Such an idea stems from the observation that several difficulties may arise in dealing with previously unknown database schemas, such as: "underground" implicit information, analyze too much detailed schemas, need of restore preacquired information, etc. The available graphical primitives are formally described by means of a uniform mechanism:

a) the database is defined in terms of a pair , where G is a graph corresponding to the intentional level of the representation, and M (called interpretation) is a set-theoretical structure corresponding to the extensional level; b) the graphical primitives are functions mapping the database into a structure called the schema of interest, which is the subset of the database containing the concepts needed in the query formulation. The primitives are grouped into three classes, according to the type of objects they operate on: the graph (Location Primitives), the graph and the interpretation (Replacing Primitives), the interpretation (Navigational Primitives). Our approach allows the user to deal uniformly with the same graphical environment during all the phases of the interaction with the database (without textual intermediate). The data model is very expressive and independent from the implementation model. Moreover, the set of the expressible graphical queries includes the class of recursive queries. In [3] we show this property by defining a textual language isomorphic to the graphical one, and by evaluating its expressive power. The paper is organized as follows. In Section 2 we define the general architecture of the system. In Section 3 the definition of the data model is provided, together with some preliminaries needed for formally defining the graphical primitives, that are discussed in Section 4. In Section 5 we review the Query Language, extensively described in previous works [3], [6]. Finally, some conclusions are drawn in Section 6.

2. System architecture

In this section we present a general overview of the system, shown in fig 1: we assume that the DBMS is based on the relational model, although this is not a strict requirement. The general architecture of the system is based on three main modules:

1. The Graphical Interface enables the user to query the database. The query is expressed on an E-R schema, and is built using the graphical commands available in the interface.

2. The Translator analyzes the queries and, using the target (relational) schema, translates the nonrecursive queries into relational algebra expressions, and the recursive ones into programs in a host language. 3. The DBMS Interface translates the algebraic expression into a query formulated in terms of the target DBMS.

Fig 1: Architecture of the query environment

During the query formulation activity, the user is provided with a large set of facilities. In fact, the user may first interact with the conceptual schema to understand its information content by means of a topdown browsing mechanism; then, using the location primitives, she/he may extract the subschema containing the concepts involved in the query (schema of interest). During this phase, the user may perform transformations on the schema of interest, by means of replacing primitives, bringing it "close to the query"; in this way, a (temporary or permanent) user view is built, that may be helpful in the subsequent activities of query formulation. Finally, the user may resort to the navigational primitives in order to complete the specification of the query, defining all its procedural characteristics. A complex query may be expressed by means of a sequence of elementary operations, each one represented by a navigation or a selection on the conceptual schema; even typical textual operations (e.g. conditions on attributes) are reduced to easy graphical interactions. It is worth noting that the above interactions should be seen as a kit of tools, that will be chosen by the user according to his needs.

The activities involved in query formulation, described so far as logically distinct, can be interactively intermixed. For example, if during a navigation a concept is needed, which was not previously selected, it is possible to switch to the location phase, adding the symbol, and then going back to the navigation phase for resuming the query.

In synthesis, the activity of query formulation is composed of several phases, each one supported by a wide set of primitives (location, replacing, navigational). In the sequel, we focus on the location and replacing primitives, while navigational primitives, which will be mentioned in section 5, are described in detail in [3] [6].

3. Preliminaries

In this section we give the definitions that will be used in section 4 in order to formally describe the

syntax and semantics of the Graphical Primitives.

Let us state the following definitions: d = is the set of all domains (i.e. set of values) {D1, .., Dk} of interest. a = {a1, .., ah ) is the set of attribute names. val : a→ d is a function associating a domain to each attribute name. If A = {a1.. ak} is a subset of a , and x is an element of the cartesian product Val(a1) × .. × Val(ak), we write x[ai] to denote the component of x corresponding to ai, and we write x[A] for .

Definition (typed graph) A typed graph G is defined as a quadruple,< N(G), E(G), att , id>, where: N(G) is the set of nodes; E(G) is the set of edges, i.e. E(G) 1 N(G) × N(G); att : N(G) → 2a ; id : N(G) → 2a such that id(n)1 att(n), for all n [ N(G). If n is a node, then we say that att(n) is the type of n . If a [ a , then: n satisfies a iff a [ att(n). Let α and β be boolean expressions defined over a , we have: n satisfies α 3 β iff n satisfies α and n satisfies β; n satisfies α £ β iff n satisfies α or n satisfies β.

Definition (interpretation) An interpretation MG of a typed graph G is a mapping from N(G) to a set of value-sets (a value-set groups atomic values of a certain type). MG(N(G)) = {Mn | n [ N(G) and Mn = {x | x [ Val(a1) × .. × Val(ak) and ∪iai = att(n)}}.

Definition (Database Schema) A Database Schema S is defined by a pair: S = < GS, MS >, where GS is a typed graph and MS is an interpretation of GS. Moreover, G S and MS verify several properties. In particular, we are interested in the following properties: 1) N(GS) = EntS ∪ RelS ∪ GenS with EntS ≠ ∅, where: EntS is the set of entity nodes; RelS is the set of relationship nodes; GenS is the set of generalization nodes (subset and generalization hierarchies); 2) E(GS) ∩ EntS × EntS = ∅; 3) ∀ e [ EntS, ∀ x, y [ Me x ≠ y iff ¡ ai [ id(e) x[ai] ≠ y[ai] . 4) ∀r [ RelS , ∀e1 ..en [EntS such that (e1,r)..(en,r) [ E(GS), it holds that: id(r)= ∪id(ei). 5) ∀ g [ GenS , ∀ e1 ..en [ EntS such that (e1,g)..(en,g) [ E(GS) ¡ ef [ e1..en such that ∀ i [Mef 2 Mei 3 id(ef) 2 id(ei)]. We call ef the father in g. Definition (Function of Interest) We call Function of Interest any boolean function ranging on the set of nodes N(GS) of a database:

i : N(GS) → {0,1} The functions of interest represent the semantics of the selection operations performed by the user. Let us introduce three basic functions of interest. 1) If n is a node in the typed graph, then:

in (x) = 1 iff x = n 2) If m, n are two nodes in the typed graph, then:

im,n (x) = 1 iff x [ minpath (m,n) where minpath stands for a function computing the minimal path between two nodes in a graph (see [1]). 3) If α is a boolean expression defined on a set of attribute names, then:

iα (x) = 1 iff x satisfies α.

The above functions correspond respectively to three selection cases: picking up a concept in the schema (the node n in the typed graph); selecting two concepts and asking for the minimal connecting path; specifying a boolean expression α defined on a set of attribute names. Let i be a function of interest defined on a database schema S = < GS , MS >. We set: Ni (GS) = { n | i(n) = 1} Ei (GS) = { (n,m) | i(n) = 1 3 i(m) = 1}; [GS]i = [MS]i = {Mn | i(n) = 1} The above definitions are used to denote the subschema of S selected by i. The functions of interest may be generalized in order to allow for the automatic selection of the nodes adjacent to the originally selected ones. The resulting restrictions are: Ni (GS) = Ni (GS) ∪ { n | ¡ m such that i(m) = 1 and (n,m) [ E (GS)} ext

Ei (GS) = { (n,m) | i(n) = 1 £ i(m) = 1} ext [GS]i

ext

=

ext

(GS)}.

4. Building the Schema of Interest

In this section, after discussing the browsing mechanism, we give both the informal semantics of each graphical primitive, and the formal syntax and semantics. We also present an example illustrating the interaction of the user with the system. The example refers to a schema containing information about a university, and in particular about travels of professors and students.

Fig. 2 : Database Schema

We suppose the user is interested in the following query: "Find students taking part in a journey to Boston, which did not book any means of transport".

4.1 Top-down Browsing Mechanism

The first activity a user performs when interacting with a schema that she/he did not know previously, is understanding its information content. To simplify this activity, the system provides a library of topdown refinements for each schema, documenting the design at various levels of detail, and supports a top-down browsing mechanism. Top-down browsing may be used selecting specific objects or whole schemas, and navigating up and down among the different top-down levels of the schema in order to better locate the fundamental concepts and the links among them. It is worth noting, that the top-down browsing may be very helpful also when used in connection with the location techniques described in section 4.2.1; in this case the user may locate the desired concept in an intermediate version of the schema: such selection results into the automatic selection of all the concepts of the final schema derived from it.

We show in figg. 3 and 4 two intermediate levels of the schema in fig. 2; the top-down development of the concepts of the schemas is displayed by means of different filling.

Fig. 3: Database schema at the most abstract level Fig. 4: First refinement of the database schema

4.2 Graphical Primitives

The location and replacing primitives are two types of graphical primitives, defined as follows: Definition (Graphical Primitive) A graphical primitive (GP) p is a function from databases to databases. If S is a database p(S) is called Database Schema of Interest. Two basic classes of GPs are discussed in this section : Location primitives and Replacing primitives. The former are used to locate a collection of nodes on the basis of a given function of interest; they do

not alter neither the type of the nodes nor their interpretation; the latter replace a part of the database with a single node, which may have any type and interpretation. In the following, we assume the GPs applied to a database S = < GS , MS >.

4.2.1 Location Primitives

Location primitives are formally defined in the following way. Let S and i be a database and a function of interest respectively. Two types of location primitives are defined: Definition (Simple Location Primitive) A simple location primitive, w.r.t. i, denoted as loci , is defined as follows: loci (S) = < [GS]i,[MS]i > Definition (Enlarged Location Primitive) An enlarged location primitive, w.r.t. i, denoted as eloci , is defined as follows: eloci (S) = < [GS]i

ext

,[MS]i

ext

>.

Note that the Database Schema of Interest only depends on the function of interest without any reference to the type, and the interpretation of nodes. In other words, location primitives allow the incremental building of the schema of interest. For this purpose, the user may adopt three different strategies:

1) Direct extraction - The most immediate method of extracting the schema of interest, is to pick up the objects of the schema. This activity may produce a disconnected schema, since the schema of interest is an intermediate step for the construction of the query. In such a case, it is possible to ask the system to replace the schema of interest with the minimal connected schema (Steiner tree) covering the selected objects. 2) Expression of meta-queries on the schema - This strategy corresponds to querying the schema structure, and extracting from it meaningful information. For this purpose three methods are available:

a) Extraction by attributes - The user may select the concepts by putting some conditions on their properties. For example, the expression (NAME AND AGE) OR CITY will select all the concepts having either attributes NAME and AGE or attribute CITY in their attribute set. b) Extraction by path search - The system provides an algorithm which searches all the existing paths connecting the two concepts. The user may specify conditions on the length of the path and/or on the presence of particular concepts. Such paths are displayed to the user which decides wether include them in the schema of interest. c) Extraction using an "oil-stain" strategy - Once a symbol, or a group of connected symbols, has been selected, it is possible to enlarge the sub-schema of interest by an "oil-stain" expansion. For this purpose, the concepts which are close to the temporary schema are displayed to the user. 3) Use of a library of schemas - The system is able to manage libraries of tailored schemas: schemas (or selected parts) can be extracted and integrated with the current schema of interest, using a "cut-and-paste" strategy. Analogously, the user may add new schemas to the library.

All the described activities add concepts to the schema of interest: since uninteresting concepts could be included in the result, the system provides the user with a de-location mechanism in order to eliminate such concepts. It is worth noting, that deleting a concept may result in a schema restructuring, automatically performed by QBD*; for example, the elimination of a higher entity in a generalization causes its properties to be associated to every lower entity. We assume that in the previous phase of top-down browsing, the information content of the database has been fully comprehended by the user. So, the problem is now to extract the schema of interest needed in the query formulation.

The user may apply the location primitives in the following way:

•) the desired top-down refinement of the schema is selected (level 2 in the example); •) the system shows a set of icons, each one representing a possible location strategy;

•) the user points at the desired icon, in the example " Extraction by path search". At this point, the

starting and the ending concept (PERSON and CITY) are picked up;

moreover, TRIP is

requested to appear in the path. The schema resulting from the above operations is shown in fig. 5. Fig. 6 represents the schema of interest at the most detailed level; •) the user de-locates the uninteresting concepts (PROFESSOR and TEACHES), and selects, by means

of the direct extraction strategy, the entities RESERVATION and TICKET, and the relationships LIVES and IN, producing the schema in fig. 7.

Fig.5: Resulting schema at the second level Fig.6: Schema of interest at the most detailed level Fig.7: Final schema of interest

We may verify the correspondence between the graphical operations and the functions of interest by means of the above example (in the following we denote with RS = < G RS , MRS > the schema in fig. 2). We assume: - N(GRS) is the set of nodes labeled with uppercase words; - iPERSON,TRIP and iTRIP, CITY equal to the functions of interest involved in the "Extraction by pathsearch" in fig. 6 (note that the minimal path computed on the schema at the second refinement level, in fig. 5, automatically results, at the most detailed level, in the set of paths in fig. 6); - iRESERVATION, iTICKET, and iLIVES equal to the functions of interest involved in the "Direct extraction" in fig. 7.

At this point, the Database Schema of Interest is completely specified. Formally, it is defined as the composition of all the location primitives used: loci ° loci ° loci PERSON, TRIP

TRIP, CITY

(where ° is the composition operator).

4.2.2 Replacing Primitives

RESERVATION

° loci

TICKET

° loci

LIVES

(RS)

Once the schema of interest has been defined, the user is enabled to transform the schema itself into a new version, closer to the query expression. In this phase several replacing primitives are available. The main difference between replacing and location primitives is that the former allow for modifying the schema, possibly with loss of information content. In other words, the user may build a proper view of the schema, which is not isomorphic to any schema of interest resulting from applying location primitives.

Replacing primitives may be reversible or non-reversible. Reversible primitives preserve the information content of the extensional level of the database. For example, if a generalization hierarchy exists among entities PERSON, MAN and WOMAN, the user may want to simplify the schema by deleting the entity PERSON. Such an operation is reversible, because the extension of PERSON can be obtained by the union of MAN and WOMAN. On the other hand, the user may want to delete uninteresting information, producing a more concise schema. Operations of this kind are non-reversible. For example, if the user is interested in the states where persons are born, the attribute STATE of the entity CITY may be directly assigned to the entity PERSON, deleting the entity CITY and the relationship BORN.

A distinction must be made between primitives that operate on generalization hierarchies and other primitives. Inside a replacement area, primitives of the second type can be applied only when all generalization hierarchies have been eliminated. This choice is due to the need of formally defining the meaning of the replacing primitives, in terms of the semantics of the database schema. Therefore, we first give the semantics of the primitives concerning the replacing of generalization hierarchies. In order to specify the replacing primitives, we introduce the concept of Star(n), where n is a node. Star(n) is the set of nodes at distance 1 from n. Let g be a node in GenS, and let ef [ Star(g) be the father in g. Several strategies may be adopted for replacing the node g. We describe here the two extreme cases (other cases may be easily derived).

G1) g and the entity nodes e1.. en [ Star(g) are replaced with an entity node ef* such that: att(ef*) = att (e1) ∪..∪ att (en) | Mef* | = | Mef | ¢ x [ Mef* ¡! ek, ¡! y [ Mek such that ¢ ai [ att (ek) {x[ai] = y [ai] iff x[id(ef*)] = y [id(ek)]} G2) g is replaced with n relationship nodes r1.. rn, such that: ¢ e1.. en [ Star(g) ¡! (ef, ri), (ri,ei) [ E(GS)

Besides the replacing of hierarchies, other primitives may be applied in any order to entities and relationships, their essential characteristic is the fact that the Database Schema of Interest is obtained by replacing a portion of the schema by a single node. Such primitives are distinguished into monadic, and polyadic : 1. Monadic: only one node of GS is replaced, 2. Polyadic: a single node n together with Star(n) are replaced.

The monadic primitives allow adding or deleting an attribute and replacing one schema concept with another. Attributes can be added only when they refer to the information content of the schema; so, they are to be defined by means of a generic query on the schema itself. For example, in a schema with entities

PERSON

and

HOUSE,

linked

by

the

relationship

OWN,

an

attribute

NUMBER_OF_OWNED_HOUSES may be added to the entity PERSON. Concepts are replaced by other concepts specifying restriction conditions on their attributes. In this case, two types of monadic primitives are provided: M1) a node n is replaced with a node n' such that: att(n') = att(n) and Mn' 1 Mn M2) a node n is replaced with a node n' such that: att(n') 1 att(n) and |Mn' |= |Mn|

The polyadic primitives replace (with or without loss of information content) one central concept (n) linked to a generic number (m) of other neighbour concepts (star(n)), with a unique concept of the same kind of the neighbour concepts. Polyadic primitives fall under two categories: P1) an entity node e and the relationship nodes r1.. rn [ Star(e) are replaced with a relationship node r, such that: id(r) = id(e) ∪ id (r1) ∪..∪ id(rn) att(r) = att(e) ∪ att (r1) ∪..∪ att (rn) ¢ x [ Mr ¡ y [ Mri such that {x[att (ri) - id(ri)] = y[att (ri) - id(ri)] iff x[id(ri)] = y[id(ri)]} ¢ x [ Mr ¡ y [ Mri, ¡ z [ Mrj such that ¢ ai [ id(ri), ¢ aj [ id(rj) with i+j {x[ai] = y[ai] and x[aj] = z[aj]} ¢ x [ Mr ¡ y [ Mri, ¡ z [ Mrj such that ¢ ai [ id(ri)(id(e), ¢ aj [ id(rj)(id(e) {ai = aj and x[ai] = y[ai] = z[ai] iff ai [ id(e)}

P2) a relationship node r and the entity nodes e1.. en [ Star(e) are replaced with an entity node e, such that: id(e) = id(r) att(e) = att(r) ∪ att (e1)∪ ..∪ att (en) ¢ x [ Me ¢ ei ¡ y [ Mei such that {x[att (ei) - id(ei)] = y[att (ei) - id(ei)] iff x[id(ei)] = y[id(ei)]} ¢ x [ Me x[id(e)] = x[id(r)]

In particular, if n=2, two notable cases may occur: in the former, the path relationship-entityrelationship is transformed into a relationship; in the latter, the path entity-relationship-entity is transformed into an entity. We now describe the interaction of the user with the system, in particular referring to the schema in fig. 7:

•) the user points at the desired icon:

- Monadic case: the system first enables the user to select the concept to be replaced in the schema. Thereafter a window is shown, displaying the attributes of the concept. The "add attribute" case results in specifying the corresponding query. In the example, the user derives the attributes STARTING_CITY and ENDING_CITY for the entity TRIP, by means of the paths TRIP-TO-CITY, and TRIP-FROM-CITY. Similarly, the attributes LIVING_CITY for the entity PERSON, and TO_CITY for the entity TRAVEL are defined; - Poliadyc case: the user specifies the central concept in the part of the schema to be replaced, then the system focuses on the concepts involved in the transformation. In the example, the user collapses the hierarchy RESERVATION-TYPE-TICKET in the entity TICKET, and the hierarchy PERSON-KIND-STUDENT in the lower entity STUDENT.

The result of all previous replacing activities is shown in fig. 8. In this section the formal semantics has been provided only for the basic replacing primitives. All the other replacing primitives, informally described, may be derived from the composition of the basic ones. For example, adding the attribute LIVING_CITY to the entity PERSON may be realized by means of two steps: 1) the polyadic primitive P2 applied to the path PERSON-LIVES-CITY; 2) the monadic primitive M2 applied to the resulting entity, deleting all the attributes of CITY apart from CITY_NAME (renamed in LIVING_CITY). Generally, all the non-reversible primitives may be obtained by applying the primitives M1 and M2 to the result of the application of the other reversible primitives.

Fig.8: Schema of interest resulting from the application of the replacing primitives

5. Navigational Primitives

After the schema of interest building, the user may work on the interpretation by using the navigational primitives. Such navigational primitives constitute a graphical query language whose expressive power

allows recursion to be expressed. In this section we review such query language. The language is a visual navigation language on E-R schemas, represented by means of diagrams; the basic idea is to decompose a query into its elementary steps, expressed by a limited set of graphical operations, such as choice of icons, selection of concepts, and navigation. In order to formally express the syntax and the semantics of the language, a one-to-one correspondence between the graphical operations and the syntactic constructs of a textual query language is defined. The general structure of the query is based on the location of a distinguished concept, called main concept (entity or relationship), that can be seen as the entry point of one or more subqueries. Subqueries can be combined by means of the usual union and intersection operators. Such subqueries express possible navigations from the main concept to other concepts in the schema, in order to extract information from any number of different entities or relationships. Once the main concept has been chosen, two different types of primitives are available for selecting its instances. The first one (called FIND in the textual language) is based on the idea of following paths of linked concepts and establishing conditions on them; the second one (called BRIDGE) is used for joining two concepts without following schema links, just comparing their attributes. Such

two

primitives can be arbitrarily nested. If the previous constructs are applied to generalization hierarchies the following rules hold: if the query involves a child entity, this entity inherits all logical relationships and attributes coming from the parent entity, in addition to its own links. On the other hand, if the involved entity is the parent entity, it does not inherit the properties of child entities. In the following, we give the informal semantics of each graphical primitive, showing the interaction of the user with the system in the navigational phase, and referring to fig. 8. First, the user selects all the students taking part in the journey in Boston; after that, she/he extracts from the above set of students those which have completed the reservation of tickets. To find such students it is necessary first to select all booked trips, then to compute the transitive closure of the result, looking for the chains having the attribute FROM_CITY equal to the attribute LIVING_CITY, and the attribute TO_CITY equal to "Boston". The difference between the two previous student sets give the desired result, i.e. the set of students that have not completed the reservations.

The general sequence of the user operations is the following: •) selecting the icon corresponding to "navigation"; •) choosing the main concept of the query, by picking it up with the mouse (the entity STUDENT in

the example); •) extracting the desired instances from the main concept, specifying possible conditions on its

attributes. The list of attributes is shown in a separate window, containing also the elements involved in the comparison (i.e. constants, other attributes, etc.), and a set of icons suitable to formulate conditions on the attributes. Conditions are expressed selecting the attributes and the icon corresponding to the required operator (conditions on the attribute TO_CITY of the entity TRAVEL are visualized in fig.9). The system shows the result of the operation by displaying a graph, where vertices correspond to attributes, and labelled edges correspond to the operators; •) extracting the desired instances from the main concept, specifying conditions on different concepts.

This operation may be carried out either by picking up an existing path of concepts, or joining concepts, by comparing their attributes using the windowing mechanism described above. In the example the former mechanism is used, the first selected path is STUDENT-INVOLVEDTRAVEL, giving as result the set of students taking part in the journey in Boston.

The system manages differently the attributes belonging to the main concept with respect to the others. More precisely, the attributes of the main concept are automatically considered in the final result of the query (unless explicitly deleted by the user), while the other ones are shown in the result only if requested. The reason for that is twofold: first an unsuitable growth of the result is avoided; second, since we consider the main concept as the "heart" of the query, its attributes are given higher importance with respect to the otherones.

Fig.9: Conditions on the attribute TO_CITY of the entity TRAVEL

It is worth noting that QBD* allows performing non-first order queries by means of a simple graphical mechanism, which is the same underlying the BRIDGE operator. In fact, an operator computing the

generalized transitive closure (GTC) [21] is available (the GTC is a transitive closure of an entity, in which the cycle conditions are extended to be boolean expressions with equality and inequality operators). Such cycle conditions are equivalent to particular BRIDGE conditions where an entity is compared with itself, therefore the window used for the GTC looks like the one adopted in the BRIDGE definition, the difference being the pre-selection of a particular icon, that signals the beginning of a recursive session. In such a session the user may: •) select the desired entity; •) specify the cycle conditions on the attributes of the entity. In the example, the cycle condition is the

equality of the attributes FROM_CITY and TO_CITY (see fig. 10); •) specify a set of conditions in order to select a subset of the result of the transitive closure.

Notice that in this case it is possible to compare a single attribute value with a set of values, which can also refer to other attributes. The main idea is to visualize the generic step of recursion (corresponding to a cartesian product) displaying a double list of the same attributes; the user may specify the cycle conditions comparing the attributes with each other. Then, the system determines the attributes involved in the projection (shown in reverse in fig.10).

During the query formulation the following features are also available: •) building new attributes either by manipulating existing attributes, or by renaming them. For this

purpose a set of functions typically supported by advanced query languages, such as SQL [25], is available (COUNT, SUM, AVG, etc.), in addition to the mathematical operators. A new attribute may be defined by picking up a suitable icon, possibly using the primitives described above for referring to attributes belonging to other entities; •) storing queries, or parts of them, in a query library. Each query is stored together with a label,

automatically generated during the query formulation. Such a label, expressed in a restricted natural language, is used as an echo, to check the correctness of the query generated by the system. When the stored query is retrieved from the library, the system displays it as an entity

whose name is the label of the query. Such new entity may be linked to the schema either by means of the BRIDGE operator or by means of the set operators;

Fig.10: An example of transitive closure

•) selecting the icon corresponding to the set operators (union, difference, intersection), for composing

different subqueries (the operator "difference" is used in the example for computing the desired set of students).

6. Conclusions

We have presented the query system QBD*, that allows the user to interact with a database in a fully graphical way. Moreover, we have defined an Entity-Relationship Oriented Data Model within the context of a more general graph-based model. On the basis of such a model, we have formally described the syntax and the semantics of a set of graphical primitives, which have been implemented in the system. It is worth noting that the formalization mechanism is not strictly dependent on the particular graphical primitives of QBD*. Rather, it may be adapted in order to describe graphical primitives working on a generic diagrammatic representation.

A prototype of the QBD* has been developed at Dipartimento di Informatica e Sistemistica of the University of Rome "La Sapienza"; the C language is the development environment, operating system MS-DOS; the graphical interface is supported by the tool PAINTER, part of the MAST_ER Case tool by GESI s.r.l. [17], using algorithms for automatic layout described in [23].

Further research will be devoted to the extension of the system for dealing with statistical data. In this case, the query must allow the definition of aggregation and statistical operations; the graphical interface is based on the model defined in [5] .

Acknowledgements

We are grateful for the encouragement and support of Carlo Batini. We would like also to thank Maurizio Lenzerini for many extremely helpful suggestions, ideas, and comments.

References

1. A.V.Aho, J.E.Hopcroft and J.D.Ullman, The Design and Analysis of Computer Algorithms, Addison Wesley, 1974. 2. A.V.Aho and J.D.Ullman, Universality of Data Retrieval Language, Proc. of the 6th ACM SIGACT-SIPLAN Symposium on Principles of Programming Languages, 1979, pp. 110-120. 3. M.Angelaccio, T.Catarci and G.Santucci, QBD*: A Graphical Query Language with Recursion, Technical Report N.31.88 of the Dipartimento di Informatica e Sistemistica, Universita' di Roma "La Sapienza", dicembre 1988. 4. M.Angelaccio, T.Catarci and G.Santucci, QBD*: A Graphical Query Language with Recursion, Proc. of the 3rd Human-Computer Interaction Conference, Boston, 1989. 5. C.Batini and G.Di Battista, Design of Statistical Databases: a Methodology for the Conceptual Step, Information Systems, Vol.13, 4, 1988. 6. T.Catarci and G.Santucci, QBD: A Graphic Query System, Proc. of the 7th Entity-Relationship Conference, Rome, November 1988. 7. A.K.Chandra and D.Harel, Computable Queries for Relational Databases, Journal of Computer Sciences, Vol. 21, 1980, pp. 156-178. 8. Visual Languages, (S.K.Chang ed.), Plenum Publishing, New York, 1986. 9. P.P.Chen, The Entity-Relationship Model toward a Unified View of Data, ACM Transactions on Data Base Systems, 1976. 10. B.Czejdo and D.W.Embley, An approach to computation specification for an entity-relationship query language, Proc. of the 6th International Conference on Entity-Relationship Approach (S.March ed.), New York, 1987, pp. 307-322 ,1987. 11. C.J.Date, An Introduction to Database Systems, Vol.I, Addison-Wesley Publishing Company, 1987.

12. R.Elmasri and J.A.Larson, A Graphical Query Facility for ER Databases, Proc. of the 4th International Conference on Entity-Relationship Approach, Chicago, Illinois, 1985, pp. 236-245. 13. R.Elmasri and G.Wiederhold, GORDAS : a formal high-level query language for the entityrelationship model, Proc. of the 2nd International Conference on Entity-Relationship Approach, Washington, D.C., 1981, pp. 49-72. 14. M.Jarke and Y.Vassiliou, A Framework for Choosing a Database Query Language, ACM Computing Surveys, Vol.17, 3, 1985, pp. 313-340. 15. V.M.Markowitz, ERROL : an entity-relationship role oriented query language, Proc. of the 3rd Conference on Entity-Relationship Approach, 1983. 16. H.M.Markowitz, A.Malhotra, and D.P.Pazel, The ER and EAS Formalism for System Modeling, and the EAS-E Language, Entity Relationship Approach to Information Modeling and Analysis (P.P. Chen ed.), ER Institute, 1981, pp. 29-47. 17. MAST_ER User Manual, GESI Gestione Sistemi per l'Informatica, Roma, 1987. 18. A.Motro, Query generalization: A technique for handling query failure, Proc. of the 1st International Workshop on Expert Database Systems (L. Kershberg ed.), Island, October 1984, pp. 314-326. 19. T.R.Rogers and R.G.G.Cattell, Entity-Relationship Database User Interfaces, Proc. of the 6th International Conference on Entity-Relationship Approach (S.March ed.), New York 1987, pp. 323-336. 20. R.C.Schank and S.Slade, Advisory systems, Artificial Intelligence Applications for Business, W.Reitman (ed.), Norwood, N.J., 1984, pp. 249-265. 21. S.Sippu and E.Soisalon-Soininen , A Generalized Transitive Closure for Relational Queries, Proc. of the International Conference on Principle of Database Systems, 1988. 22. K.Subieta and M.Missala, Semantics of query languages for the entity relationship model, Proc. of the 6th International Conference on Entity-Relationship Approach (S.March ed.), New York 1987.

23. R. Tamassia, G. Di Battista, and C. Batini, Automatic Graph Drawing and Readability of Diagrams, IEEE Transactions on Systems, Men and Cybernetics, 1988. 24. F.N.Tou, M.D.Williams, R.E.Fikes, D.A.Enderson, and T.W.Malone, Rabbit: An intelligent database assistant, Proc. of the National Conference on Artificial Intelligence, Menlo Park, Calif., 1982, pp. 314-318. 25. J.D. Ullman, Principles of Database and Knpwledge-Base Systems, Computer Science Press, 1988. 26. H.K.T.Wong and I.Kuo, GUIDE : Graphical User Interface for Database Exploration, Proc. of the 8th VLDB Conference, Mexico City, 1982,pp. 22-31. 27. A.W.Woods, Natural language communication with machines: an ongoin goal, Artificial Intelligence Application for Business (W.Reitman ed.), N.J., 1984, pp. 195-209. 28. Z.Q.Zhang and A.O.Mendelzon, A Graphical Query Language for Entity-Relationship Databases, Proc. of 3rd International Conference on Entity-Relationship Approach, 1983. 29. M.M.Zloof, Query-by-Example; a database language, IBM Syst.Journal, 1977.

View publication stats

Lihat lebih banyak...

Query by diagram: A fully visual query system

Descrição do Produto

Comentários