A graph based knowledge retrieval system

Share Embed


Descrição do Produto

SIGDOC

Graph-Based

Retrieval Yuri

of Information

Quintana,

Mohamed

Department

of

Waterloo,

Current

hypertext

systems

to browsing),

them

farther

text

Retrieval

away

from

the information

System)

is comprised

means

users can get disoriented that

allows

of an authoring

they

users to retrieve

system,

a browser,

Waterloo

specific

in large hypertext

seek. This

paper

information

Lo

Engineering

Canada

for finding

Systems

Andrew

Design of

Ontario,

have no intelligent

Kamel,

Systems

University

(as opposed

in Hypertext

92

N2L

3G1

information.

documents

describes

an information

in hypertext

and a graph-based

When

searching

for specific

and may end up following retrieval

documents

information

based

retrieval

system

called

on its semantic

facility.

The

information

a path

that

takefs

HRS (Hypercontent.

graph-baaed

HR.S

retrieval

facility

allows users to retrieve specific information in hypertext documents by posing English language queries. The retrieval Graphs, a knowledge representation scheme. The English language queries posed is based on the use of Conceptual by users are automatically converted to Conceptual Graphs by a parser. The information in hypertext documents is also

facility

represented

using

by a semantic

Conceptual

baaed

Graphs.

search.

needs to find specific document. The paper

This

information concludes

both the browsing capabilities

1

Query

technology

processing is useful

and does not bow that natural language

of hypertext

is treated for retrieval

Hypertext

[1].

that

and the semantic search capabilities

formation

that

(GUI)

will

A user may view

HyperCardR

allow

that

users

Sys-

of nodes

and retrieval

knowledge

of natural

is performed

domains

language query processing.

to display

a hypertext or just

document

document

in hypertext

to address

such as Keyword

this

[15, 1], and Document ods requires

the in-

the hypertext

document.

in order

to

is usually

non-

documents

is

highly interconnected into a web-like pattern, and links allow users to selectively choose the order of nodes to be viewed.

by providing

tools

[12, 28], Recency

Markers

user to either

the

document

links

problem

Searching

the hypertext

to ezplore the contents

[22]. Browsing

since the information

have tried

the

connect

nodes and select hypertext nodes.

information

of a hypertext

links

its way

graphical

have

is comprised

and

systems

that

as Apple’s

document

in hypertext display other

find qwcijic linear

such

information

Hypertext

14, 25, 38] has made

applications

A hypertext

contain

nodes.

[10,

computer

user interfaces tem

in large

Current hypertext systems have no intelligent mearus of finding speci$c information. Many hypertext system:~

technology

many

matchingprocess,

where a user the organisation of the hypertext document or the words used in the retrieval of information in hypertext documents can provide users with

Introduction

into

as a graph of information

History

[4]. Each of these methknow

the words used in

or to know the organization

of

Keyword Search methods are only useful if the user has knowledge of the words used in the document. Evidence of this can be found in the evaluation of Superbook [12], a hypertext system with a rich indez keyword search facility.

Statistics

and asked to find pertext document

students

were given questions

answers to these questions in a hyon statistics. When questions con-

Hypertext has been shown to be a useful way to [15]. However, when searching for browse information

tained words that did not occur in the neighborhood of the target information, users took longer and tended to give up searching without finding the desired information more often than users with conventional docu-

specific

mentation.

information,

users can get disoriented

and may

is in relation to other nodes in the document. Some studies [12, 24] have shown that users may end up fol-

Recency History methods display a recent history of nodes visited. Sometimes recent nodes visited are displayed as a map [15]. Document Markers (sometimes

lowing

called

no longer have any indication

a path

that

of where the current

takes them

farther

node

away from

the

information they seek. Users can become frustrated and confused in the web of information. This situation has been called hyper-chdter [10] and it is especially prevalent when using large hypertext documents.

from

@

1992 ACM

089791 -533-W92/0010/0157

$1.50

anywhere

) can also be used to mark

users can quickly in the hypertext

History and Document Markers user is searching for information viously viewed information.

P~mcqytimtfea

mFdti&dkm+tibthe copies at. net reads or dkibutcd for dixect -acid advantsgs, rhe ACM wpyisht nodcx and the tide oftlu pMication md b date.w==. ~d n~= iS @UI tit COPY%~ by -on of the Aswcialim fa GxqnUing kfschmy. To COQyothsrwi% or to Rpubhi-1, Xequilu afss Snd/or SpeCmcprmisim.

Bookmarks[4]

cific node so that

an

a spe-

go to a marked node document.

Recency

are useful only if the that is similar to rwe-

of In this paper, we report on the development intelligent graph-based approach for representing

and retrieving

specijic

information

in hypertext

docu-

157

S/GDOC

92

ments. HRS (Hypertext Retrieval System) uses concep tual graphs [29] to represent the information in hyper-

The display of a hypertext document in the version of HRS is shown in FigMicrosoft’ Wlndows=w

text documents. A user can ask a natural language question regarding a topic and the system tinds the nodes in the hypertext document that answer the question.

a CO1OUI monitor. When the cursor is moved to an area with a link, the cursor changes from an arrow to a

HRS consists ●

A

of the following

Browsing

lows

Systern.

The browsing

users to display

follow

hypertext

components:

a hypertext

links

system

document

in a non-linear

aland

ure 1.

Links

to other

to pose queries

on documents. Keyword

The Keyword ●

A Keyword keywords



A

Graph-Based

cility. tic

This

based

retrieval Each

glish

which

language

node

as a graph

trieved,

in

the

Graphs. matching

a seman-

hypertext

document

in

by users

Query

Conceptual base.

Once

is treated

a node

a user can browse the node, follow

pose additional

is re-

links, or

queries.

HRS is an improvement tems because it providea ing capabilities

En-

are converted

processing

process.

Fa-

in a hypertext

in a knowledge posed

over existing hypertext sysusers with both the brows-

of hypertext

and the semantic

search

capabilities of natural language queries. The paper is organized as follows. Section 2 describes the user interface and gives examples of browsing and searching in HRS. Examples illustrate the problems encountered by users when searching for specific information by following hypertext links or using keyword search. Section 3 describes the graph-bssed retrieval facility in HRS that is based on conceptual graphs. Section 4 describes the implementation of HRS in Microsoft Windows and Prolog. Section 5 gives a comparison other knowledge retrieval systems. conclusions system.

and future

extensions

enter keywords ument.

performs

representation

is stored

searches for

Retrieval

information

queries

to Conceptual

This

document.

facility

of the

a corresponding

Graphs

Facility.

Information

search

document. has

Search

in the hypertext

between HRS and Section 6 includes to the current

HRS

are displayed

in red on

‘hand” icon. A single click on the mouse button will display a new node, namely the node that is linked to the highlighted text area. The Search command allows users are supported:

sequence.

nodes

Two

Search facility that

The Query

retrieval

methods

and Knowledge-Based

are to be searched Result

Search.

in HRS allows

dialogue

users to

for in the doc-

box lists the nodes

that contain one or more of the keywords that occur in the query. This dialogue box allows users to select a new node for display. Users often have problems finding information with keyword searches. For example, consider a user who is trying to find the answer to the question ‘How do I read a line from

a jile

?“ in the

One

is the

description

answer

fgets.

However,

fgets function stream”.

of the

the hypertext states

The

C programming

node that

“fgets gets a string

keywords

from

language.

C library

function

describes from

the question

the

an input would

not

match the words in the hypertext node that describes fgets. The word read corresponds to get, the word line corresponds to the word st~”ng, and the word jile corresponds to the words input stream. However, this correspondence

applies

in this situation queries.

and may not necillustrates

that

essarily

be true for other

keyword iar with

searching is only useful when the user is familthe worda in the hypertext document.

This

The graph-based information retrieval facility allows users to enter a natural language query. The information in hypertext nodes is represented in a knowledge base and the knowledge and compared

with

base is semantically

the query.

The words

searched used in the

query need not be the same as the words used in the nodes that are returned. The processing of queries is based on a semantic-based edge representation [29].

The

approach

formalism

semantic-based

that

uses a knowl-

called conceptual

approach

graphs

to information

re-

trieval in hypertext domains is a more accurate and precise retrieval method than existing keyword meth2

The

HRS

System

This section discusses how users interact with HRS to browse a hypertext document and retrieve information. HRS is currently running on three operating system platforms:

PC

DOS’~,

MicrosoftR

Windowsrx

3.0,

and Unix=~ [3]. The Microsoft Windows version allows users to display nodes and select links using the mouse. Dialogue boxes are used to allow users to pose natural language questions for information retrieval. The Unix and DOS versions of HRS input to select links.

158

currently

use line-oriented

ods. HRS is designed with a layered architecture. user interface is separated from the application by an Applications Programming Interface (API)

The code layer.

This allows the application code, which includes the search facilities and database management system, to be moved other computer platforms and new interfaces can then be integrated. Figure 3 shows the HRS architecture. Subsequent sections of this paper describe the design and implementation facility in HRS.

of the graph-based

retrieval

SIGDC)C

92

a

.EiIe

~ome

search

Qoto

Help

C Library

INDEX TOPIC

DESCRIPTION

ftLLOCftT 10N CHIIIWCTER CONUERS 10N DNE DEU I CE DI RECTORY Dos FILE

tteamry Relocation Character ttanagement conversion Time and Date

Device

Dos File Hanagenent Heap Hanagenent Interrupt Handling Ilath & Trig Functions tkmory Ilanagement Miscellaneous Ilultibyte Characters ~~:ng System 1/0

HE(iP INTERRUPT ?tftTH HEflORY ruse ttULTIBYTE OPEIMTING PROCESS STREM STRING Other

areas

1/0

Directory

stream

1/0

String

ttanagemant

of

Help:

C_Language_Syntax

C_HELP_WIN_PtWEL

WtTCOFt_C

b

Figure

1: Browsing

a

a Hypertext

Document

WA

\hrs\c.hlp

~le

~ome

search

~oto

Help

INDEX

C Library

TOPIC ftLLOC9TION CIMRRCTER CONVERSION DflTE DEU I CE DIRECTORY

Itenory Charac Convers Time an Deu ice Directo]

Dos FILE HEfIP

HULTIBYTE OPERMING

PROCESS STR131H STRING areas

What

a natural

of

Help:

language

question

such

as...

is . . . ?

Howmany...?

H?e ttal Howdolreadal~ne Heap Intarru tlath & ?lemor ttisce f: 1 Ilult ibyt Operat ir l-q Process Straam String Ra nagenbcvw

INTERRUPT HftTH ttEHORY ttISC

Dther

Enter

+

fromafile?

C_Language_Syntax

& ●

-I

WITCOtl_C

CJtELPJ’lRINflfNWL

7 I r.L~

I*

!

Flgure2:

Natural

Language

Query

Facility

159

SIGDOC’92

Application

Graph Based Knowledge

Programming Interface Natural Language Query

*

System

El

H Keyword

❑ l-n n A

Retrieval

Retrieval

System

Keyword

.

.

Search

-

Dkplay Node

End

\ Hypertext

Database Management

n c1

System

Hypertext Oocument

n n

user

Open Hypertext Document

Node Index

close

Hypertext Document

Figure

3

Graph

Based

Knowledge

3: The HRS System

Re-

trieval to improve

One approach is to provide language in

a natural

interface

a natural

language

the retrieval

language

allows

users

of information

interface. to

such as English

to specijy how to tlnd the information.

information

without

having

Semantic

search

users’ requests

with

To date, most of the existing natural language interf&ccs for retrieving information have focused on interfaces to database systems [7, 11]. These systems provide the novice user with a means of accessing a database in a natural cryptic

language without requiring the user to learn the commands generally associated with database

query languages.

Examples

have been considered

of application

include

airline

domains

fight

that

information

[36], statistics of lunar rocks [37], airplane maintenance data [31], and naval ship data [16]. However, these syatema provide data retrieval as opposed to knowledge retrieval. Few natural language interfaces oped to retrieve information from [13, 18,27, procedures,

160

35]. Knowledge objects,

have been knowledge

can be description

or general

facts

The representation of knowledgel is therefore more complex than of data in databases. In [19] we introduced a method for knowledge retrieval based on matching conceptual tneval).

A natural

request

methods can then be used to match the information they seek.

:=:



H=

about

develbases

of events, the world.

grapk,

GKR

Integrating GKR to retrieve

(Graph-based

as a component

information

from

Knowledge

R.e-

of HRS allows users

hypertext

documents

us-

ing natural language queries. A knowledge base ia used to represent the information in a hypertext document. GKR interprets natural language queries using a semantic interpreter and performs a semantic based search of the knowledge base using a graph matching process. The following sections describe the GKR facility artd gives examples of knowledge retrieval from a hypertext document

on the C programming

Creating

3.1

The semantic

that

represents

Conceptual

interpreter

icon, and a join guage sentence,

language.

is comprised

Graphs of a parser,

a lex-

algorithm. The input is a natural lanand the output is a conceptual graph the meaning

of the sentence.

The parser grammatically analyses tences, both declarative and interrogative, lFor a survey see [5, 6].

English senand produces

SIGDCIC

a syntactic parse structure for the sentence. The parser has the abfity to parse the sentence into one or

be parsed using the DCG rules shown in Figure 4. The parae tree is shown in Figure 5. The nodes of the tree

more syntactic

are syntactic

parse structures.

tures may reflect

different

Different

interpretationa

parse strucof a sentence.

structures

of the sentence.

the tree are the actual

consider the simple sentence: “John seea The two possible parses would Bti with a telescope”. show that, in one case, John has the telescope and seea

natural

language

92

The leaves of words.

For example,

Bill;

in the other,

Bill

has the telescope

sent ence / noun-phrase

and is seen by

John.

/

/

GKR allows a programmer to deline a Dejinite Clawe Grammar (DCG) similar to that of [26]. These clauses specify the syntactic structure of the English sentences these to be analyzed. A DCG converter transforms DCG rules into Prolog [2] rules that are then executed by the parser. A simple example of DCG rules and their corresponding

sent ence noun.phras

Prolog

-->

rules is given in Figure

noun-phrase

e -->

noun-phrase

--> * You ~ .

noun

-->

‘files

~non_hidden

verb_phrase

-->

verb,

verb-aur, __> )c~).

verb_inf

-->

~erase]

verb_obj

-->

noun_phrase.

noun-phrase(Sl

noun( noun(

verb(

:- adjective(Sl noun(S2, S3) .

,S3)

S1 ,S3

)

:-

verb-aux(

[canl

verb-inf(

[erase

:-

noun(Sl

,S2),

canonical

verb_obj

(S1 ,S2)

Figure4:

lSl], :-

DCGRules

5: A Parse Tree Structure

_

associated with a word (such as the noun), a word also has a conceptual

,S2).

1S11

:-

,S2)

,

graph.

of things

are represented

(S2 ,S3) . S1, S2) , S2, S3 ).

Sl).

by denoting

converted

(Si,

that

are perceived,

and 2) relatiorm

S2).

into Prolog

graphically

by rectangles,

The noconcepts

relations

by

circles, and arrows are used to ahow the direction of relations. Individuals are represented as particular instances of a concept and appear to the right of the COIlcept name. Conceptual graphs can also be represented

Sl). noun-phrase

the word jile can

relationships that can exist between concepts. tation is a connected graph notation, where

, S2 ).

verb(Sl

For example,

The conceptual graph notation consists of two basic typea of symbok, 1) concepts: peoples underlying interpretations

verb-object verb-aux( verb_fin(

Sl],

\ files

mean a collection of information stored on a computer, or a metal tool found in machine shops. Each meaning is referred to as a word sense. A conceptual graph that defines the meaning of a word sense is referred to as a

noun-phrase(Sl ,S2) , verb-phrase (S2, S3 ) .

( Ih’kon-hidden

verb_phrase(Sl

/ non-hidden

erase

can

graph for each meaning.

[Youl Sll ,S1). [files 1S11 ,S1) .

adjective

/

graph that defines the meaning of the word. Words with multiple meanings or usages will have a conceptual

.

S2)

/

/

matical information part of speech,eg.

j ect.

verb_inf.

,S3)

noun_phrase(Sl,

J.

verb_ob

verb_aur

:-

/ // You

I noun_phrase / \ adjective noun

verb.inf /

Figure

~. -->

,S3)

verb.aur /

GKR uses a lexicon to convert parse trees into con~ceptual graphs. The lexicon is a dictionary that contains the words used by the parser. In addition to the gram~-

adjective

sentence(Sl

/

verb.obj

/\

noun.

noun.

-->

-->

4.

/ /

\

verb

noun

, verb_phrase.

adjective,

noun

verb

\ verb-phrase

concepts

between

square brackets

and rela-

tions between round brackets. To clarify this, Figure 6 shows the canonical Rules

graphs

for the word jile. The first canonical graph is for the word sense of tile as in a metal tool found in machine shops represented

by the concept

[tile.tool].

The second

The parser generates only parse trees that follow the patterns defined in the DCG grammar. However, this grammar may not have all the possible patterns for the entire English language, and furthermore, the gram-

is for the word sense of a computer

tile used to organixe

and store data, represented The join algorithm

a

sentation

for

mar may not necessarily

semantic

lexicon.

follow

formal

English

grammar

rules. The sentence,

“You can erase non-hidden

files”,

can

a sentence,

The

by the concept creates

using

algorithm

the

[tile].

conceptual parse

replaces

tree

repreand

natural

the lan-

guage words in the parse tree with the canonical graph that defines the meaning of the word. It then combines

161

SIGDOC’92

sent ence Word ----

Concept -------

you

[user]

erase

[erase] [f ilel

file

[file-tool]

->

[file:

(purp)

->

{*3]

-> (char)->

defined.

[comp-storagel ->(obj )-> [binary-digits]

6: Semantic

Lexicon

and Canonical

As discussed in the previous than

one word

one canonical algorithm,

canonical

graphs

section,

(one for

will

to create

will

sense).

to use all

a conceptual

possible

graph

for the

sentence. For example, if there are K words in a sentence, where the a+h word has Ni canonical graphs, then the total number of possible combinations of canonical graphs that can be used to create the conceptual graph N1. NZ. . . . NK. isT= a different interpretation

a canonical

be joined

larger the

with

graph.

is produced

incorrect

word

using senses

Consider the sentence, jilesn. The join algorithm join

two

until

or

the

more

final

shows

how

to the

graph

will be eliminated canonical

Hence no resulting

sentence

manner,

graph other

the

graphs

The heuristic

that are

“You that

is created.

graph

[tile:{*}]

word

rules define

graph

sense.

a for

In this

eliminated.

can erase non-hidden semantic

lticon

share

common

concepts

For

example,

Figure

for the noun for

if it canto form

conceptual

uses the

graph

[non_hidden]

graphs

the

what

adjective

concepts

jiles

to 7

is joined

nonhidden. may occur

in the same tree for the sentence to be semantically valid. For example, not all syntactically valid sentences

162

it is possible

in GKR

basis by noting

sleep.

could

Heuristic

rules

be added

to reject

this

that ideas cannot to reject

these

Some sentences are both syntactically and semantically correct but may have two interpretations. In some cases, a previous sentence can be used to select one of the interpretations. If the sentence “John bought a tele‘John

in the same situation

sees Bill with a telescope”,

as the sentence

then one can infer that

John has the telescope and ia using it to see Bill (as ok posed to Bill having the telescope). GKR produces all semantic interpretations (as defied by join rules) and does not use reasoning among previous sentences (context reasoning) to select a single correct interpretation. Context reasoning is a challenging ever, GKR can later be extended capabfity.

Each combination represents of the sentence. Some combi-

nations will be eliminated by heuristic join rules ~oin rules combhe canonical graphs into larger graphs). For example,

However,

sentence on a semantic

scope” occurred

have more

each word

attempt

rules) graph

a word may have

sense and hence graph

The join

not

of a Join

types of sentences. It is important to note that the domain of knowledge must be limited in scope because the number of heuristic join rules increases with the size of the domain.

Graphs

graphs in the tree (through the use of heuristic until a single conceptual graph is created. This represents the meaning of the entire sentence. than

7: Example

are semantically valid. Consider Chomsky’s [9] &nous sentence “Colorless green ideas sleep furioudyn which may be syntactically correct depending on the grammar

[filel->(purp)->

more

fjron-hidd~

[grindl

schema:

Figure

/ /

Figure

[file]

/

[erase]

[user]

bon-hiddenl

\ verb-obj

/

/

non-hidden

schema:

/ verb

/ noun

[file-tool] Chidderd

[file.tool]

verb-phrase

noun-phrase

hidden

file

\

/

It is possible

that

more

than

task in itself. to incorporate one conceptual

Howthis graph

can be created from a sentence. This typically occurs with compound sentences where more than one idea may be expressed. A compound sentence occurs when two valid sentences are joined by the word and. An exand -Pie of this type of sentence is “John likes Mary Bill likes Cindyn. In this me, the two ideas are not directly connected. Hence two conceptual graphs would be created.

3.2

Processing

Conceptual

Graph

Queries The retrieval of knowledge in GKR is done by matching the query, that is expressed as a conceptual graph, to conceptual graphs in the knowledge base. The query

SIGD(2C’92

graph is compared to other grapk using a Projection Algorithm. For example, the question “Can I emse jiles Qti can be answered . non-hidden

jiles”.

by the

CYOU can ewe

assertion

The projection is shown in Figure matches each concept and relation. It

9. This example also assumes that

individuals

have unique

Often, the retrieval of information straight forward matching of concepts example.

In the previous

of the assertion

graph,

names.

is not a simple as in the previous

case, the query

w=

a sub-set

and this was sufficient

reason to

answer the query. However in other ceses, the sub-set condition is not valid. For example, consider the sentence ‘Can I erase the jile myjile ?“ whose conceptual graph is shown in Figure 10. GKR generates an additional query (sub-goal), namely to determine if “myfile is a non-hidden jile”. Once this is determined, the initial query can be answered. GKR automatically generates subgoals. The sub-goals are generated by noting that the mapping requires

of the concept

a unification

the set referent ents

and

{*}.

[fde:

between (For

individuals, of individuals

dividual

myjile

that

to [file

the individual

a detailed

see [29]

to a set

myfile]

). are

is a particular

myjile

discussion

The

symbol

not

specified.

instance

{*}], and

of refer{*}

refers The

in-

maintain a list of all previous sub-goals and detect a loop when a sub-goal is generated which has previously occurred

in the deduction.

significant

amounts

To retrieve

This solution

of additional

conceptual

would

require

memory.

graphs

from

the knowledge

base, GKR uses an exhaustive sequential seareh of conceptual graphs in the knowledge base. Conceptual graph called

matching is performed Concept Relation (CR)

tual graph

into

using a data structure list. It splits a concep

concept-relation-concept

triplets

which

form the members of a CR list. A query graph is a subset of another conceptual graph if the query graph’s CR list has all of its members in the CR list of the other conceptual graph. Hence the query shown in Figure 9 is represented

[erase]

by a CR list that

->

[user]

bon-hidde~
Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.