LEXICO: A system for lexicographic processing

June 14, 2017 | Autor: Nathan Relles | Categoria: Cognitive Science, Data Format
Share Embed


Descrição do Produto

Computers and the Humanities, Vol. 11, pp. 127-137. Pergamon Press, 1977. Printed in the U.S.A. 0010-4817/77/0501-0127502.00/0 Copyright ~ 1978 Pergamon Press

LEXICO: A System for Lexicographic Processing R I C H A R D L. V E N E Z K Y , N A T H A N R E L L E S a n d L Y N N E P R I C E LEXICO is an interactive system which assists lexicographers in storing, editing, and concording texts; lemmatizing word lists; and generating slips. A slip is equivalent to a file card and contains a single word from a text, and its lemma, context, and source. From a file of such slips (sometimes running into the millions), plus supportive reference materials and word studies, dictionary editors compile entries for a dictionary. The system, developed originally as a research tool for exploring different approaches to man-machine integration in lexical processing, has now been implemented as a user system at the Madison Academic Computing Center, University of Wisconsin. Recent applications of the system include studies of style in Modern French literature and text processing for the Dictionary of Old English. The present article presents a sampling of the capabilities of LEXICO, with emphasis on interaction techniques and user aids.

sustained user is another class, the occasional users. These are people who mastered the system once, but because of infrequent practice sometimes forget an essential item. They need some assistance, but their memories can often be jarred with a minimum of verbiage. Thus, where the novice may require a full-page explanation of a certain command, the occasional user may need only a sentence or two, and the sustained user desires at most an abbreviation. The needs of these different types of users strongly influenced the design of LEXICO. Other design criteria were based upon analyses of existing text processing and interactive systems. From these considerations, six general goals were established for the system. A. The system should appear to the user as a single, unified entity. The manner in which tasks to be performed on texts are specified should be similar for all tasks. The manner in which the system responds to user requests should likewise be consistent throughout the system. B. The user should be totally isolated from the operating system and certainly should not be required to understand file formats~ program characteristics, and the like. Just as important is the need for isolation from the ciphers and incantations required to create files, use programs, and enter data. At any level of discourse, the user should be aware only of data and tasks. C. Task specification should be flexible and easy, minimizing the chance of user errors. After previous experience demonstrated the inadequacy of an ever-expanding number of fixed-field control cards for specifying task parameters, it became evident that a command language should be used for task specification. D. User declarations should be minimized; the system should therefore have several levels of default values to be assumed for any parameters not overtly specified by the user. E. The system should be easy to maintain. In

Design goals

LEXICO was designed for three different types of users: the novice, the sustained user and the occasional user. Because of the large number of parameters which must be specified in certain tasks and the relatively unfamiliar vocabulary of lexicography, these three types of users have widely differing needs. The novice needs to learn the system's capabilities and the methods of requesting different functions. He must discover how and when to create backup files, how to recover from syntax errors, and which servicing priorities to select for different cost and time demands. In contrast, the sustained user has already acquired this knowledge and wants to make requests as brief as possible. He needs little extra documentation; errors are quickly recognized and therefore do not require extensive explication. Between the novice and the Richard L. Venezky is Unidel Professor of Educational Foundations at the University of Delaware. Nathan Relies and Lynne Price are Ph.D. candidates in the Department of Computer Sciences at the University of Wisconsin-Madison.

127

128

RICHARD L. VENEZKY, NATHAN RELLES AND LYNNE PRICE

particular, there should be minimal effort by programmers responsible for implementing new capabilities, modifying the command language t o accommodate new or revised capabilities, and tracing system errors. Facilities should be provided for monitoring both user and system behavior. F. The system should not be too expensive. While this might seem an obvious goal, it needs to be emphasized that LEXICO was designed to be practical. In addition to demonstrating the feasibility of automating certain tasks, LEXICO provides a useful and practical alternative to other methods. Therefore, all design decisions included a careful consideration of user costs.

The user's view The user of LEXICO works with one or more collections of texts which the user wants to process in similar ways. In addition to the texts themselves, a collection contains a directory identifying and describing each text, and may contain any of the following: 1. conventions to be used when entering or concording texts; 2. spelling conversion rules to be used for headword classification; 3. a text form - - base form conversion list to be used for headword classification; 4. concordances of texts; 5. for each concorded text, a list of words occurring in that text. Features which may be common to all texts in a collection are called collection defaults. Once specified for a collection, these features need not be respecified for each text. The typical processes which can occur for a collection of texts include: 1. creation of a collection and, optionally, definition of collection default values; 2. entry of one or more texts into the collection, optionally producing a listing of each text; 3. correction of errors in texts (editing); 4. concording texts; 5. formation of a headword classification 0emmatization) process for the collection by definition of spelling conversions and base form - - text form associations; 6. headword classification of the words that occur in texts, producing, for each text, a listing of the headword (base form) associated with each text form; 7. deletion of texts from the collection; and 8. deletion of the collection from the system (after all processing is complete).

In addition, users may at any time examine a summary of the processes performed on each text, or listings of any text, or listings of spelling conversion rules, base form - - text form associations, or the base forms associated with each word in a text. Also, collection default values may be inspected or updated; the headword classification process may be revised; and the text form - - base form associations of any text may be corrected. The user enters commands to LEXICO from a remote terminal. Many tasks are performed on-line with the results displayed immediately at the same terminal. However, some tasks require so much time, generate so much output, or cost so much, that they are performed off-line. In this case, the user describes a task by commands entered at the terminal, but the tasks are actually performed later and results printed at the computing center. Tasks are specified to the system by entering statements in the LEXICO command language. Most tasks are specified in the system as functional blocks. A block begins with a block header which is a command that identifies the task (e.g., CREATE, ADD, EDIT). Subsequent commands specify how the designated task is to he performed (e.g., delimiter specifications, printing options, collating sequence). Following these commands and declarations the END command is used to signify that the task has been completely described. (Many tasks are specified in one command and therefore do not require an END command.) After some commands are entered, LEXICO asks a question of the user; for example, the command, EDIT; is followed by the prompt, WHICH TEXT?. During interaction, the user may enter special commands to LEXICO for assistance in understanding error messages, questions, or requests for information. These user aids, especially suited to the novice user but, of course, available to anyone, include: Explain error (*err). If an error is made, the system displays a brief description of that error. In response to *err, a more detailed explanation is displayed. *Err may be entered several times for progressively more detailed explanations. Explain question (*equ). Occasionally the system will solicit information in the form of a brief question. Successively entering *equ results in progressively more detailed explanations of the request. Example (*exa). Whenever the system solicits information or when an error has been made, examples of correct input may be obtained with this command.

LEXICO: A SYSTEM FOR LEXICOGRAPHIC PROCESSING

Menu (*mnu). This command causes the system to display all commands allowed in the block in which the user is working.

Processing procedures To demonstrate how data are processed by LEXICO, several examples of user-system interactions, shown below, illustrate many of the system's capabilities, but do not in all cases reflect normal interactive sessions. Typically, a user would remain in a single task and would request fewer user aids than are shown here. The text used for these examples was written specifically to illustrate some of the capabilities of the system. The first step is to prepare the text for input into a collection. Shown below is a sample text as it exists, prior to processing by LEXICO, on a card file called INPUT*TEXT.

129

theses. The English translation of the text has been included but is enclosed in square brackets to indicate that the English words are notes to the text and should not be concorded. The text ends with ' - ' . (In citation 30 the symbol " / " marks the accent in 'cosi'.) In this text a blank, comma, or semicolon marks the end of a word, a period indicates the end of a sentence, and parentheses, square brackets and ' ~ ' have the uses mentioned above. All of these symbols are standard delimiters, called system defaults. However, a user may select any other symbols to be used for these purposes. Once the text has been prepared in this manner, it is ready to be entered into a collection. In the next example, Fig. 1, a collection called "parables" is created and certain collection defaults are specified. (In all examples of dialogues between a user and LEXICO, user input is shown in lower case and is preceded by the symbol '>'. Answers to system questions are

'JOHN' (10) JOHN E FRANCESCO SEDEVANNO SUL BANCO [JOHN AND FRANCESCO WERE SITTING ON A BENCH]. (20) FRANCESCO LAVORAVA NEGLI UFFICJ DEL BANCO [FRANCESCO WORKED IN THE OFFICES OF THE BANK]: JOHN ERA UN UOMO SENZA PRINCIPJ [JOHN WAS A MAN WITHOUT PRINCIPLES]. (30) C'EFA UN BUCO NELLA SCARPA DI JOHN [THERE WAS A HOLE IN JOHN'S SHOE]. E COSI/VOLEVA UN NUOVO PAJO DI SCARPE [SO HE WNATED A NEW PAIR OF SHOES]; MA NON AVEVA DENARO [BUT HE DIDN'T HAVE MONEY]. (40) JOHN VOLEVA CHE FRANCESCO LO AIUTASSE DI RUBARE IL BANCO [JOHN WNATED FRANCESCO TO HELP HIM ROB THE BANK]. (50) FRANCESCO LO HA CONSEGNATO AGLI UFFICCI DEL BANCO [FRANCESCO TURNED HIM IN AT THE OFFICES OF THE BANK[. (60) MORALE: DECISIONI FATTE SU UN BANCO NON TI FA PADRONE DEL BANCO [MORAL: DECISIONS MADE ON A BENCH DON'T MAKE YOU BOSS OF THE BANK (IT LOSES SOMETHING IN TRANSLATION)[. AA The text begins with its title. The body of the text has been divided into citations - - in this case, sentences - each of which has been given an identifier. Here the identifiers are multiples of ten, but the system allows alphabetic or numeric identifiers of up to three levels. Typical identifiers may be page and line numbers, or psalm and verse numbers. They are enclosed in parch-

indented.) In this same block a variety of other collection defaults could be entered, including concording specifications and input/output conventions. These can also be added in an UPDATE block. After each block is concluded, LEXICO prompts the user with 'TASK COMMAND'. In the next example, Fig. 2, the text shown above is scheduled to be added to

130

RICHARD L. VENEZKY, NATHAN RELLES AND LYNNE PRICE

The user initiates interaction with LEXICO

>@le*ico. LEXlCO VERSION 2.0 COLLECTION NAME?

12:17:55

08/07/76

L EXlCO responds and asks for a collection name. The user enters the collection name.

--*parables NEW COLLECTION MAY BE CREATED

LEXICO recognizes that the user wants to work in a new collection.

TASK COMMAND:

L EXICO asks for a task to perform.

>create;

The user wants to create a collection.

COLLECTION 'PARABLES 'TO BE CREATED: YOU MAY ENTER COLLECTION DEFAULTS.

L EXlCO acknowledges.

>add stopwords del die era negli neUasul un >'c"era' cosi/ma non aveva che Io il ha >agli su ti;

The user enters the stopwords; "'c'era'" is enclosed in apostrophes because it contains one and LEXlCO uses this symbol as a quotation mark.

>end;

The user ends the CREA TE block.

COLLECTION CREATED.

L EXlCO creates the collection. Fig. 1.

TASK COMMAND:

LEXICO asks what to do next.

>add john;

The user wants to add a text called " J o h n ' "

TEXT CODE ASSIGNED TO 'JOHN

'IS

From n o w on the text may be referred to as " J o h n " or as "1";

1.

>input on card file input*text;

The user tells LEXICO where to find the text.

>end;

The user has no further specifications.

CREATE BACKUP IMMEDIATELY BEFORE THIS PROCESS? (Y OR N)

LEXlCO asks i f a copy o f the collection should be saved in case the computer goes down while the text is being added. The user does not want a backup.

--~n WHEN? (I, T, O, W)

LEXICO asks for a run priority. The user requests an explanation o f the question.

--**equ

THE CHARACTER YOU ENTER WILL DETERMINE WHEN LEXlCO explains the options. THE RUN WILL BE INITIATED: I = >IMMEDIATELY T=>TODAY O=>OVERNIGHT W=>WEEKEND ANYTHING ELSE = >CANCEL RUN (FOR COST IMPLICATIONS, PRESS EXPLAIN-QUESTION AGAIN) WHEN? (I, T, O, W) The user wants more detail.

--,'*equ NOTE: I=VERY EXPENSIVE T = LESS EXPENSIVE W = LEAST EXPENSIVE O = INEXPENSIVE (BUT NOT CHEAP) "-~o

The user selects overnight priority.

RUN IDENTIFICATION: XDO848 (SAVERNOS07*6122421)

LEXICO tells the user h o w to identify the output at the computing center.

Fig. 2

LEXICO: A SYSTEM FOR LEXICOGRAPHIC PROCESSING

>@1e'ice LEXICO VERSION 2.0 COLLECTION NAME?

08/07/76

131

12:52:26

~parables COLLECTION ACCESSIBLE: PROCEED

LEXICO recognizes the collection created above.

TASK COMMAND: >edit john; TEXT 'JOHN '(TEXT CODE: 1) AVAILABLE CURRENT CITATION IS SEQUENCE NUMBER 1; PROCEED TO EDIT.

The user wants to edit the text.

>3;

The user selects the third citation.

ID:

30

LEXICO displays the citation and its identifier.

C'EFA UN BUCO NELLA SCARPA DI JOHN [ THERE WAS A HOLE IN JOHN'S SHOE], E COSl/ VOLEVA UN NUOVO PAJO DI SCARPE [SO HE WANTED A NEW PAIR OF SHOES]; MA NON AVENA DENARO [BUT HE DIDN'T HAVE MONEY]. >change f re r;

The user attempts to correct the spelling of ""c"era" but enters "'re" instead of "'to"

UNRECOGNIZED KEYWORD OR PUNCTUATION: F

LEXlCO does not understand the previous input.

>*err

The user wants the error message explained.

I WAS EXPECTING A KEYWORD OR PUNCTUATION IN PLACE OF THE ABOVE STRING YOUR ERROR IS DUE TO: 1. A MISSPELLED KEYWORD. 2. A COMMAND NOT ALLOWED IN THIS BLOCK, 3. A WORD THAT ISN'T A KEYWORD, OR 4. INCORRECT OR OMITTED PUNCTUATION.

The keyword " t o " was misspelled.

PRESS EXPLAIN-ERROR FOR CORRECTIVE MEASURES, EXAMPLE-KEY FOR EXAMPLES. >f to r;

Beginning with the symbol where the error occurred, the command is reentered.

C'ERA UN BUCO NELLA SCARPA DI JOHN [THERE WAS A HOLE IN JOHN'S SHOE], E COSl/ VOLEVA UN NUOVO PAJO DI SCARPE [SO HE WANTED A NEW PAIR OF SHOES]; MA NON AVEVA DENARO [BUT HE DIDN'T HAVE MONEY]. >next; ID:

The user continues to the next citation.

40

JOHN VOLEVA CHE FRANCESCO LO AIUTASSE DI RUBARE IL BANCO [JOHN WNATED FRANCESCO TO HELP HIM ROB THE BANK].

It is displayed.

>c na to an;

The spelling of "wanted" is corrected, using "'c'" as an abbreviation for "change"

JOHN VOLEVA CHE FRANCESCO LO AIUTASSE DI RUBARE IL BANCO [JOHN WANTED FRANCESCO TO HELP HIM ROB THE BANK]. >end;

All corrections are made so the block is ended. Fig. 3

132

RICHARDL. VENEZKY,NATHANRELLESAND LYNNEPRICE

collection Parables. After the text has been added, errors can be corrected as shown in Fig. 3. The corrected text is ready for concording. In the exchange illustrated in Fig. 4, the user schedules the concordance. For this text, it is unnecessary to enter other commands with the CONCORD block. However, at this point the user may specify, for example, that a different set of delimiters is to be used for this text than for other texts in the collection, or that the concordance is to be stored on tape instead of in the collection. For texts that do not require editing, there is an ADDCONCORD block which allows a text to be added and concorded in one run. It is also possible to add, or to add and concord several texts at one time. One result of concording a text is to store in the collection an alphabetized list of all the keywords and stopwords which appeared in the text. Headword classification is the process o f associating a base form or headword with each entry in this list; when it has been completed, this list and the concordance may be used to generate slips. An optional step in the headword classification process is the standardization of spelling. When applying spelling rules of the form new:old, LEXICO replaces every occurrence of the string " o l d " with " n e w " in the word list of a text. No change is made in the body of the text. Blanks are used in spelling rules to indicate that the rules apply only at word beginnings or endings. Spelling rules which convert word-final " j " to "ii" and every other occurrence of " j " to " i " , and the application of these rules to the word list of the sample task, are shown in Fig. 5. Since headwords are assigned to respelled forms rather than text forms, respelling can greatly reduce the number of bases which must be specified for texts which have many variant spellings. A base form may be associated with each entry in the word list of a text in either of two ways. Previously defined basetype rules may be applied to the word list in

an off-line process called LOOKUP, or the bases may be entered explicitly in a CLEANUP block. Basetype rules may be entered as in Fig. 6. The CLEANUP block may be used to enter bases in the word list and to associate citations with each occurrence of a homograph. In Fig. 7, the SHOW U N M A T C H E D command is used to have LEXICO display each entry that has no base. When the headword classification process is completed, LEXICO will output this information either as slips or as a concordance alphabetized on lemmata (base forms). Although LEXICO's headword classification processes are all word-oriented, the system can be used to perform some phrase lemmatization. One approach is to separate words in a phrase with an otherwise unused character, such as a hyphen. An alternative is to specify that the blank symbol is not to be treated as a word delimiter and to explicitly punctuate all phrases. The disadvantage in both methods, that individual words within a phrase are not recognized, can be circumvented by concording each text twice, specifying a different set of delimiters each time. The word list produced with one concordance (e.g., with blanks treated as word delimiters) can be used for word lemmatization; that generated with the other concordance (e.g., with blanks not recognized as word delimiters) can be used for phrase lemmatization. LEXICO's default system makes this double concording extremely easy. A different method is to associate a lemma for an entire phrase with a single keyword that appears within the phrase. If the keyword also appears in other contexts, the homograph capabilities may be used to specify appropriate bases for them. Finally, it is possible that a future modification to the system will allow phrases as well as single words to be concorded. If so, the headword classification processes would automatically expect lemmata to be entered for the phrases.

>concord 1;

The user refers to the text by text code instead o f text name.

TEXT'JOHN (TEXT C O D E : 1)AVAILABLE: YOU MAY ENTER CONCORDANCE SPECIFICATIONS. >end; CREATE BACKUP IMMEDIATELY BEFORETHIS PROCESS?(Y OR N) --~n

The user does n o t want a backup.

WHEN? (I, T, O, W) ~o

He selects the overnight priority.

RUN IDENTIFICATION: XDO851 (SAVERN0807*6125642) Fig. 4

LEXICO: A SYSTEM FOR LEXICOGRAPHIC PROCESSING

133

TASK COMMAND:

>update;

The spelling rules are entered in an UPDATE block.

COLLECTION 'PARABLES ' TO BE UPDATED: YOU MAY ENTER COLLECTION DEFAULTS.

>add spelling rules ii:'j' ('j'), i:j; >end; TASK COMMAND:

>respell 1;

The user requests the rules to be applied to the the word list.

HERE AND NOW? (Y OR N) --.y

LEXICO asks if this should be done on-line or off-line

WANT A LISTING: (Y OR N) --,y

The user wants the original and respelled forms to be displayed.

CURRENT SPELLING RULES ARE: T : 'J' '11' : 'J'

The system first displays the spelling rules and then shows the word list.

The user wants the RESPELL done on-line.

(WORDS THAT ARE CHANGED ARE MARKED WITH ***) 1 AGLI 2 AIUTASSE

AGLI AIUTASSE

201L 21 JOHN

IL IOHN

22 LAVORAVA

LAVORAVA

32 PAJO

PAIO

41 UFFICII 42 UFICJ

UFFICII UFFICII

45 VOLEVA VOLEVA RESPELLING COMPLETED. TASK COMMAND: >cleanup 1, TEXT 'JOHN

The CLEANUP block is used to correct an exception to the spelling rules.

'

(TEXT CODE:

1) AVAILABLE:

PROCEED WITH CLEANUP. >type 21; 21 JOHN

The user selects the word to correct.

IOHN

L EXICO displays the entry in the word list.

JOHN

The change is displayed.

>new respelled john; 21 JOHN

The user enters the correction.

>end;

The user ends the block. Fig. 5

134

RICHARDL. VENEZKY,NATHANRELLESAND LYNNEPRICE 2. The cost of the most expensive task, concording, is highly dependent on user-specified options. In particular, it should be noted that a considerable cost reduction is possible if a concordance is not printed. (This might be the case if, for example, a user were primarily interested in obtaining magnetic tape output.) The cost of a concordance can also be reduced through the use of stopwords. 3. Since space limitations do not allow a full description of the University of Wisconsin Computing Center charges, or of the degree to which computing is subsidized by the University, the figures in Tables 1 and 2 should be interpreted with caution.

Costs The cost of performing different tasks with LEXICO depends on the priority at which off-line jobs are run and the time of day at which on-line interaction takes place. Both of these choices are, of course, at the user's discretion. Table 1 gives the cost of some tasks performed by the system in the last few months; all tasks were billed at the lowest priority and, hence, at the least expensive rate. Table 2 gives some typical file charges. Any attempt to estimate the cost of a particular task would be subject to the following considerations: 1. Continued system monitoring, permitting the refinement of system components, has resulted in cost decreases. For example, a recent change reduced the charge for initiating interaction with LEXICO from about $1.00 to $0.43 during the day, or from $0.35 to $0.17 at night. Further cost reductions are likely as data from future system use becomes available.

Table 2. Typical file charges. Data

Table 1. Recent costs for LEXICO tasks. Process Concord a text of 12,338words; no stopwords; print concordance; Concord a text of 21,807 words; no stopwords; print concordance; Concord a text of 10,195 words; no stopwords; concordance not printed Concord a text of 17,309 words; no stopwords; concordance not printed Generate a base concordance for 1000 words comprising a text of 1100 words RESPELL a word list with 1083 entries; no listing LOOKUP a word list with 1083 entries 189 hours of on-line interaction

Total cost (including printing)

Cost of printed output

$20.66

$14.75

Daily file charge

A collectionwith 6 concorded texts totalling over 14,000 text words. No basetype rules; concordances stored on tape. The concordances for the above collection (if stored in the collection rather than on tape). A file of about 8600 basetype rules.

$1.21

$3.44

$ .79

Programmer Aids $35.55

$24.69

$ 2.64

$ 4.96

$ 1.64

$ 1.20

$ .12 $ 1.00 $ 7.99

$ .68

Several features of LEXICO were developed to simplify debugging and modification and to encourage more consistent, reliable programs. Some of these have been incorporated into the command language itself. For example, there are LEXICO commands to revise the command language granunar and any messages displayed by the system. The system default values are stored in a special collection, and may be modified by entering the same commands used to change any other collection. By means of either the LEXICO interactive system or a special-purpose routine, the internal representation of user data may be inspected or changed. Other command language statements may be entered at any time to activate system tracing facilities, to allow testing of revised system components, or to permit monitoring of the system. Many of these facilities may In basetype rules, the base precedes the colon.

> a d d basetype rules ufficio : ufficii, fare : fa fatte, aiutare : aiutasse, 'nel gli' : negli; > a d d homograph rule banco : banco ( = bank) banco ( = bench);

Homograph rules permit more than one base to be associated with a word; in these the bases follow the colon. Fig. 6

LEXICO: A SYSTEM FOR LEXICOGRAPHIC PROCESSING

>show unmatched; ENTER BASE FOR TYPE RESPELLED FORM IS AGLI

135

The show unmatched process begins.

1 AGLI The user asks for an explanation.

-~*equ

ENTER BASE (WITHOUT ';') OR '?' OR' = ' OR 'H' OR 'R' OR ';'.

The system displays the codes which may be entered. The user asks what they mean.

-'*equ ? LEAVES TYPE UNMATCHED = ENTERS A BASE WHICH IS THE SAME AS THE RESPELLED TYPE, IF ANY EXISTS, AND OTHERWISE, IS THE SAME AS THE ORIGINAL TYPE H ALLOWS YOU TO ENTER HOMOGRAPHS R REJECTS THE TYPE ; TERMINATES THE 'SHOW UNMATCHED' PROCESS. (FURTHER EXPLANATION AVAILABLE).

The system explains.

The user enters the base for the first word.

--'A GLI BASE IS A GLI

LEXICO formulates the corresponding basetype rule and asks if it should be saved in the collection for use in future LOOKUPs (if other texts are processed).

'A GLI' : 'AGLI' ADD BASETYPE RULE? (Y OR N) --,y

The third concordance word is displayed next since the second word, "aiutasse'" was assigned to the base "aiutare " during LOOKUP.

ENTER BASE FOR TYPE 3 AVEVA

RESPELLED FORM IS AVEVA --*avere BASE IS AVERE 'AVERE' : 'AVEVA' ADD BASETYPE RULE? (Y OR N) ~y 4 BANCO

BANCO

*HOMOGRAPH*

BASE 1 : B A N C O ( = BANK) BASE 2 : BANCO ( = BENCH) 6 CITATIONS. NO BASE ASSOCIATIONS EXIST. ENTER BASE 3 --~;

The fourth word is a homograph; LEXICO displays the information it has, and asks for additional bases. The user does not want to add other bases.

ENTER BASE NUMBER OR ; He wants to associate citations with the first base.

ENTER SEQUENCE NUMBERS FOR BASE 1 BANCO(= BANK) (TERMINATE WITH ;) ~2-4 6;

The second, third, fourth and sixth occurrences of "banco" in the text belong with the first base.

Fig. 7 continued.

136

RICHARDL. VENEZKY,NATHANRELLESAND LYNNEPRICE

ENTER BASE NUMBER OR ; ~2 ENTER SEQUENCE NUMBERS FOR BASE 2 BANCO( = BENCH) (TERMINATE WITH ;) All other occurrences of the word in the text are matched with the second base.

~rest; 4 BANCO

BANCO

BASE 1 : BANCO( = BANK) CITATIONS 2 BASE 2: BANCO( = BENCH) CITATIONS 1 ENTER BASE FOR TYPE RESPELLED FORM IS BUCO

*HOMOGRAPH*

3

4

5

When all citations are matched. LEXlCO displays the results and continues to the next word.

6

BUCO = is the code for entering a base indentical to the respelled form.

BASEISBUCO 'BUCO':'BUCO' ADD BASETYPERULE?(Y OR N) ~y ENTER BASEFORTYPE 6 CHE

Fig. 7 also be used in off-line programs. For monitoring the system's performance, a file is kept which records the use of various components, along with their cost, resource utilization, and other relevant data. A copy of any interactive session may be generated for later inspection by maintenance staff. Most o f the routines which provide these (and other) features of LEXICO can be used by programs not connected with LEXICO. Especially useful are facilities for resource allocation, automatic scheduling of off-line jobs, and user communication.

Availability LEXICO was designed both for a specific local environment and a more diffuse expanded environment. The local environment was the UNIVAC 1108 at the University of Wisconsin, which has been upgraded to a dual processor 1110 with 3.5 million characters of core memory and almost 1.5 billion characters of random access, secondary storage. Currently, up to 72 remote sites (keyboard terminals and Remote Job Entry stations) may have simultaneous access to the 1110 and, hence, LEXICO. The system is now maintained by the University of Wisconsin Academic Computing Center, and expanded availability will be made possible by the University's participation in EDUNET, which uses the public TELENET communications network.

At the beginning of its development, nearly 5 yr ago, the only available high level language considered practical for implementating LEXICO was FORTRAN. It was also our original intention to make the system relatively transportable; in the course of development, however, this goal was dropped because the increased costs could not be justified for an experimental system. The current system, therefore, may be run on a UNIVAC 1loo series machine operating under EXEC-8, and requires a minimum of 100k of core storage, a remote terminal, and minimal random access, secondary storage. The system consists of 20 separately executable subsystems (programs), composed of 285 distinct functional modules (subroutines). The entire system contains about 28,000 lines of code, 95~ of which are FORTRAN, the remainder assembly language; this is equivalent to about loo,OO0 machine instructions. While the system could be converted to operate on some other non-UNIVAC machine, such an effort would be impractical at best, and is not recommended, since it may be accessed via telephone connection from any teletype-compatible remote terminal.

Comparison of LEXICO to other text-processing systems. Since its functions are narrowly defined by the needs of lexicographers, LEXICO cannot be compared easily to other text-processing systems. In addition, it

LEXICO:A SYSTEMFOR LEXICOGRAPHICPROCESSING incorporates into a single system tasks which generally are done by a variety of utility packages, e.g., text editing, slip generation. Thus, LEXICO generates concordances as does JEUDEMO (Bratley, Lusignan and Ouellette, 1974), but lacks all of the capabilities which that system has for stylistic analysis, such as word-pattern searching. On the other hand, JEUDEMO was not designed for text-collection edition and maintenance, spelling conversion, or lemmatization, and therefore does these either in a roundabout way or not at all. Most other humanities languages, like EYEBALL (Ross and Rasche, 1972), SCAN (Brown, 1972) and RATS (Smith, 1972) are oriented towards literary computing and therefore tend to have better facilities than does LEXICO for phrase handling and grammatical parsing, but have no facilities for the specialized lexicographic functions which LEXICO performs. LEXICO might be compared to many of these systems in cost of generating concordances, but no data are available for determining such costs for any of the other systems mentioned above. For lexicographers, however, the choice of processing procedures does not pit LEXICO against literary processing systems, but against either a self-programmed system to do the same tasks, or against non-computational procedures. In such a comparison, access to LEXICO, processing costs, training, and documentation, among other factors, will be important. Conclusions

Although several hundred hr of user interactions have been logged by LEXICO, it is still too early to determine how well the design criteria mentioned earlier have been met. The system permits a unique symbiosis of user and computer. Lemmatization, for example, is too timeconsuming to be achieved by human effort alone and too ill-defined to allow complete algorithmic specification. One advantage LEXICO has over most other user systems is that, since the system began as a research

137

project, several methods of performing certain tasks could he tried. Some components of the system were developed, tested, discarded and redesigned, once, twice and even three times before the current version was implemented. Nevertheless, improvements can still be made, especially in file handling. If the project were started over today, different approaches might be taken to several processes, including text input and editing. The use of microprocessors would also be considered for certain tasks. For a full evaluation, the system must (as stated above) be compared to other procedures for preparing dictionary materials. One important measure of the project's success, however, will be the degree to which other interactive systems adopt the man-machine techniques employed by LEXICO.

The initial work on LEXICO was supported by Grant GJ-32764 from the National Science Foundation. Subsequent work has been supported by the Graduate Research Committee at the University of Wisconsin, by the Foundation for Education and Social Development, and by the Canada Council through a grant to the Medieval Centre at the University of Toronto. Some of the material presented here has been adapted from a forthcoming article in Cahiers de Lexicologie and from the LEXICO User Guides. Requests for reprints should be addressed to Professor Richard L. Venezky, Department of Educational foundations, University of Delaware, Newark DE 19711. REFERENCES Bratley, P., S. Lusignan and F. Ouellette, "Jeudemo: A Text Handling System". Computers in the Humanities. ed. J. L. Mitchell. Minneapolis: University of Minnesota Press (1974). Brown, P. J., "Scan: A Simple Conversational Programming Language for Text Analysis". Computers and the Humanities 6 (1972) 223-227. Ross, D., Jr. and R. H. Rasche, "Eyeball: A Computer Program for Description of Style". Computers and the Humanities 6, (1972), 213-222. Smith, J. B. "Rats: A Middle-Level Text Utility System". Computers and the Humanities 6, (1972), 277-284.

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.