ISIS Project: Final Report

July 5, 2017 | Autor: Martin Rajman | Categoria: Information Systems, Information Retrieval, Speech Recognition
Share Embed


Descrição do Produto

ISIS Project

ÉC O L E P O L Y T E C H N I Q U E FÉ DÉR A L E D E L A U S A N N E

FINAL REPORT

Jean-Cédric Chappelier, Martin Rajman, Pierrette Bouillon, Susan Armstrong, Vincenzo Pallotta, Afzal Ballim

15 Septembre 1999

ISIS Project: Final Report

September 15, 1999

Contents 1 Introduction

4

2 Data

4

2.1

French PolyPhone Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

2.2

Working database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

2.2.1

Reformatting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

2.2.2

Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

Raw Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

2.3

3 Data Annotation 3.1

3.2

Syntactic annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

3.1.1

Resources for annotation: Grammar and Lexicon . . . . . . . . . . . . . .

7

3.1.2

Annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

Semantic annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Syntactic analysis 4.1

4.2

4.3

7

11

Resource production from ISSCO’s work . . . . . . . . . . . . . . . . . . . . . . 12 4.1.1

Context-Free Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.1.2

Example of an ELU rule translation: . . . . . . . . . . . . . . . . . . . . . 12

4.1.3

Lexicon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Lattice Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.2.1

Lattice Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.2.2

Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Syntactic forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5 Robust Analysis and Frame Filling

17

5.1

Computational logic for robust analysis . . . . . . . . . . . . . . . . . . . . . . . 17

5.2

Implementation of the semantic module . . . . . . . . . . . . . . . . . . . . . . . 17 5.2.1

Tree-paths representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5.2.2

Discourse markers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5.2.3

Generation of hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

6 Conclusion

21

6.1

Resource unification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

6.2

How to build good language models for PolyPhone? . . . . . . . . . . . . . . . . . 22

6.3

Robustness through Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2

ISIS Project: Final Report

September 15, 1999

A Swiss French PolyPhone database

25

B Missing record numbers in the French PolyPhone 1.0a CD-ROM

27

C Errors encountered in the French PolyPhone database

28

C.1 wrong prompts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 C.2 miscellaneous errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 C.3 Transcription errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 D Raw Statistics about French PolyPhone corpus

31

E Lattice Quality

34

F Semantic Annotations of the 30 bootstrap sentences

35

G Semantic analysis on the 30 bootstrap sentences

45

H Left-corner Head-driven Island Parser

59

3

ISIS Project: Final Report

September 15, 1999

1 Introduction ISIS project started on April 1st 1998 and finished on April 19th 1999. The project was funded and overseen by SwissCom; the partners were EPFL/LIA, EPFL/LITH, ISSCO and IDIAP. The general applicative framework of the ISIS project was to design an information system NLP interface for automated telephone-based phonebook inquiry. The objective of the project was to define an architecture to improve speech recognition results by integrating higher level linguistic knowledge. Its concrete objectives were to propose and evaluate a very first functional prototype of software architecture for vocal access to databases through phone. This prototype focused on phonebook requests, using the SwissCom French PolyPhone database as typical input. The proposed architecture for the functional prototype contained 3 modules:  a speech recognition system, taking speech signal as input and providing N -best sequences in form of a lattice;  a stochastic syntactic analyser (i.e. parser) extracting the k -best analysis;  a semantic module in charge of filling the frames required to query the database. Three work-packages have been achieved in the ISIS project: 1. Annotation of the PolyPhone database 2. Syntactic filtering 3. Semantic processing: frame filling This report first presents a description of the database and then further details what has been achieved in the latter workpackages.

2 Data 2.1 French PolyPhone Corpus The data treated in the ISIS project consist of a subset of the original Swiss French PolyPhone database (version 1.0a) restricted to the items related to the 111 service calls (“rubrique 38” of the calling sheet [12]). This database contains 4’293 recordings, 1’886 male and 2’407 female speaker recordings. Each recording consists of a request for a phone number based on the name and address of a fictitious person provided to the speaker. The recordings were transcribed and provided to the project as a text file (see Appendix A for further description of the original data). As far as the address fields are concerned, the data in the PolyPhone database was unfortunately not tagged and even inconsistent. The corresponding last three lines in the .txt file were thus processed with the following heuristic: line1 = name line2 = address if line3 is not empty and = town otherwise line3 = town if non empty This heuristic seems to perform quite well but no evaluation was possible as it was not possible to find enough information about the original data in order to be able to perform automatic validation. Concerning the structure of the fields in the Swiss Phone-Book database, it was assumed to be the same as the one that appears on the web (http://www.ife.ee.ethz.ch/cgi-bin/etvq/f), namely (one field per line): 4

ISIS Project: Final Report

September 15, 1999

Nom de famille / Firme Prénom / Autres informations No de téléphone Rue, numéro NPA, localité

The addresses were compiled in the following way [10]:  A database was automatically extracted form the Romand Phonebook, each entry consisting of 3 fields: Name, Adresse, Street (occasionally empty)  New addresses were composed by randomly mixing these 3 fields. One point still remains unclear about the PolyPhone database (as no answers were found neither in [12] nor in [4]): what was the set of annotations used for the transcription of the utterances. Several speech annotation such as "" appear in the text. Was it systematic? Are there other such markers?

2.2 Working database 2.2.1

Reformatting

From the database available on the French PolyPhone CD-ROM, a working database was extracted, containing only the part of the information of interest for the project. Prompts and information requests expressed by users were extracted from the CD-ROM files and regrouped into a single representation in the following format: id:cd1/b00/f0000o06:sid17733 prompt:1 adr1:MOTTAZ MONIQUE adr2:rue du PRINTEMPS 4 adr3:SAIGNELEGIER text[123]: Bonjour j’aimerais un numéro de téléphone à Saignelegier c’est Mottaz m o deux ta z Monique rue du printemps numéro quatre sample:0.200000:10.820000:88160:42801

where id identifies the original location of the file in the CD-ROM and the sheet_id field from the NIST header. prompt identifies both the prompt field of the NIST header and the begin prompt ... end prompt segment of the file itself. prompt1 corresponds to Veuillez maintenant faire comme si vous étiez en ligne avec le 111 pour demander le no de téléphone de la personne imaginaire dont les coordonnées se trouvent ci-dessous: prompt2 corresponds to Simulez une demande de renseignement au 111 concernant la personne correspondant aux coordonnées fictives ci-dessous: prompt3 corresponds to Prononcez une demande, au 111, du numéro de téléphone de: adr1:, adr2: and adr3: is the address part that has been extracted from the begin prompt ... end prompt segment of the file. Provided that the original file formating is correct, these fields should correspond to adr1: name 5

ISIS Project: Final Report

September 15, 1999

adr2: street adr3: town It was the responsibility of ISSCO to check, during the processing/annotating of the database, the correctness of these 3 fields. text corresponds to the text_transcription field of the NIST header. The number in square brackets is the total number of chars in the request. sample regroups the information from sample_begin, sample_end, sample_count and sample_checksum segments as they appear in the NIST header. 2.2.2

Splitting

The reformatted data were split into 4 parts :  A first set of 500 entries were kept apart for final tests (the test set). These sentences have remained untouched during the ISIS project.  A second set of 30 sentences, the bootstrap set, was extracted to serve as an initial bootstrap for the whole process. They constitute the very first examples on which both speech recognition, annotation and grammar production were achieved.  A third set of 970 sentences complements the bootstrap set so as to constitute a first training set of 1000 sentences on which the process was run both manually and automatically.  The fourth set consists of the remaining 2792 entries. It was kept apart as an extended training set on which the automatic processing/learning could be realized.

[IDIAP] ISSCO

LIA

LITH

30 (bootstrap) transcription

---------Prompt

Polyphone (queries) +

970 working set 2792 validation set 500 test set

Polyphone (10 sentences)

2.3 Raw Data Analysis An a priori analysis of the corpus consisting of textual statistics was done. Among the 4’293 requests, 5 were actually empty (no text: [\inintelligible]) and 766 contained undocumented annotations (e.g. [\prononciation bizarre Montilier]). The vocabulary used contained 1’075 occurences of punctuation signs (among 6 punctuation signs, see Appendix D for further details), useless in speech transcriptions. Once undocumented annotations and punctuation signs were removed, the corpus contained 70’665 word occurences from a vocabulary of 5’877 different word forms. This led to a average length of 16.5 words per request. 6

ISIS Project: Final Report

September 15, 1999

The most frequent N -grams of words at the beginning of the request were also studied and are reported in Appendix D. During this first phase, several errors of format were detected in the PolyPhone data, as reported in Appendix C.

3 Data Annotation 3.1 Syntactic annotations 3.1.1

Resources for annotation: Grammar and Lexicon

Development and Coverage The preliminary ELU grammar and lexicon were developed by ISSCO in order to cover the final corpus. This work was done in three stages:  



Selection of 488 new sentences and extension of the preliminary grammar/lexicon; First evaluation of the grammar with the new corpus (518 sentences composed of the 488 new sentences, plus the 30 sentences used for the development of the preliminary grammar); Second evaluation of the grammar with 130 unseen sentences (without any modification of the grammar).

The results of the two evaluations are comparable, as shown in the two next sections. This seems to show that the grammar is adequate for the domain. First evaluation The first grammar gives a correct analysis for 83% of the sentences considered. Among the sentences that are not parsed, 2.5% are not pragmatically well formed (for example Mais alors là je n’y comprends rien du tout; oui bonjour pouvez-vous s’il vous plaît m’ indiquer l’ adresse exacte Gogniat Etienne poste la Conversion de monsieur; S’il vous plaît est-ce que je pourrais avoir le numéro de telle ou telle personne oh c’est un peu con comme machin). The remaining sentences contain very rare syntactic structures as s’il vous plaît j’ aimerais à Valeyres-Ursins le numéro de Fournier Georges Georges s’il vous plaît. Most of them could easily be added in the grammar, but they would increase the ambiguity without drasticaly improving the coverage. In summary: sentences tested : 518 sentences which receive one correct analysis : 381 (73.5%) sentences which receive one correct analysis after small modifications of the input text (correction of grammatical errors, suppression of repetition etc.) : 41 (7.9%) sentences not analysed with the current grammar : 96 (18.7%)

Second evaluation The results of the second evaluation are summarized below: sentences tested : 130 sentences which receive one correct analysis : 103 (79.23%) sentences not analysed with the current grammar : 27 (20.77%)

7

ISIS Project: Final Report

September 15, 1999

Linguistic description The lexicon contains 1652 words, i.e. 1358 proper nouns (Jean, Gentil, etc.), 7 adjectives (téléphonique, bonne, exacte, etc.), 2 conjunctions (et, ou), 55 verbs (connais, est, manque, habite, etc.), 22 pronouns (que, l’, le, la, etc.), 20 nouns (numéro, téléphone, etc.), 119 numerals (un, deux, vingt-deux, etc.), 14 prepositions (dans, près-de, etc.), 12 interjections (bonjour, allo, etc.), 2 auxiliaries (est, a, etc.), 12 adverbs (plutôt, ici, non, bien, ne, etc.) and 32 codes (a, b, c, d, tiret, apostrophe, etc.). Each word is annotated at two different levels: general morpho-syntactic features (as category, number, person, mode, etc.) that could be found in any dictionary, and other features specific to this description, like semantic feature and sub-categorisation. This distinction is illustrated below: Mohamed * npr ##traits morpho-syntaxiques = sg = 3 ##traits specifiques = prenom

The grammar contains 94 rules. The distribution into categories is given below: TOP: 1 P: 17 ([ici madame Plant]) SV: 15 (qui [habite à Neyruz]) V2: 6 (qui [s’écrit]) SN: 30 ([dix-neuf Tavernier]) NPR2: 5 ([Jean Gentil]) N2: 3 ([Sébastopol ville]) NUM : 8 ([mille deux cent vingt]) Code: 3 ([g e n t i l]) Rel : 1 Interj : 4 ([oui bonjour madame]) SP : 2 ([de partir)] Total : 94

3.1.2

Annotations

The 518 sentences were syntactically annotated with the N EGRA tool, developed at the University of Saarbruecken. This program has two interesting functionalities. First, Negra offers a tool that makes it possible to manually build consistent trees. Secondly, the result of the annotation is saved in a SQL data base, whose content can be exported in ASCII-based format that is both easy to read by humans and easy to parse for machines. This ASCII format is intended for data exchange and for efficient processing with standard Unix tools. 8

ISIS Project: Final Report

September 15, 1999

The methodology used for the construction of the tree-bank exploits these characteristics. For the sentences correctly analyzed with the ELU grammar, a program was written that automatically transforms ELU format into the Negra ASCII format. This program takes as input the ambiguous trees given by the ELU grammar. A linguist can then choose the correct tree among those that are proposed by the ELU grammar and this tree is automatically converted in the Negra ASCII format. Once imported into Negra, the tree can be visualized and modified as necessary. For grammatical sentences that are not analyzed by the actual ELU grammar, the tree is manually build with the Negra tool. Apragmatic sentences have not been annotated. Agrammatical sentences were corrected before annotation. An example of the two formats is given below. Here is the result of the ELU parser (4 interpretations) for the sentence pourriez-vous m’indiquer le numéro de téléphone de Gaillard Martine Clos-du-Four deux Neyruz dans le canton de Fribourg: (TOP (P (V2 ("pourriez")) ("_vous") (SV ("m’") (V2 ("indiquer")) (SN (SN (SN ("le") ("numéro")) (SP ("de") (SN ("téléphone")))) (SP ("de") (SN (SN (SN (SNOMPR ("Gaillard"))) (SN (SNOMPR ("Martine")))) (SN (SN (SN (SNOMPR ("Clos_du_Four"))) ("deux")) (SP ("à") (SN (SN (SNOMPR ("Neyruz"))) (SP ("dans") (SN (SN ("le") ("canton")) (SP ("de") (SN (SNOMPR ("Fribourg"))))))))))))))) ---------------------------------------------------------------------(TOP (P (V2 ("pourriez")) ("_vous") (SV ("m’") (V2 ("indiquer")) (SN (SN (SN (SN ("le") ("numéro")) (SP ("de") (SN ("téléphone")))) (SP ("de") (SN (SN (SN (SNOMPR ("Gaillard"))) (SN (SNOMPR ("Martine")))) (SN (SN (SNOMPR ("Clos_du_Four"))) ("deux"))))) (SP ("à") (SN (SN (SNOMPR ("Neyruz"))) (SP ("dans") (SN (SN ("le") ("canton")) (SP ("de") (SN (SNOMPR ("Fribourg")))))))))))) ---------------------------------------------------------------------(TOP (P (V2 ("pourriez")) ("_vous") (SV ("m’") (V2 ("indiquer")) (SN (SN (SN (SN (SN ("le") ("numéro")) (SP ("de") (SN ("téléphone")))) (SP ("de") (SN (SN (SN (SNOMPR ("Gaillard"))) (SN (SNOMPR ("Martine")))) (SN (SN (SNOMPR ("Clos_du_Four"))) ("deux"))))) (SP ("à") (SN (SNOMPR ("Neyruz"))))) (SP ("dans") (SN (SN ("le") ("canton")) (SP ("de") (SN (SNOMPR ("Fribourg")))))))))) ---------------------------------------------------------------------(TOP (P (V2 ("pourriez")) ("_vous") (SV ("m’") (V2 ("indiquer"))

9

ISIS Project: Final Report

September 15, 1999

(SN (SN (SN (SN ("le") ("numéro")) (SP ("de") (SN ("téléphone")))) (SP ("de") (SN (SN (SN (SNOMPR ("Gaillard"))) (SN (SNOMPR ("Martine")))) (SN (SN (SN (SNOMPR ("Clos_du_Four"))) ("deux")) (SP ("à") (SN (SNOMPR ("Neyruz")))))))) (SP ("dans") (SN (SN ("le") ("canton")) (SP ("de") (SN (SNOMPR ("Fribourg"))))))))))

We now give the Negra format for the correct analysis of the same sentence. The first column indicates the different nodes of the tree, the second their syntactic category, and the last specifies the mother node. For example, pourriez is a verb and its mother node is called 500 and is defined as a V2. pourriez _vous m’ indiquer le numéro de téléphone de Gaillard Martine Clos_du_Four deux à Neyruz dans le canton de Fribourg #500 #501 #502 #503 #504 #505 #506 #507 #508 #509 #510 #511 #512 #513 #514 #515 #516 #517 #518 #519 #520 #521 #522 #523 #524 #525 #526 #527 #528 #529 #530

V PRON PRON V DET N PREP N PREP NPR NPR NPR NUM PREP NPR PREP DET N PREP NPR V2 V2 SN SN SNOMPR SNOMPR SNOMPR SNOMPR SN SNOMPR SP SN SN SN SN SN SN SN SN SP SN SP SN SP SN SN SP SN SV P TOP

----------------------------------------------------

----------------------------------------------------

500 529 528 501 502 502 510 503 526 504 505 506 518 523 507 521 508 508 519 509 529 528 516 510 511 512 513 514 520 515 516 517 517 518 522 519 527 525 524 520 521 522 523 524 525 526 527 528 529 530 0

10

ISIS Project: Final Report

September 15, 1999

3.2 Semantic annotations The goal is here to build a valid query to an information system, using limited world knowledge of the domain in question. Although such a task may, in its simplest form, be performed quite effectively using heuristic methods (e.g. keyword spotting), such a baseline approach is brittle, and does not scale up easily in the case of real dialogues. We therefore chose to provide a frame containing more information than in the original (simple) form so as to allow better processing to form the query. The full frame description is given below1: [Caller] Title: Name: Locality: Target_Identification Name (default: Person) *Person Family name: [Title]: [First name]: [Second name]: [Occupation] Description: [Class]: {yellow pages categories} *Company Name: [Description]: [Category]: {yellow pages categories} [Owner]: [Contact person]: {repres., direction, secretariat, ...} Target_Address [Appart n.]: [Street n.]: [Building]: [Street name]: [Village]: [NPA]: Loc_type: Locality (at least one of the sub-fields) City: “Environs”: Region: Canton: Telephone prefix: Request type Phone type: (default: standard) {standard, privé, fax, natel} Request status: (default: ok) {ok, ill-formed, missing-information, ...}

The result of the semantic annontation of the 30 bootstrap sentences is given in Appendix F.

4 Syntactic analysis The aim of this workpackage was to provide syntactic analyses of the speech hypotheses produced by the speech recognizer. This task concretely consists of parsing the lattices coming out of the speech recognizer with a context-free grammar (CFG) derived from ISSCO’s ELU grammar, as 1

Bracketted slots are optional and starred slots are disjunctive.

11

ISIS Project: Final Report

September 15, 1999

summarized by the following figure:

Speech Recognizer Human Listener

IDIAP

Lexicon1

transcription transcription ...

Human Analysis

Grammar CFG

Syntactic Parser

4

3

2

leroy

1

le la

roi

dort dos

bien

1

2

3

4

LIA

CYK table

ISSCO

Lexicon2 Grammar ELU

Trees

The workpackage consists of 2 main tasks: 1. translating ISCCO’s grammar into a CFG and a lexicon; 2. parsing IDIAP’s lattices.

4.1 Resource production from ISSCO’s work This task consists of: 1. Translation of ISSCO’s ELU grammar into a Context-Free Grammar; 2. Lexicon extraction and tag set generation. 4.1.1

Context-Free Grammar

The type-2 grammar developed by ISSCO (ELU format) was translated into a (non-probabilistic) context-free grammar (CFG) necessary for parsing lattices. The original ISSCO grammar had 94 rules. We implemented an optimised CFG translation leading to 28’253 context-free rules (naive translation would have given over 700’000 rules) among which 21’604 (i.e. more than 76%) unary rules involving new non-terminals (NewNTxx) artificially introduced so as to reduce the combinatorial complexity. The obtained CFG was tested to have no cycles. The number of non-terminals is 6’474 (+ 174 tags) where the CF backbone of the ELU grammair has 25 main categories. 4.1.2

Example of an ELU rule translation:

Num -> Hn Num2 = num = n = num = numéro = non =

12

ISIS Project: Final Report

September 15, 1999

is translated into: num.sem=lettre.conjoined=non num.sem=lettre.conjoined=non num.sem=lettre.conjoined=oui num.sem=lettre.conjoined=oui

4.1.3

-> -> -> ->

"numéro" "numéro" "numéro" "numéro"

num.sem=lettre.conjoined=non num.sem=lettre.conjoined=oui num.sem=lettre.conjoined=non num.sem=lettre.conjoined=oui

Lexicon

In order to get a lexicon fully equivalent to ISSCO’s lexicon, we had to generate tags for attributevalue pairs that were not explicitly expressed in the ELU lexicon (since unification grammars expand values of attributes dynamically during unification only when needed). We apply for this task the same mechanism as for translating the grammar since a lexicon can also be viewed as (and actually is) a set of grammar rules. The generated lexicon had 6’962 words and 174 tags. A better description of lexical items at the ELU level would significantly reduce the number of ambiguities at this level. Here is an example: Acacias Acacias Acacias Acacias

npr.genre=fem .nombre=sg.sem=loc_name_rue.pers=3 npr.genre=masc.nombre=sg.sem=loc_name_rue.pers=3 npr.genre=masc.nombre=pl.sem=loc_name_rue.pers=3 npr.genre=fem. nombre=pl.sem=loc_name_rue.pers=3

4.2 Lattice Parsing 4.2.1

Lattice Quality

The aim of this workpackage was to apply syntactic filtering directly on the lattice coming out of the speech recognizer rather than iteratively on each of the hypothesis. The very first problem we met with the lattices obtained for the 30 bootstrap sentences was their very low quality (due to the lack of both a uniformized lexicon and a good language model): none of them contained the correct sentence (and for 29 of them, words of the reference sentence were even not in the lattice (see Appendix E for futher details.) For this reason we decided to repeat the experiment with another set of data: speech lattices produced for 10 sentences of another part of PolyPhone (resulting from other work [5]) for which better acourstic and language models we available oand for which we were sure that these lattices contained at least all the words of the correct sentence. 4.2.2

Experimental Results

We wanted to experimentally address two questions: does the sequential coupling significantly improve the recognition? And is it feasible in a reasonable time? The first preliminary experiments we made on sequential coupling with lattice parsing were, for both aspects, promising. They were made on a set of ten spoken utterances with lots of ambigui-

13

ISIS Project: Final Report

sentence (a) (b) sentence (a) (b)

September 15, 1999

1

> 5  1010

2

3

521 243

> 5  1010

862 334 592

> 5  1010

6

7

> 5  1010

> 5  1010

290 658

5 873

4

5 3 261 227 208

4 594 041

> 5  1010

3 326 421 643

984

8

9

10

> 5  1010 92 322

> 5  1010 26 740

6 009 538 238 273

Table 1: Ambiguities for the ten considered sentences. (a) number of word sequences in the word lattice produced by the speech recognizer ; (b) number of parses in the lattice. SR P1 P2 P3

219 209 81

average total coupling time (a+b+c) (a) (b) (c) 0.72 0.08 0.54 0.1 0.85 0.08 0.67 0.1 0.81 0.08 0.63 0.1

Table 2: Coupling time vs. Recognition time for different parameter settings (P1, P2 and P3). SR represents the average speech recognition time using the S TRUT recognizer [19] and the N OWAY decoder [17]. For the parser: (a) speech lattice to parser chart conversion, (b) actual parsing time, (c) parser input and output. Times are given in seconds of CPU time on a S PARC U LTRA 1.

ties2 as illustrated in table 1 [5]. Experiments were carried out for several different parameter settings of the speech recognizer3 . On the average over all the experiments, in 35% of the cases the coupling with a SCFG strictly improved the results and in 67% of the cases it did at least as well as without SCFG. Furthermore, when restricted to the two parameter sets for which the speech-recognizer produced its best results, the former results improved to 50% and 80% respectively. It is worth emphasizing that, using the computationally efficient parser we developed [11], the overhead in time due to the adjunction of a SCFG filter is negligible with respect to the speech recognition time as shown in table 2 and figure below. 2 3

and high recognition word-error rate Several values for the "acoustic factor" were tested.

14

ISIS Project: Final Report

September 15, 1999

Coupling timings

Phrase10

Phrase9

Phrase8

Phrase7

Phrase6

Translating Analyzing Other operations

Phrase5

Phrase4

Phrase3

Phrase2

Phrase1 00:00.00

00:00.09

00:00.17

00:00.26

00:00.35

00:00.43

00:00.52

00:00.60

00:00.69

00:00.78

Time

These preliminary results are encouraging, demonstrating that sequential coupling with lattice parsing both improves the recognition and can be achieved with realistic computation times.

4.3 Syntactic forests The syntactic filter developed at LIA uses the stochastic context-free grammar (CFG) obtained as explained above to produce syntactic interpretations for the semantic module. The output format consists of a bracketed version of the syntactic tree. Here is an example: Sentence: b00/f0062o06 Transcription: Bonjour est-ce que vous pourriez me donner le numéro de téléphone de madame Blaser Anne-Lise qui habite à la Sapelle Villars-Tiercelin merci

Prompt : prompt:1 adr1:BLASER ANNE-LISE adr2:LA SAPELLE adr3:VILLARS-TIERCELIN

Example of raw output: [ top [ p.sem=requ_compl.conjoined=oui.politesse=non.type=sn [ NewNT3 [ p.sem=debut.conjoined=oui .politesse=oui.type=sn [ NewNT37.sem=debut [ :4 bonjour ] ] ] ] [ NewNT4 [ p.sem=requ_merci .conjoined=oui.politesse=oui.type=sn [ NewNT13 [ p.sem=r_principal.conjoined=oui .politesse=oui.type=estceque [ :26 est-ce que ] [ p.sem=r_principal.conjoined=non.politesse=non .type=decla [ :15 vous ] [ NewNT31.nombre=pl.sem=r_principal.pers=2 [ sv.nombre=pl .sem=r_principal.mode=indi.pers=2.souscat=transsv [ v2.nombre=pl.mode=indi.pers=2 .souscat=transsv [ :109 pourriez ] ] [ NewNT65 [ sv.nombre=pl.sem=r_principal.mode=infi .pers=1.souscat=bitrans [ NewNT53 [ :18 me ] ] [ v2.nombre=pl.mode=infi.pers=1.souscat=bitrans [ :47 donner ] ] [ NewNT54 [ sn.genre=masc.nombre=sg.sem=abstrait.pers=3.conjoined=non.num=oui [ sn.genre=masc.nombre=sg.sem=abstrait.pers=3.conjoined=non.num=oui [ sn.genre=masc.nombre=sg .sem=abstrait.pers=3.conjoined=non.num=oui [ NewNT120.genre=masc.nombre=sg [ :114 le ] ] [ :122 numéro ] ] [ sp.sem=artefact [ :27 de ] [ sn.genre=masc.nombre=sg.sem=artefact.pers=3 .conjoined=non.num=oui [ :136 téléphone ] ] ] ] [ sp.sem=nom_com [ :27 de ] [ sn.genre=fem .nombre=sg.sem=nom_com.pers=3.conjoined=non.num=oui [ sn.genre=fem.nombre=sg.sem=nom_com.pers=3

15

ISIS Project: Final Report

September 15, 1999

.conjoined=non.num=oui [ NewNT85.genre=fem.nombre=sg.pers=3 [ sn.genre=fem.nombre=sg.sem=titre .pers=3.conjoined=non.num=oui [ :146 madame ] ] ] [ NewNT86 [ sn.genre=fem.nombre=sg.sem=nom_com2 .pers=3.conjoined=non.num=oui [ NewNT93.genre=fem.nombre=sg.pers=3 [ sn.genre=fem.nombre=sg .sem=nom.pers=3.conjoined=non.num=oui [ snpr.genre=fem.nombre=sg.sem=nom.pers=3.conjoined=non [ :30 Blaser ] ] ] ] [ NewNT94 [ sn.genre=fem.nombre=sg.sem=prenom.pers=3.conjoined=non.num=oui [ snpr.genre=fem.nombre=sg.sem=prenom.pers=3.conjoined=non [ :28 Anne-Lise ] ] ] ] ] ] ] [ rel.souscat=intrans_loc [ NewNT134 [ :11 qui ] ] [ NewNT135.souscat=intrans_loc [ sv.nombre=sg .sem=r_principal.mode=indi.pers=1.souscat=intrans_loc [ v2.nombre=sg.mode=indi.pers=1 .souscat=intrans_loc [ :64 habite ] ] [ sp.sem=loc [ :27 à ] [ sn.genre=fem.nombre=sg.sem=loc .pers=3.conjoined=oui.num=oui [ NewNT107.genre=fem.nombre=sg.pers=3 [ sn.genre=fem.nombre=sg .sem=loc_rue.pers=3.conjoined=non.num=oui [ :116 la ] [ snpr.genre=fem.nombre=sg.sem=loc_rue .pers=3.conjoined=non [ :45 Sapelle ] ] ] ] [ NewNT108 [ sn.genre=fem.nombre=pl.sem=loc.pers=3 .conjoined=non.num=oui [ snpr.genre=fem.nombre=pl.sem=loc.pers=3.conjoined=non [ :36 Villars-Tiercelin ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] [ NewNT14 [ p.sem=fin.conjoined=oui.politesse=oui.type=sn [ NewNT37.sem=fin [ :6 merci ] ] ] ] ] ] ] ]

which is, in a more human readable form (once the "technical non-terminals" (NewNTxx) have been removed):

[ top [ p.sem=requ_compl.conjoined=oui.politesse=non.type=sn [ p.sem=debut.conjoined=oui.politesse=oui.type=sn [ :4 bonjour ] ] [ p.sem=requ_merci.conjoined=oui.politesse=oui.type=sn [ p.sem=r_principal.conjoined=oui.politesse=oui.type=estceque [ :26 est-ce que ] [ p.sem=r_principal.conjoined=non.politesse=non.type=decla [ :15 vous ] [ sv.nombre=pl.sem=r_principal.mode=indi.pers=2.souscat=transsv [ v2.nombre=pl.mode=indi.pers=2.souscat=trans [ :109 pourriez ] ] [ sv.nombre=pl.sem=r_principal.mode=infi.pers=1.souscat=bitrans [ :18 me ] [ v2.nombre=pl.mode=infi.pers=1.souscat=bitrans [ :47 donner ] ] [ sn.genre=masc.nombre=sg.sem=abstrait.pers=3.conjoined=non.num=oui [ sn.genre=masc.nombre=sg.sem=abstrait.pers=3.conjoined=non.num=oui [ sn.genre=masc.nombre=sg.sem=abstrait.pers=3.conjoined=non.num=oui [ :114 le ] [ :122 numéro ] ] [ sp.sem=artefact [ :27 de ] [ sn.genre=masc.nombre=sg.sem=artefact.pers=3.conjoined=non.num=oui [ :136 téléphone ] ] ] ] [ sp.sem=nom_com [ :27 de ] [ sn.genre=fem.nombre=sg.sem=nom_com.pers=3.conjoined=non.num=oui [ sn.genre=fem.nombre=sg.sem=nom_com.pers=3.conjoined=non.num=oui [ sn.genre=fem.nombre=sg.sem=titre.pers=3.conjoined=non.num=oui [ :146 madame ] ] [ sn.genre=fem.nombre=sg.sem=nom_com2.pers=3.conjoined=non.num=oui [ sn.genre=fem.nombre=sg.sem=nom.pers=3.conjoined=non.num=oui [ snpr.genre=fem.nombre=sg.sem=nom.pers=3.conjoined=non [ :30 Blaser ] ] ] [ sn.genre=fem.nombre=sg.sem=prenom.pers=3.conjoined=non.num=oui [ snpr.genre=fem.nombre=sg.sem=prenom.pers=3.conjoined=non [ :28 Anne-Lise ] ] ] ] ] [ rel.souscat=intrans_loc [ :11 qui ] [ sv.nombre=sg.sem=r_principal.mode=indi.pers=1.souscat=intrans_loc [ v2.nombre=sg.mode=indi.pers=1.souscat=intrans_loc [ :64 habite ] ] [ sp.sem=loc [ :27 à ] [ sn.genre=fem.nombre=sg.sem=loc.pers=3.conjoined=oui.num=oui [ sn.genre=fem.nombre=sg.sem=loc_rue.pers=3.conjoined=non.n [ :116 la ] [ snpr.genre=fem.nombre=sg.sem=loc_rue.pers=3.conjoined [ :45 Sapelle ] ] ] [ sn.genre=fem.nombre=pl.sem=loc.pers=3.conjoined=non.num=o [ snpr.genre=fem.nombre=pl.sem=loc.pers=3.conjoined=non

16

ISIS Project: Final Report

September 15, 1999

[ :36 Villars-Tiercelin ] ] ] ] ] ] ] ] ] ] ] ] ] ] [ p.sem=fin.conjoined=oui.politesse=oui.type=sn [ :6 merci ] ] ] ] ]

5 Robust Analysis and Frame Filling 5.1 Computational logic for robust analysis What has been considered to be an advantage using logic-based programming languages is the symbol processing capability and the way of abstracting from the actual implementation of needed data structures. Definite Clause Grammars come to mind when relating Logic Programming and Natural Language Processing. This is of course one of the best couplings between Computational Linguistics and Logic to support both (i) the development of linguistic models of Natural Language (Computational Linguistics) and (ii) the design of real life applications (Language Engineering). The main drawback to this approach is efficiency, but it is not the only one. In recent years several efforts have been done to improve efficiency of logic and functional programming languages by means of powerful abstract machines and optimized compilers. Sometimes, efficiency recovery leads to introduction of non-logical features in the language and the programmer should be aware of it in order to exploit it in the development of his or her applications (i.e. cut in logic programming). An important question to ask is: “how can computational logic contribute to robust discourse analysis ?”. A partial answer to this question is that currently logic-based programming languages are able to integrate in an unifying framework all or most of the techniques necessary for robust text analysis. Furthermore this can be done in a rigorous “mathematical” fashion. In this sense robustness is related to correctness and provability with respect to the specifications. A NLP system developed within a logical framework has a predictable behavior which is useful in order to check the validity of the underlying theories.

5.2 Implementation of the semantic module In our approach we tried to integrate the above principles in our system in order to effectively compute hypotheses for the frame filling task. This can be done by building a lattice of frame filling hypotheses and possibly selecting the best one. Hypotheses are typically sequences of proper names. The lattice of hypotheses is generated by means of LHIP discourse grammar4. This type of grammar is used to extract names chunks and assemble them into the hypothesized frame structure. 5.2.1

Tree-paths representation

Parse trees obtained from the previous module are encoded into a path representation which allows us to easily specify constraints over the tree structure. A path-sentence is a list of path-words which in turn are compound terms of the type terminal(word, path) where word is a constant term and path is a list of arc identifiers that is compound terms ’cat’(#number_of_nodes, #node, #identifier) uniquely identifying an arc in the parse tree. The functor ’cat’ is a category name and its arguments are integer positive numbers. For instance the representation of the parse tree: 4

LHIP stands for Left-corner Head-driven Island Parser and is explained further in Appendix H

17

ISIS Project: Final Report

September 15, 1999 P

ADV ici

SN

SN

SN

N madame

SNOMPR

NPR Plant

is given by: [terminal(ici,[’ADV’(1,1,14),’P’(2,1,12),’P’(2,1,11)), terminal(madame,[’N’(1,1,19),’SN’(1,1,17),’SN’(2,1,16),’P’(2,2,15),’P’(2,1,11)), terminal(’Plant’,[’NPR’(1,1,24),’SNOMPR’(1,1,22),’SN’(1,1,21),’SN’(2,2,20),’P’(2,2,15),’P’(2,1,11)].

Using this representation it is possible to define a grouping operator (e.g. group/2) which given a sequence of adjacent names finds the subsequence of words having the least common ancestor which is closer than the least common ancestor (e.g. lca/2) of the given sequence. These two operators are very useful for imposing structural knowledge constraints and they are straightforwardly defined as PROLOG programs by: lca([terminal(_,W)],W). lca([terminal(_,W)|R],P) :lca(R,P1), prefix_path(P1,P), prefix_path(W,P),!. group([],[]). group(L,X) :lca(L,P), proper_sublist(L,X), length(X,N), N>1, lca(X,P1), proper_sublist(P1,P). prefix_path(A,A). prefix_path([_|B],C) :prefix_path(B,C).

5.2.2

Discourse markers

Discourse segments allow us to model dialog by a set of pragmatic concepts (dialogue acts) representing what the user is expected to utter (for example initiation of a dialogue: init, expression of gratitude: thank, and demand for information: request, etc.) and in that way are useful for reducing the syntactic and semantic ambiguity. These are domain-dependent and must be defined for a given corpus. For their definition, we followed the experiments done in the context of Verbmobil (see for example [13, 14]). In our specific case identifying special words serving both as separators among logical subparts of the same sentence and as introducers of semantic constituents allows us to search for name sequences to fill a particular slot only in interesting part of the sentence. One of the most important separator is the announcement-query separator. The LHIP clauses defining this separator can be one or more words covering rules such as:

18

ISIS Project: Final Report

September 15, 1999

ann_query_separator #1.0 ˜˜> @terminal(’téléphone’,_). ann_query_separator #1.0 ˜˜> ( @terminal(’numéro’,_): @terminal(’de’,_): (? @terminal(’téléphone’,_) ?)).

As an example of semantic constituents introducers we propose here the street_intro([T,Prep,Det],1) #1.0 ˜˜> * street_type(T), preposition(Prep), determiner(Det).

which make use of some word knowledge about street types coming from an external thesaurus like: street_type(terminal(X,P)) ˜˜> @terminal(X,P), {thesaurus(street,W),member(X,W)}.

5.2.3

Generation of hypotheses

The generation of hypotheses for filling the frame is performed by composing weighted rules, assembling chunks and filtering possible hypotheses. The main assumption on which a probabilistic approach to NLP is based, is that language is considered as being a random phenomenon with its own probability distribution function: coverage is often translated as expectation in a probabilistic sense. Changing perspective and considering language just as an uncertain and imprecise phenomenon and understanding as a perception process, it is natural to think of fuzzy models of language (see [15] and [6]). Recently, fuzzy reasoning has been partially integrated into a CLP paradigm (see [18]) in order to deal with so called soft constraints in weighted constraint logic grammars. We tried to get some inspiration from the above proposal for integrating fuzzy logic and parsing to compute weights to assign to each frame filling hypotheses. Each LHIP rule returns a confidence factor together with the sequence of names. The confidence factor for a rule can be either assigned statically (e.g. to pre-terminal rules) or they can be computed composing recursively the confidence factors of sub-constituents. Confidence factors are combined choosing the minimum among confidences of each sub-constituents. It is possible that there is no enough information for filling a slot. In this case the grammar should provide a mean to provide an empty constituent when all possible hypothesis rules have failed. This is possible using negation and epsilon-rules in LHIP as showed in the following rules for dealing with street names. found_street_name(L,Conf) #1.0 ˜˜> * street_intro(Intro,Conf), name_list(X), {append(Intro,X,L)}. found_street_name(X,0.3) ˜˜> * name_list(X). hyp_street_name(Street,Conf) ˜˜>

19

ISIS Project: Final Report

September 15, 1999

* found_street_name(Street,Conf). hyp_street_name([],1) ˜˜> ˜found_street_name(_,_), lhip_true.

where name_list(X) accounts for a sequence of adjacent proper names and lhip_true corresponds to the empty sequence. Observe that in this particular case there is no need to select the minimum confidence factor from the sub-constituents of the rule found_street_name since we have only street_intro(Intro,Conf) which propagates its confidence factor. The highest level constituent is represented by the whole frame structure which simply specifies the possible orders of chunks relative to slot hypotheses. A rule for a possible frame hypothesis is: frame(Caller_title, Caller_name, Target_title, Target_name, Street_name, Street_number, Locality, Weight) ˜˜> hyp_caller(Caller_title,Caller_name,C1), * ann_query_separator, hyp_target(Target_title,Target_name,C2), * location_intro, hyp_street_name(Street_name,C3), hyp_street_number(Street_number,C4)); hyp_locality_name(Locality,C5), {minlist([C1,C2,C3,C4,C5],Weight)}.

In this rule we specify a possible order of chunks interleaved by separators and introducers. The computation of global weight may be more complex than the above rule which uses simply the minimum of each hypothesis confidence values. In this case we did not provide any structural constraint (e.g. preferring names chunks belonging to the minimal common sub-tree or those having the longest sequence of name belonging to the same sub-tree). The obtained frame hypotheses can be further filtered by both using structural knowledge (e.g. constraints over the tree-path representation) and word knowledge. In order to combine the information extracted from the previous analysis step into the final query representation which can be directly mapped into the database query language we will make use of a frame structure in which slots represent information units or attributes in the database. A simple notion of context can be useful to fill by default those slots for which we have no explicit information. For doing this type of hierarchical reasoning we exploit the meta-programming capabilities of logic programming and we used a meta-interpreter which allows multiple inheritance among logical theories [9]. More precisely we made use of the special retraction operator “” for composing logic programs which allows us to easily model the concept of inheritance in hierarchical reasoning. The expression P  Q, where P and Q are meta-variables used to denote arbitrary logic programs, means that the resulting logic programs contains all the definition of P except those that are also defined in Q. The definition of the isa operator is obtained combining the retraction operator with the union operator (e.g. [) that simply make the physical union of two logic programs, by

P isa Q = P

[

Q P ):

(



20

ISIS Project: Final Report

September 15, 1999

As an example for the above definition we provide some default definitions which have been used to represent part of the world knowledge in our domain. The rules theory contains rules for inferring the locality or the locality type when they are not explicitly mentioned in the query. rules: locality(City) :caller_prefix(X), prefix(X,City). loc_type(Type) :locality(City), gis(City,Type). where prefix/2 and gis/2 are world knowledge bases (i.e. a collection of facts grouped in a theory called kb) and caller_prefix/1 can be easily provided from the answer system. If some information is missing then the system tries to provide some default additional information to complete the query. The following theory contains definition for some mandatory slots which need to be filled in case of incomplete queries, like for instance in the theory query_defaults: query_defaults: identification(person). phone_type(standard). loc_type(city). Finally starting from an incomplete query which does not account for the required information we can use deduction to generate the query completion like for instance asking for:

?- demo((query isa query_default)

[

rules

[

kb), loc_type(X)).

6 Conclusion The first main conclusion is that the chosen architecture which has been fully tested on the 20 bootstrap sentences is viable. We are, however, conscious of the fact that a lot of work remains to be done to demonstrate its operationality on real size data; among which quantitative evaluation of the impact on the performances. Several limitations encountered during the ISIS project will, however, need to be addressed:  unification of the resources to be used by the different modules;  improvement of models (language models, grammars, lexica, ...);  robustness handling through a real dialog management. Let’s detail these points a bit.

6.1 Resource unification A keypoint problem underlined by the ISIS project was the resource unification. 21

ISIS Project: Final Report

September 15, 1999

For instance, when considering annotation only comes the question of the definition of a “word”. Lexica at the speech recognition level may differ from lexica at syntactic analysis level which may also differ from lexica at semantic level. For a fully integrated protytype all of these lexica must be the same or at least some transformations must be defined from resources at one level to resources at the other. For example, consider the request: bonjour est ce que vous pourriez me donner le numéro de téléphone de madame bien anne lise qui habite à la sappelle villars tiercelin merci

1. Is "est ce que" one or three words? Should it be written "est ce que" (certainly better at the speech level) or "est-ce que" (certainly better at syntactic level)? 2. How sould the first name Anne-Lise be considered: anne lise, Anne Lise or Anne-Lise ? 3. the same for Villars-Tiercelin. The resource unification problem that arose during the ISIS project at least showed that in a followup, such a project needs a precise definition of a reference lexicon common to all modules (which certainly means, due to the speech recognition module, to take "atomic" definition for words (est ce que, anne lise, villars tiercelin)).

6.2 How to build good language models for PolyPhone? Despite the question of ressource that has already been addressed in the previous section, one can wonder how could good language models be built for the PolyPhone Corpus. Indeed, such models cannot be learned from the transcriptions currently made available since these transcriptions contained: 1. numerous transcription errors (cf appendix C.3) ; 2. specific undocumented annotations ([\... ]) ; 3. non uniform transcription: accents, capitalization and punctuation are not uniform at all but really depend on the transcripted sentence. A great deal of work is still required to produce transcriptions for the PolyPhone database at a quality level that allows them to be used to train a proper language model.

6.3 Robustness through Dialog From a very superficial observation of the human language understanding process, it appears clear that no deep competence of the underlying structure of the spoken language is required in order to be able to acceptably process distorted utterances. On the other hand, the more experienced is the speaker, the more probable is a successful understanding of that distorted input. How can this kind of fault-tolerant behavior be reproduced in an artificial system by means of computational techniques? Several answers have been proposed to this question and many systems implemented so far, but no one of them is capable of dealing with robustness as a whole, which should be the challenge for future systems [2, 8]. Robustness in dialogue is crucial when the artificial system takes part in the interaction since inability or low performance in processing utterances will cause unacceptable degradation of the overall system. As pointed out in [3] it is better to have a dialogue system that tries to guess a specific interpretation in case of ambiguity rather than ask the user for a clarification. If this first 22

ISIS Project: Final Report

September 15, 1999

commitment later proves incorrect a robust behavior will be able to interpret subsequent corrections as repair procedures to be issued in order to get the intended interpretation.

23

ISIS Project: Final Report

September 15, 1999

APPENDICES

24

ISIS Project: Final Report

September 15, 1999

A Swiss French PolyPhone database The data considered in the ISIS project consist in a subset of the original Swiss French PolyPhone database (version 1.0a) restricted to the items related to the 111 service calls (“rubrique 38” of the calling sheet [12]). This database contains 4293 recordings, each of which consisting of 2 files, one ASCII file (.txt) corresponding to the initial prompt and address request and a data file (.alw) containing the sound data itself in a-law format along with a header in NIST format containing further informations (among which the transcription of the speaker request) [1, 12]. Here is an example of the ASCII file: Veuillez maintenant faire comme si vous étiez en ligne avec le 111 pour demander le no de téléphone de la personne imaginaire dont les coordonnées se trouvent ci-dessous BUCHWALDER-CUENAT PASCALE MOULIN D’ALLERES SEMBRANCHER

and its corresponding NIST header: NIST_1A 1024 database_id -s22 Swiss_French_Polyphone recording_site -s17 Swiss_Telecom_PTT sheet_id -i 13952 prompt -s168 Veuillez maintenant faire comme si vous étiez en ligne avec le 111 pour demander le no de téléphone de la personne imaginaire dont les coordonnées se trouvent en face. text_transcription -s114 Bonjour mademoiselle, j’aimerais le numéro de téléphone de Buchwalder-Cuenat Pascale Moulin d’Allères Sembrancher speaking_mode -s11 spontaneous sample_begin -r 0.200000 sample_end -r 10.151875 sample_count -i 82815 sample_n_bytes -i 1 channel_count -i 1 sample_coding -s4 alaw sample_rate -i 8000 sample_byte_format -s1 1 sample_sig_bits -i 8 sample_checksum -i 62637 database_version -s3 1.0 utterance_id -s8 m3212o06 end_head

The database is organised in blocks of less than 100 recordings. block00 to block24 contain 2’407 female speaker recordings and block30 to block49 1’886 male speaker recordings. Notice that not all the possible recording numbers are presents. The missing numbers are reported in appendix B. Files are named according to their block number and number of the subdirectory in the block, as well as the sex of the speaker. For instance f1234o06.txt concerns a female speaker stored in block12 subdirectory 34. The part ’o06’ is of no meaning within the current project. It may be, 25

ISIS Project: Final Report

September 15, 1999

but this is not completely clear yet, that the block + directory number correspond to the speaker number as referred in [4] and may then be used to retrieve all the information about the user5 . The link between the recording and the calling sheet is given in the NIST header of the .alw file in the line: sheet_id -i 13952

5

but where

26

ISIS Project: Final Report

September 15, 1999

B Missing record numbers in the French PolyPhone 1.0a CD-ROM 0016 0118 0221 0356 0416 0505 0603 0732 0816 0905 1014 1118 1286 1327 1429 1515 1630 1711 1800 1914 2006 2143 2215 2304 2442 3012 3117 3223 3304 3427 3501 3603 3701 3807 3951 4048 4124 4226 4332 4400 4517 4603 4709 4851 4908

0017 0276 0294 0369 0371 0372 0373 0374 0375 0467 0529 0559 0626 0841 0907 1094 1127 1291 1394 1449 1526 1656 1751 1827 1940 2055 2158 2260 2315 2494 3050 3156 3245 3308 3441 3503 3621 3707 3883 3967 4067 4129 4250 4361 4427 4537 4645 4728 4858 4965

0878 0879 0945 0994 1174 1175 1397 1477 1571 1657 1774 1828 1962 2063 2183 2275 2333

3176 3267 3314 3449 3506 3655 3711

1579 1660 1781 1844 1992 2067 2198 2296 2360

1585 1661 1788 1875 1993 2075

1590 1673 1789 1795

2084

2385

3186 3191 3194 3343 3450 3514 3660 3724

3344 3458 3520 3676 3735

3359 3381 3387 3463 3464 3467 3481 3488 3497 3521 3525 3531 3540 3546 3558 3564 3566 3576 3578 3580 3583 3589 3752 3774 3782

3997 4077 4136 4151 4152 4173 4181 4185

4437 4551 4653 4736

4439 4443 4448 4477 4479 4484 4496 4555 4558 4562 4594 4662 4739 4758 4762 4794

4976

27

ISIS Project: Final Report

September 15, 1999

C Errors encountered in the French PolyPhone database C.1

wrong prompts

prompt -s10 Non defini [1 occurrence] -> supprime prompt -s111 Indiquez le niveau final de votre formation scolaire (école primaire, école professionnelle ou école supérieure [1 occurrence] -> supprime prompt -s118 Simulez une demande de renseignement au 111 concernant la personne correspondant aux coordonnées fictives ci-dessous : [1 occurrence] -> rajoute espace en fin de ligne prompt -s14 Francey Martin [1 occurrence] -> supprime prompt -s15 Gschwind Michel [1 occurrence] -> supprime prompt -s168 Veuillez [...] se trouvent en face . [4137 occurrences] -> remplace par : Veuillez [...] se trouvent ci-dessous: prompt -s180 Indiquez le type de téléphone que vous utilisez en ce moment précis (standard Tritel, téléphone sans fil, Natel C, Natel D, un appareil importé, ou une cabine téléphonique publique [6 occurrences] -> supprime prompt -s52 Simulez une demande aux renseignements téléphoniques [139 occurrences] -> remplace par : Simulez une demande de renseignement au 111 concernant la personne correspondant aux coordonnées fictives ci-dessous: prompt -s79 Simulez une demande aux renseignements téléphoniques concernant cette personne. [1 occurrence] -> remplace par : Simulez une demande de renseignement au 111 concernant la personne correspondant aux coordonnées fictives ci-dessous: prompt -s82 Veuillez maintenant faire comme si vous étiez en ligne avec le 111...pour demander [1 occurrence] -> remplace par : Simulez une demande de renseignement au 111 concernant la personne correspondant aux coordonnées fictives ci-dessous:

C.2

miscellaneous errors

Entries similar to Prononcez une demande ,au 111, du numéro de téléphone de TORNAY MAURICE habitant LE MARTINET à GIMEL. (votre réponse)

were transformed into Prononcez une demande ,au 111, du numéro de téléphone de:

28

ISIS Project: Final Report

September 15, 1999

TORNAY MAURICE LE MARTINET GIMEL

Other errors: id:cd1/b00/f0052o06:sid14282 text[137]: Non défini oui bonjour j’aimerais le numéro de téléphone de madame Megroz Colette Bureau de poste à [\hésitation Misery] s’il vous plaît -> text[137]: Oui bonjour j’aimerais le numéro de téléphone de madame Megroz Colette Bureau de poste à [\hésitation Misery] s’il vous plaît id:cd1/b01/f0155o06:sid13339 text[62]: Non défini Bircher-Schmid Elisabeth Vie-du-Haut trente à Buix -> text[62]: Bircher-Schmid Elisabeth Vie-du-Haut trente à Buix

C.3

Transcription errors

On the 30 ’bootstrap’ sentences, 9 of them had transcription errors! (30 %): id:cd1/b00/f0024o06:sid14067 Bonjour j’aurais aimé connaître le numéro de téléphone de monsieur Oyvaert Steve je vous l’épelle o y v a e r t Stev e habitant le Grand Clos je ne connais pas le numéro postal Pailly p a i l l y merci p a i l l y -> p a i deux l y ---------------------------------------------------------------------id:cd1/b00/f0055o06:sid13400 Oui bonjour pouvez-vous me donner [\hésitation le] les coordonnées de monsieur [\prononciation bizarre Hofer] Franci s chemin de la Carrière Saint-Cergue merci au revoir. 1) [\prononciation bizarre Hofer] -> Hoffman 2) Franci s -> Francis 3) annotations non documentes ([\... ]) ---------------------------------------------------------------------id:cd1/b00/f0056o06:sid14212 Oui bonjour j’aimerais savoir [\hésitation euh] le numéro de téléphone de Raphaël Glassey qui vit [\hésitation euh] à le Châble c’est en Valais annotations non documentes ([\... ]) ---------------------------------------------------------------------id:cd1/b00/f0060o06:sid14187 Oui est-ce-que vous pouvez m’indiquer le numéro de téléphone de monsieur Simonin Gérard à Collex pouvez -> pourrez

29

ISIS Project: Final Report

September 15, 1999

---------------------------------------------------------------------id:cd1/b00/f0094o06:sid13244 Auriez-vous la gentillesse de me communiquer le numéro de téléphone de monsieur Crettaz-[\prononciation bizarre Void e] Marie-Laurentin les Charbonnières 1) la pronociation n’a rien de bizarre (Voïdé) 2) il manque "s’il vous plait" à la fin ---------------------------------------------------------------------id:cd1/b01/f0104o06:sid13300 -> Oui (?) au début ---------------------------------------------------------------------id:cd1/b01/f0176o06:sid13496

Oui bonjour est-ce-que je pourrais avoir le numéro de monsieur Fawer Patrice place de la Cite à Yens s’il vous plaît Cite -> cité ---------------------------------------------------------------------id:cd1/b01/f0199o06:sid13982 Oui bonjour c’est depuis Yens que je téléphone j’aimerais avoir le numéro de téléphone de madame Delaloye Bernadette Bernadette Oeuches Domont vingt trois Alle Bernadette n’est dit qu’une seule fois ! ---------------------------------------------------------------------id:cd1/b02/f0241o06:sid14549 Oui bonjour madame je cherche le numéro de téléphone de Voirol Eric je vous épelle v o i r o l prénom Eric e r i c i l habite rue du Château [\prononciation bizarre Môtiers] Neuchâtel merci 1) [\prononciation bizarre Môtiers] -> moitié 2) Pourquoi pas d’accent sur Eric ?? ----------------------------------------------------------------------

30

ISIS Project: Final Report

September 15, 1999

D Raw Statistics about French PolyPhone corpus The corpus contains 4293 requests among which 5 are actually empty (either no text or [\inintelligible]) and 766 contained undocumented annotations (e.g. [\prononciation bizarre Montilier]). The vocabulary used contained 1’075 uses of pontuctuation signs among 6 punctuation signs with the following distribution 894 80 52 30 11 8

, . : ! ? ;

Once undocumented annotations and punctuation signs are removed, the corpus contains 70’665 words among a vocabualry of 5’877 different words. This leads to a average length of 16.5 words per request. Here are the 50 most frequent words: 4650 4065 3134 2971 2754 1968 1859 1693 1558 1316 999 997 868 836 834 776 775

de le numéro_de_téléphone à bonjour j’ aimerais monsieur madame s’il_vous_plaît je vous mademoiselle me la avoir oui

716 697 581 551 451 450 400 376 369 367 349 342 339 339 322 321 319

merci donner numéro rue habite qui euh a e un les du pourriezchemin savoir l’ deux

304 294 283 276 259 254 248 247 231 215 213 212 211 203 201 193

que vingt route r est-ce poste connaître des m’ pourrais est au indiquer trois o huit

Here is the distribution of vocabulary frequencies (in log scale): 10000

number of occurences

1000

100

10

1 1

10

100 word frequency

1000

10000

31

ISIS Project: Final Report

September 15, 1999

This vocabulary contains 2584 happax (i.e. 44%) and 64% of the vocabulary occurs less than 2 times. We also studied the N -grams of words at begining of request (i.e. introduction of request). Here are, for N = 1 to N = 9 the 10 most frequent N -grams with their number of occurences: 1966 730 250 120 85 81 80 68 67 66

bonjour oui j’ euh je alors veuillez mademoiselle pourriezbonsoir

648 594 453 343 242 138 69 67 66 52

oui bonjour bonjour j’ bonjour mademoiselle bonjour madame j’ aimerais bonjour je bonjour est-ce pourriez- vous bonjour pourriezpouvez- vous

576 247 165 141 131 99 96 68 66 59

bonjour j’ aimerais oui bonjour j’ bonjour mademoiselle j’ bonjour madame j’ j’ aimerais le oui bonjour madame oui bonjour mademoiselle bonjour est-ce que bonjour pourriez- vous bonjour je voudrais

295 234 160 139 100 93 76 55 52 50

bonjour j’ aimerais le oui bonjour j’ aimerais bonjour mademoiselle j’ aimerais bonjour madame j’ aimerais j’ aimerais le numéro_de_téléphone bonjour j’ aimerais avoir bonjour j’ aimerais savoir bonjour mademoiselle pourriez- vous oui bonjour madame j’ bonjour est-ce que je

222 116 94 87 87 71 69 67 48 47

bonjour j’ aimerais le numéro_de_téléphone oui bonjour j’ aimerais le j’ aimerais le numéro_de_téléphone de bonjour mademoiselle j’ aimerais le bonjour j’ aimerais avoir le bonjour j’ aimerais savoir le bonjour madame j’ aimerais le bonjour j’ aimerais le numéro bonjour est-ce que je pourrais oui bonjour madame j’ aimerais

211 94 80 74 63 60 56 43 39 39

bonjour j’ aimerais le numéro_de_téléphone de oui bonjour j’ aimerais le numéro_de_téléphone bonjour j’ aimerais avoir le numéro_de_téléphone bonjour mademoiselle j’ aimerais le numéro_de_téléphone bonjour j’ aimerais le numéro de bonjour j’ aimerais savoir le numéro_de_téléphone bonjour madame j’ aimerais le numéro_de_téléphone bonjour est-ce que je pourrais avoir j’ aimerais le numéro_de_téléphone de monsieur bonjour j’ aimerais connaître le numéro_de_téléphone

91 88 76 71 58 54 50 42 38 31

oui bonjour j’ aimerais le numéro_de_téléphone de bonjour j’ aimerais le numéro_de_téléphone de monsieur bonjour j’ aimerais avoir le numéro_de_téléphone de bonjour mademoiselle j’ aimerais le numéro_de_téléphone de bonjour j’ aimerais savoir le numéro_de_téléphone de bonjour madame j’ aimerais le numéro_de_téléphone de bonjour j’ aimerais le numéro_de_téléphone de madame bonjour est-ce que je pourrais avoir le bonjour j’ aimerais connaître le numéro_de_téléphone de pourriez- vous me donner le numéro_de_téléphone de

32

ISIS Project: Final Report

September 15, 1999

42 38 35 30 30 26 26 26 26 25

oui bonjour j’ aimerais le numéro_de_téléphone de monsieur bonjour est-ce que je pourrais avoir le numéro_de_téléphone bonjour mademoiselle j’ aimerais le numéro_de_téléphone de monsieur bonjour pourriez- vous me donner le numéro_de_téléphone de bonjour j’ aimerais avoir le numéro_de_téléphone de monsieur oui bonjour j’ aimerais savoir le numéro_de_téléphone de oui bonjour j’ aimerais avoir le numéro_de_téléphone de bonjour mademoiselle pourriez- vous me donner le numéro_de_téléphone bonjour j’ aimerais savoir le numéro_de_téléphone de monsieur oui bonjour j’ aimerais le numéro_de_téléphone de madame

37 24 18 17 14 11 11 11 11 10

bonjour est-ce que je pourrais avoir le numéro_de_téléphone de bonjour mademoiselle pourriez- vous me donner le numéro_de_téléphone de bonjour pourriez- vous me donner le numéro_de_téléphone de monsieur oui bonjour est-ce que je pourrais avoir le numéro_de_téléphone oui bonjour mademoiselle j’ aimerais le numéro_de_téléphone de monsieur oui bonjour j’ aimerais savoir le numéro_de_téléphone de monsieur oui bonjour j’ aimerais avoir le numéro_de_téléphone de madame bonjour mademoiselle pouvez- vous me donner le numéro_de_téléphone de bonjour mademoiselle est-ce que je pourrais avoir le numéro_de_téléphone oui bonjour pourriez- vous me donner le numéro_de_téléphone de

33

ISIS Project: Final Report

September 15, 1999

E Lattice Quality mots de la phrase 1 non trouvés dans le lattice : mademoiselle mots de la phrase 2 non trouvés dans le lattice : bonjour, mademoiselle, madame, bongard, veuillez mots de la phrase 3 non trouvés dans le lattice : aimé, oyvaert, steve, pailly mots de la phrase 4 non trouvés dans le lattice : vous, plaît, zbinden, merci mots de la phrase 5 non trouvés dans le lattice : souhaiterai, numéro, genet, conthey mots de la phrase 6 non trouvés dans le lattice : vedo, brignon, baar mots de la phrase 7 non trouvés dans le lattice : bonjour, aimerais, numéro, celui, zahnd, coop mots de la phrase 8 non trouvés dans le lattice : aimerais, madame, claire, lise, ow, randin, w, trois, clarens mots de la phrase 9 non trouvés dans le lattice : gisiger, agnès, du, motiers, neuchâtel mots de la phrase 10 non trouvés dans le lattice : bonjour, donner, laurentin mots de la phrase 11 non trouvés dans le lattice : oui, bonjour, pouvez, carrière mots de la phrase 12 non trouvés dans le lattice : oui, bonjour, aimerais, numéro, vit mots de la phrase 13 non trouvés dans le lattice : oui, vous, pouvez, indiquer, monsieur, simonin, collex mots de la phrase 14 non trouvés dans le lattice : pierre, yens, merci, revoir mots de la phrase 15 non trouvés dans le lattice : madame, lopez, auriez, numéro, jean, cinq, glovelier mots de la phrase 16 non trouvés dans le lattice : demeurant, neuf, mots de la phrase 17 non trouvés dans le lattice : laurence, moyen, monthey, aimerais, numéro, ça, deux mots de la phrase 18 non trouvés dans le lattice : aimerais, un, numéro, monsieur, sauser, jean, claude, rue, vingt mots de la phrase 19 non trouvés dans le lattice : auriez, gentillesse, crettaz mots de la phrase 20 non trouvés dans le lattice : excusez, déranger, veuillez, téléphone, monsieur, noirjean, jean, develier mots de la phrase 21 non trouvés dans le lattice : oui, bonjour, mademoiselle, marthe, otcha, pourriez, numéro, richard, revoir mots de la phrase 22 non trouvés dans le lattice : oui, aimerais, numéro mots de la phrase 23 non trouvés dans le lattice : mots de la phrase 24 non trouvés dans le lattice : mademoiselle, vous, sebastopol, pantet, imier mots de la phrase 25 non trouvés dans le lattice : madame, guyot, qui, habite, neuves mots de la phrase 26 non trouvés dans le lattice : oui, monsieur, fawer, yens mots de la phrase 27 non trouvés dans le lattice : joris, repos mots de la phrase 28 non trouvés dans le lattice : depuis, yens, delaloye, oeuches, alle mots de la phrase 29 non trouvés dans le lattice : blaser mots de la phrase 30 non trouvés dans le lattice : oui, madame, cherche, rue, neuchâtel

34

ISIS Project: Final Report

F

September 15, 1999

Semantic Annotations of the 30 bootstrap sentences

id: cd1/b00/f0002o06:sid13609 adr1: GROSJEAN MARTINE adr2: adr3: VAULION text[165]: Bonjour madame, ici madame Plant, pouvez-vous, s’il vous plaît, me donner le numéro de téléphone de mademoiselle Martine Grosjean g r o s j e a n qui habite Vaulion [User]: madame Plant Identification Person Name [Title]: mademoiselle Family name: Grosjean [First name]: Martine Address Locality City: Vaulion

id: cd1/b00/f0014o06:sid14011 adr1: CRITTIN DANIEL adr2: adr3: LE CERNEUX-VEUSIL text[145]: Bonjour mademoiselle c’est madame Bongard veuillez me donner le numéro de téléphone de monsieur Crittin Daniel le Cerneux-Veusil s’il vous plaît [User]: madame Bongard Identification Name Person [Title]: monsieur Family name: Crittin [First name]: Daniel Address Locality City: le Cerneux-Veusil

id: cd1/b00/f0024o06:sid14067 adr1: OYVAERT STEVE adr2: LE GRAND CLOS adr3: PAILLY text[201]: Bonjour j’aurais aimé connaître le numéro de téléphone de monsieur Oyvaert Steve je vous l’épelle o y v a e r t Steve habitant le Grand Clos je ne connais pas le numéro postal Pailly p a i l l y merci Identification Name Person [Title]: monsieur Family name: Oyvaert [First name]: Steve Address [Street name]: le Grand Clos Locality City: Pailly

id: cd1/b00/f0027o06:sid14186 adr1: AUBERT-ZBINDEN HUGUETTE

35

ISIS Project: Final Report

September 15, 1999

adr2: LE BANNE 201 B adr3: FONTENAIS text[142]: Bonjour veuillez s’il vous plaît me donner le numéro de téléphone de madame Aubert-Zbinden Huguette la Banne deux cent un b à Fontenais merci Identification Person Name [Title]: madame Family name: Aubert-Zbinden [First name]: Huguette Address [Street n.]: 201 [Building]: b [Street name]: la Banne Locality City: Fontenais

id: cd1/b00/f0029o06:sid14294 adr1: GENET DANIEL adr2: RTE DE VETROZ adr3: CONTHEY text[95]: Bonjour je souhaiterai le numéro téléphonique de monsieur Genet Daniel route de Vétroz Conthey Identification Name Person [Title]: monsieur Family name: Genet [First name]: Daniel Address [Street name]: route de Vétroz Locality City: Conthey

id: cd1/b00/f0032o06:sid14128 adr1: VEDO-MOSER BRIGITTE adr2: BRIGNON adr3: BAAR (NENDAZ) text[77]: J’aimerais le numéro de téléphone de Vedo-Moser Brigitte Brignon Baar-Nendaz Identification Name Person Family name: Vedo-Moser [First name]: Brigitte Address [Street name]: Brignon Locality City: Nendaz Region: Baar

id: cd1/b00/f0034o06:sid14131 adr1: ZAHND-CHARMILLOT SYLVIA adr2: BATIMENT DE LA COOP adr3: SAVIESE text[178]: Bonjour j’aimerais un numéro de téléphone s’il vous plaît celui de madame Sylvia Zahnd z a h n d Charmillot c h a r m i deux l o t à Savièse s a v i e s e au bâtiment de la Coop

36

ISIS Project: Final Report

September 15, 1999

Identification Name Person [Title]: madame Family name: Zahnd-Chamillot [First name]: Sylvia Address [Additional info]: bâtiment de la Coop Locality City: Savièse

id: cd1/b00/f0043o06:sid14302 adr1: VON OW-RANDIN CLAIRE-LISE adr2: GRAMMONT 3 adr3: CLARENS text[160]: Oui bonjour j’aimerais le numéro de téléphone de madame Claire-Lise Von Ow-Randin qui s’écrit v o n o w tiret r a n d i n et qui habite trois Grammont à Clarens Identification Name Person [Title]: madame Family: Von Ow-Randin [First name]: Claire-Lise Address [Street n.]: 3 [Street name]: Grammont Locality City: Clarens

id: cd1/b00/f0047o06:sid14065 adr1: MAITRE-GISIGER AGNES adr2: CLOS DU TERREAU adr3: MOTIERS NE text[105]: Je désirerais connaître le numéro de téléphone de Maitre-Gisiger Agnès Clos du Terreau Motiers Neuchâtel Identification Name Person Family: Maitre-Gisiger [First name]: Agnès Address [Street name]: Clos du Terreau Locality City: Motiers Region: Neuchâtel

id: cd1/b00/f0050o06:sid14209 adr1: CRETTAZ-VOIDE MARIE-LAURENTIN adr2: POSTE VILLAGE adr3: LEYSIN 1 text[151]: Bonjour est-ce que vous pourriez me donner le numéro de téléphone de Crettaz-Voide Marie-Laurentin elle habite à poste Village à Leysin s’il vous plaît Identification Name Person Family name: Crettaz-Voide

37

ISIS Project: Final Report

September 15, 1999

[First name]: Marie-Laurentin Address [Village]: poste Village Locality City: Leysin

id: cd1/b00/f0055o06:sid13400 adr1: HOFER FRANCIS adr2: CH. DE LA CARRIERE adr3: ST-CERGUE text[169]: Oui bonjour pouvez-vous me donner [\hésitation le] les coordonnées de monsieur [\prononciation bizarre Hofer] Francis chemin de la Carrière Saint-Cergue merci au revoir. Identification Name Person [Title]: monsieur Family name: Hofer [First name]: Francis Address [Street name]: chemin de la Carrière Locality City: Saint-Cergue

id: cd1/b00/f0056o06:sid14212 adr1: GLASSEY RAPHAEL adr2: adr3: LE CHABLE VS text[143]: Oui bonjour j’aimerais savoir [\hésitation euh] le numéro de téléphone de Raphaël Glassey qui vit [\hésitation euh] à le Châble c’est en Valais Identification Name Person Family name: Glassey [First name]: Raphaël Address Locality City: le Châble Region: Valais

id: cd1/b00/f0060o06:sid14187 adr1: SIMONIN GERARD adr2: adr3: COLLEX text[97]: Oui est-ce-que vous pouvez m’indiquer le numéro de téléphone de monsieur Simonin Gérard à Collex Identification Name Person [Title]: monsieur Family name: Simonin [First name]: Gérard Address Locality City: Collex

id: cd1/b00/f0062o06:sid13842

38

ISIS Project: Final Report

September 15, 1999

adr1: MICHEL PIERRE adr2: adr3: VILLARS-SOUS-YENS text[148]: Bonjour madame j’aimerais le numéro de téléphone de Michel Pierre le nom de famille est Michel il habite Villars-sous-Yens merci beaucoup au revoir Identification Name Person Family name: Michel [First name]: Pierre Address Locality City: Villars-sous-Yens

id: cd1/b00/f0065o06:sid14099 adr1: SORDET JEAN-PIERRE adr2: MONTATES 5 adr3: GLOVELIER text[172]: Bonjour c’est madame Lopez est-ce-que vous auriez la gentillesse de me donner le numéro de téléphone de monsieur Sordet Jean-Pierre Montates cinq Glovelier s’il vous plaît [User]: madame Lopez Identification Name Person [Title]: monsieur Family name: Sordet [First name]: Jean-Pierre Address [Street n.]: 5 [Street name]: Montates Locality City: Glovelier

id: cd1/b00/f0071o06:sid14259 adr1: HOHERMUTH JACQUELINE adr2: R.TAVERNIER 19 adr3: AUBONNE text[148]: Bonjour pourriez-vous s’il vous plaît me donner le numéro de téléphone de madame Hohermuth Jacqueline demeurant dix neuf rue de Tavernier à Aubonne Identification Name Person [Title]: madame Family name: Hohermuth [First name]: Jacqueline Address [Street n.]: 19 [Street name]: rue de Tavernier Locality City: Aubonne

id: cd1/b00/f0075o06:sid14268 adr1: BESSE-ZAMBAZ MURIELLE adr2: FRUENCE 335 adr3: CHATEL-ST-DENIS

39

ISIS Project: Final Report

September 15, 1999

text[230]: Oui bonjour Laurence Moyen de Monthey qui téléphone j’aimerais le numéro de téléphone de madame Besse-Zambaz ça s’écrit b e deux s e tiret z a m b a z Murielle elle habite à Fruence trois cent trente cinq Châtel-Saint-Denis France [User]: Laurence Moyen de Monthey Identification Name Person [Title]: madame Family name: Besse-Zambaz [First name]: Murielle Address [Street n.]: 335 [Street name]: Fruence Locality City: Châtel-Saint-Denis

id: cd1/b00/f0077o06:sid14235 adr1: SAUSER JEAN-CLAUDE adr2: RUE DES COLOMBES 22 adr3: COLLOMBEY text[110]: Bonjour j’aimerais un numéro de téléphone à Collombey monsieur Sauser Jean-Claude rue des Colombes vingt deux Identification Name Person [Title]: monsieur Family name: Sauser [First name]: Jean-Claude Address [Street n.]: 22 [Street name]: rue des Colombes Locality City: Collombey

id: cd1/b00/f0094o06:sid13244 adr1: CRETTAZ-VOIDE MARIE-LAURENTIN adr2: adr3: LES CHARBONNIERES text[153]: Auriez-vous la gentillesse de me communiquer le numéro de téléphone de monsieur Crettaz-[\prononciation bizarre Voide] Marie-Laurentin les Charbonnières Identification Name Person [Title]: monsieur Family name: Crettaz-Voide [First name]: Marie-Laurentin Address Locality City: les Charbonnières

id: cd1/b01/f0104o06:sid13300 adr1: NOIRJEAN ALAIN JEAN-LOUIS adr2: PL. DE LA POSTE 1 adr3: DEVELIER text[188]: Excusez-moi de vous déranger mademoiselle, veuillez s’il vous plaît, me donner le numéro de téléphone de

40

ISIS Project: Final Report

September 15, 1999

monsieur Noirjean Alain Jean-Louis qui habite à la place de la poste un à Develier Identification Name Person [Title]: monsieur Family: Noirjean [First name]: Alain [Second name]: Jean-Louis Address [Street n.]: 1 [Street name]: place de la poste Locality City: Develier

id: cd1/b01/f0107o06:sid13111 adr1: MELET RICHARD adr2: CH. DE TREMBLEY 38 adr3:PRANGINS text[191]: Oui bonjour madame, c’est mademoiselle Marthe Otcha, pourriez-vous me donner le numéro de téléphone de monsieur Melet Richard chemin de Trembley trente huit à Prangins merci madame au revoir [User]: mademoiselle Marthe Otcha Identification Name Person [Title]: monsieur Family name: Melet [First name]: Richard Address [Street n.]: 38 [Street name]: chemin de Trembley Locality City: Prangins

id: cd1/b01/f0138o06:sid13424 adr1: GABIOUD-FURLETTI PATRICIA adr2: adr3: COUSSET text[100]: Oui bonjour j’aimerais connaître le numéro de téléphone de madame Gabioud-Furletti Patricia Cousset Identification Name Person [Title]: madame Family name: Gabioud-Furletti [First name]: Patricia Address Locality City: Cousset

id: cd1/b01/f0146o06:sid13279 adr1: GLASSEY PIERRE-LOUIS adr2: RUE DU MONT 7 adr3: SION text[125]: Bonjour j’aurais voulu avoir le numéro de téléphone de monsieur Glassey Pierre-Louis qui habite à la rue du Mont sept à Sion Identification

41

ISIS Project: Final Report

September 15, 1999

Name Person [Title]: monsieur Family name: Glassey [First name]: Pierre-Louis Address [Street n.]: 7 [Street name]: rue du Mont Locality City: Sion

id: cd1/b01/f0149o06:sid13365 adr1: GROSJEAN MARTINE adr2: SEBASTOPOL 18 PANTET adr3: ST-IMIER text[155]: Bonjour mademoiselle puis-je connaître s’il vous plaît le numéro de téléphone de madame Martine Grosjean habitant à Sebastopol dix huit Pantet Saint-Imier Identification Name Person [Title]: madame Family: Grosjean [First name]: Martine Address [Street n.]: 18 [Street name]: Sebastopol [Additional info]: Pantet Locality City: Saint-Imier

id: cd1/b01/f0170o06:sid13443 adr1: GUYOT-BLANC MONIQUE adr2: LES NEUVES 1 adr3: LAMBOING text[100]: J’aimerais le numéro de téléphone de madame Monique Guyot-Blanc qui habite les Neuves un à Lamboing Identification Name Person [Title]: madame Family name: Guyot-Blanc [First name]: Monique Address [Street n.]: 1 [Street name]: les Neuves Locality City: Lamboing

id: cd1/b01/f0176o06:sid13496 adr1: FAWER PATRICE adr2: PL. DE LA CITE adr3: YENS text[117]: Oui bonjour est-ce-que je pourrais avoir le numéro de monsieur Fawer Patrice place de la Cite à Yens s’il vous plaît Identification Name

42

ISIS Project: Final Report

September 15, 1999

Person [Title]: monsieur Family name: Fawer [First name]: Patrice Address [Street name]: place de la Cite Locality City: Yens

id: cd1/b01/f0186o06:sid13095 adr1: JORIS-SAUDAN MAURICETTE adr2: CHANDOSSEL adr3: VILLAREPOS text[134]: Bonjour pourriez-vous s’il vous plaît me donner le numéro de téléphone de madame Joris-Saudan Mauricette Chandossel Villa repos merci Identification Name Person [Title]: madame Family name: Joris-Saudan [First name]: Mauricette Address [Street name]: Chandossel Locality City: Villa repos

id: cd1/b01/f0199o06:sid13982 adr1: DELALOYE BERNADETTE adr2: OEUCHES DOMONT 23 adr3: ALLE text[160]: Oui bonjour c’est depuis Yens que je téléphone j’aimerais avoir le numéro de téléphone de madame Delaloye Bernadette Bernadette Oeuches Domont vingt trois Alle [User]: Yens Identification Name Person [Title]: madame Family name: Delaloye [First name]: Bernadette Address [Street n.]: 23 [Street name]: Oeuches Domont Locality City: Alle

id: cd1/b02/f0206o06:sid13990 adr1: BLASER ANNE-LISE adr2: LA SAPELLE adr3: VILLARS-TIERCELIN text[140]: Bonjour est-ce que vous pourriez me donner le numéro de téléphone de madame Blaser Anne-Lise qui habite à la Sapelle Villars-Tiercelin merci Identification Name Person [Title]: madame

43

ISIS Project: Final Report

September 15, 1999

Family name: Blaser [First name]: Anne-Lise Address [Street name]: la Sapelle Locality City: Villars-Tiercelin

id: cd1/b02/f0241o06:sid14549 adr1: VOIROL ERIC adr2: RUE DU CHATEAU adr3: MOTIERS NE text[189]: Oui bonjour madame je cherche le numéro de téléphone de Voirol Eric je vous épelle v o i r o l prénom Eric e r i c il habite rue du Château [\prononciation bizarre Môtiers] Neuchâtel merci Identification Name Person Family name: Voirol [First name]: Eric Address [Street name]: rue du Château Môtiers Locality City: Neuchâtel

44

ISIS Project: Final Report

September 15, 1999

G Semantic analysis on the 30 bootstrap sentences query1========================== caller_title([madame]) caller_name([Plant]) title([mademoiselle]) name([Martine,Grosjean]) street_name([Vaulion]) street_number([]) locality([]) weight(0.3) query1========================== caller_title([madame]) caller_name([Plant]) title([mademoiselle]) name([Martine,Grosjean]) street_name([]) street_number([]) locality([Vaulion]) weight(0.5) query1========================== caller_title([madame]) caller_name([Plant]) title([mademoiselle]) name([Martine,Grosjean]) street_name([Vaulion]) street_number([]) locality([]) weight(0.3) query1========================== caller_title([madame]) caller_name([Plant]) title([mademoiselle]) name([Martine,Grosjean]) street_name([]) street_number([]) locality([Vaulion]) weight(0.5) query2========================== caller_title([madame]) caller_name([Bongard]) title([monsieur]) name([Crittin,Daniel]) street_name([le,Cerneux_Veusil]) street_number([]) locality([]) weight(0.5) query2========================== caller_title([madame]) caller_name([Bongard]) title([monsieur]) name([Crittin,Daniel]) street_name([]) street_number([]) locality([Cerneux_Veusil]) weight(0.5) query2========================== caller_title([madame]) caller_name([Bongard]) title([monsieur]) name([Crittin,Daniel]) street_name([le,Cerneux_Veusil]) street_number([]) locality([]) weight(0.5) query2========================== caller_title([madame])

query18========================= caller_title([]) caller_name([]) title([]) name([Sauser,Jean_Claude]) street_name([rue,des,Colombes]) street_number([vingt_deux]) locality([]) weight(0.3) query18========================= caller_title([]) caller_name([]) title([]) name([Sauser,Jean_Claude]) street_name([]) street_number([vingt_deux]) locality([]) weight(0.3) query18========================= caller_title([]) caller_name([]) title([]) name([Sauser,Jean_Claude]) street_name([]) street_number([vingt_deux]) locality([]) weight(0.3) query18========================= caller_title([]) caller_name([]) title([]) name([Collombey]) street_name([rue,des,Colombes]) street_number([vingt_deux]) locality([]) weight(0.3) query18========================= caller_title([]) caller_name([]) title([]) name([Collombey]) street_name([]) street_number([vingt_deux]) locality([]) weight(0.3) query18========================= caller_title([]) caller_name([]) title([]) name([Collombey]) street_name([]) street_number([vingt_deux]) locality([]) weight(0.3) query18========================= caller_title([]) caller_name([]) title([]) name([Sauser]) street_name([rue,des,Colombes]) street_number([vingt_deux]) locality([]) weight(0.3) query18========================= caller_title([]) caller_name([])

45

ISIS Project: Final Report

caller_name([Bongard]) title([monsieur]) name([Crittin,Daniel]) street_name([]) street_number([]) locality([Cerneux_Veusil]) weight(0.5) query3========================== caller_title([]) caller_name([]) title([monsieur]) name([Oyvaert,Steve]) street_name([le,Grand^Clos]) street_number([]) locality([Pailly]) weight(0.5) query3========================== caller_title([]) caller_name([]) title([monsieur]) name([Oyvaert,Steve]) street_name([]) street_number([]) locality([Grand^Clos]) weight(0.5) query3========================== caller_title([]) caller_name([]) title([monsieur]) name([Oyvaert,Steve]) street_name([]) street_number([]) locality([Pailly]) weight(0.5) query3========================== caller_title([]) caller_name([]) title([monsieur]) name([Oyvaert,Steve]) street_name([le,Grand^Clos]) street_number([]) locality([Pailly]) weight(0.5) query3========================== caller_title([]) caller_name([]) title([monsieur]) name([Oyvaert,Steve]) street_name([]) street_number([]) locality([Grand^Clos]) weight(0.5) query3========================== caller_title([]) caller_name([]) title([monsieur]) name([Oyvaert,Steve]) street_name([]) street_number([]) locality([Pailly]) weight(0.5) query4========================== caller_title([]) caller_name([]) title([madame]) name([Aubert_Zbinden,Huguette]) street_name([la,Banne])

September 15, 1999

title([]) name([Sauser]) street_name([]) street_number([vingt_deux]) locality([]) weight(0.3) query18========================= caller_title([]) caller_name([]) title([]) name([Sauser]) street_name([]) street_number([vingt_deux]) locality([]) weight(0.3) query18========================= caller_title([]) caller_name([]) title([]) name([Jean_Claude]) street_name([rue,des,Colombes]) street_number([vingt_deux]) locality([]) weight(0.3) query18========================= caller_title([]) caller_name([]) title([]) name([Jean_Claude]) street_name([]) street_number([vingt_deux]) locality([]) weight(0.3) query18========================= caller_title([]) caller_name([]) title([]) name([Jean_Claude]) street_name([]) street_number([vingt_deux]) locality([]) weight(0.3) query18========================= caller_title([]) caller_name([]) title([]) name([Colombes]) street_name([]) street_number([vingt_deux]) locality([]) weight(0.3) query18========================= caller_title([]) caller_name([]) title([]) name([Colombes]) street_name([]) street_number([vingt_deux]) locality([]) weight(0.3) query19========================= caller_title([]) caller_name([]) title([monsieur]) name([Crettaz_Voide,Marie_Laurentin]) street_name([les,Charbonnières]) street_number([]) locality([])

46

ISIS Project: Final Report

street_number([deux,cent,un]) locality([Fontenais]) weight(0.5) query4========================== caller_title([]) caller_name([]) title([madame]) name([Aubert_Zbinden,Huguette]) street_name([]) street_number([deux,cent,un]) locality([Fontenais]) weight(0.7) query4========================== caller_title([]) caller_name([]) title([madame]) name([Aubert_Zbinden,Huguette]) street_name([Fontenais]) street_number([deux,cent,un]) locality([]) weight(0.3) query4========================== caller_title([]) caller_name([]) title([madame]) name([Aubert_Zbinden,Huguette]) street_name([]) street_number([deux,cent,un]) locality([Fontenais]) weight(0.7) query5========================== caller_title([]) caller_name([]) title([monsieur]) name([Genet,Daniel]) street_name([route,de,Vétroz, Conthey]) street_number([]) locality([]) weight(1) query5========================== caller_title([]) caller_name([]) title([monsieur]) name([Genet,Daniel]) street_name([]) street_number([]) locality([Vétroz,Conthey]) weight(0.5) query5========================== caller_title([]) caller_name([]) title([monsieur]) name([Genet,Daniel]) street_name([]) street_number([]) locality([Vétroz]) weight(0.5) query5========================== caller_title([]) caller_name([]) title([monsieur]) name([Genet,Daniel]) street_name([]) street_number([]) locality([Conthey]) weight(0.5) query5==========================

September 15, 1999

weight(0.5) query19========================= caller_title([]) caller_name([]) title([monsieur]) name([Crettaz_Voide,Marie_Laurentin]) street_name([]) street_number([]) locality([Charbonnières]) weight(0.5) query19========================= caller_title([]) caller_name([]) title([monsieur]) name([Crettaz_Voide,Marie_Laurentin]) street_name([les,Charbonnières]) street_number([]) locality([]) weight(0.5) query19========================= caller_title([]) caller_name([]) title([monsieur]) name([Crettaz_Voide,Marie_Laurentin]) street_name([]) street_number([]) locality([Charbonnières]) weight(0.5) query20========================= caller_title([]) caller_name([]) title([monsieur]) name([Noirjean,Alain,Jean_Louis]) street_name([place,de,la,poste]) street_number([un]) locality([Develier]) weight(0.7) query20========================= caller_title([]) caller_name([]) title([monsieur]) name([Noirjean,Alain,Jean_Louis]) street_name([]) street_number([un]) locality([Develier]) weight(0.7) query20========================= caller_title([]) caller_name([]) title([monsieur]) name([Noirjean,Alain,Jean_Louis]) street_name([Develier]) street_number([un]) locality([]) weight(0.3) query20========================= caller_title([]) caller_name([]) title([monsieur]) name([Noirjean,Alain,Jean_Louis]) street_name([]) street_number([un]) locality([Develier]) weight(0.7) query21========================= caller_title([mademoiselle]) caller_name([Marthe,Otcha])

47

ISIS Project: Final Report

caller_title([]) caller_name([]) title([monsieur]) name([Genet,Daniel]) street_name([route,de,Vétroz, Conthey]) street_number([]) locality([]) weight(1) query5========================== caller_title([]) caller_name([]) title([monsieur]) name([Genet,Daniel]) street_name([]) street_number([]) locality([Vétroz,Conthey]) weight(0.5) query5========================== caller_title([]) caller_name([]) title([monsieur]) name([Genet,Daniel]) street_name([]) street_number([]) locality([Vétroz]) weight(0.5) query5========================== caller_title([]) caller_name([]) title([monsieur]) name([Genet,Daniel]) street_name([]) street_number([]) locality([Conthey]) weight(0.5) query6========================== caller_title([]) caller_name([]) title([]) name([Vedo_Moser,Brigitte, Brignon, Baar_Nendaz]) street_name([]) street_number([]) locality([]) weight(1) query6========================== caller_title([]) caller_name([]) title([]) name([Vedo_Moser,Brigitte, Brignon,Baar_Nendaz]) street_name([]) street_number([]) locality([]) weight(1) query7========================== caller_title([]) caller_name([]) title([madame]) name([Sylvia,Zahnd]) street_name([la,Coop]) street_number([]) locality([]) weight(0.5) query7========================== caller_title([])

September 15, 1999

title([monsieur]) name([Melet,Richard]) street_name([chemin,de,Trembley]) street_number([trente_huit]) locality([Prangins]) weight(0.7) query21========================= caller_title([mademoiselle]) caller_name([Marthe,Otcha]) title([monsieur]) name([Melet,Richard]) street_name([]) street_number([trente_huit]) locality([Prangins]) weight(0.7) query21========================= caller_title([mademoiselle]) caller_name([Marthe,Otcha]) title([monsieur]) name([Melet,Richard]) street_name([Prangins]) street_number([trente_huit]) locality([]) weight(0.3) query21========================= caller_title([mademoiselle]) caller_name([Marthe,Otcha]) title([monsieur]) name([Melet,Richard]) street_name([]) street_number([trente_huit]) locality([Prangins]) weight(0.7) query22========================= caller_title([]) caller_name([]) title([madame]) name([Gabioud_Furletti,Patricia]) street_name([Cousset]) street_number([]) locality([]) weight(0.3) query22========================= caller_title([]) caller_name([]) title([madame]) name([Gabioud_Furletti,Patricia]) street_name([]) street_number([]) locality([Cousset]) weight(0.5) query22========================= caller_title([]) caller_name([]) title([madame]) name([Gabioud_Furletti,Patricia]) street_name([Cousset]) street_number([]) locality([]) weight(0.3) query22========================= caller_title([]) caller_name([]) title([madame]) name([Gabioud_Furletti,Patricia]) street_name([]) street_number([]) locality([Cousset])

48

ISIS Project: Final Report

caller_name([]) title([madame]) name([Sylvia,Zahnd]) street_name([]) street_number([]) locality([Savièse]) weight(0.7) query7========================== caller_title([]) caller_name([]) title([madame]) name([Sylvia,Zahnd]) street_name([la,Coop]) street_number([]) locality([]) weight(0.5) query7========================== caller_title([]) caller_name([]) title([madame]) name([Sylvia,Zahnd]) street_name([]) street_number([]) locality([Savièse]) weight(0.7) query8========================== caller_title([]) caller_name([]) title([madame]) name([Claire_Lise,Von^Ow_Randin]) street_name([Grammont]) street_number([]) locality([Clarens]) weight(0.3) query8========================== caller_title([]) caller_name([]) title([madame]) name([Claire_Lise,Von^Ow_Randin]) street_name([Clarens]) street_number([]) locality([]) weight(0.3) query8========================== caller_title([]) caller_name([]) title([madame]) name([Claire_Lise,Von^Ow_Randin]) street_name([]) street_number([trois]) locality([Clarens]) weight(0.7) query8========================== caller_title([]) caller_name([]) title([madame]) name([Claire_Lise,Von^Ow_Randin]) street_name([Grammont]) street_number([trois]) locality([Clarens]) weight(0.3) query8========================== caller_title([]) caller_name([]) title([madame]) name([Claire_Lise,Von^Ow_Randin]) street_name([Clarens]) street_number([trois])

September 15, 1999

weight(0.5) query23========================= caller_title([]) caller_name([]) title([monsieur]) name([Glassey,Pierre_Louis]) street_name([rue,du,Mont]) street_number([sept]) locality([Sion]) weight(0.7) query23========================= caller_title([]) caller_name([]) title([monsieur]) name([Glassey,Pierre_Louis]) street_name([]) street_number([sept]) locality([Sion]) weight(0.7) query23========================= caller_title([]) caller_name([]) title([monsieur]) name([Glassey,Pierre_Louis]) street_name([Sion]) street_number([sept]) locality([]) weight(0.3) query23========================= caller_title([]) caller_name([]) title([monsieur]) name([Glassey,Pierre_Louis]) street_name([]) street_number([sept]) locality([Sion]) weight(0.7) query24========================= caller_title([]) caller_name([]) title([madame]) name([Martine,Grosjean]) street_name([Pantet,Saint_Imier]) street_number([]) locality([]) weight(0.3) query24========================= caller_title([]) caller_name([]) title([madame]) name([Martine,Grosjean]) street_name([Sebastopol]) street_number([dix_huit]) locality([Pantet,Saint_Imier]) weight(0.3) query24========================= caller_title([]) caller_name([]) title([madame]) name([Martine,Grosjean]) street_name([Sebastopol]) street_number([dix_huit]) locality([Pantet]) weight(0.3) query24========================= caller_title([]) caller_name([])

49

ISIS Project: Final Report

locality([]) weight(0.3) query8========================== caller_title([]) caller_name([]) title([madame]) name([Claire_Lise,Von^Ow_Randin]) street_name([]) street_number([trois]) locality([Clarens]) weight(0.7) query9========================== caller_title([]) caller_name([]) title([]) name([Maitre_Gisiger,Agnès]) street_name([Clos,du,Terreau, Motiers,Neuchâtel]) street_number([]) locality([]) weight(1) query9========================== caller_title([]) caller_name([]) title([]) name([Maitre_Gisiger,Agnès]) street_name([]) street_number([]) locality([Terreau,Motiers, Neuchâtel]) weight(0.5) query9========================== caller_title([]) caller_name([]) title([]) name([Maitre_Gisiger,Agnès]) street_name([]) street_number([]) locality([Terreau,Motiers]) weight(0.5) query9========================== caller_title([]) caller_name([]) title([]) name([Maitre_Gisiger,Agnès]) street_name([]) street_number([]) locality([Motiers,Neuchâtel]) weight(0.5) query9========================== caller_title([]) caller_name([]) title([]) name([Maitre_Gisiger,Agnès]) street_name([]) street_number([]) locality([Terreau]) weight(0.5) query9========================== caller_title([]) caller_name([]) title([]) name([Maitre_Gisiger,Agnès]) street_name([]) street_number([]) locality([Motiers]) weight(0.5) query9==========================

September 15, 1999

title([madame]) name([Martine,Grosjean]) street_name([Sebastopol]) street_number([dix_huit]) locality([Saint_Imier]) weight(0.3) query24========================= caller_title([]) caller_name([]) title([madame]) name([Martine,Grosjean]) street_name([Pantet]) street_number([]) locality([Saint_Imier]) weight(0.3) query24========================= caller_title([]) caller_name([]) title([madame]) name([Martine,Grosjean]) street_name([Saint_Imier]) street_number([]) locality([]) weight(0.3) query24========================= caller_title([]) caller_name([]) title([madame]) name([Martine,Grosjean]) street_name([]) street_number([dix_huit]) locality([Pantet,Saint_Imier]) weight(0.5) query24========================= caller_title([]) caller_name([]) title([madame]) name([Martine,Grosjean]) street_name([]) street_number([dix_huit]) locality([Pantet]) weight(0.5) query24========================= caller_title([]) caller_name([]) title([madame]) name([Martine,Grosjean]) street_name([]) street_number([dix_huit]) locality([Saint_Imier]) weight(0.5) query24========================= caller_title([]) caller_name([]) title([madame]) name([Martine,Grosjean]) street_name([Pantet,Saint_Imier]) street_number([dix_huit]) locality([]) weight(0.3) query24========================= caller_title([]) caller_name([]) title([madame]) name([Martine,Grosjean]) street_name([Pantet]) street_number([dix_huit]) locality([Saint_Imier]) weight(0.3)

50

ISIS Project: Final Report

caller_title([]) caller_name([]) title([]) name([Maitre_Gisiger,Agnès]) street_name([]) street_number([]) locality([Neuchâtel]) weight(0.5) query9========================== caller_title([]) caller_name([]) title([]) name([Maitre_Gisiger,Agnès]) street_name([Clos,du,Terreau, Motiers,Neuchâtel]) street_number([]) locality([]) weight(1) query9========================== caller_title([]) caller_name([]) title([]) name([Maitre_Gisiger,Agnès]) street_name([]) street_number([]) locality([Terreau,Motiers, Neuchâtel]) weight(0.5) query9========================== caller_title([]) caller_name([]) title([]) name([Maitre_Gisiger,Agnès]) street_name([]) street_number([]) locality([Terreau,Motiers]) weight(0.5) query9========================== caller_title([]) caller_name([]) title([]) name([Maitre_Gisiger,Agnès]) street_name([]) street_number([]) locality([Motiers,Neuchâtel]) weight(0.5) query9========================== caller_title([]) caller_name([]) title([]) name([Maitre_Gisiger,Agnès]) street_name([]) street_number([]) locality([Terreau]) weight(0.5) query9========================== caller_title([]) caller_name([]) title([]) name([Maitre_Gisiger,Agnès]) street_name([]) street_number([]) locality([Motiers]) weight(0.5) query9========================== caller_title([]) caller_name([]) title([]) name([Maitre_Gisiger,Agnès])

September 15, 1999

query24========================= caller_title([]) caller_name([]) title([madame]) name([Martine,Grosjean]) street_name([Saint_Imier]) street_number([dix_huit]) locality([]) weight(0.3) query24========================= caller_title([]) caller_name([]) title([madame]) name([Martine,Grosjean]) street_name([]) street_number([dix_huit]) locality([Pantet,Saint_Imier]) weight(0.5) query24========================= caller_title([]) caller_name([]) title([madame]) name([Martine,Grosjean]) street_name([]) street_number([dix_huit]) locality([Pantet]) weight(0.5) query24========================= caller_title([]) caller_name([]) title([madame]) name([Martine,Grosjean]) street_name([]) street_number([dix_huit]) locality([Saint_Imier]) weight(0.5) query25========================= caller_title([]) caller_name([]) title([madame]) name([Monique,Guyot_Blanc]) street_name([les,Neuves]) street_number([un]) locality([Lamboing]) weight(0.5) query25========================= caller_title([]) caller_name([]) title([madame]) name([Monique,Guyot_Blanc]) street_name([]) street_number([un]) locality([Lamboing]) weight(0.7) query25========================= caller_title([]) caller_name([]) title([madame]) name([Monique,Guyot_Blanc]) street_name([Lamboing]) street_number([un]) locality([]) weight(0.3) query25========================= caller_title([]) caller_name([]) title([madame]) name([Monique,Guyot_Blanc])

51

ISIS Project: Final Report

September 15, 1999

street_name([]) street_number([]) locality([Neuchâtel]) weight(0.5)

street_name([]) street_number([un]) locality([Lamboing]) weight(0.7)

query10========================= caller_title([]) caller_name([]) title([]) name([Crettaz_Voide, Marie_Laurentin]) street_name([poste^Village]) street_number([]) locality([Leysin]) weight(0.3) query10========================= caller_title([]) caller_name([]) title([]) name([Crettaz_Voide, Marie_Laurentin]) street_name([Leysin]) street_number([]) locality([]) weight(0.3) query10========================= caller_title([]) caller_name([]) title([]) name([Crettaz_Voide, Marie_Laurentin]) street_name([]) street_number([]) locality([poste^Village]) weight(0.7) query10========================= caller_title([]) caller_name([]) title([]) name([Crettaz_Voide, Marie_Laurentin]) street_name([poste^Village]) street_number([]) locality([Leysin]) weight(0.3) query10========================= caller_title([]) caller_name([]) title([]) name([Crettaz_Voide, Marie_Laurentin]) street_name([Leysin]) street_number([]) locality([]) weight(0.3) query10========================= caller_title([]) caller_name([]) title([]) name([Crettaz_Voide, Marie_Laurentin]) street_name([]) street_number([]) locality([poste^Village]) weight(0.7)

query26========================= caller_title([]) caller_name([]) title([monsieur]) name([Fawer,Patrice]) street_name([place,de,la,Cité]) street_number([]) locality([Yens]) weight(0.7) query26========================= caller_title([]) caller_name([]) title([monsieur]) name([Fawer,Patrice]) street_name([]) street_number([]) locality([Yens]) weight(0.7) query26========================= caller_title([]) caller_name([]) title([monsieur]) name([Fawer,Patrice]) street_name([place,de,la,Cité]) street_number([]) locality([Yens]) weight(0.7) query26========================= caller_title([]) caller_name([]) title([monsieur]) name([Fawer,Patrice]) street_name([]) street_number([]) locality([Yens]) weight(0.7)

query11========================= caller_title([]) caller_name([])

query27========================= caller_title([]) caller_name([]) title([madame]) name([Joris_Saudan,Mauricette]) street_name([Chandossel,Villa^repos]) street_number([]) locality([]) weight(0.3) query27========================= caller_title([]) caller_name([]) title([madame]) name([Joris_Saudan,Mauricette]) street_name([Chandossel]) street_number([]) locality([Villa^repos]) weight(0.3) query27========================= caller_title([]) caller_name([]) title([madame]) name([Joris_Saudan,Mauricette]) street_name([Villa^repos]) street_number([]) locality([]) weight(0.3)

52

ISIS Project: Final Report

title([monsieur]) name([Hofer,Francis]) street_name([chemin,de,la, Carrière,Saint_Cergue]) street_number([]) locality([]) weight(1) query11========================= caller_title([]) caller_name([]) title([monsieur]) name([Hofer,Francis]) street_name([]) street_number([]) locality([Carrière,Saint_Cergue]) weight(0.5) query11========================= caller_title([]) caller_name([]) title([monsieur]) name([Hofer,Francis]) street_name([]) street_number([]) locality([Carrière]) weight(0.5) query11========================= caller_title([]) caller_name([]) title([monsieur]) name([Hofer,Francis]) street_name([]) street_number([]) locality([Saint_Cergue]) weight(0.5) query11========================= caller_title([]) caller_name([]) title([monsieur]) name([Hofer,Francis]) street_name([chemin,de,la, Carrière,Saint_Cergue]) street_number([]) locality([]) weight(1) query11========================= caller_title([]) caller_name([]) title([monsieur]) name([Hofer,Francis]) street_name([]) street_number([]) locality([Carrière,Saint_Cergue]) weight(0.5) query11========================= caller_title([]) caller_name([]) title([monsieur]) name([Hofer,Francis]) street_name([]) street_number([]) locality([Carrière]) weight(0.5) query11========================= caller_title([]) caller_name([]) title([monsieur]) name([Hofer,Francis]) street_name([]) street_number([])

September 15, 1999

query27========================= caller_title([]) caller_name([]) title([madame]) name([Joris_Saudan,Mauricette]) street_name([]) street_number([]) locality([Chandossel,Villa^repos]) weight(0.5) query27========================= caller_title([]) caller_name([]) title([madame]) name([Joris_Saudan,Mauricette]) street_name([]) street_number([]) locality([Chandossel]) weight(0.5) query27========================= caller_title([]) caller_name([]) title([madame]) name([Joris_Saudan,Mauricette]) street_name([]) street_number([]) locality([Villa^repos]) weight(0.5) query27========================= caller_title([]) caller_name([]) title([madame]) name([Joris_Saudan,Mauricette]) street_name([Chandossel,Villa^repos]) street_number([]) locality([]) weight(0.3) query27========================= caller_title([]) caller_name([]) title([madame]) name([Joris_Saudan,Mauricette]) street_name([Chandossel]) street_number([]) locality([Villa^repos]) weight(0.3) query27========================= caller_title([]) caller_name([]) title([madame]) name([Joris_Saudan,Mauricette]) street_name([Villa^repos]) street_number([]) locality([]) weight(0.3) query27========================= caller_title([]) caller_name([]) title([madame]) name([Joris_Saudan,Mauricette]) street_name([]) street_number([]) locality([Chandossel,Villa^repos]) weight(0.5) query27========================= caller_title([]) caller_name([]) title([madame]) name([Joris_Saudan,Mauricette]) street_name([])

53

ISIS Project: Final Report

locality([Saint_Cergue]) weight(0.5) query12========================= caller_title([]) caller_name([]) title([]) name([Raphaël,Glassey]) street_name([le,Châble]) street_number([]) locality([Valais]) weight(0.5) query12========================= caller_title([]) caller_name([]) title([]) name([Raphaël,Glassey]) street_name([]) street_number([]) locality([Valais]) weight(1) query12========================= caller_title([]) caller_name([]) title([]) name([Raphaël,Glassey]) street_name([le,Châble]) street_number([]) locality([Valais]) weight(0.5) query12========================= caller_title([]) caller_name([]) title([]) name([Raphaël,Glassey]) street_name([]) street_number([]) locality([Valais]) weight(1) query13========================= caller_title([]) caller_name([]) title([monsieur]) name([Simonin,Gérard]) street_name([Collex]) street_number([]) locality([]) weight(0.3) query13========================= caller_title([]) caller_name([]) title([monsieur]) name([Simonin,Gérard]) street_name([]) street_number([]) locality([Collex]) weight(0.7) query13========================= caller_title([]) caller_name([]) title([monsieur]) name([Simonin,Gérard]) street_name([Collex]) street_number([]) locality([]) weight(0.3) query13========================= caller_title([])

September 15, 1999

street_number([]) locality([Chandossel]) weight(0.5) query27========================= caller_title([]) caller_name([]) title([madame]) name([Joris_Saudan,Mauricette]) street_name([]) street_number([]) locality([Villa^repos]) weight(0.5) query28========================= caller_title([]) caller_name([]) title([madame]) name([Delaloye,Bernadette]) street_name([Oeuches^Domont]) street_number([vingt_trois]) locality([Alle]) weight(0.3) query28========================= caller_title([]) caller_name([]) title([madame]) name([Delaloye,Bernadette]) street_name([Alle]) street_number([]) locality([]) weight(0.3) query28========================= caller_title([]) caller_name([]) title([madame]) name([Delaloye,Bernadette]) street_name([]) street_number([vingt_trois]) locality([Alle]) weight(0.5) query28========================= caller_title([]) caller_name([]) title([madame]) name([Delaloye,Bernadette]) street_name([Alle]) street_number([vingt_trois]) locality([]) weight(0.3) query28========================= caller_title([]) caller_name([]) title([madame]) name([Delaloye,Bernadette]) street_name([]) street_number([vingt_trois]) locality([Alle]) weight(0.5) query29========================= caller_title([]) caller_name([]) title([madame]) name([Blaser,Anne_Lise]) street_name([la,Sapelle, Villars_Tiercelin]) street_number([]) locality([]) weight(0.5)

54

ISIS Project: Final Report

caller_name([]) title([monsieur]) name([Simonin,Gérard]) street_name([]) street_number([]) locality([Collex]) weight(0.7) query14========================= query15========================= caller_title([madame]) caller_name([Lopez]) title([monsieur]) name([Sordet,Jean_Pierre]) street_name([Montates]) street_number([cinq]) locality([Glovelier]) weight(0.3) query15========================= caller_title([madame]) caller_name([Lopez]) title([monsieur]) name([Sordet,Jean_Pierre]) street_name([Glovelier]) street_number([]) locality([]) weight(0.3) query15========================= caller_title([madame]) caller_name([Lopez]) title([monsieur]) name([Sordet,Jean_Pierre]) street_name([]) street_number([cinq]) locality([Glovelier]) weight(0.5) query15========================= caller_title([madame]) caller_name([Lopez]) title([monsieur]) name([Sordet,Jean_Pierre]) street_name([Glovelier]) street_number([cinq]) locality([]) weight(0.3) query15========================= caller_title([madame]) caller_name([Lopez]) title([monsieur]) name([Sordet,Jean_Pierre]) street_name([]) street_number([cinq]) locality([Glovelier]) weight(0.5) query16========================= caller_title([]) caller_name([]) title([madame]) name([Hohermuth,Jacqueline]) street_name([rue,de,Tavernier]) street_number([]) locality([Aubonne]) weight(0.7) query16========================= caller_title([]) caller_name([]) title([madame])

September 15, 1999

query29========================= caller_title([]) caller_name([]) title([madame]) name([Blaser,Anne_Lise]) street_name([]) street_number([]) locality([Sapelle,Villars_Tiercelin]) weight(0.5) query29========================= caller_title([]) caller_name([]) title([madame]) name([Blaser,Anne_Lise]) street_name([]) street_number([]) locality([Sapelle]) weight(0.5) query29========================= caller_title([]) caller_name([]) title([madame]) name([Blaser,Anne_Lise]) street_name([]) street_number([]) locality([Villars_Tiercelin]) weight(0.5) query29========================= caller_title([]) caller_name([]) title([madame]) name([Blaser,Anne_Lise]) street_name([la,Sapelle, Villars_Tiercelin]) street_number([]) locality([]) weight(0.5) query29========================= caller_title([]) caller_name([]) title([madame]) name([Blaser,Anne_Lise]) street_name([]) street_number([]) locality([Sapelle,Villars_Tiercelin]) weight(0.5) query29========================= caller_title([]) caller_name([]) title([madame]) name([Blaser,Anne_Lise]) street_name([]) street_number([]) locality([Sapelle]) weight(0.5) query29========================= caller_title([]) caller_name([]) title([madame]) name([Blaser,Anne_Lise]) street_name([]) street_number([]) locality([Villars_Tiercelin]) weight(0.5) query30========================= caller_title([]) caller_name([]) title([])

55

ISIS Project: Final Report

name([Hohermuth,Jacqueline]) street_name([]) street_number([dix_neuf]) locality([Aubonne]) weight(0.7) query16========================= caller_title([]) caller_name([]) title([madame]) name([Hohermuth,Jacqueline]) street_name([rue,de,Tavernier]) street_number([dix_neuf]) locality([Aubonne]) weight(0.7) query16========================= caller_title([]) caller_name([]) title([madame]) name([Hohermuth,Jacqueline]) street_name([]) street_number([dix_neuf]) locality([Aubonne]) weight(0.7) query17========================= caller_title([]) caller_name([]) title([]) name([]) street_name([Châtel_Saint_Denis, France]) street_number([]) locality([]) weight(0.3) query17========================= caller_title([]) caller_name([]) title([]) name([]) street_name([Fruence]) street_number([trois,cent, trente_cinq]) locality([Châtel_Saint_Denis,France]) weight(0.3) query17========================= caller_title([]) caller_name([]) title([]) name([]) street_name([Fruence]) street_number([trois,cent, trente_cinq]) locality([Châtel_Saint_Denis]) weight(0.3) query17========================= caller_title([]) caller_name([]) title([]) name([]) street_name([Fruence]) street_number([trois,cent, trente_cinq]) locality([France]) weight(0.3) query17========================= caller_title([]) caller_name([]) title([]) name([])

September 15, 1999

name([Voirol,Eric]) street_name([rue,du,Château,Môtiers, Neuchâtel]) street_number([]) locality([]) weight(1) query30========================= caller_title([]) caller_name([]) title([]) name([Voirol,Eric]) street_name([]) street_number([]) locality([Château,Môtiers,Neuchâtel]) weight(0.5) query30========================= caller_title([]) caller_name([]) title([]) name([Voirol,Eric]) street_name([]) street_number([]) locality([Château,Môtiers]) weight(0.5) query30========================= caller_title([]) caller_name([]) title([]) name([Voirol,Eric]) street_name([]) street_number([]) locality([Môtiers,Neuchâtel]) weight(0.5) query30========================= caller_title([]) caller_name([]) title([]) name([Voirol,Eric]) street_name([]) street_number([]) locality([Château]) weight(0.5) query30========================= caller_title([]) caller_name([]) title([]) name([Voirol,Eric]) street_name([]) street_number([]) locality([Môtiers]) weight(0.5) query30========================= caller_title([]) caller_name([]) title([]) name([Voirol,Eric]) street_name([]) street_number([]) locality([Neuchâtel]) weight(0.5) query30========================= caller_title([]) caller_name([]) title([]) name([Voirol,Eric]) street_name([rue,du,Château,Môtiers, Neuchâtel]) street_number([]) locality([])

56

ISIS Project: Final Report

street_name([Châtel_Saint_Denis]) street_number([]) locality([France]) weight(0.3) query17========================= caller_title([]) caller_name([]) title([]) name([]) street_name([France]) street_number([]) locality([]) weight(0.3) query17========================= caller_title([]) caller_name([]) title([]) name([]) street_name([]) street_number([trois,cent, trente_cinq]) locality([Châtel_Saint_Denis,France]) weight(0.5) query17========================= caller_title([]) caller_name([]) title([]) name([]) street_name([]) street_number([trois,cent, trente_cinq]) locality([Châtel_Saint_Denis]) weight(0.5) query17========================= caller_title([]) caller_name([]) title([]) name([]) street_name([]) street_number([trois,cent, trente_cinq]) locality([France]) weight(0.5) query17========================= caller_title([]) caller_name([]) title([]) name([]) street_name([Châtel_Saint_Denis, France]) street_number([trois,cent, trente_cinq]) locality([]) weight(0.3) query17========================= caller_title([]) caller_name([]) title([]) name([]) street_name([Châtel_Saint_Denis]) street_number([trois,cent, trente_cinq]) locality([France]) weight(0.3) query17========================= caller_title([]) caller_name([]) title([]) name([])

September 15, 1999

weight(1) query30========================= caller_title([]) caller_name([]) title([]) name([Voirol,Eric]) street_name([]) street_number([]) locality([Château,Môtiers,Neuchâtel]) weight(0.5) query30========================= caller_title([]) caller_name([]) title([]) name([Voirol,Eric]) street_name([]) street_number([]) locality([Château,Môtiers]) weight(0.5) query30========================= caller_title([]) caller_name([]) title([]) name([Voirol,Eric]) street_name([]) street_number([]) locality([Môtiers,Neuchâtel]) weight(0.5) query30========================= caller_title([]) caller_name([]) title([]) name([Voirol,Eric]) street_name([]) street_number([]) locality([Château]) weight(0.5) query30========================= caller_title([]) caller_name([]) title([]) name([Voirol,Eric]) street_name([]) street_number([]) locality([Môtiers]) weight(0.5) query30========================= caller_title([]) caller_name([]) title([]) name([Voirol,Eric]) street_name([]) street_number([]) locality([Neuchâtel]) weight(0.5)

57

ISIS Project: Final Report

September 15, 1999

street_name([France]) street_number([trois,cent, trente_cinq]) locality([]) weight(0.3) query17========================= caller_title([]) caller_name([]) title([]) name([]) street_name([]) street_number([trois,cent, trente_cinq]) locality([Châtel_Saint_Denis,France]) weight(0.5) query17========================= caller_title([]) caller_name([]) title([]) name([]) street_name([]) street_number([trois,cent, trente_cinq]) locality([Châtel_Saint_Denis]) weight(0.5) query17========================= caller_title([]) caller_name([]) title([]) name([]) street_name([]) street_number([trois,cent, trente_cinq]) locality([France]) weight(0.5)

58

ISIS Project: Final Report

September 15, 1999

H Left-corner Head-driven Island Parser LHIP [7, 16] is a system which performs robust analysis of its input, using a grammar defined in an extended form of the Definite Clause Grammar formalism used for implementation of parsers in Prolog. The chief modifications to the standard Prolog ‘grammar rule’ format are of two types: one or more right-hand side (RHS) items may be marked as ‘heads’, and one or more RHS items may be marked as ‘ignorable’. LHIP employs a different control strategy from that used by Prolog DCGs, in order to allow it to cope with ungrammatical or unforeseen input. The behavior of LHIP can best be understood in terms of the complementary notions of span and cover. A grammar rule is said to produce an island which spans input terminals ti to ti+n if the island starts at the ith terminal, and the i + nth terminal is the terminal immediately to the right of the last terminal of the island. A rule is said to cover m items if m terminals are consumed in the span of the rule. Thus m  n. If m = n then the rule has completely covered the span. As implied here, rules need not cover all of the input in order to succeed. More specifically, the constraints applied in creating islands are such that islands do not have to be adjacent, but may be separated by non-covered input. There are two notions of non-coverage of the input: unsanctioned and sanctioned non-coverage. The former case arises when the grammar simply does not account for some terminal. Sanctioned non-coverage means that special rules, called “ignore” rules, have been applied so that by ignoring parts of the input the islands are adjacent. Those parts of the input that have been ignored are considered to have been consumed. These ignore rules can be invoked individually or as a class. It is this latter capability which distinguishes ignore rules from regular rules, as they are functionally equivalent otherwise, but mainly serve as a notational aid for the grammar writer. Strict adjacency between RHS clauses can be specified in the grammar. It is possible to define global and local thresholds for the proportion of the spanned input that must be covered by rules; in this way, the user of an LHIP grammar can exercise quite fine control over the required accuracy and completeness of the analysis. A chart is kept of successes and failures of rules, both to improve efficiency and provide a means of identifying unattached constituents. In addition, feedback is given to the grammar writer on the degree to which the grammar is able to cope with the given input; in a context of grammar development, this may serve as notification of areas to which the coverage of the grammar might next be extended. Extensions of Prolog DCG grammars in LHIP permit: 1. 2. 3. 4. 5.

nominating certain RHS clauses as heads; marking some RHS clauses as being optional; invocation of ignore rules; imposing adjacency constraints between two RHS clauses; setting a local threshold level in a rule for the fraction of spanned input that must be covered.

A threshold defines the minimum fraction of terminals covered by the rule in relation to the terminals spanned by the rule in order for the rule to succeed. For instance, if a rule spans terminals ti to ti+n covering j terminals in that span, then the rule can only succeed if j=n  T . The following is an example of a LHIP rule. At first sight this rule appears left recursive. However, the sub-rule “conjunction(Conj)” is marked as a head and therefore is evaluated before either of “s(Sl)” or “s(Sr)”. Presuming that the conjunction-rule does not end up invoking (directly or indirectly) the s-rule, then the s-rule is not left-recursive.

59

ISIS Project: Final Report

September 15, 1999

s(conjunct(Conj,Sl,Sr)) ˜˜> s(Sl) *conjunction(Conj), s(Sr). LHIP provides a number of ways of applying a grammar to input. The simplest allows one to enumerate the possible analyses of the input with the grammar. The order in which the results are produced will reflect the lexical ordering of the rules as they are converted by LHIP. With the threshold level set to 0, all analyses possible with the grammar by deletion of input terminals can be generated. By setting the threshold to 1, only those partial analyses that have no unaccounted for terminals within their spans can succeed. Thus, supposing a suitable grammar, for the sentence John saw Mary and Mark saw them there would be analyses corresponding to the sentence itself, as well as John saw Mary, John saw Mark, John saw them, Mary saw them, Mary and Mark saw them, etc. By setting the threshold to 1, only those partial analyses that have no unaccounted for terminals within their spans can succeed. Hence, Mark saw them would receive a valid analysis, as would Mary and Mark saw them, provided that the grammar contains a rule for conjoined NPs; John saw them, on the other hand, would not. As this example illustrates, a partial analysis of this kind may not in fact correspond to a true sub-parse of the input (since Mary and Mark was not a conjoined subject in the original). Some care must therefore be taken in interpreting results. This rule illustrates a number of features: negation, and optional forms. The rule will only succeed if (with respect to the area of input in which it might occur) there is a noun with no determiner. In addition, there can be optional adjectives before the noun. np(propernoun(N,Mods)) ˜˜> ˜ determiner(_), (? adjectives(Mods) ?), * noun(N). This rule illustrates the use of disjunction and embedded Prolog code. It should be noted that within the scope of a disjunction or negation, a head is local to the disjunct or negation. noun(X) ˜˜> ( * @pussy, (? @cat ?); * @cat), {X=cat}. This rule illustrates a typical use of adjacency, to specify compound nouns. Adjacency is not restricted such a use however, but may generally be used anywhere. noun(missionary_camp) ˜˜> @missionary : @camp. A number of tools are provided for producing analyses of input by the grammar with certain constraints. For example, to find the set of analyses that provide maximal coverage over the input, to find the subset of the maximal coverage set that have minimum spans, and to find the find analyses that have maximal thresholds. In addition, other tools can be used to search the chart for constituents that have been found but are not attached to any complete analysis. The conversion of the grammar into Prolog code means that the user of the system can easily develop analysis tools that apply different constraints, using the given tools as building blocks. 60

ISIS Project: Final Report

September 15, 1999

References [1] readme.txt on Polyphone 1.0a CD-ROM for ISIS project. [2] Dario Albesano, Paolo Baggia, Morena Danieli, Roberto Gemello, Elisabetta Gerbino, and Claudio Rullent. Dialogos: a robust system for human-machine spoken dialogue on the telephone. In Proc. of ICASSP, Munich, Germany, 1997. [3] J.F. Allen, B. Miller, E. Ringger, and T. Sikorski. A robust system for natural spoken dialogue. In Proc. 34th Meeting of the Assoc. for Computational Linguistics. Association of Computational Linguistics, June 1996. [4] J. M. Andersen, G. Caloz, and H. Bourlard. SwissCom "Advanced Vocal Interfaces Services" project – Technical report for 1997. Technical Report COM-97-06, IDIAP, December 1997. [5] R. Aragues, J.-C. Chappelier, and M. Rajman. Integration of syntactic constraints within a speech recognition system: Coupling a speech recognizer and a stochastic context-free parser. Technical Report DI-98/309, EPFL, Lausanne (Switzerland), February 1999. [6] Peter. R.J. Asveld. Towards robustness in parsing - fuzzifying context-free language recognition. In J. Dassow, G. Rozemberg, and A. Salomaa, editors, Developments in Language Theory II - At the Crossroad of Mathematics, Computer Science and Biology, pages 443– 453. World Scientific, Singapore, 1996. [7] A. Ballim and G. Russell. LHIP: Extended DCGs for Configurable Robust Parsing. In Proceedings of the 15th International Conference on Computational Linguistics, pages 501 – 507, Kyoto, Japan, 1994. ACL. [8] Manuela Boros, Gerhard Hanrieder, and Ulla Ackermann. Linguistic processing for spoken dialogue systems - experiences made in the syslid project -. In Proceedings of the third CRIM-FORWISS Workshop, Montreal, Canada, 1996. [9] A. Brogi and F. Turini. Meta-logic for program composition: Semantic issues. In K.R. Apt and F.Turini, editors, Meta-Logics and Logic Programming. The MIT Press, 1995. [10] Gilles Caloz. private discussion. [11] J.-C. Chappelier and M. Rajman. A generalized CYK algorithm for parsing stochastic CFG. In TAPD’98 Workshop, pages 133–137, Paris (France), 1998. [12] G. Chollet, J.-L. Cochard, A. Constantinescu, C. Jaboulet, and Ph. Langlais. Swiss French PolyPhone and PolyVar: Telephone speech databases to model inter– and intra-speaker variability. Technical Report RR-96-01, IDIAP, April 1996. [13] S. Jekat, A. Klein, E. Maier, I. Maleck, M. Mast, and J.J. Quantz. Dialogue acts in vermobil. Verbmobil Report 65, DFKI, 1995. [14] R. Kompe, A. Kiebling, T. Kuhn, M. Mast, H. Niemann, E. Nöth, K. Ott, and A. Batliner. Prosody takes over: A prosodically guided dialog system. Verbmobil report 47, DFKI, 1994. [15] E.T. Lee and L.A. Zadeh. Note on fuzzy languages. Information Science, 1:421–434, 1969.

61

ISIS Project: Final Report

September 15, 1999

[16] C. Lieske and A. Ballim. Rethinking natural language processing with prolog. In Proceedings of Practical Applications of Prolog and Practical Applications of Constraint Technology (PAPPACTS98), London,UK, 1998. Practical Application Company. [17] S. Renals. Noway’s manual page, 1994. http://www.clsp.jhu.edu/ws96/ris/man/noway.doc.

University

of

Cambridge,

[18] Stefan Riezler. Quantitative constraint logic programming for weighted grammar applications. In Logical Aspects of Computational Linguistics (LACL’96), LNCS. Springer, 1996. [19] Speech Training and Recognition Unified Tool. 1996.

http://tcts.fpms.ac.be/speech/strut.html,

62

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.