Design of a Natural Command Language Dialogue System

June 23, 2017 | Autor: Gabriel Amores | Categoria: Dialogue System

Descrição do Produto

Design of a Natural Command Language Dialogue System

Jose F. Quesada, Doroteo Torre, J. Gabriel Amores

Distribution: Public

Specification, Interaction and Reconfiguration in Dialogue Understanding Systems: IST-1999-10516 Deliverable D3.2 December, 2000

Specification, Interaction and Reconfiguration in Dialogue Understanding Systems: IST-1999-10516

G¨oteborg University Department of Linguistics SRI Cambridge Natural Language Processing Group Telef´onica Investigacio´ n y Desarrollo SA Unipersonal Speech Technology Division Universit¨at des Saarlandes Department of Computational Linguistics Universidad de Sevilla Julietta Research Group in Natural Language Processing

For copies of reports, updates on project activities and other SIRIDUS-related information, contact: The SIRIDUS Project Administrator SRI International 23 Millers Yard, Mill Lane, Cambridge, United Kingdom CB2 1RQ [email protected]

See also our internet homepage http://www.cam.sri.com/siridus

c 2000, The Individual Authors

No part of this document may be reproduced or transmitted in any form, or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission from the copyright owner.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 3/91

Contents

1

Introduction

2

Hardware Architecture

10

2.1

Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

2.1.1

The PABX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

2.1.2

The PCU (PABX Control Unit) . . . . . . . . . . . . . . . . . . . . . . . .

13

2.1.3

The VRU (Voice Response Unit) . . . . . . . . . . . . . . . . . . . . . . . .

13

Interconnectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

2.2.1

PCU-PABX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

2.2.2

VRU-PCU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

2.2.3

VRU-PABX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

2.2

3

8

Software Architecture

17

3.1

Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

3.2

Multi-Agent Distributed Architecture . . . . . . . . . . . . . . . . . . . . . . . . .

18

3.3

Description of the Agents in the Demonstrator . . . . . . . . . . . . . . . . . . . . .

18

3.3.1

Agent Manager (AM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

3.3.2

Console Input/Output Agents . . . . . . . . . . . . . . . . . . . . . . . . .

20

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

4

Page 4/91

3.3.3

Speech Input / Output Agents . . . . . . . . . . . . . . . . . . . . . . . . .

22

3.3.4

PABX Control Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

3.3.5

Database Control Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

3.3.6

Dialogue Management Agents . . . . . . . . . . . . . . . . . . . . . . . . .

27

Dialogue Management Agent Design

28

4.1

The Dialogue Management Agent in the Multi Agent-based Architecture . . . . . . .

28

4.2

Main Components of the Dialogue Management Agent . . . . . . . . . . . . . . . .

29

4.3

The Speech Prospector: Incorporating dialogue knowledge during speech recognition

33

4.3.1

Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33

4.3.2

Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33

The Input Analyser (Natural Language Understanding): Interpreting the Input into a Set of Dialogue Moves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

4.4.1

Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

4.4.2

Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

4.5

Dialogue Moves Representation: the DTAC protocol . . . . . . . . . . . . . . . . .

35

4.6

The Dialogue Move Selector: Re–organisation of the Dialogue Move Input Pool . . .

37

4.6.1

Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

4.6.2

Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

The Dialogue Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38

4.7.1

Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38

4.7.2

Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

The Output Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

4.8.1

Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

4.8.2

Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

4.4

4.7

4.8

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

5

The Natural Language Understanding Module

42

5.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

42

5.2

Computational Techniques and Formal Properties . . . . . . . . . . . . . . . . . . .

42

5.3

Specifying Linguistic Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

5.4

Lexical and Morphological Analysis . . . . . . . . . . . . . . . . . . . . . . . . . .

44

5.4.1

Simulating Morphological Analysis through Morphological Generation and Efficient Knowledge–Base Retrieval . . . . . . . . . . . . . . . . . . . . . .

44

Mph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

44

Analysis Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47

5.5.1

Unification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

5.5.2

Semantically–oriented Grammar . . . . . . . . . . . . . . . . . . . . . . . .

50

5.4.2 5.5

6

Spoken Language Parsing Strategies

52

6.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

52

6.2

Characteristics of Spoken Language . . . . . . . . . . . . . . . . . . . . . . . . . .

53

6.2.1

The Output of the Speech Recogniser . . . . . . . . . . . . . . . . . . . . .

53

Parsing Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

6.3.1

Lexical Analysis: Void Words . . . . . . . . . . . . . . . . . . . . . . . . .

54

6.3.2

Analysis of Partial Strings . . . . . . . . . . . . . . . . . . . . . . . . . . .

55

Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

56

6.4.1

Length and Position Algorithms . . . . . . . . . . . . . . . . . . . . . . . .

56

6.4.2

Global Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

6.3

6.4

7

Page 5/91

The Dialogue Manager

60

7.1

61

Design Constraints: A Dialogue System for the Automatic Telephone Task Scenario .

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

7.2

8

Page 6/91

Specification Level: Dialogue Rules . . . . . . . . . . . . . . . . . . . . . . . . . .

63

7.2.1

RuleId . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

63

7.2.2

TriggeringConditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

64

7.2.3

PriorityLevel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

65

7.2.4

PreActions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

67

7.2.5

ActionsExpectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

67

7.2.6

DeclareExpectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69

7.2.7

SetExpectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71

7.2.8

PostActions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

72

Dialogue Moves and Types for the Telephone Scenario

74

8.1

Dialogue Moves and Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

74

8.1.1

askCommand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

74

8.1.2

specifyCommand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

75

8.1.3

informExecution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

79

8.1.4

askParameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

81

8.1.5

specifyParameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

82

8.1.6

askConfirmation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

82

8.1.7

answerYN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

83

8.1.8

askContinuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

84

8.1.9

askRepeat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

84

8.1.10 askHelp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85

8.1.11 answerHelp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86

8.1.12 errorRecovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 7/91

8.1.13 greet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

87

8.1.14 quit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

87

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 8/91

Chapter 1

Introduction This document describes the design of the demonstrator to be developed in SIRIDUS Work Package 3. This demonstrator is intended to provide a real example of development of a natural command language dialogue system within the overall context of the SIRIDUS project. The work underlying this deliverable relates to other work being carried out inside the project. First, this demonstrator may be considered as an instantiation of the intended system architecture described in Deliverable D6.1 Siridus System Architecture and Interface Report (Baseline) [Lewin et al 2000]. Therefore, we have followed the general architecture specification described in Chapter 7 of the previous report, with the following modifications: First, the Trindi DME component (Figure 7.1, [Lewin et al 2000]) will be replaced by the DM described in Chapter 4 of this document. Secondly, we will not attempt to use prosodic information during the recognition and synthesis stages of the system. However, the similarities between both architectures are worth noting. First, we will use an agent– based architecture (Chapter 3 of this document) following the KQML standard [Labrou & Finnin 1996, Finnin et al 1993]. Second, we have adopted an Information State Update approach to dialogue management, as proposed in the TRINDI project [Traum et al 1999]. This document presents first the hardware architecture considered for the demonstrator, describing the parts integrating the dialogue system and the interconnections between them (Chapter 2). Then, the document describes the software architecture that will be used in the development of the demonstrator (Chapter 3): a multi–agent distributed architecture. Chapter 3 describes why this architecture has been chosen, the agents that have been considered and the messages that the different agents interchange in order to obtain the global functionality. Chapter 4 describes in more detail one of the most important agents in the demonstrator, the Dialogue Manager, which is in charge of interpreting the user’s input, sending the appropriate messages to the other agents in the system, processing their responses, and producing a suitable output for the user.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 9/91

The most important components of the Dialogue Manager are analysed in more detail in the next chapters. Chapter 5 concentrates on the Natural Language Understanding Module, paying special attention to the specification of linguistic knowledge and the main techniques involved in the morphological and lexical analyses, parsing and unification. Chapter 6 presents some parsing and unification strategies implemented to adapt the Natural Language Understanding Module to Spoken Input. The main goal at this level has been to detect and correct some common speech recognition errors. Chapter 7 describes the Dialogue Manager Module of the Dialogue Management Agent. The chapter analyses the main design constraints which have been taken into account during this phase of system design and describes the programming language designed for the specification of dialogue systems. Finally, Chapter 8 contains the complete list of Dialogue Moves (DMOVE) and Types (TYPE) that will be considered during the implementation of the Natural Command Language System. This chapter may be characterized as the instantiation of the Dialogue Move Scheme for Natural Command Languages presented in [Amores & Quesada 2000] Dialogue Moves in Natural Command Languages for the scenario described in [Torre, Amores & Quesada 2000] User Requirements on a Natural Command Language Dialogue System and based on the architecture and specification formalism presented in this document.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 10/91

Chapter 2

Hardware Architecture This chapter describes the hardware involved in the demonstrator that will be built as part of SIRIDUS Work Package 3. It comprises a series of elements, which are described individually in section 2.1. This section describes the hardware elements and the role they play within the complete demonstrator. To achieve the desired functionality it is necessary that these hardware elements interact with each other. Section 2.2 describes who this interaction is carried out through several communication lines, as well as the information they carry. Figure 2.1 represents the hardware architecture chosen for the demonstrator, including the different hardware elements composing it, and the communication lines interconnecting them.

2.1

Decomposition

The demonstrator to be developed is a system that allows personnel in a large or medium sized institution to perform simple and some advanced telephone functions, as well as some information query functions, using their voice over the phone and natural language (in particular commands issued in a natural way). To achieve this goal, the demonstrator will be composed of three hardware elements:

1. A PABX providing: Internal telephone extensions for the people in an institution. External telephone lines for external telephone calls. 2. A PABX Control Unit (PCU) that will control the PABX in order to perform the telephone functions that the system will provide.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 11/91

3. A Voice Response Unit (VRU) in charge of the user interaction. As this interaction will be held over the phone and using only speech and natural language, this unit will be responsible for: Speech recognition. Speech synthesis. Natural language processing. Dialogue management. The following three subsections describe each of these three hardware elements.

2.1.1

The PABX

The demonstrator will make use of an Ericsson MD110 BC9 PABX currently in operation at Telef´onica I+D. This PABX currently provides about 200 internal extensions for the employees of Telef´onica I+D at Valladolid. The following telephonic services are accessible (by dialling the appropriate codes) from the internal extensions: 1. Internal telephone calls. 2. External telephone calls. 3. Redialling. 4. Multi party conference (up to 8 speakers). 5. Automatic notification when an internal extension gets free. 6. Notification of a second incoming call. 7. Switching between two active incoming calls. 8. Transference of an established incoming call. 9. Capture of incoming calls from another extension. 10. Automatic transference of incoming calls (immediate / if busy / if there is no answer). The demonstrator will make accessible some of these telephone services (only services described under 1, 2, 3, 4 (partially) and 10) through an automatic dialogue system, via voice over the phone. In order to do so, it is necessary to be able to control the PABX in a completely different way, from an external computer connected to the PABX itself and to the computer on which the dialogue system runs. We have called this external computer PABX Control Unit (PCU), and the computer in charge of the dialogue system and all the aspects related to the user input/output via voice Voice Response Unit (VRU). Both computers are described in the next two sections.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Figure 2.1: Hardware Architecture Decomposition and Connectivity

Page 12/91

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

2.1.2

Page 13/91

The PCU (PABX Control Unit)

The PABX Control Unit (PCU) will be a Pentium III PC running under Windows NT 4.0. In order to be able to control the PABX it needs two special software packages:

CTConnect V4.0. This software facilitates the control of several PABXs of different vendors from an external PC. Only several models of PABXs are supported. The software is only available for a limited number of platforms, not including Solaris for PC. Ericsson ApplicationLink/CSTA V3.0.2. This is a lower level software, specific for the Ericsson PABX, that allows CTConnect to effectively control this particular model of PABX.

2.1.3

The VRU (Voice Response Unit)

The Voice Response Unit (VRU) will be in charge of the system–user interaction. This interaction will be held entirely via voice over telephone lines. Therefore, it will need software to perform voice recognition and voice synthesis over the phone. In order to provide these means, the VRU consists of a Pentium II PC running under SUN Solaris 2.5.1 and some specialised hardware and software:

Two Dialogic Antares 3000/50 PC Boards These are DSP boards with four Texas Instruments DSPs. They improve the digital signal processing capabilities of the PC in order to be able to process digital samples of telephonic speech for recognition more efficiently, and to be able to produce digital samples of telephonic synthetic speech. In the VRU one of these boards will be used to process the samples of the user’s voice in order to perform the speech recognition. Telef´onica’s Speech Recogniser runs partially on this board. The other board will be used to generate a synthetic voice that will constitute the system’s output toward the user. Telef´onica’s Speech Synthesizer runs on this board. A Dialogic D41E PC board. This board provides an interface for up to four telephone lines, which can be sampled and connected digitally to the Antares DSP boards through a specialised bus (SC-BUS) different from the internal PC bus. The libraries that come along with this board allow a program to perform basic functions with the telephone line, including waiting for an incoming telephone call, connecting the incoming telephone call to an Antares Board, and disconnecting the line.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 14/91

Figure 2.2 represents the hardware elements that compose the VRU, as well as the interconnections between them and the external inputs and outputs.

2.2

Interconnectivity

In order to perform the intended functions, the three hardware elements that compose the demonstrator will need to be interconnected and interchange information of different sorts.

2.2.1

PCU-PABX

The PABX Control Unit (PCU) will be connected to the PABX (as can be seen in Figure 2.1) by means of a serial line. Through this line, the PABX control unit will issue control commands to the PABX and will receive notifications from the PABX. The communication through this serial line is made possible through the CTConnect software and the underlying Ericsson ApplicationLink software. Making use of this software, an application will be developed to properly handle the notifications from the PABX and to issue the commands to the PABX as required. This application will run on the PCU and will need to establish a communication with the Voice Response Unit (VRU). This communication is analysed in the next section.

2.2.2

VRU-PCU

One of the main goals of the demonstrator to be developed is to facilitate the access to several telephone functions by means of the natural language speech over the phone. For that reason, it is essential that communication exists between the Voice Response Unit (VRU), in charge of the voice interaction with the user, and the PABX Control Unit. This communication will be through a Local Area Network (LAN) (as shown in Figure 2.1) to which both computers (PCU and VRU) will be connected. This communication will mainly happen from the VRU to the PCU, as its main goal is to allow the VRU to send the appropriate commands to the PCU in order to perform the functions requested by the user. This way the PCU will be in charge of translating these commands into PABX commands, which will be, in turn, sent to the PABX. Communication in the opposite direction may also be necessary under certain circumstances. Therefore, in the design phase of the demonstrator we leave open the possibility of communication from the PCU to the VRU.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Figure 2.2: Voice Response Unit Hardware Architecture

Page 15/91

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

2.2.3

Page 16/91

VRU-PABX

As can be seen in Figure 2.1, the VRU will be connected to the PABX by means of four telephone lines. All system-user interaction will be held through these telephone lines, which will carry both the user’s and the synthetic system’s voice. These telephone lines will also carry the usual telephone signalling information (i.e. incoming call tone), which the VRU will need to be able to handle as well (in particular,the telephone line interface PC board will handle it).

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 17/91

Chapter 3

Software Architecture This chapter describes the software architecture for the demonstrator that will be built in SIRIDUS Work Package 3. This software will run on the two PCs that form part of the demonstrator, the PABX Control Unit (PCU) and the Voice Response Unit (VRU).

3.1

Goals

The pursued goals for the software architecture are basically the following:

Modularity Modularity is a highly desirable characteristic in every software development. In the case of this demonstrator, this characteristic is of capital importance because the development of the demonstrator’s software will involve three different groups at three different locations in Spain: Telef´onica I+D at Valladolid, Telef´onica I+D at Madrid, and the University of Seville. For this reason it is essential to make the software to develop as modular as possible. Facilities for distributed execution. Due to technical limitations, it is necessary to use two computers (PCU and VRU) in the demonstrator. The software running on each of these computers needs to communicate with the software on the other. Therefore, it is important that the architecture chosen facilitates the distributed execution of programs, and the communication between them. Facilities for multi-platform communication. Due to the same technical limitations, both computers (PCU and VRU) need to use different operating systems, Windows NT and SUN Solaris. Therefore it would be desirable to choose an architecture that facilitates the communication between programs running on different platforms.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

3.2

Page 18/91

Multi-Agent Distributed Architecture

To try to achieve these goals a multi-agent distributed architecture will be used. The software will be divided into modules (or agents) that will play a particular role within the complete system, and that will communicate with the others in order to obtain the global functionality. Each agent will be a separate program that will communicate with other agents through KQML ASCII text messages. KQML stands for Knowledge Query and Manipulation Language, and is a standardised language for inter-agent communication. Different specification versions of this language can be found in [Labrou & Finnin 1996, Finnin et al 1993]. The agent architecture we envisage for the demonstrator will be a centralised architecture, in the sense that there will be a central agent through which all the messaging will pass. This central agent will route the messages to their destinations, and will provide additional services to the rest of the agents integrating the demonstrator. This choice seems to be consistent with the trend of some of the most advanced dialogue systems. For instance TRAINS [Trains Web Page, Ferguson et al 1996], TRIPS [Trips Web Page, Allen et al 2001], and the GALAXY architecture [Seneff et al 1998, Seneff et al 1999] make use of multi–agent distributed architectures. Moreover, the architectures they use also make use of a central agent that controls the communications. The TRAINS system and the TRIPS system also use KQML as the communication language between the agents. The following section describes in more detail the different agents that integrate the complete demonstrator. It also describes the KQML messages that they will interchange, and their meanings.

3.3

Description of the Agents in the Demonstrator

As a result of the adoption of a multi-agent distributed architecture for our demonstrator, the software to develop can be described as a set of independent programs or agents that run concurrently in the same or different machines and interchange KQML messages in order to achieve the desired functionality. This section describes the different agents that are considered for the demonstrator, as well as the messages that they will be able to handle and send to other agents. These messages are the only interaction between agents. Therefore, defining the messages handled by an agent is equivalent to completely defining its interface. Figure 3.1 shows the agents integrating the demonstrator. As can be seen, all the agents will run on the Voice Response Unit (VRU). The only exception is the PABX control agent, which will run on the PABX Control Unit (PCU).

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Figure 3.1: Software Architecture. Distributed Multi-Agent Architecture

Page 19/91

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

3.3.1

Page 20/91

Agent Manager (AM)

This agent is the centre of communications of the agent architecture. All messages between any two agents necessarily pass through this agent. Every agent in the architecture has to register by sending a message to the AM before it can send a message to any other agent. This way the AM agent maintains a record of the agents connected to the system, and knows how to send a message to any of the connected agents. Messages sent: None. It only routes messages to the appropriate agents. Messages received: – (register :receiver AM :name N) This causes an agent to be registered by the AM with the name N. – (broadcast :receiver AM :content C) This makes the AM to send a message with content C to all the agents connected at that moment.

3.3.2

Console Input/Output Agents

The main input and output modality in the demonstrator to be produced in SIRIDUS work package 3 is voice over the phone. However, in the development and use of the demonstrator it would be highly convenient to have a collection of agents that allow to introduce text as the user’s input and to obtain text as the user’s output. It would also be interesting to have the complete transcription of the dialogue in text format. For that reason the following agents will be developed.

TEXT-IN Agent This agent allows a user to introduce textual input to the demonstrator. This way the demonstrator will work with either speech or text input, which is very convenient for its debugging and also for performing demonstrations. This agent simulates the message sent to the rest of the demonstrator when a speech input has been produced (provided that the recognised sentence is the same as the typed text). Therefore, for the rest of the demonstrator it will be indistinguishable whether the input has been spoken or typed. Messages sent: – (register :receiver AM :name "TEXT-IN") TEXT-IN agent registration message.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 21/91

– (broadcast :receiver AM :content (tell :content (user-input "xxxx"))) Each time a text input is finished, the TEXT-IN agent sends a broadcast message to inform all the agents in the system of the new user input. The agents more interested in this information, however, are the dialogue manager (DM) agent and the TRANSCRIPT agent. Other agents will normally discard this message. Messages received: None expected.

TEXT-OUT Agent The system will always produce output in two modalities: voice over the phone and text on the screen. The main modality is, however, voice over the phone, being text on the screen only a useful add–on for demonstration and debugging purposes. This agent presents the output of the system in textual form on the screen of the VRU. In order to do so, it processes the broadcast messages which the LINE-INT agent issues to inform the rest of the system of a new spoken system output.

Messages sent: – (register :receiver AM :name "TEXT-OUT") TEXT-OUT agent registration message. Messages received: – (tell :content (system-output "xxxx")) This message is sent by the LINE-INT agent to confirm that a new system’s spoken output has been issued.

TRANSCRIPTION Agent This agent maintains the transcription of the whole user-system dialogue on the screen of the VRY in a readable format. As the other console input/output agents, its main utility is to facilitate the demonstration of the system and for debugging and dialogue refining purposes.

Messages sent: – (register :receiver AM :name "TRANSCRIPTION") TRANSCRIPTION agent registration message.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 22/91

Messages received: – (tell :content (user-input "xxxx")) This message is sent by either the LINE-INT agent or the TEXT-IN agent to inform all the system of a new user input (either spoken or written). This way the TRANSCRIPTION agent can update the transcription on the screen. – (tell :content (system-output "xxxx")) This message is sent by the LINE-INT agent as a confirmation of a new system output, and is used by the TRANSCRIPTION agent to add the system output to the transcription on the screen.

3.3.3

Speech Input / Output Agents

The agents described in this section allow the demonstrator to work with speech input and output over a telephone line, which is the main input/output modality in the intended demonstrator. The restriction of using voice over the phone imposes a series of additional requirements on these agents. They have to manage not only a speech recogniser and synthesiser, but also a telephone line. In particular, the state of the telephone line has to be monitored in order to connect and disconnect the line to the speech recogniser and synthesiser when appropriate, and to notify the rest of the system when an incoming call is received and when the telephone call has ended. Due to these requirements it seems to be easier to group all the functions related to the telephone line (including speech recognition and synthesis) in a single agent which pretends to be an abstraction of the telephone line. We have called this agent the Telephone Line Interface (LINE-INT) agent.

LINE-INT Agent This agent provides the necessary interface to control a telephone line with speech recognition and speech synthesis capabilities. Messages sent: – (register :receiver AM :name "LINE-INT") LINE-INT agent registration message. – (broadcast :receiver AM :content (tell :content (line-start))) This way LINE-INT notifies all the agents in the system that an incoming call has been established. This message should be processed at least by the Dialogue Manager agent (it indicates that a new conversation starts), and by the TRANSCRIPTION agent. (broadcast :receiver AM :content (tell :content (line-stop)))

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 23/91

This way LINE-INT notifies all the agents in the system that a previously established telephone call has ended (on the user or system’s initiative). This message should be taken into account at least by the Dialogue Manager agent (as it indicates that the conversation has finished), and by the TRANSCRIPTION agent. Messages received: – (request :receiver LINE-INT :content (say "xxxx")) This message is sent to the LINE-INT agent to ask it to produce a synthetic speech output to the user. Once the synthetic speech has been produced and sent to the user, LINE-INT replies with a broadcast message indicating that a new system output has been produced. (broadcast :receiver AM :content (tell :content (system-output "xxxx"))) This message is used by the TRANSCRIPTION and the TEXT-OUT agents. – (request :receiver LINE-INT :content (recognize)) This message is sent to the LINE-INT agent to start the speech recogniser. The speech recogniser ends automatically when it detects a long silence. When it has finished, the LINE-INT agent broadcasts a message to inform all the agents in the system of the recognition results. (broadcast :receiver AM :content (tell :content (user-input "xxxx"))) This message is used by the Dialogue Manager (DM) agent and the TRANSCRIPTION agent. – (request :receiver LINE-INT :content (line-stop)) This message requests the LINE-INT agent to disconnect a previously established telephone call. This message may be sent by the Dialogue Manager (DM) agent to ask the LINE-INT agent to finish a conversation with a user. When the telephone call has finished the LINE-INT agent informs the rest of the agents by broadcasting the message (broadcast :receiver AM :content (tell :content (line-stop)))

3.3.4

PABX Control Agents

The control of the PABX will be the responsibility of the PABX Control Unit (PCU). The agents involved in this control will run on this computer. In principle, only one agent will be in charge of controlling the PABX, the PABX agent.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 24/91

PABX Agent This agent will accept requests from the Dialogue Manager (DM) agent asking it to perform several control actions over the PABX, such as connecting two or more extensions, establishing/cancelling the forwarding of incoming calls, etc. The messages that this agent will interchange with the dialogue manager have not been completely defined yet. In particular, during the development of the PABX agent the necessity for other messages may arise. The messages that have already been identified as necessary to handle by the PABX agent are the following:

Messages sent: – (register :receiver AM :name "PABX") PABX agent registration message. Messages received (not completely defined yet): – (request :content (connect number1 number2)) This message asks the PABX control agent to connect two extensions / external lines to establish a communication, or to connect one additional extension to a previously established communication (for three party conference calls). – (request :content (activate-forwarding extension-number forward-number)) This message asks the PABX control agent to activate the forwarding of incoming calls to certain extension. – (request :content (deactivate-forwarding extension-number)) This message is sent to the PABX control agent to deactivate the forwarding of incoming calls to an extension number.

3.3.5

Database Control Agent

The demonstrator to be developed in SIRIDUS work package 3 is designed to provide voice access to two external resources: the PABX and a database containing the directory of an institution (Telef´onica I+D in our case). The PABX and its control is conceptualised as an agent, and the database is conceptualised as another agent, the Data Base (DB) agent. This way, all the necessary interactions with the PABX will be done through the PABX agent, and all necessary interactions with the database will be done through the DB agent.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 25/91

DB Agent This agent will handle requests from the dialogue system to access the database in either read or write mode. The database will contain the following fields for each user of the system (each person in the example institution, Telef´onica I+D):

Database record number (record). First name (name). First surname (surname1) Second surname (surname2) Telephone extension (extension). Office number (office). E-mail address (e-mail). The extension to which this terminal is forwarded (fwd-ext), or 0 if not forwarded. The last number called by the user (last-call). The service mode (mode). Each user may choose between having this automatic service deactivated, partially or completely activated. This field will indicate the degree of activation of the automatic service chosen by the user.

The DB agent will allow access to all this information. Under certain circumstances the data supplied may match several entries. The interface with the DB Agent will consist of four functions (supported as messages between the agents).

Request a select operation. Return the number of entries that match the pattern of the last select operation. Return the Nth selected record. Update a record.

In cases of ambiguity (more than 1 record matches the query) these operations will allow the dialogue manager to set up a disambiguation sub–dialogue. For instance, after the user has requested to call Peter, and if there are two ’Peter’ entries, the system should ask the user:

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 26/91

Do you want to call Peter Johnson or Peter Smith? instead of just repeating Who do you want to call? Messages sent: – (register :receiver AM :name "DB") DB agent registration message. Messages received: – (request :content (select :name XXXX :surname1 YYYY :surname2 ZZZZ ) ) This message requests the DB agent to look for all the records in the database that match the partial record specification in the find command. The answer from the DB agent will take the form: (reply :content (found :matches N) ) This answer indicates the number of records that match the currently activated select message. – (request :content (get :record_number N) ) This message requests the database agent to return the Nth record that matched the previous select operation. The database agent will reply to this message with a message that indicates the values of every field in that record: (reply :content (record :record_number N : .... : ) ) – (request :content (update :record_number N : .... : ) ) This message asks the DB agent to modify (update) the specified fields in the record number nnnn. Not all the record fields could be modified (for instance, it will not be possible to modify the record number). Also, the DB agent may allow only some agents to modify the database (messages always include a field :sender that specifies the sender of the message, so that the DB agent may identify the agent that requested an update command). The DB agent may reply indicating that the operation has successfully been executed with the following message:

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 27/91

(reply :content (updated)) or with an indication of an error (reply :content (update-error))

3.3.6

Dialogue Management Agents

Last but not least, there will be a set of agents that will be in charge of the natural language processing and the interpretation of the user’s input, so that they produce the desired results. In order to do so, these agents will need to interact and control in some way the rest of the agents in the system. Initially everything related to natural language processing, interpretation of the user input, execution of the appropriate actions and generation of the output language will be performed by a single agent, the Dialogue Manager (DM) agent.

DM Agent This agent will receive as its input the results of the speech recogniser (output of the LINE-INT agent) or the output of the TEXT-IN agent and will analyse it and process it, generating the appropriate messages to the PABX control (PABX) agent and the database (DB) agent, processing their answers, and generating an appropriate output that will be sent to the LINE-INT agent to produce the synthetic voice.

Messages sent: – (register :receiver AM :name "DM") DM agent registration message. – The DM agent also generates most of the input messages for the agents PABX, DB and LINE-INT. Messages received: The DM agent manages most of the output messages from the PABX, DB, LINE-INT and TEXT-IN agents.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 28/91

Chapter 4

Dialogue Management Agent Design This chapter describes the system design of the dialogue manager in a spoken language dialogue system for Spanish in the domain of Natural Command Languages. From a functional point of view, the overall system can be seen as an interface to a telephone service, where the dialogue manager acts as an automatic telephone operator. This design is based on the analysis of the domain, user requirements and the illustrative corpus outlined in User Requirements on a Natural Command Language Dialogue System [Torre, Amores & Quesada 2000]. From an operational point of view, the design of the dialogue manager is based on and makes an extensive use of the notion of dialogue move, and specifically of the set of dialogue moves described in Dialogue Moves in Natural Command Languages [Amores & Quesada 2000].

4.1

The Dialogue Management Agent in the Multi Agent-based Architecture

At a high level of abstraction, the dialogue manager may be considered as an agent communicating dynamically with several other agents in the overall architecture of the system, such as:

The Database Agent, which allows consulting operations in the general (organisation) directory. The Input/Output Agent, allowing for a functional abstraction about the specific characteristics of the components in charge of the real input and output. The interface between the dialogue manager and the input/output agents may be seen as an asynchronous channel. The PABX Agent, which will actually execute the commands detected and confirmed by the Dialogue Management Agent.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 29/91

Process Manager Database Access

PABX Dialogue Management

Control

Speech Input/Output

Figure 4.1: The Dialogue Management Agent in the Multi Agent–based Architecture

The Process Manager (or Agents Manager) Agent in charge of the control of the different agents involved. Figure 4.1 below shows the relations between these four agents, including the control and data flow of information.

4.2

Main Components of the Dialogue Management Agent

Coming down in the level of abstraction, this section will analyse the main components of the Dialogue Management Agent (DMA). At this level, we will differentiate five modules: Speech Prospector, Input Analyser (Natural Language Understanding), Dialogue Move Selector, Dialogue Manager, and Output Generator,

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 30/91

two main stores:

Dialogue Moves Input Pool, and Dialogue History Blackboard,

and an external resource (controlled by the Database Agent):

Directory and User Database.

Figure 4.2 presents the internal structure of the Dialogue Management Agent, including the main data and control connection between the different elements. In order to have a general picture about the functions and relationships of the different components, let’s assume that at some point in the man–machine interaction, the user says something (asks a command, replies to a question, etc.). The user’s input itself will be processed by the Input/Output Agent. In the real scenario of the target application, the input module will be a speech recogniser. Therefore, at first glance, the result of the first input analysis should be a string of words containing the textual transcription of the most likely sentence recognised by the speech recogniser. Nevertheless, one of the main goals of the Siridus project is to take advantage, at any stage of the whole dialogue management system, of the integration of the different components. At this moment, the speech recognition task may take advantage of the current informational status of the whole system (last dialogue moves, current active expectations, dialogue history, etc.) and the domain knowledge (set of possible commands and the parameter model for each command, current user’s device configuration, special or high priority commands, etc.) In order to accomplish this functionality, the Dialogue Management Agent includes a Speech Prospector. In the ideal situation, this module will have access to the internal representation of the speech recogniser (word lattice, timing marks, acoustic scores, and so on). As a result of the integration of this information and the dialogue history we expect that the Speech Prospector will improve the semantic adequacy of the string of words obtained from the analysis of the user’s input. A string of words will be the input to the Input Analyser (Natural Language Understanding) module. The goal of this module is to obtain the dialogue move, or the set of dialogue moves in case of a multi– functional or ambiguous input. The architecture proposed is based on a unification–based grammar. That is, the Input Analyser contains a lexico–morphological submodule, a parser and a unifier. They use the DTAC protocol for the representation of the dialogue moves. The Input Analyser module sends the dialogue moves obtained to the Dialogue Move Selector. This module controls the Dialogue Moves Input Pool. This store contains the ordered set of dialogue moves not yet processed by the Dialogue Manager module.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 31/91

Database Dialogue Moves Input Pool

Access

Dialogue Manager Dialogue Move Selector

Dialogue Input Analyser

History Blackboard

PABX Control

Speech Prospector

Output Generator

Speech Input/Output

Figure 4.2: Main Components of the Dialogue Management Agent

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 32/91

The Speech Prospector, the Input Analyser and the Dialogue Move Selector modules may be connected using a sequential and synchronous strategy, defining what should be called the User Input Analysis block of the Dialogue Management Agent. Nevertheless, this Agent has two main barriers of synchronism. The first one appears in the interface between the Dialogue Manager and the Dialogue Move Selector (in fact, between the Dialogue Manager and the User Input Analysis block as a whole). The idea is that the Dialogue Manager may be processing a dialogue move while (just at the very same time) the user makes a new utterance, activating (in parallel to the Dialogue Manager), the Input Agent, and, in a cascade, the Speech Prospector, the Input Analyser, and the Dialogue Move Selector. So, one of the main goals of the Dialogue Moves Input Pool is to allow this asynchronous and parallel work of the User Input Analysis block of modules and the Dialogue Manager. To allow this, the first function of the Dialogue Move Selector is to store each dialogue move received from the Input Analyser in the Dialogue Moves Input Pool. The Dialogue Move Selector must also reorder the dialogue moves in the Input Pool according to the priority level of each dialogue move, maintaining a totally ordered set of dialogue moves. The score used for this operation may be statically assigned by the Input Analyser (for instance, if the user requests the cancellation of the current command, this dialogue move should have a very high priority level, in order to be able to interrupt the Dialogue Manager). Also, the Dialogue Move Selector may assign the scores dynamically taking into account the relationships between the different dialogue moves in the pool, the status of the Dialogue Manager and the dialogue history stored in the Dialogue History Blackboard store. The Dialogue Manager module will get the top dialogue move in the Dialogue Moves Input Pool (with the highest priority level) and will analyse it using the domain knowledge previously stored as a set of dialogue rules and the dialogue history. During this operation, the Dialogue Manager may use additional information stored in the external Data Base resource (for instance, to consult a directory entry, or to update the last call of the user in order to retry it later). The processing of the Dialogue Manager may be interrupted by a higher priority dialogue move by a control call from the Dialogue Move Selector. The successful completion of the Dialogue Manager will generate a dialogue move as a result. That is, the dialogue manager is a function that uses as parameters the dialogue move under processing, the set of dialogue rules, and the dialogue history, and returns the dialogue move that contains the information state corresponding to the integration of the new dialogue move in the dialogue history according to the dialogue rules. The resulting dialogue move will be stored in the Dialogue History Blackboard. This store marks the second asynchronism in the whole Dialogue Management Agent. The dialogue history contains the time–based ordered set of dialogue moves that defines the evolution of the information state of the Dialogue Management agent. This information is not merely anecdotic but very informative, as it contains the set of open dialogue moves (corresponding to actions triggered by the user but not yet completed) and the previous commands and parameter values, which may be used to solve anaphoric references.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 33/91

The Dialogue Manager is also connected to the Output Generator, which will have to translate the dialogue move obtained by the Dialogue Manager into a sentence that will be communicated to the user by means of the Output Agent (a Display Agent or a Speech Synthesiser).

4.3

The Speech Prospector: Incorporating dialogue knowledge during speech recognition

4.3.1

Description

The goal of this module is to improve speech recognition by exploiting the advantages of an information– state change view of dialogue, one of the main objectives of Work Package 2: Relating Information States to Speech Input and Output. This module will be an experimental workbench that will allow the investigation of different integration strategies between the natural command language–based dialogue system in Spanish and the speech input analyser. This module is connected to the input component of the Input/Output Agent. Typically, this module will be a speech recogniser: the LINE-INT Agent. As the type of the input may be typed text, making use of the Keyboard Agent, in this case, this module will have no function to do. The actual implementation of this module will depend on technical limitations and on the results obtained in other SIRIDUS work packages. So, the demonstrator may or may not include this component in its final version.

4.3.2

Interface

The interface between the Speech Input agent and the Speech Prospector module of the Dialogue Management agent will consist of the following main functions:

UserUtterance(speechRecDataStructure) The Speech Input sends, by the use of this function, the internal data structure obtained by the Speech Recogniser to the Speech Prospector. Depending on different speech recogniser techniques, this speech recogniser data structure may contain or not the following items: – The n–best hypotheses, with explicit scores. – The word lattice, probably extended with timing marks, and – Acoustic annotations (such as prosody, noise, etc.).

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 34/91

In the Agent–based Architecture this function will be supported by the following message: (broadcast :content (user-input "xxxx")) sent by the LINE-INT Agent (section 3.3.3), where "xxxx" contains the information obtained by the Speech Recognizer (either the n–best hypotheses or the word lattice). It is important to note here that in the final implementation, due mainly to technical limitations related to the integration of the Speech Input/Output Interfaces, the PABX and the PABX Control Unit, it will not be possible to access the Speech Recogniser’s word lattice in the real–time final application. Therefore, the main goal of the Speech Prospector is to serve as a research workbench to investigate off–line several techniques to improve the Speech Input recognition process taking into account the information state of the Dialogue Manager. ImprovedUserUtterance(string) Using the information state of the dialogue stored in the Dialogue History Blackboard, the Speech Prospector will try to improve the recognised user utterance. This function constitutes the output interface of the Speech Prospector.

4.4

The Input Analyser (Natural Language Understanding): Interpreting the Input into a Set of Dialogue Moves

4.4.1

Description

The goal of this module is to transform the (probably improved) user input (extended with additional telephone operations such as TimeOut or HangUp) into a Set of Dialogue Moves. The NLP system that constitutes the kernel of the Input Analyser [L´opez & Quesada 1998] carries out the lexical, morphological, grammatical and semantic analysis, augmented with a wide range of speech repair strategies [L´opez & Quesada 1999]. The final semantic interpretation of this module will be then a set of one or more dialogue moves. Chapters 5 and 6 describe the Natural Language Understanding module in detail. Dialogue Moves will be represented in a uniform and consistent format in the different modules of the Dialogue Management Agent. Specifically, this design is based on the DTAC protocol, which will be presented below.

4.4.2

Interface ImprovedUserUtterance(string) Receives as the input of the Input Analyser the result of the Speech Prospector.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 35/91

In text–to–text environments (debuging mainly) this function will correspond to the following message (see section 3.3.2: Console Input/Output Agents): (broadcast :content (user-input "xxxx")) UserDialogueMove(dialogueMove) This function sends each dialogue move obtained by the Input Analyser to the Dialogue Move Selector.

4.5

Dialogue Moves Representation: the DTAC protocol

This protocol, formally based on the extended Lexical Object Theory [Quesada 1998], guarantees an efficient, bi-directional, flexible and transparent communication between the Input Analyser (NLP system) and the rest of dialogue management modules. The NLP system transforms each utterance into a list (one or more) of feature structures according to the DTAC protocol. The DTAC protocol stands for Dialogue Move, Type, Arguments and Contents. 1. DMOVE: This feature identifies the kind of dialogue move. Its range of values is the set of dialogue moves for Natural Command Languages specified in the document Dialogue Moves in Natural Command Languages [Amores & Quesada 2000].

Command-oriented DMs

Parameter-oriented DMs Interaction-oriented DMs

Dialogue Moves (DM) in NCL askCommand specifyCommand informExecution askParameter specifyParameter askConfirmation answerYN askContinuation askRepeat askHelp answerHelp errorRecovery greet quit

2. TYPE: This feature identifies the specific dialogue move in the kind of the corresponding DMOVE. For instance, the sentence “I would like to transfer my telephone calls” belongs to the specifyCommand DMOVE category, and to the TransferCall TYPE.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 36/91

While the DMOVE classification intends to be domain and implementation independent, the set of TYPEs will be domain dependent. In some sense, the TYPE classification instantiates the DMOVE model to the specific domain. Chapter 8 contains the set of TYPEs defined for the spoken dialogue system, with some examples of their representation. 3. ARGS: Some types of dialogue moves may require the presence of one or more arguments. The ARGS feature specifies the argument structure of the DMOVE/TYPE pair. This takes the form of a list in which conjunction, disjunction, and optional operators may appear. 4. CONT: This feature represents the particular values associated with each element of the ARGS attribute. For terminal DTAC structures (with an empty ARGS list), the CONT will specify the value of the structure. For non–terminal DTAC structures (with a non–empty ARGS list), the CONT is recursively represented by the CONT feature of each feature whose name equals the ARGS value.

As an example, the DTAC representation of the utterance

“Transfer my calls”

will be:

The corresponding DTAC representation for the utterance

“The number is 1234”

is:

And the utterance:

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 37/91

“Transfer my calls to the number is 1234”

which includes both the command and the parameter, will be incrementally represented as follows:

In this case, the value of the CONT feature of the main structure (specifyCommand / TransferCall) points to the CONT feature of the feature named Destination (ARGS value) in the main structure, that is, the CONT (1234) of the specifyParameter/Extension sub–structure.

4.6

The Dialogue Move Selector: Re–organisation of the Dialogue Move Input Pool

4.6.1

Description

The result of the Input Analyser will be sent to the Dialogue Move Selector module. The goal of this module will be therefore to re–organise the dialogue moves stored in this pool in order to take into account possible dependencies between them, as well as to raise high priority functions, such as HangUp. This module uses the Dialogue Move Input Pool Store both as the input and as the output, and is activated by each new entry sent by the Input Analyser. For his work, the Dialogue Move Selector may use the previous dialogue history stored in the Dialogue History Blackboard, and also it may ask the Dialogue Manager about its current state of work (wait, dialogue move under processing, ...).

4.6.2

Interface UserDialogueMove(dialogueMove) This function sends each dialogue move obtained by the Input Analyser to the Dialogue Move Selector.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 38/91

GetInputPoolSize() Returns the number of dialogue moves currently stored in the Dialogue Moves Input Pool. GetNDialogueMove(N) Return the Nth dialogue move stored, where N goes from 1 (high priority or top score) to GetInputPoolSize() (low priority or bottom score). SaveDialogueMove(dialogueMove,priorityLevel) This function saves a dialogue Move in the Dialogue Moves Input Pool with an associated priority level, which will automatically reorder the dialogue moves stored. These last three functions define the interface between the Dialogue Move Selector and the Dialogue Moves Input Pool. AlertNewDM(priorityLevel) This function defines a control interface between the Dialogue Move Selector and the Dialogue Manager modules, trying to avoid the problems raised by the asynchronous behaviour of both modules. Basically, this function tells the Dialogue Manager that a new dialogue move with priorityLevel has just been received.

4.7

The Dialogue Manager

4.7.1

Description

The man–machine dialogue itself is under the control of this module. Its main sources of knowledge will be:

Domain knowledge: represented as a set of declarative dialogue rules. Input Dialogue Move: the top dialogue move selected by the Dialogue Move Selector at any moment. Dialogue History: the whole dialogue so far, including the successful commands and those unsuccessfully completed, as well as the command under completion and execution (which corresponds to the expectations of the question under discussion).

The dialogue manager may be viewed as a function that goes from these arguments: the set of dialogue rules (domain knowledge), the current input dialogue move, and the context (dialogue history), to a new dialogue move, which, functionally speaking, represents the information state of the dialogue manager. The dialogue moves obtained by this module will be sequentially stored in the Dialogue History Blackboard.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 39/91

Usually, the Dialogue Manager will require extra information from the external database. In order to allow a high functionality, the interface of this communication will support SQL. Besides, this module will have to communicate to the PABX agent the commands to be executed, when these have been detected, completed and confirmed. Chapter 7 describes the Dialogue Manager in detail.

4.7.2

Interface ExtractTopDialogueMove() This function returns the move with the highest priority in the Input Pool (functionally equivalent to GetNDialogueMove(1)), and it also deletes it from the pool. There is no need of an additional reorganisation of the remaining dialogue moves as the order is maintained. This function constitutes the main input interface of the Dialogue Manager module. In order to conveniently manipulate high priority user’s commands (such us cancellation of the command under execution, even the hang up of the line, or a waiting request), the Dialogue Move Selector must send the priority level of each dialogue move that has been received from the Input Analyser. So, the control function AlertNewDM(priorityLevel) is part of the input interface of the Dialogue Manager. This message will awake the Dialogue Manager if it was waiting for new input, or it may interrupt the process under execution if there is a dialogue move with a higher priority. ConsultDatabase(SQLsentence) This function defines the bi–directional flow of information between the Dialogue Manager and the Database Access agent. It should consist of SQL commands and results. This function will be implemented in the Agent–Architecture by means of the different messages supported by the Database Control agent (section 3.3.5): – (request :content (select :name XXXX :surname1 YYYY :surname2 ZZZZ ) ) – (reply :content (found :matches N) ) – (request :content (get :record_number N) ) – (reply :content (record :record_number N : .... : ) ) – (request :content (update :record_number N : .... : ) )

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 40/91

– (reply :content (updated)) – (reply :content (update-error)) ConsultDialogueHistory(DTACpattern) This function will return the first dialogue move of the Dialogue History Blackboard that matches the pattern used as the argument. It is worth noting that for efficiency reasons, the dialogue moves will be stored as a FIFO queue. So the search process activated by this function begins with the most recent dialogue move stored. SaveDialogueMove(dialogueMove) Once the Dialogue Manager has finished the analysis and processing of a user input dialogue move (taken from the Dialogue Moves Input Pool), the result of this process will change the information state of the Dialogue Management agent. This new information state may be represented as a dialogue move using the DTAC protocol. The SaveDialogueMove(dialogueMove) function will then store the new information state in the Dialogue History Blackboard, allowing its consultation by any module of the agent. ExecuteCommand(command) This function defines the output interface between the Dialogue Manager and the PABX Control agent in charge of the actual execution of the commands. In the Agent–based Architecture, this function will be supported by the different messages in charge of the telephone functions (see Section 3.3.4: PABX Control Agents): – (request :content (connect number1 number2)) – (request :content (activate-forwarding extension-number forward-number)) – (request :content (deactivate-forwarding extension-number)) – ... OutputDialogueMove(dialogueMove) If the new information state contains information that is not shared by the user (for instance, as the result of a database consultation), the dialogue move that contains this new information state will be sent to the Output Generator module.

4.8

The Output Generator

4.8.1

Description

The Dialogue Manager will also be connected to the Output Generator. The interface between these two modules will make use of the DTAC protocol, and the contents of the communication will be the dialogue move, which has to be transformed into a sentence.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 41/91

In some situations, the Dialogue Manager may decide to communicate to the user some information: the result to a consult, a question about some underspecified parameters, the confirmation of the operation, etc. This operation has informational contents as the user will probably change his/her information state once the system reply has been received. So, it seems reasonable to represent this interchange of information as a dialogue move (DTAC), although in this case the flow of information goes from the system to the user. Making use of the information stored in the Dialogue History Blackboard, if needed, and based on the dialogue move sent by the Dialogue Manager, the Output Generator will generate the corresponding natural language sentence as a string of characters. This string will then be sent to the Agent in charge of the communication with the user (normally the Speech Syntesiser included in the Input/Output Agents).

4.8.2

Interface OutputDialogueMove(dialogueMove) The output of the Dialogue Manager in its interface to the Output Generator defines the input of the latter as well. ConsultDialogueHistory(DTACpattern) This function, previously introduced as part of the interface between the Dialogue Manager and the Dialogue History Blackboard may be used by the Output Generator, during the generation of the output sentence. This way, the Output Generator may adapt the style of the sentence taking into account if this is the first or the second time that the same question has been asked to the user, for instance. OutputSentence(string) The sentence itself will be sent to the Input/Output agent, which in its turn may send this string to a Speech Synthesiser, a Display Agent, etc.

The string itself will be sent to the LINE-INT Agent (see Section 3.3.3: Speech Input/Output Agents) using the following message: (broadcast :content (system-output "xxxx")) This message will be received by the LINE-INT Agent (in the real scenario using speech–to–speech communication), as well as the TRANSCRIPTION Agent, and the TEXT-OUT Agent (in case of a text–to–text simulation).

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 42/91

Chapter 5

The Natural Language Understanding Module 5.1

Introduction

This chapter describes the Natural Language Understanding Module that will be used in our system. This module was originally designed to meet the requirements of modern language engineering as far as the following criteria: efficiency, robustness, reusability, linguistic motivation, and capable of handling large amounts of data. The system incorporates a series of novel computational techniques which enhance the overall performance significantly: representation, storage and retrieval of very large feature structure–based knowledge bases, bidirectional event–driven bottom up parsing with top–down predictions and constructive unification with post–copy. The goal of this chapter is to show that this module meets all those criteria. Section 1 describes the main computational techniques exhibited by the NLU system. The remaining sections describe the main submodules in the system: lexical and morphological analysis, parsing and unification.

5.2

Computational Techniques and Formal Properties

Our goal has been to obtain a system inspired in unification grammars while achieving the best performance in analysis times. The system has been written in C, comprising more than 20,000 lines of code. Its efficiency results from the implementation of the following techniques:

1. Representation, Storage and Retrieval of Very Large Feature Structure–based Knowledge Bases. The lexical module is based on Improved Binary Trees with Vertical Cut [Quesada & Amores 1995].

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 43/91

The computational complexity of this technique is , where is the length of the lexicon. We have tested it with artificial dictionaries obtaining that the time necessary to analyse a single lexical item varies from 1 millisecond (with dictionaries of up to 5,000 entries) to 3 milliseconds (with dictionaries of up to 134,000,000 lexical entries). 2. Bidirectional Event–driven Bottom–Up Parsing with Top–Down Predictions. From a theoretical point of view, this parsing technique [Quesada & Amores forthcoming] reduces between 80 to 95% the amount of arcs and nodes in a conventional chart parser. That is, the system only generates 20% of structures in the worst case. The practical consequence of this is that we can parse between 2,000 and 5,000 words per second. These results have been tested with grammars including recursion, and local and non–local dependencies, and sentences from 1 ( ) to 1024 ( ) words. In a different work, it has been demonstrated that the parser is sound and complete [Quesada 1997]. 3. Constructive Unification with Post–Copy. This algorithm incorporates the following strategies: structure–sharing, reversible unification, constructive unification, disunification and post– copying [Quesada 2000]. Constructive unification by itself eliminates the problems of pre– copying, over–copying and redundant copying completely. This set of techniques reduces the computational load (memory and time) up to a 98% when compared with basic unification algorithms such as the na¨ıve algorithm or the default use of unification in Prolog. The basic architecture of our system relies on a modular separation between the algorithms which perform the specific tasks of parsing and unification, and the specification languages used to express linguistic knowledge and control commands.

5.3

Specifying Linguistic Knowledge

Our system is equipped with a series of specification languages whereby the necessary linguistic knowledge is incorporated into the system, and a series of control commands to configure the functioning of the system. The main concept is that of language. For each language, we may build an analysis grammar and lexicon. A configuration command indicates which languages will be active in the current application. /*** *** SampleTrans.epstm ***/ BeginningOfLanguage English AnalysisGrammar FromFile E_grammar AnalysisLexicon FromFile E_lexicon UsingPred Elex EndOfLanguage

In addition, the system includes a tool to measure the statistical performance of the system, and a ’trace’ capability which provides detailed information about all the operations involved in the process.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 44/91

In the following sections we will offer a concise description of each of the processes which take place since the system receives a string of words as input until it generates the corresponding feature representation.

5.4

Lexical and Morphological Analysis

The phase of lexical analysis receives a string of words as input and its goal is to output a list of the syntactic category/ies and functional structure/s associated with each word.

5.4.1

Simulating Morphological Analysis through Morphological Generation and Efficient Knowledge–Base Retrieval

The lexicon is built following Mph–Vtree syntax [Quesada & Amores 1995]. Mph defines a sophisticated language for the specification of lexicons for unification grammars. Its output may be linked to Vtree, a powerful system for the efficient storage and retrieval of large feature structure–based knowledge bases. The joint use of both systems creates a specification environment close to the linguist and a very efficient module for lexical and morphological analysis. From a computational point of view our model is based on inflected forms, as opposed to other paradigms based on two–level morphology [Koskenniemi 1984]. Nevertheless, the system does not require that all inflected entries be coded manually. Instead, Mph permits the definition of regular morphological phenomena. An additional advantage is that the same specification language may be used to define lexical redundancy rules and some cases of derivational morphology. Finally, using Vtree to store and retrieve lexical items once they have been generated by Mph results in excellent times. Specifically, Vtree obtains a performance rate of milliseconds per word, independently of the morphological complexity of the languages involved. Complexity only affects Mph, and the time consumed by Mph to generate all forms is compilation time, which is performed just once. In sum, a lexical analysis model based on morphological generation such as the one presented here is preferable to models based on morphological analysis in real–time applications and with a large lexicon, where a maximum response time is usually required.

5.4.2

Mph

Mph has been designed for typed feature structures. Namely, it uses the notion of shape to refer to complex feature structures permitted in a language. A shape defines the skeleton of a structure, that is, the attributes which may or must be instantiated.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 45/91

A shape definition consists of four components: its name or identifier, body, list of indexes and transformational rules of shapes associated. A body definition in a shape is a list of attribute–value pairs. The last component in a shape definition are transformational rules. One of the goals of transformational rules over shapes is to capture morphological generalisations found in natural languages. They may also be used as lexical redundancy rules to capture transitive alternations in verbs, for example. Each rule contains two components: a pattern and a set of target structures. Meta–relations allow us to associate a shape with generation models for new shapes. Despite the use of macros and a transformational rule to obtain the plural of regular nouns, each of the entries above requires specifying all new values, which in fact are the same for all entries. This situation may be simplified by using input forms or iforms. These constructions permit to associate a flat, Prolog–like predicate with a complex feature structure, incorporating all the expressive power of Mph. That is, iforms allow the inclusion of macros, functions, multi–word entries, etc. Effectively, the simultaneous use of shapes and macros permits the design of a hierarchy of typed feature structures. In addition, Mph incorporates a default inheritance strategy, whereby assigning different values to the same attribute does not result in an error. The next example illustrates the expressive power of Mph:

/* English * Analysis Lexicon */ BeginningOfLexicon

Macro Definitions

DefMacros = (agr:(gen:masc,num:sing,per:3)) = (agr:(gen:masc,num:plur,per:3))

Shapes Definition

DefShapes

Elex: Meta–predicate to establish the link

Elex (LU) Eagr (agr:(gen,num,per))

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 46/91

@shape: shapes hierarchy, default inheritance, and exceptions Enoun (LU,CAT:n,MOR,head,ggf,@Eagr)

Morphological Rules RulePattern (MOR:[N1]) RuleTarget { (MOR:null()) (LU:strcat(base->LU:,s),,MOR:null()) } Everb (LU,CAT:v,MOR,pred,ggf,@Eagr, tense,pobj:(pcase)) RulePattern (MOR:[V1]) RuleTarget { (MOR:extract(base->MOR:,[V1])) (MOR:extract(base->MOR:,[V1,VED1]), agr:(per:1|2,num:sing)) (MOR:extract(base->MOR:,[V1,VED1]), LU:strcat(base->LU:,s), agr:(per:3,num:sing)) (MOR:extract(base->MOR:,[V1,VED1]), agr:(num:plur)) } RulePattern (MOR:[VED1]) RuleTarget { (MOR:extract(base->MOR:,[VED1]), LU:strcat(base->LU:,ed),tense:past) }

Lexical Redundancy Rules Rulepattern (MOR:[LR1]) RuleTarget { (MOR:extract(base->MOR:,[LR1])) (MOR:extract(base->MOR:,[LR1]), ggf:[subj,obj,obj2],pobj:null()) } Edet (LU,CAT:det,@Eagr,spec) Eprep (LU,CAT:prep,pcase)

Meta–Relations DefMetas MetaPattern Enoun() MetaTarget Elex(LU:base->LU:) MetaPattern Everb() MetaTarget Elex(LU:base->LU:)

Input Form Definitions

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 47/91

DefIForms IFnoun(LU,MOR,ggf) Enoun (LU:base->LU,MOR:base->MOR, head:base->LU,ggf:base->ggf,)

Regular Entries through the activation of Input Forms

ActIForm IFnoun (present,[N1],[]) (girl,[N1],[]) (telescope,[N1],[]) (John,[],[]) ActIForm IFverb (present,[V1,VED1],[subj,obj,pobj],with)

Irregular entries through the activation of Shapes

ActShape Everb (LU:give, MOR:[V1,LR1], pred:give, ggf:[subj,obj,pobj], pobj:(pcase:to), tense:pres) (LU:gave, MOR:[LR1], pred:give, ggf:[subj,obj,pobj], pobj:(pcase:to), tense:past) ActShape Edet (LU:a,agr:(num:sing),spec:a)

Additional Features: Lexical Ambiguity, Homonimy, Disjunction, and Negation in Atomic Values, etc.

EndOfLexicon

5.5

Analysis Grammar

From a functional point of view, the parsing [Bunt & Tomita 1996, Quesada 1997] and unification [Shieber 1986, Quesada 2000] modules in our system have been implemented following an interleaving strategy. That is, the parser interacts with the unifier during the analysis process.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

RootsOfGrammar: S NP VP SubcatControl: ggf SubcatFunctions: subj obj obj2 pobj HeadFeatures: pred head form quant 1:S -> NP VP {@up.subj = @self-1; @up = @self-2;} (2:NP -> n) {@up = @self-1;} (3:NP -> det n) {@up = @self-1; @up = @self-2;} (4:NP -> NP PP) {@up = @self-1; @up.pobj = @self-2; @completeness(@sf);} (5:VP -> v NP) {@up = @self-1; @up.obj = @self-2; @completeness(@sf-[subj]); @coherence(@sf-[subj]);} (6:VP -> v NP VP) {@up = @self-1;}

Page 48/91

Possible Termination Nodes Subcategorization Controlling Feature Subcategorization Grammatical Functions Possible Head Features

Context-Free PS Rules

Functional Equations

Coherence and Completeness

Figure 5.1: English Basic Grammar

Broadly speaking, the parser in this system may be described as a bidirectional bottom–up chart [Kay 1980, Quesada 1997], incorporating top–down predictions. The efficiency of a bottom–up chart parser may be increased if useless arcs are eliminated in the first stages of the process. Top–down predictions have been incorporated with that goal. We have implemented a set of simple and intuitive mathematical relations between the nodes in a grammar which allow us to determine whether certain arcs have no guarantee of success on certain occasions. The model of top–down predictions requires that the parser knows all the information regarding possible arc applications over current nodes. This information is obtained through a model of bidirectional event generation. Our parsing strategy approaches the problem of efficiency from an algorithmic point of view. In addition, the system incorporates a computational approach to increase the parsing efficiency further. Namely, the grammar is compiled beforehand, obtaining an internal representation of it which reduces the comparison of strings of characters considerably. An analysis grammar consists of 3 components: A series of configuration parameters, of which only RootsOfGrammar will be used by the

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 49/91

parser. RootsOfGrammar specifies possible termination symbols in the grammar, thus allowing for grammars with multiple root symbols. A set of context–free productions, each of which contains an identifier, a non–terminal symbol in the left–hand side of the rule, and one or more terminal or non–terminal symbols in the right–hand side of the rule. Each production may contain a set of functional equations which will be passed on to the unification module. Figure 5.1 shows a simple English grammar written in a format acceptable for the system.

5.5.1

Unification

The unification module of any unification–based NLP system usually consumes around 80 or 90% of the total computation time. The relevance of this module justifies that we make a special effort in the design of the algorithms and implementation strategies. In addition, we should take into account the linguistic requirements regarding the expressive power that unification grammars usually demand. Thus, we could divide the unification module into two distinct components. On the one hand, we have the unification algorithm proper, which is independent of any linguistic formalism. On the other, we would find the specification layer, which tries to capture the strategies and notations found in the particular theory being implemented. In our case, the latter has been designed having LFG in mind, although the algorithm is valid for any unification–based formalism. The core of the module implements a reversible unification strategy, based on disunification and post– copying [Quesada 2000]. The strategy relies on a sophisticated data organisation which obviates most copying processes during unification. If unification fails, the disunification algorithm recovers the original data structures faithfully. If unification succeeds the result is copied (post–copied) and the disunification process recovers the original input structures. The current version allows the use of atomic values, atom negation and disjunction (negated or not), and lists. As regards the LFG notation, the unification algorithm covers the basic equational unification (=) plus: functional constraints (=c), evaluation and conditional execution (if ...then ...else), specific functions for the manipulation of character strings and lists (@concat, @member, @count), mathematical operators (+,-,*,/), logical (!,&&,||) and relational operators (==,!=,=), and coherence and completeness control. Classical LFG metavariables and are called @up and @self-N in our system. At this point we have not implemented functional uncertainty [Kaplan & Zaenen 1989]. The example in Figure 5.1 includes some of these functions. After the analysis phase, the parser and the unification algorithm will have obtained one or more constituent structures (c–structure in LFG) and one or more functional structures (f–structure) for

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 50/91

each grammatical sentence. The next chapter describes the techniques and strategies which have been incorporated into this system in order to parse spoken output as string of words generated by a speech recognition system.

5.5.2

Semantically–oriented Grammar

The general NLP framework outlined above has been adapted to better cope with domain–specific applications which exhibit syntactic and semantic peculiarities. Two main changes have been incorporated:

1. Typification of Non–Terminal Nodes allows us to assign DTAC patterns (see Chapter 8 and section 4.5) to each semantically rich expression in this domain: NonTerminalTypes: TelephoneCall = (DMOVE: specifyCommand, TYPE: TelephoneCall, ARGS: [Name]|[Number]|[Extension]) TransferCall = (DMOVE: specifyCommand, TYPE: TransferCall, ARGS: [Name]|[Extension])

ConferenceCall = (DMOVE: specifyCommand, TYPE: ConferenceCall, ARGS: [Name,Name2]| [Name,Extension]| [Extension,Extension2]) ... 2. Semantically–oriented CF grammar. The nodes in the grammar express domain–specific semantics: ////////////////////////////////////// /////// Function:TelephoneCall: ////////////////////////////////////// (P11P : TelephoneCall -> LTelephoneCall) (P12P : TelephoneCall -> LTelephoneCall Name)

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

{ @up = @self-2; } (P13P : TelephoneCall -> LTelephoneCall Number) { @up = @self-2; } (P14P : TelephoneCall -> LTelephoneCall Extension) { @up = @self-2; }

Page 51/91

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 52/91

Chapter 6

Spoken Language Parsing Strategies In this chapter we present the parsing strategies developed for our NLU system, which will eventually be integrated with a speech recogniser and a dialogue manager (DM). This chapter focuses on the description of the syntactic notation schemes which have been implemented in the system: syntactic representation of freely ordered constructions (for the treatment of free order in Spanish and in spoken language in general), typified semantic nuclei specification (DTAC) on the basis of which grammar rules are built, and parsing strategies for the sentence fragmentation into autonomous blocks (one entry–several semantic nuclei). Likewise, the original parsing algorithm has been modified in order to accommodate several strategies for the detection and correction (whenever possible) of the deficiencies originated at the speech recognition stage (over–recognition, under–recognition, close–recognition and mis–recognition). The output of the NLU is then passed on to the Dialogue Manager module, as will be described in chapters 4 and 7.

6.1

Introduction

Speech recognition and language engineering have recently been integrated in real applications. Nevertheless, continuous, speaker–independent speech recognition technology is not always efficient as regards precision of the recognised input. Thus, it is necessary to develop strategies which help the detection and correction of errors generated at the recognition stage. This chapter presents a series of strategies based on language engineering techniques, which have proved very useful for the task of detecting and correcting errors generated at the speech recognition stage. We present a systemic study of the most common errors in speech recognition and propose a classification of those errors into four categories: under–recognition, over–recognition, close–recognition and mis–recognition.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

6.2

Page 53/91

Characteristics of Spoken Language

Spoken language is characterized by the following features:

1. Syntactic free order: the canonical syntactic constituency may be disrupted. Sometimes, there is a serious interruption in the current of discourse, as in the case of so–called “speech disfluencies” (repetitions, false starts, speech repairs [Heeman 1997]). 2. Vocabulary: the speaker freely expresses his/her emotions, state of mind, attitudes and other situational information by using idioms, colloquial expressions, colloquial terms, etc.

In the corpus studied for this work, speakers use a number of terms which merely reflect situational information. For the task of telephone commands, those terms have been considered as lacking relevant information for the semantic representation of the sentence. As explained below, errors made at the recognition stage will determine the need to recover information from partial strings. However, speech disfluencies have not been dealt with in this work.

6.2.1

The Output of the Speech Recogniser

The NLU module will receive as input the best hypotheses of a series of N–best hypotheses generated by the speech recogniser. It is widely known that even the most efficient and robust speech recognisers show an error rate of 5% (approximately). Thus, in the design of the NLU module we must include the appropriate mechanisms in order to overcome the errors produced at the speech recognition stage. Speech recognition errors have been classified into the following four blocks:

1. Under–Recognition Under–recognition takes place when any fragment of the original sequence is deleted. The speech recogniser commonly under–recognises monosyllabic words, such as prepositions and other grammatical words (determiners, adverbs, etc.). OriginalSentence: (hacer una llamada ) InputSentence: (hacer llamada ) *(OriginalSentence:make a telephone call) *(InputSentence:make telephone call) 2. Over–Recognition Over–recognition takes place when extra fragments, which were not uttered, are inserted. Very often, the speech recogniser generates two words when only one was uttered. If the term inserted corresponds to a grammatical word, the meaning of the sentence is not usually altered.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 54/91

OriginalSentence: (tengo algun mensaje almacenado ) InputSentence: (tengo algun mensaje la almacenado) *(OriginalSentence:do I have any stored message) *(InputSentence:do I have any stored the message) However, the meaning of the sentence may change: OriginalSentence: (desactiva el modo transferencia ) InputSentence: (es activa el modo transferencia ) *(OriginalSentence:deactivate the transfer mode) *(InputSentence: is active the transfer mode) In the second example, the verb “desactiva” (deactivate) has been recognized in the sequence as “es activa” (is active). 3. Close–Recognition Close–recognition takes place when the speech recognizer generates sequences of words which are similar to the ones originally uttered. The words that have been closely recognised often have the same morphological root as the original ones. OriginalSentence: (anotar en mi agenda un nombre ) InputSentence: (anota en mi agenda un nombre ) *(OriginalSentence:including a new name in my directory) *(InputSentence:include a new name in my directory) 4. Mis–Recognition Mis–recognition takes place when the speech recogniser generates sequences of words which show very little relation to the original sentence uttered by the speaker. OriginalSentence: ( quiero hablar con Celinda ) InputSentence: ( y si quien pasa Celinda ) *(OriginalSentence:I want to talk to Celinda) *(InputSentence:and yes who pass Celinda) These are the most difficult errors to solve. We hope that the integration of the speech prospector as part of the dialogue management agent will help us repair some of these limitations.

6.3

Parsing Strategies

6.3.1

Lexical Analysis: Void Words

As stated above, the terms deleted at the recognition stage usually correspond to grammatical words. The same happens when over–recognition takes place. The existence of these two kinds of error

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 55/91

has led to the adoption of the so–called “concept–spotting” technique. Thus, the NLU module does not take into account those terms which are not relevant to the understanding of the meaning of the sequence. This is possible because the system has a very restricted domain and the semantic representation obtained is usually the combination of a telephone function and its arguments. To obtain the basic semantic representation, the lexical category VOID is assigned to all meaningless words if the ConfVoidWordsIgnoreOn configuration option has been activated. VOID words are not processed at the morphological level. In the following example, the user has been speaking to a colleague while issuing a telephone command.

Esto, Pepe... quiero que grabes el mensaje

*(Er Umm, Pepe... I want to store the message) The mechanism deletes those terms which are not relevant to the current task:

grabes mensaje

*(store message) The final sequence will then be matched to a complete semantic structure. When the number of VOID words in the sequence is too high for a correct understanding of the message, the VoidThreshold control mechanism rejects the whole sequence. Very often, rejected sentences correspond to ill–formed sentences at the recognition stage.

no yo yo yo yo eh

*(no I I I I eh) The mechanism prevents the construction of a parse tree when serious errors have been produced at the recognition stage (mis–recognition).

6.3.2

Analysis of Partial Strings

The NLU module takes a relaxation approach at the analysis stage [Carbonell & Hayes 1983] [Hobbs et al 1992] [L´opez & Quesada 1998]. On the one hand, spoken language frequently shows syntactic disorder. On the other, when recognition errors are produced, the meaning of the utterance may not be constructed from the whole sequence. The parser includes a full string parsing mechanism whereby it only accepts the root nodes which spawn the whole sequence (classical manner). However, the partial string parsing option will accept any root node obtained while parsing the sequence, even though the whole input string has not been consumed.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 56/91

This mechanism also incorporates several strategies for the deletion of redundant nodes. That is, for any two nodes associated with the same symbol, if the interval of the first node includes the interval of the second one, then the second one is erased. In addition, another filter deletes the nodes embedded, even though they are not associated with the same symbol. The activation of partial strings analysis is very useful when the NLU system has to recover missing information: all partial analyses will be kept active while the DM selects the correct one to complete the DTAC structure.

6.4

Ambiguity

The system incorporates a powerful disambiguation mechanism. Ambiguity poses serious problems when the analysis of partial strings is active, since more than one parse tree may be chosen for a given sequence of words. The disambiguation mechanism is carried out with three different algorithms: 1. Length The system gives priority to the parse tree which contains a larger number of nodes in its interval. 2. Position The system gives priority to information updating. That is, latter trees are preferred to earlier ones. 3. Global criterion The final result is the combination of the two previous ones.

6.4.1

Length and Position Algorithms

In order to better understand the functionality of these two algorithms let us consider the following sequence generated by the lexical analyser as an example:

OriginalSentence:(quiero escuchar los mensajes almacenados) InputSentence:(cero escuchar los mensajes almacenados)

*(OriginalSentence:I want to listen to stored messages) *(InputSentence:zero want to listen to stored messages)

>> LexicalAnalysis (LNumber LListento LMessage LStored)

The Length and Position algorithms operate based on the following concepts:

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 57/91

1. First, numerical values are marked between every item generated by the lexical analyser, as follows: [0] LNumber [1] LListento [2] LMessage [3] LStoredMessage [4] 2. Each node is bounded by the two extremes which define its interval. In our example, LNumber would be enclosed within the [0-1] extremes. 3. The left and right extremes of the interval are called LExtreme and RExtreme, respectively. The Position algorithm is defined by the following formula: Position = LExtreme + RExtreme The Length algorithm is defined by the following equation: Length = RExtreme - LExtreme Finally, the P*L value is calculated: P*L = Position * Length The values assigned by these formulae are then shown as part of the analysis output.

InputSentence(cero escuchar los mensajes almacenados) LexicalAnalysis(LNumber LListento LMessage LStored) OutputParser: * Number(LNumber(zero)) Position = 1, Length = 1, P*L = 1 * StoredMessage(LListento(listen_to), LStored(stored), LMessage(message)) Position = 5, Length = 3, P*L = 15

In the example (Zero listen to stored messages), the disambiguation tool in the system selects the second parse tree (StoredMessage), taking into account the value provided by the Position and Length variables (P*L=15).

6.4.2

Global Criterion

There is still a third disambiguation mechanism in the system: the global criterion or G criterion. This algorithm is based on a combination of the Position and Length variables as a priority control over ambiguous analyses. We can see this with an example.

InputSentence: (borrar numero grabado)

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 58/91

*(delete stored number)

LexicalAnalysis (LDelete LAuxNumber LStored) OutputParser: * StoredMessage(LStored(stored)) Position = 5 Length = 1 P*L=5 G =8 * DeleteNumber(LDelete(delete), LAuxNumber(number)) Position=2 Length=2 P*L=4 G=8

The G criterion is activated via the ActivateGCriterion configuration command. The mechanism is applied to every single node in the analysis tree, and not only to root nodes. The formula for the algorithm is as follows:

The Complexity of every terminal node is obtained from its Length. That is, for every terminal node TN, its complexity is defined as:

The Complexity of every non–terminal node will be taken from adding the Complexity values of its daughters.

where N refers to the node and N1, . . . , Nk refers to the set of daughters. The algorithm selects the analysis with the highest value. Let us see the functioning of the G Criterion with an example:

InputSentence ( cinco ... desactiva el modo llamada en espera) *( five ... deactivate call waiting) LexicalAnalysis (LName LDisconnect LCallWaiting) OutputParser: * Number:(LNumber(five)) Position = 1, Length = 1, P*L = 1, G = 8 * OffCallWaiting(LOff(deactivate), CallWaiting:(call_waiting)) Position = 4, Length = 2, P*L = 8, G = 36

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 59/91

The G criterion has proved to be efficient in the selection of the correct analysis when the partial strings mechanism is activated.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 60/91

Chapter 7

The Dialogue Manager This chapter concentrates on the design of the Dialogue Manager Module (section 4.7) of the Dialogue Management Agent (Chapter 4). From an operational point of view, this module can be described as a function: DManager(DomKnow,InputDMove,DHist)

OutputDMove

That is, the sources of knowledge used by the Dialogue Manager (DManager) are: Domain knowledge (DomKnow): represented as a set of declarative dialogue rules. These rules constitute the Specification Level of the Dialogue Manager (section 7.2). Input Dialogue Move (InputDMove): the top dialogue move selected by the Dialogue Move Selector at any moment. Dialogue History: the whole dialogue so far, including the successful commands and those unsuccessfully completed, as well as the command under completion and execution (which corresponds to the expectations of the question under discussion). This information is stored as an ordered set of information states at the Representation Level of the Dialogue Manager. As a result of its application, the Dialogue Manager will return a new Dialogue Move (OutputDMove). Both Input and Output Dialogue Moves (along with their corresponding user and system’s utterances) will be stored at the User–Interface Level of the Dialogue Manager (specifically, at the Dialogue History Blackboard). Therefore, the Dialogue Manager may be viewed as a function that goes from these arguments: the set of dialogue rules (domain knowledge), the current input dialogue move, and the context (dialogue history) to a new dialogue move, which, functionally speaking, represents the information state of the dialogue manager.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 61/91

The dialogue moves obtained by this module will be sequentially stored in the Dialogue History Blackboard. Usually, the Dialogue Manager will require extra information from the external database. In order to allow a high functionality, the interface of this communication will support SQL. Besides, this module will have to communicate to the PABX agent the commands to be executed, once these have been detected, completed and confirmed. Section 1 describes the main design constraints taken into account during the specification of this module. Next, section 2 presents the Specification Level of the Dialogue Manager.

7.1

Design Constraints: A Dialogue System for the Automatic Telephone Task Scenario

This section describes the design constraints of the system. We illustrate a sample conversation in order to present the functionality we are aiming at (although our corpus is in Spanish, we have translated the conversation into English for expository reasons).

S-0: [Welcome message and management instructions] U-1: Are you in transfer mode? S-1: Yes, to your office number. Shall I cancel it? U-2: I want to transfer all incoming calls to my mobile and to make a phone call. S-2: Transfer to number 111 111 111 activated. Who do you want to call? U-3: Peter Adamson. (Let’s suppose that the Speech Recognition system recognises Peter Abramson) S-3: Shall I call Peter Abramson, number 222 222 222? U-4: No, Peter Adamson. S-4: Shall I call Peter Adamson, number 333 333 333? U-5: OK. S-5: I’m calling Peter Adamson. (Let’s suppose now that Peter Adamson’s phone is engaged. The user hangs up. Some minutes later ... U-6: Please call him again.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 62/91

S-6: Shall I call Peter Adamson? U-7: Right. S-7: Calling Peter Adamson.

This short conversation will help us to describe the design restrictions we have taken into account:

Interaction with the speech recognition system. Our system is embedded in a Spoken Dialogue System application which takes as input the output of a speech recognition system through the telephone line. Speech recognition errors such as those reported in our sample conversation have been dealt with in previous research work in our group. The natural language understanding system (Chapter 5) incorporates a number of techniques for the detection and correction of recognition errors during the natural language understanding phase (described in chapter 6). Nevertheless, the dialogue management system should also be capable of handling recognition errors when configuring the dialogue interaction by using both direct confirmation questions (as in S-3) and indirect ones (S-2). Task Detection. Our scenario differs from task-oriented systems in that the system does not know beforehand the task that the user has in mind. Rather, the user may choose between any of the different functions which have been designed to interact with the PABX. Therefore, the first problem that the system must solve is to figure out which task(s) the user may want to perform. Incomplete functions. In our scenario, it is common to find situations in which the requested functions are not complete, that is, commands for which the user has not specified all the arguments required to fulfil the task. For instance, in the second part of U-2 the user is requesting the system to place a phone call, but s/he has not specified the destination number. In this case, the system must be capable of keeping track of the requested information and generating those questions necessary to complete the missing information. Expectations. The dialogue manager system can benefit from the knowledge of the previous history of the dialogue. In fact, this dialogue history generates answering expectations: in S-4 the system is waiting for a confirmation answer either affirmative (“Yes”, “OK”, ...) or negative (“No”, “That’s wrong!”, ...), in S-5 the system is expecting a destination (name or number) from the user. This knowledge (expectations generated from the dialogue history) can be further used both by the speech recognition system and the natural language processing module in order to improve the efficiency and restrict the semantic search in the grammar. Multiple paths in a dialogue. As we can observe in U-2, it is common to find cases where one interaction initiates several functions. The system must be capable of handling all of them, while recalling unfulfilled work at every single point. In our example S-2 is asking for indirect confirmation of one of the functions. Later on, the system carries on with the fulfilment of the second function requested in U-2. Dialogue history and anaphoric references. Representing the previous dialogue history is also useful in order to deal with discourse phenomena such as anaphoric references. In our sample conversation U6 (“call him again”) “Peter Adamson”.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

7.2

Page 63/91

Specification Level: Dialogue Rules

This level is in charge of the specification of the domain and task knowledge involved in the management of the dialogue. We propose a declarative approach, where this knowledge is represented as a set of Dialogue Rules. By means of the dialogue rules, the Specification Level of the Dialogue Manager specifies the conversational structure which will allow the management of the dialogue itself. From a functional point of view, the dialogue rules consist of a triggering mechanism (allowing both data and procedure–driven strategies), a set of internal operations (including the control of expectations), and a set of pre– and post–actions that will be executed at different stages of the rule. From a descriptive viewpoint, each rule consists of a set of attributes, where only the RuleId is compulsory:

RuleId TriggeringConditions PriorityLevel PreActions DeclareExpectations ActionsExpectations SetExpectations PostActions

7.2.1

RuleId

This feature identifies the rule. It must be a string of characters. Some identifiers will be a special meaning:

STARTUP: Each Dialogue Manager must have a StartUp rule, which will be automatically triggered each time the dialogue manager starts. EXIT: This rule will be activated as the very last rule when the Dialogue Manager finishes its work.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

7.2.2

Page 64/91

TriggeringConditions

Rules can be activated by means of two mechanisms. Firstly, as part of the their actions, a rule may activate another rule. For instance, the following version of the STARTUP rule shows this mechanism: (RuleId: STARTUP; PreActions: { UserPrompt("Welcome message"); ActivateRule(FUNCTION); }) In this case, once the Dialogue Manager has activated the STARTUP rule, this will show a Welcome message, and then the rule will directly activate the FUNCTION rule. Secondly, a rule may be activated by the user’s input. As has been shown in Chapters 5 and 6, the user’s input is analysed by the Natural Language Understanding Module obtaining a representation based on the DTAC protocol. The Dialogue Manager will try to unify this DTAC structure with the TriggeringConditions of all the rules, obtaining the set of candidate rules (rules that should be applied to the user’s input). For instance, the TRANSFERCALL rule is defined as: (RuleId: TRANSFERCALL; TriggeringConditions: (DMOVE:specifyCommand, TYPE:TransferCall), ....) Taking into account that the user’s input I would like to transfer my calls. will be represented using the DTAC protocol as:

it is obvious that this DTAC structure will unify with the TriggeringConditions of the TRANSFERCALL rule.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

7.2.3

Page 65/91

PriorityLevel

In case the triggering mechanism obtains a set of two or more candidate rules, the priority level strategy will allow the dialogue manager to select the most prioritised rule. To illustrate this feature with an example, let’s consider that the dialogue manager allows a general Help rule (for help subdialogues):

(RuleId: HELP; TriggeringConditions: (DMOVE:askHelp, TYPE:AskHelp), ....)

as well as a specific Help rule on a domain topic, for instance, a help on transfer calls:

(RuleId: TRANSFERHELP; TriggeringConditions: (DMOVE:askHelp, TYPE:AskHelp, CONT:TransferCall), ....)

This way, the following user’s utterance: What functions are available?

should be represented as:

This DTAC structure will unify with the TriggeringConditions of the HELP rule, but will not unify with the corresponging conditions of the TRANSFERHELP rule. Nevertheless, the user’s input: How can I transfer my telephone calls?

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 66/91

which will be represented as:

will unify with both rules. To avoid this ambiguity, prioritising the most specific rule, it is possible to include the following priority levels as part of the rules (where 0, the default value, represents the highest priority):

(RuleId: HELP; PriorityLevel: 20; TriggeringConditions: (DMOVE:askHelp, TYPE:AskHelp), ....) (RuleId: TRANSFERHELP; PriorityLevel: 10; TriggeringConditions: (DMOVE:askHelp, TYPE:AskHelp, CONT:TransferCall), ....)

A second useful application of the Priority Level mechanism allows the specification of a kind of safety net rule. This rule will have the lowest priority level, and will have no triggering conditions. It will be triggered by any user’s input, but will actually be selected if no other rule with a higher priority has been triggered.

(RuleId: SAFETY; PriorityLevel: XXXXX; TriggeringConditions: ( ) ....)

// the lowest priority level

This rule will act as an error recovery mechanism which will improve the robustness of the dialogue manager.

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

7.2.4

Page 67/91

PreActions

This section of the rule specifies the set of actions that must be executed once the rule has been activated. The action mechanism (used both for PreActions and PostActions) will include several functions internal to the Dialogue Manager:

ActivateRule EndDialogueManager CreateInformationState ...

as well as several external functions that will communicate the Dialogue Manager with other agents:

ConsultDirectory UserPrompt ...

Also, this component will allow the specification of conditional (if ... then ... else ...) and iterative (while (...) ... ) structures.

7.2.5

ActionsExpectations

Once the PreActions of a rule have been executed, the DTAC associated with the rule will be analysed in order to determine the consistency of the feature structure. The consistency principle is:

A DTAC structure is consistent if: a) The ARGS feature is empty, or b) For each value V in the ARGS feature, there exists a consistent feature V.

For instance, the following DTAC structures are consistent:

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 68/91

while the following DTAC structure is inconsistent:

The ActionsExpectations component of a dialogue rule specifies the actions that must be executed for each inconsistent argument. The following example indicates that the query Specify the destination of the call should be prompted to the user it s/he has activated the telephone call function without specifying the destination of the call: (RuleId: TELEPHONECALL; PriorityLevel: 15; TriggeringConditions: (DMOVE:specifyCommand, TYPE:TelephoneCall), ActionsExpectations: { [Name] => { UserPrompt("Specify the destination of the call, please."); } ....)

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

7.2.6

Page 69/91

DeclareExpectations

This feature declares how to compose the information states of different dialogue rules. Suppose that the user’s input is:

U-1: I would like to make a phone call.

This sentence will be represented by the Natural Language Understanding Module as:

To simplify the example, we have assumed that the only valid argument of a TelephoneCall is a Name. The previous DTAC structure will trigger the TELEPHONECALL rule above. It is obvious that this structure is inconsistent as there is no Name feature (required by the ARGS feature). This feature (Name) is therefore taken as an expectation of this information state (DTAC). Next, the ActionsExpectations prompts the user:

S-1: Specify the destination of the call, please.

A natural continuation of this dialogue will consist on the user specifying a destination of the call:

U-2: Peter Evans

This utterance is represented as the following DTAC structure:

This DTAC structure will trigger the following rule:

SIRIDUS project Ref. IST-1999-10516, February 26, 2001

Page 70/91

(RuleId: NAME; PriorityLevel: 20; TriggeringConditions: (DMOVE:specifyParameter, TYPE:Name) )

This structure is consistent, as there is no ARGS feature missing. The problem now is how to incorporate the information state obtained by the NAME rule, with the information state previously captured by the TELEPHONECALL rule. This function is specified by the DeclareExpectations component of the dialogue rule:

(RuleId: TELEPHONECALL; PriorityLevel: 15; TriggeringConditions: (DMOVE:specifyCommand, TYPE:TelephoneCall), ActionsExpectations: { [Name] => { UserPrompt("Specify the destination of the call, please."); } } DeclareExpectations: { Name

Lihat lebih banyak...

Design of a Natural Command Language Dialogue System

Descrição do Produto

Comentários