A Computational Model of Man-Machine Dialogue
Shuji DOSHITA, Masahiro ARAKI, and Tatsuya KAWAHARA
Department of Information Science, Kyoto University
Sakyo-ku, Kyoto 606-01, Japan
e-mail: doshita@kuis.kyoto-u.ac.jp
{1. Introduction}
We propose a dialogue model that reflects two important aspects of
spoken dialogue system: to be 'robust' and to be 'cooperative'. For
this purpose, our model has two main inference spaces: Conversational
Space (CS) and Problem Solving Space (PSS). CS is a kind of dynamic
Bayesian network that represents a meaning of utterance and general
dialogue rule. 'Robust' aspect is treated in CS. PSS is a network so
called Event Hierarchy that represents the structure of task domain
problems. 'Cooperative' aspect is mainly treated in PSS. In
constructing CS and making inference on PSS, system's process, from
meaning understanding through response generation, is modeled by
dividing into five steps. From our point of view, cooperative problem
solving dialogue is regarded as a process of constructing CS and
achieving goal in PSS through these five steps.
{2. Outline of Our Approach}
As mentioned above, major problems in constructing SDS are how to deal
with uncertainty and how to manage cooperative dialogue.
{2.1 Dealing with Uncertainty}
In SDS, there are various ambiguity and uncertainty of user's input,
such as uncertainty of speech recognition results, syntactic and
semantic ambiguity, ill-formed utterances and uncertainty of user's
intention. Many probabilistic methods are developed for each problems.
But in order to deal with various ambiguity and uncertainty by
integrated manner, we need a framework of probabilistic reasoning.
Then we decide to hire a Bayesian network formalism.
Bayesian network is a kind of probabilistic causal network. Each node
represents a random variable, that is a value of a proposition. In
this paper, random variable is a binary variable, that is true of
false. Each link represents a kind of causal relationship. A certainty
measure is assigned to each node that is consistent with the axioms of
probability theory. Its computational cost for updating certainty
measure is proportional to the longest path in the network. Because
Bayesian network propagates evidential message bidirectional, it can
deal with multiple evidence inputs. Then Bayesian network is suitable
for treating uncertainty in natural language processing.
We regard utterance understanding as dynamic construction of Bayesian
network. We call this network Conversational Space (CS). The input of
CS is phrase hypothesis that is a result of phrase spotting module.
Phrase hypothesis is represented a node with a spotting score as its
certainty measure. Some classes of linguistic instances are inferred
from this evidence. A proposition that states an existence of instance
of conceptual class, utterance type class, action type class is
inferred by network expanding procedure. The way of dealing with each
uncertainty is shown in section 3. CS is also used for recording a
history of dialogue and generating a surface response.
{2.2 Managing Cooperative Dialogue}
We will feel SDS as 'cooperative' if SDS make proper answer and/or
good suggestion. In order to generate such response, SDS must
recognize user's plan and select proper speech act as system's
response. But if system cannot make response until user's plan is
recognized, the dialogue does not go smoothly. Then SDS should have a
dialogue strategy both cooperative and responsive.
In our model, knowledge of task domain for plan recognition is
represented by static network. Main structure of this network is same
as Event Hierarchy. It represents relationships between plan and
subplans, and between plan and actions. We call this network Problem
Solving Space (PSS). We apply minimal covering method for plan
recognition in PSS. The basic point of this procedure is to find
forest that covers all the subplans and actions previously achieved.
According to the result of plan recognition, we set two processing
types. One is 'surface understanding' and the other is 'deep
understanding'.
'Surface understanding' is a process of generating response without
using information about user's plan or the context of dialogue before
user's latest utterance. On the other hand, 'deep understanding' is a
process of identifying user's plan, updating mental state and
selecting cooperative system's reaction.
The decision which process to choose, surface or deep, is made by the
result of plan recognition. If recognized minimal cover includes only
one top level plan node (that means minimal cover is a tree), we see
user's plan as recognized and deep understanding process is chosen.
When minimal cover has several top level plan nodes, we call this
situation as competing plans. In this competing situation, system
does not have enough information to identify only one user's plan.
Then the 'surface understanding' process works for generating
immediate response. Surface response is made in CS by the trigram of
utterance type.
{2.3 Five Steps Model of Dialogue Processing}
In order to construct CS and to make inference in PSS, the process
from understanding user's utterance through generating system's
response is divided into five steps. These steps are (1) meaning
understanding, (2) intention understanding, (3) communicative effect,
(4) reaction generation, and (5) response generation.
Meaning understanding step constructs CS and response generation step
compose a surface expression of system's response from the part of CS.
Intention understanding step make correspondence utterance type in CS
with action in PSS. Reaction generation step selects a cooperative
reaction in PSS and expands the reaction to utterance type of CS. The
status of problem solving and declared user's preference are recorded
in mental state by communicative effect step.
{3. Conclusion}
We present an outline of five steps model of cooperative problem
solving dialogue. We show robustness can be implemented by Bayesian
network framework. And also we show cooperativeness can be implemented
by two level processing: surface understanding and deep
understanding. For this purpose, we set Conversational Space and
Problem Solving Space. The role of these two spaces are explained by
five steps total dialogue modeling.
Keywords: dialogue model, speech understanding, plan recognition, Bayesian network