Understanding Spontaneous Speech in Spoken Dialogue
Hiroaki SAITO
Department of Mathematics, Keio University
Yokohama 223, Japan
e-mail: hxs@nak.math.keio.ac.jp
It is hard to write context-free grammar rules to cover free word
order phenomena in spontaneous speech. Even the recognition
performance gets improved, the utterance itself might be
ungrammatical. Thus, it is not practical to pursue precise and
specified syntactic rules.
An approach which combines a context-free principle and case frame
instantiation has been claimed to be robust for ungrammaticality.
This approach, however, is not effective in speech applications,
because its strategy heavily leans on a word which specifies case and
such word is often short and pronounced unstressed, especially in
English. Case frame instantiation also relies on a verb. Depending on
such particular words is risky in speech applications.
For instance, it would be easy to parse a written sentence ``I send a
mail to Smith.'' When that sentence is pronounced, the word `to' is
often hard to be recognized. This causes great difficulty for the
system to understand the sentence, because that tiny word plays an
important role semantically. If the task is small, we can effortlessly
build a knowledge that `send' customarily puts `to' before
destination. As the task gets bigger, however, building such
knowledge gets hard and time-consuming. Thus such knowledge should be
extracted from a corpus automatically.
This research proposes a method which handles syntax loosely and
extracts the meaning of an utterance with the help of co-occurrence of
words as an important semantic information. Co-occurrence information
is obtained by parsing corpus sentences by a generalized LR parser. A
function which detects word connectivity is attached to each rule,
e.g. co-occurrence $<$send, to$>$ is found by rule ``S $-->$ S PP'' and its
semantic function ``(cooccur (x1 head) (x2 prep)).'' This method can
extract syntactic connectivity between words, while the conventional
statistical measure bigram or trigram simply scans surface connection.
It is impossible to write the context-free grammar rules to parse all
the corpus sentences. Thus, the generalized LR parser should be
equipped with the following four error recovery techniques.
Situation: Suppose S is the top state of a parsing stack. Suppose
X is the current input symbol.
Action function `action(S,X)' returns possible actions
(shift, reduce, accept or error) by looking up the action table of
a grammar. Multiple actions might be returned
because of the generalized LR parsing.
Suppose action(Si,Xi) returns `error'
in parsing an input string X1, X2, ..., Xn.
The first three techniques are described for a single word
substitution / deletion / insertion. Of course these techniques can
be applied to manipulation of two or more words. In practice, however,
that may expand search too much. Adapting the gap filling technique
loosely may also blow up search. Thus a heuristics like
``Two consecutive dummy nonterminals must not be created'' should be adopted.
With the four error recovery mechanisms above, co-occurrence data are
extracted from the corpus. Now we see how the data are utilized in parsing
erroneous speech input ``I send a mail Smith.''
Parsing proceeds with no trouble against ``I send a mail.''
In parsing the next word `Smith,' the action function returns error.
The same four error recovery mechanisms are also used in this parsing
phase. Although the actual action depends on the grammar, [word
insertion] interposes a preposition. If the terminal symbols of the grammar
are lexicons, the actual prepositions are chosen. If terminals are
parts of speech, a preterminal like *preposition is chosen.
In either case, if the co-occurrence data show that `to' is
frequently used after `send' in a particular rule, `to' can
be inserted with high probability.
Checking word co-occurrence in the semantic action in each rule
enhances robustness of our error-recoverable generalized LR parser
especially for ungrammaticalities and ellipses in spontaneous speech.
----------------------------------------------------------------------
As a related research, a parser generator called NLyacc has been
announced as a free software. NLyacc accepts an arbitrary
context-free grammar (cyclic rules like A $-->$ A are excluded)
written in the yacc format and produces its generalized LR parser.
NLyacc, unlike yacc, accepts multiple values from a lexical analyzer,
which is useful for handling ambiguous parts of speech of a word
in natural language applications.
Keywords: spontaneous speech, parsing, word cooccurrence