Decoder JULIUS

Interface with Acoustic & Language Model
Input
waveform files (16bit PCM), MFCC files (HTK)
Microphone (Sun, SGI, PC/Linux, DAT-LINK/netaudio)
2-pass decoding (tree-trellis search)
1st pass: real-time processing with simple constraint
intermediate: word-trellis with index
2nd pass: stack-decoder to output N-best candidates

Overview of Decoder JULIUS

cross-word phone model language model search approx.

1st pass approximate 2-gram 1-best

2nd pass accurate 3-gram N-best

Decoding Parameters

1st pass: frame-synchronous beam search
- tree-structured lexical entry vs. (partially) linear entry
- 1-best approx. vs. word-pair approx.
- 1-gram factoring vs. 2-gram factoring
- cross-word triphone handling with 1-best approximation
2nd pass: stack decoding search
- word graph vs. word trellis $\rightarrow$ word-trellis index
- beam search vs. best-first search $\rightarrow$ enveloped best-first
- N-best candidates [standard] vs. 1-best candidate [fast]
- accurate cross-word triphone handling
  without delay [standard] vs. with delay [fast]
Others
- beam width
- LM weight
- insertion penalty
Gaussian Pruning in PTM model application
- safe... already computed k-best values as threshold
- beam... set up a beam width in intermediate dimensions
- heuristic... heuristic estimation of yet-to-be-computed dimensions

Next: Japanese Dictation System Up: Specification of Modules Previous: Language Model

Tatsuya Kawahara
5/31/2000