- Interface with Acoustic & Language Model
- Input
waveform files (16bit PCM), MFCC files (HTK)
Microphone (Sun, SGI, PC/Linux, DAT-LINK/netaudio)
- 2-pass decoding (tree-trellis search)
1st pass: real-time processing with simple constraint
intermediate: word-trellis with index
2nd pass: stack-decoder to output N-best candidates
Overview of Decoder JULIUS
| cross-word phone model | language model | search approx. |
1st pass | approximate | 2-gram | 1-best |
2nd pass | accurate | 3-gram | N-best |
Decoding Parameters
- 1st pass: frame-synchronous beam search
- tree-structured lexical entry vs. (partially) linear entry
- 1-best approx. vs. word-pair approx.
- 1-gram factoring vs. 2-gram factoring
- cross-word triphone handling with 1-best approximation
- 2nd pass: stack decoding search
- word graph vs. word trellis word-trellis index
- beam search vs. best-first search enveloped best-first
- N-best candidates [standard] vs. 1-best candidate [fast]
- accurate cross-word triphone handling
without delay [standard] vs. with delay [fast]
- Others
- beam width
- LM weight
- insertion penalty
- Gaussian Pruning in PTM model application
- safe... already computed k-best values as threshold
- beam... set up a beam width in intermediate dimensions
- heuristic... heuristic estimation of yet-to-be-computed dimensions
Next: Japanese Dictation System
Up: Specification of Modules
Previous: Language Model
Tatsuya Kawahara
5/31/2000