Language Model


Training...Mainichi newspaper article texts

45 month 75 month
period '91/01-'94/09 '91/01-'94/09
'95/01-'97/06
data amount 65M words 118M words


Language Model Compression

    Baseline model (cutoff-1-1)


List of 20K Language Models

2-gram 3-gram
entries entries
45month cutoff-1-1 1,238,929 4,733,916
45month cutoff-4-4 657,759 1,593,020
45month compress10% 1,238,929 473,176
75month cutoff-1-1 1,675,803 7,445,209
75month cutoff-4-4 901,475 2,629,605
75month compress10% 1,675,803 744,438


List of 60K Language Models

2-gram 3-gram
entries entries
75month cutoff-1-1 2,420,231 8,368,507
75month compress10% 2,420,231 836,852

backward 3-gram (for forward-backward search)


next up previous
Next: Decoder JULIUS Up: Specification of Modules Previous: Morphological Analysis
Tatsuya Kawahara
5/31/2000