階層Pitman-Yor 過程に基づく可変長n-gram 言語モデル  [in Japanese] Bayesian Variable Order n-gram Language Model Based on Hierarchical Pitman-Yor Processes  [in Japanese]

Access this Article

Search this Article

Author(s)

Abstract

本論文では,n-gram 分布の階層的生成モデルである階層Pitman-Yor 過程を拡張することで,各単語の生まれた隠れたマルコフ過程のオーダを自動的に推定し,適切な文脈を用いる可変長n-gram言語モデルを提案する.無限の深さを持つ予測接尾辞木上の確率過程を考えることにより,句を確率的に発見し,適切な文脈長を学習することができる.これにより,従来不可能だった高次nグラムの学習が可能になる.本手法は言語モデルだけでなく,マルコフモデル一般について,そのオーダをデータから推定できる可変長生成モデルとなっている.英語および日本語の標準的なコーパスでの実験により,提案法の有効性を確認した.This paper proposes a variable order n-gram language model by extending a recently proposed model based on the hierarchical Pitman-Yor processes. Introducing a stochastic process on an infinite depth prediction suffix tree, we can infer the hidden n-gram context from which each word originated. Experiments on standard large corpora showed validity and efficiency of the proposed model. Our architecture is also applicable to general Markov models to estimate their variable orders of generation.

This paper proposes a variable order n-gram language model by extending a recently proposed model based on the hierarchical Pitman-Yor processes. Introducing a stochastic process on an infinite depth prediction suffix tree, we can infer the hidden n-gram context from which each word originated. Experiments on standard large corpora showed validity and efficiency of the proposed model. Our architecture is also applicable to general Markov models to estimate their variable orders of generation.

Journal

  • IPSJ journal

    IPSJ journal 48(12), 4023-4032, 2007-12-15

    Information Processing Society of Japan (IPSJ)

References:  29

Codes

  • NII Article ID (NAID)
    110006531976
  • NII NACSIS-CAT ID (NCID)
    AN00116647
  • Text Lang
    JPN
  • Article Type
    Journal Article
  • ISSN
    1882-7764
  • NDL Article ID
    9303533
  • NDL Source Classification
    ZM13(科学技術--科学技術一般--データ処理・計算機)
  • NDL Call No.
    Z14-741
  • Data Source
    CJP  NDL  NII-ELS  IPSJ 
Page Top