WFSTに基づく確率文脈自由文法およびその拡張文法の高速EM学習法

  • 亀谷 由隆
    東京工業大学大学院情報理工学研究科計算工学
  • 森 高志
    東京工業大学大学院情報理工学研究科計算工学
  • 佐藤 泰介
    東京工業大学大学院情報理工学研究科計算工学

書誌事項

タイトル別名
  • Efficient EM learning of probabilistic CFGs and their extensions by using WFSTs
  • WFST ニ モトヅク カクリツ ブンミャク ジユウ ブンポウ オヨビ ソノ カクチョウ ブンポウ ノ コウソク EM ガクシュウホウ

この論文をさがす

抄録

Probabilistic context-free grammars (PCFGs) are a widely-known class of statistical language models. The Inside-Outside (I-O) algorithm is also well-known as an efficient EM algorithm tailored for PCFGs. Although the algorithm requires only inexpensive linguistic resources, there remains a problem in its efficiency. In this paper, we present a new framework for efficient EM learning of PCFGs in which the parser is separated from the EM algorithm, assuming the underlying CFG is given. A new EM procedure exploits the compactness of WFSTs (well-formed substring tables) generated by the parser. Our framework is quite general in the sense that the input grammar need not to be in Chomsky normal form (CNF) while the new EM algorithm is equivalent to the I-O algorithm in the CNF case. In addition, we propose a polynomial-time EM procedure for CFGs with context-sensitive probabilities, and report experimental results with ATR corpus and a hand-crafted Japanese grammar.

収録刊行物

  • 自然言語処理

    自然言語処理 8 (1), 49-84, 2001

    一般社団法人 言語処理学会

被引用文献 (3)*注記

もっと見る

参考文献 (26)*注記

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ