語の共起確率に基づく係り受け解析とその評価

藤尾, 正和, 松本, 裕治

本論文では，粗い日本語係り受け解析手法として，語の共起確率に基づく係り受け解析手法を提案し，その評価を行う．学習および評価用コーパスとしてEDRコーパスを使用し，文節および文単位の係り受け精度を調べる．またどのような係り受け関係名において誤りが多いのか調べるため，関係名ごとの解析精度も調べる．英語において，比較的近いモデルおよび情報を用いたCollins? (1996)のモデルと文節単位の係り受け精度を比較した結果，EDRコーパスを使用した日本語解析に関しては，我々のモデルの精度がCollinsのモデルを上まわった．また，現状の統計モデルのもとでさらに解析精度を上げるため，再現率を犠牲にして適合率を上げる手法（部分解析），および適合率を犠牲にして再現率を上げる手法（冗長解析手法）についても提案する．``確信度''（乾ら，1998）を使用した Globalのほか，Local/norm，Ratio/nextの3つの手法について評価を行った結果，少くとも我々の統計モデルを使用する場合，解析精度，速度などを考慮するとRatio/nextが優れているということが分かった．

We present statistical models of Japanese dependency analysis based onlexical collocation probability.We use the EDR corpus for both training and evaluation,and evaluate the precision of the models in terms of correct dependencypairs and correct sentences.We measure the correct rate of dependencypairs for each type of dependency relation.To achieve higher performance under the current statistical parsingmodel, we propose a method that intend to acquire higher precision rateat the cost of recall rate (partial parse), and the method to acquirehigher recall rate at the cost of precision rate (redundant parse).We propose and compare three partial (redundant) parse methods,Global, Local/norm, Ratio/next, and find that Ratio/next is superior to others among our methods.

語の共起確率に基づく係り受け解析とその評価

書誌事項

この論文をさがす

抄録

収録刊行物

被引用文献 (14)*注記

参考文献 (17)*注記

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

語の共起確率に基づく係り受け解析とその評価

書誌事項

この論文をさがす

抄録

収録刊行物

被引用文献 (14)*注記

参考文献 (17)*注記

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について