A Linguistics-Driven Approach to Statistical Parsing for Low-Resourced Languages

BOONKWAN Prachya, SUPNITHI Thepchai

doi:10.1587/transinf.2014dap0024

抄録

Developing a practical and accurate statistical parser for low-resourced languages is a hard problem, because it requires large-scale treebanks, which are expensive and labor-intensive to build from scratch. Unsupervised grammar induction theoretically offers a way to overcome this hurdle by learning hidden syntactic structures from raw text automatically. The accuracy of grammar induction is still impractically low because frequent collocations of non-linguistically associable units are commonly found, resulting in dependency attachment errors. We introduce a novel approach to building a statistical parser for low-resourced languages by using language parameters as a guide for grammar induction. The intuition of this paper is: most dependency attachment errors are frequently used word orders which can be captured by a small prescribed set of linguistic constraints, while the rest of the language can be learned statistically by grammar induction. We then show that covering the most frequent grammar rules via our language parameters has a strong impact on the parsing accuracy in 12 languages.

収録刊行物

IEICE Transactions on Information and Systems

IEICE Transactions on Information and Systems E98.D (5), 1045-1052, 2015

一般社団法人電子情報通信学会

キーワード

詳細情報詳細情報について

CRID: 1390001204377424896

NII論文ID: 130005067748

DOI: 10.1587/transinf.2014dap0024

ISSN: 17451361; 09168532

Web Site: https://www.jstage.jst.go.jp/article/transinf/E98.D/5/E98.D_2014DAP0024/_pdf

本文言語コード: en

データソース種別

JaLC
Crossref
CiNii Articles

抄録ライセンスフラグ: 使用不可

A Linguistics-Driven Approach to Statistical Parsing for Low-Resourced Languages

抄録

収録刊行物

参考文献 (19)*注記

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

A Linguistics-Driven Approach to Statistical Parsing for Low-Resourced Languages

抄録

収録刊行物

参考文献 (19)*注記

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について