文字間統計情報に基づく口語文字列の自動抽出

延澤 志保, 斎藤 博昭, 中西 正和

doi:10.5715/jnlp.8.3_39

書誌事項

タイトル別名

Automatic Extraction of Oral Expressions Based on Letter Cooccurrence Statistics
モジカントウケイジョウホウニモトヅクコウゴモジレツノジドウチュウシュツ

この論文をさがす

抄録

Researches based on statistical information have been more significant in the field of natural language processing. The use of raw corpora is fascinating, as it is easy to obtain a certain amount of non-tagged texts. However raw corpora often contain unknown words and phrases, and this causes low accuracy of the experiments. Colloquialism has not been worked enough because of this problem, though the processing of colloquialism is strongly required for the emails and other tasks. In this paper we propose a simple method to obtain domain-specific sequences from unrestricted texts using statistical information only. Our method needs a non-tagged training corpus. We use the statistical information drawn from the training corpus to extract semantic character sequences automatically. We had experiments on sequence extraction on email texts, and succeeded in extracting significant semantic sequences in the test corpus. The sequences our system salvaged contain casual terms, proper nouns, and sequences with representation change such as pronunciation extension.

収録刊行物

自然言語処理

自然言語処理 8 (3), 39-57, 2001

一般社団法人　言語処理学会

キーワード

詳細情報詳細情報について

CRID: 1390282679453456512

NII論文ID: 10021991458

NII書誌ID: AN10472659

DOI: 10.5715/jnlp.8.3_39

ISSN: 21858314; 13407619

NDL書誌ID: 5840998

Web Site: https://ndlsearch.ndl.go.jp/books/R000000004-I5840998; http://www.jstage.jst.go.jp/article/jnlp1994/8/3/8_3_39/_pdf

本文言語コード: ja

データソース種別

JaLC
NDL
Crossref
CiNii Articles

抄録ライセンスフラグ: 使用不可

文字間統計情報に基づく口語文字列の自動抽出

書誌事項

この論文をさがす

抄録

収録刊行物

被引用文献 (1)*注記

参考文献 (17)*注記

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

文字間統計情報に基づく口語文字列の自動抽出

書誌事項

この論文をさがす

抄録

収録刊行物

被引用文献 (1)*注記

参考文献 (17)*注記

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について