書誌事項
- タイトル別名
-
- Automatic Extraction of Oral Expressions Based on Letter Cooccurrence Statistics
- モジ カン トウケイ ジョウホウ ニ モトヅク コウゴ モジレツ ノ ジドウ チュウシュツ
この論文をさがす
抄録
Researches based on statistical information have been more significant in the field of natural language processing. The use of raw corpora is fascinating, as it is easy to obtain a certain amount of non-tagged texts. However raw corpora often contain unknown words and phrases, and this causes low accuracy of the experiments. Colloquialism has not been worked enough because of this problem, though the processing of colloquialism is strongly required for the emails and other tasks. In this paper we propose a simple method to obtain domain-specific sequences from unrestricted texts using statistical information only. Our method needs a non-tagged training corpus. We use the statistical information drawn from the training corpus to extract semantic character sequences automatically. We had experiments on sequence extraction on email texts, and succeeded in extracting significant semantic sequences in the test corpus. The sequences our system salvaged contain casual terms, proper nouns, and sequences with representation change such as pronunciation extension.
収録刊行物
-
- 自然言語処理
-
自然言語処理 8 (3), 39-57, 2001
一般社団法人 言語処理学会
- Tweet
詳細情報 詳細情報について
-
- CRID
- 1390282679453456512
-
- NII論文ID
- 10021991458
-
- NII書誌ID
- AN10472659
-
- ISSN
- 21858314
- 13407619
-
- NDL書誌ID
- 5840998
-
- 本文言語コード
- ja
-
- データソース種別
-
- JaLC
- NDL
- Crossref
- CiNii Articles
-
- 抄録ライセンスフラグ
- 使用不可