Automatic Extraction of Oral Expressions Based on Letter Cooccurrence Statistics

NOBESAWA SHIHO, SAITO HIROAKI, NAKANISHI MASAKAZU

doi:10.5715/jnlp.8.3_39

Bibliographic Information

Other Title

文字間統計情報に基づく口語文字列の自動抽出
モジカントウケイジョウホウニモトヅクコウゴモジレツノジドウチュウシュツ

Search this article

Abstract

Researches based on statistical information have been more significant in the field of natural language processing. The use of raw corpora is fascinating, as it is easy to obtain a certain amount of non-tagged texts. However raw corpora often contain unknown words and phrases, and this causes low accuracy of the experiments. Colloquialism has not been worked enough because of this problem, though the processing of colloquialism is strongly required for the emails and other tasks. In this paper we propose a simple method to obtain domain-specific sequences from unrestricted texts using statistical information only. Our method needs a non-tagged training corpus. We use the statistical information drawn from the training corpus to extract semantic character sequences automatically. We had experiments on sequence extraction on email texts, and succeeded in extracting significant semantic sequences in the test corpus. The sequences our system salvaged contain casual terms, proper nouns, and sequences with representation change such as pronunciation extension.

Journal

Journal of Natural Language Processing

Journal of Natural Language Processing 8 (3), 39-57, 2001

The Association for Natural Language Processing

Details 詳細情報について

CRID: 1390282679453456512

NII Article ID: 10021991458

NII Book ID: AN10472659

DOI: 10.5715/jnlp.8.3_39

ISSN: 21858314; 13407619

NDL BIB ID: 5840998

Web Site: https://ndlsearch.ndl.go.jp/books/R000000004-I5840998; http://www.jstage.jst.go.jp/article/jnlp1994/8/3/8_3_39/_pdf

Text Lang: ja

Data Source

JaLC
NDL
Crossref
CiNii Articles

Abstract License Flag: Disallowed

Export

Automatic Extraction of Oral Expressions Based on Letter Cooccurrence Statistics

Bibliographic Information

Search this article

Abstract

Journal

Citations (1)*help

References(17)*help

Keywords

Details 詳細情報について

Export

Report a problem

Automatic Extraction of Oral Expressions Based on Letter Cooccurrence Statistics

Bibliographic Information

Search this article

Abstract

Journal

Citations (1)*help

References(17)*help

Keywords

Details 詳細情報について

Export

Report a problem

Project list