A Method for Syntactic Behavior Analysis

この論文にアクセスする

この論文をさがす

著者

    • HOGENHOUT Wide R.
    • This work was carried out while the first author was affiliated to the Nara Institute of Science and Technology

抄録

We show how a treebank can be used to cluster words on the basis of their syntactic behavior. By extracting statistics on the structures in which words appear it is possible to discover similarities and differences in usage between words with the same part-of-speech. This clustering is compared to the conventional clustering based on co-occurrences. While conventional clustering can discover semantical similarities or the tendency to appear together, the method we present ignores these factors and places the focus on syntactical usage, in other words the sort of structures it appears in. We present a case study on prepositions, showing how they can be automatically subdivided by their syntactic behavior and we discuss the appropriateness of such a subdivision. We have also carried out experiments to compare the quality of clusters quantitatively. For this goal we used clusters based on syntactic behavior for improving the estimation of the distribution of the dependency relation between words. Since such a distribution is necessarily estimated with sparse data, an entropy test can show how informative the classes are about syntactic usage. Finally, we discuss a number of ways in which a classification of words can contribute to applications of natural language processing.

収録刊行物

  • 自然言語処理 = Journal of natural language processing

    自然言語処理 = Journal of natural language processing 5(2), 25-46, 1998-04-10

    一般社団法人 言語処理学会

参考文献:  21件中 1-21件 を表示

各種コード

  • NII論文ID(NAID)
    10008827576
  • NII書誌ID(NCID)
    AN10472659
  • 本文言語コード
    ENG
  • 資料種別
    ART
  • ISSN
    13407619
  • データ提供元
    CJP書誌  J-STAGE 
ページトップへ