類似度の高いサブクラスタに基づく名詞クラスタリング(「ユニバーサルコミュニケーションを実現するための言語処理技術」シンポジウム)  [in Japanese] Extraction of Noun Synonyms and Other Related Words Using Dense-Subclusters  [in Japanese]

Abstract

PantelらがCBCという類似度の高いサブクラスタをあらかじめ作成しておく事でサブクラスタに基づいた揺れの少ない統合と語義を考慮した再統合を行うクラスタリング手法を提案したが,本研究ではCBCを基に係り受けパターンを利用した名詞クラスタリングを行い同義語・類義語クラスタの獲得を目指す.本論文ではCBCの既存の式ではなく確率分布を用いた類似度計算式(Jensen-Shannon)の使用,並びにサブクラスタ候補を決定する新しいスコアリング方法を用いた日本語の名詞クラスタリング手法を提案する.毎日新聞94年度1年分を用いてCBCに用いられる類似度計算式とJensen-Shannonの比較を行いJensen-Shannonの有効性を示し,さらにスコアリング式をいくつかのパターンで提案・比較を行い適切にサブクラスタ候補を決定するスコアリング方法を求める.

In this paper we propose a noun clustering approach on the basis of CBC proposed by Pantel. CBC is a clustering approach that carefully extracts clusters by finding sub-clusters regarded as committees with the same meanings, and try to extract unknown clusters from the remaining elements. In preliminary experiments of Japanese noun clustering, however, we found that CBC does not work well at the measurement of basic similarity between words with context vectors and scoring method that decides to merge sub-clusters. To these problems in this paper we propose to apply Jensen-Shannon formula as a measurement and a new scoring method. In the experimental results of constructing sub-clusters of Japanese nouns from a new paper article we will show that our proposed approaches overcome the approaches in CBC at the clustering accuracy.

Journal

IEICE technical report. Natural language understanding and models of communication   [List of Volumes]

IEICE technical report. Natural language understanding and models of communication 108(408), 31-35, 2009-01-19  [Table of Contents]

The Institute of Electronics, Information and Communication Engineers

References:  8

You must have a user ID to see the references.If you already have a user ID, please click "Login" to access the info.New users can click "Sign Up" to register for an user ID.

Preview

Preview

Codes

  • NII Article ID (NAID) :
    110007138259
  • NII NACSIS-CAT ID (NCID) :
    AN10091225
  • Text Lang :
    JPN
  • Article Type :
    ART
  • ISSN :
    09135685
  • NDL Article ID :
    9794021
  • NDL Source Classification :
    ZN33(科学技術--電気工学・電気機械工業--電子工学・電気通信)
  • NDL Call No. :
    Z16-940
  • Databases :
    CJP  NDL  NII-ELS 

Export