大規模分布類似度計算のためのベイズ手法を用いた新しい類似尺度

風間, 淳一, ステインデ・サーガ, 黒田, 航, 村田, 真樹, 鳥澤健太郎

書誌事項

タイトル別名

ダイキボブンプルイジドケイサンノタメノベイズシュホウオモチイタアタラシイルイジシャクド
A Bayesian Similarity Measure for Large-scale Calculation of Distributional Similarities

この論文をさがす

抄録

これまで提案されている語の意味的類似度尺度は，文脈プロファイルを限られた量のデータから点推定で求めて利用していることから，データスパースネスに対して頑健ではない．本論文は，ベイズ推定の手法を用いた頑健な意味的類似度計算方法を提案する．提案手法は，ベイズ推定により得られた文脈プロファイルの分布の下で元となる類似度の期待値をとることにより類似度を計算する．文脈プロファイルが多項分布で表現され，ベイズ推定における事前分布がDirichlet分布であり，元となる類似度がBhattacharyya係数である場合，この方法は解析解を持ち，効率的に計算できる．日本語の大規模語彙に対する類似度計算において，提案手法が既存のよく知られた意味的類似度尺度よりも優れていることを実験で示す．

Existing word similarity measures are not robust to data sparseness since they rely only on the point estimation of words' context profiles obtained from a limited amount of data. This paper proposes a Bayesian method for robust distributional word similarities. The method uses a distribution of context profiles obtained by Bayesian estimation and takes the expectation of a base similarity measure under that distribution. When the context profiles are multinomial distributions, the priors are Dirichlet, and the base measure is the Bhattacharyya coefficient, we can derive an analytical form that allows efficient calculation. For the task of word similarity estimation for a large-scale vocabulary in Japanese, we show that the proposed measure gives better accuracies than other well-known similarity measures.

収録刊行物

情報処理学会論文誌

情報処理学会論文誌 52 (12), 3349-3362, 2011-12-15

東京 : 情報処理学会

詳細情報詳細情報について

CRID: 1050564287854162432

NII論文ID: 110008719913

NII書誌ID: AN00116647

ISSN: 18827764; 18827837; 03875806

NDL書誌ID: 023426495

Web Site: http://id.nii.ac.jp/1001/00079541/; https://ndlsearch.ndl.go.jp/books/R000000004-I023426495

本文言語コード: ja

資料種別: journal article

データソース種別

IRDB
NDL
CiNii Articles

大規模分布類似度計算のためのベイズ手法を用いた新しい類似尺度

書誌事項

この論文をさがす

抄録

収録刊行物

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

大規模分布類似度計算のためのベイズ手法を用いた新しい類似尺度

書誌事項

この論文をさがす

抄録

収録刊行物

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について