頂点間の類似度の足し合わせによるリンク予測精度の改善

  • 元田 剛史
    東京工業大学 大学院情報理工学研究科 計算工学専攻
  • 村田 剛志
    東京工業大学 大学院情報理工学研究科 計算工学専攻

書誌事項

タイトル別名
  • Improving the accuracy of link prediction by combining similarities of node pairs

抄録

Recently, network analysis has been intensively investigated in several fields of science. Link prediction is a problem of predicting the existence of a link between two entities based on observed links, and it is one of the popular link mining tasks. Although many link prediction methods have been proposed, they have their merits and demerits. In this paper, we present two topics as follows: 1) In order to obtain the strategies of selecting the best link prediction methods, we perform experiments of six link prediction methods (Common Neighbors (CN) , Jaccard's Coefficient (JC) , Adamic/Adar (AA) , Shortest Path (SP) , Preferential Attachment (PA) and Hierarchical Random Graph (HRG) ) for 39 real networks. 2) We propose a new similarity that is the summation of similarities based on the logistic regression. We used 10-fold cross validation and bagging for model selection of proposed method. We estimate the accuracy and computation time of HRG, proposed method (bagging) and proposed method (10-fold cross validation) for 28 data sets. As a result of 1) , CN, JC and AA achieve good performance for the networks that has higher clustering coefficient than 0.4. SP achieves good performance for the network that has higher average shortest path length than 3. PA underperforms the random predictor for the network has lower variance of degrees than 0.5. HRG performs consistently well. As a result of 2) , accuracy of proposed methods (both of bagging and 10-fold cross validation) are reached higher than the accuracy of HRG for 17 data sets and finishes the calculation faster than HRG. Proposed methods perform good accuracy for social network, citation network, dictionary network, biological network and transfer network (journey). Proposed methods underperform for trade network, circuit network, and food web network. Sometimes, proposed method (bagging) reaches higher accuracy than the accuracy of proposed method (10-fold cross validation). Proposed method (10-fold cross validation) finishes the calculation faster than proposed method (bagging). In conclusion, proposed methods finish the calculation faster than HRG and accuracy of proposed methods reaches higher than HRG.

収録刊行物

被引用文献 (1)*注記

もっと見る

参考文献 (5)*注記

もっと見る

関連プロジェクト

もっと見る

キーワード

詳細情報 詳細情報について

問題の指摘

ページトップへ