Improving the accuracy of link prediction by combining similarities of node pairs

  • Motoda Takeshi
    Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology
  • Murata Tsuyoshi
    Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology

Bibliographic Information

Other Title
  • 頂点間の類似度の足し合わせによるリンク予測精度の改善

Abstract

Recently, network analysis has been intensively investigated in several fields of science. Link prediction is a problem of predicting the existence of a link between two entities based on observed links, and it is one of the popular link mining tasks. Although many link prediction methods have been proposed, they have their merits and demerits. In this paper, we present two topics as follows: 1) In order to obtain the strategies of selecting the best link prediction methods, we perform experiments of six link prediction methods (Common Neighbors (CN) , Jaccard's Coefficient (JC) , Adamic/Adar (AA) , Shortest Path (SP) , Preferential Attachment (PA) and Hierarchical Random Graph (HRG) ) for 39 real networks. 2) We propose a new similarity that is the summation of similarities based on the logistic regression. We used 10-fold cross validation and bagging for model selection of proposed method. We estimate the accuracy and computation time of HRG, proposed method (bagging) and proposed method (10-fold cross validation) for 28 data sets. As a result of 1) , CN, JC and AA achieve good performance for the networks that has higher clustering coefficient than 0.4. SP achieves good performance for the network that has higher average shortest path length than 3. PA underperforms the random predictor for the network has lower variance of degrees than 0.5. HRG performs consistently well. As a result of 2) , accuracy of proposed methods (both of bagging and 10-fold cross validation) are reached higher than the accuracy of HRG for 17 data sets and finishes the calculation faster than HRG. Proposed methods perform good accuracy for social network, citation network, dictionary network, biological network and transfer network (journey). Proposed methods underperform for trade network, circuit network, and food web network. Sometimes, proposed method (bagging) reaches higher accuracy than the accuracy of proposed method (10-fold cross validation). Proposed method (10-fold cross validation) finishes the calculation faster than proposed method (bagging). In conclusion, proposed methods finish the calculation faster than HRG and accuracy of proposed methods reaches higher than HRG.

Journal

Citations (1)*help

See more

References(5)*help

See more

Related Projects

See more

Details 詳細情報について

Report a problem

Back to top