教師あり学習に基づく<i>l</i><sub>1</sub>正則化を用いた計量行列の学習法に関する一考察  [in Japanese] A Study of Learning a Sparse Metric Matrix using <i>l</i><sub>1</sub> Regularization Based on Supervised Learning  [in Japanese]

Access this Article

Search this Article

Author(s)

    • 三川 健太 MIKAWA Kenta
    • 早稲田大学創造理工学部経営システム工学科 Department of Industrial and Management Systems Engineering, School of Creative Science and Engineering, Waseda University
    • 小林 学 KOBAYASHI Manabu
    • 湘南工科大学工学部情報工学科 Department Information Management Science, Shonan Institute of Technology
    • 後藤 正幸 GOTO Masayuki
    • 早稲田大学創造理工学部経営システム工学科 Department of Industrial and Management Systems Engineering, School of Creative Science and Engineering, Waseda University

Abstract

データの統計的特徴を考慮した距離構造を学習する方法論としてメトリックラーニングが知られており,そのための様々な手法が提案されている.メトリックラーニングはマハラノビス距離におけるマハラノビス行列(以下,計量行列)を学習するための手法であるが,パラメータ数が入力データの次元数の2乗に比例することが知られている.加えて,学習に要するデータの数も同様に増加してしまうため,高次元データを用いた場合には非常に多くのデータを用意する必要がある.本研究では,計量行列のパラメータ数を減少させるための方法として<i>l</i><sub>1</sub>正則化に基づくアプローチを採用し,ADMM (Alternating Direction Method of Multiplier) を用いた最適な計量行列の導出方法を示す.提案手法を高次元,スパースなデータセット,ならびに低次元,密なデータセットそれぞれについて適用し,その有効性について示す.

In this paper, we focus on classification problems based on the vector space model. As one of the methods, distance metric learning which estimates an appropriate metric matrix for classification by using the iterative optimization procedure is known as an effective method. However, the distance metric learning for high dimensional data tends to cause the problems of overfitting to a training dataset and longer computational time. In addition, the number of parameters that need to be estimated is in proportion to the square of the input data dimension. Therefore, if the dimension of input data becomes high, the number of training data to acquire a metric matrix with enough accuracy becomes enormous. Especially, these problems are caused when analyzing the document data and purchase history data stored in the EC site with high dimensional and sparse structure. To avoid these problems, we propose the method of <i>l</i><sub>1</sub> regularized distance metric learning by introducing the alternating direction method of multiplier (ADMM) algorithm. The effectiveness of our proposed method is clarified by classification experiments using a newspaper article that has a highly dimensional and sparse structure and the UCI machine learning repository, which has a low and dense structure.

Journal

  • Journal of Japan Industrial Management Association

    Journal of Japan Industrial Management Association 66(3), 230-239, 2015

    Japan Industrial Management Association

Codes

  • NII Article ID (NAID)
    130005107068
  • NII NACSIS-CAT ID (NCID)
    AN10561806
  • Text Lang
    JPN
  • ISSN
    1342-2618
  • NDL Article ID
    026816796
  • NDL Call No.
    Z4-298
  • Data Source
    NDL  J-STAGE 
Page Top