データ研磨によるクリーク列挙クラスタリング Clustering by Clique Enumeration and Data Cleaning like Method

この論文にアクセスする

この論文をさがす

著者

抄録

近年の IT 技術の発達により,ビッグデータを用いたデータ解析はますますその重要性を増している.しかし,ビッグデータ解析には,データの大きさ以外にも多様性という大きな困難がある.多様なデータは,それぞれ異なる特徴を持つグループから構成されているため,全体的に解析することが困難であり,まずグループ構造の解明が重要である.既存のクラスタリング手法やパターンマイニングによってグループ構造の解明にアプローチすると,解が大量,少数のグループしか見つけられない,類似する大量の解を生成,見つかるグループの大きさに大きなばらつきがある,計算コストが大きすぎる,といった難点にぶつかることになる.本稿では,グラフクラスタリング問題に対して,そもそもデータがどのようになっていればグループ構造が抽出しやすいかを考え,ノイズの少ない明確なデータを定義し,ノイズ混じりの生データを,そのグループ構造を壊さないように明確なデータへと変換する,データ研磨という手法を紹介する.また,データ研磨アルゴリズムとデータ研磨を行ったグラフが持つ数理的な構造を紹介し,将来的に 「明確なデータ」 を研究するための礎とする.Recent development on information technology has made bigdata analysis more familiar in research and industrial areas. However, bigdata has big difficulties on diversity, other than its huge size. Data with much diversity is usually composed of many groups each of those has its original feature, thus data analysis from the global structures usually fails to capture the details of the data. To analyze the data correctly, capturing the group structure is important. Existing clustring algorithms and pattern mining algorithms aim to extract the group structures from the data. However, they usually find huge number of solutions, too few groups, many similar groups, or groups with large biased sizes, and often take long computation time. In this paper, we address the graph clustering problem, and discuss what are good graphs in that we can easily capture the group structures. From the discussion, we define a graph class that is a model of noiseless clarified graph. We then propose a data cleaning like method that modifies the given data graph to a clarified graph without breaking the group structures. We also show some mathematical and algorithmic properties for the graph class and the modifying algorithm.

Recent development on information technology has made bigdata analysis more familiar in research and industrial areas. However, bigdata has big difficulties on diversity, other than its huge size. Data with much diversity is usually composed of many groups each of those has its original feature, thus data analysis from the global structures usually fails to capture the details of the data. To analyze the data correctly, capturing the group structure is important. Existing clustring algorithms and pattern mining algorithms aim to extract the group structures from the data. However, they usually find huge number of solutions, too few groups, many similar groups, or groups with large biased sizes, and often take long computation time. In this paper, we address the graph clustering problem, and discuss what are good graphs in that we can easily capture the group structures. From the discussion, we define a graph class that is a model of noiseless clarified graph. We then propose a data cleaning like method that modifies the given data graph to a clarified graph without breaking the group structures. We also show some mathematical and algorithmic properties for the graph class and the modifying algorithm.

収録刊行物

  • 研究報告アルゴリズム(AL)

    研究報告アルゴリズム(AL) 2014-AL-146(2), 1-8, 2014-01-23

    一般社団法人情報処理学会

キーワード

各種コード

  • NII論文ID(NAID)
    110009659424
  • NII書誌ID(NCID)
    AN1009593X
  • 本文言語コード
    JPN
  • 資料種別
    Technical Report
  • ISSN
    09196072
  • データ提供元
    NII-ELS  IPSJ 
ページトップへ