情報量と頻度に基づく非同期かつ有用な系列パターンの高速抽出

  • 村田 順平
    山梨大学大学院 医学工学総合教育部 コンピュータ・メディア工学専攻
  • 岩沼 宏治
    山梨大学大学院 医学工学総合研究部
  • 大塚 尚貴
    山梨大学大学院 医学工学総合教育部 コンピュータ・メディア工学専攻

書誌事項

タイトル別名
  • Mining Asynchronous Interesting Sequential Patterns based on Frequency and Self-Information

抄録

In this paper, we propose new methods and gave a system, called IFMAP , for extracting interesting patterns from a long sequential data based on frequency and self-information, and experimentally evaluate the proposed methods in the application of handling a newspaper article corpus.<br>Sequential data mining methods based on frequency have intensively beenstudied so far. These methods, however, are not effective nor valuable for some applications where almost all high-frequent patterns should beregarded just as meaningless noisy patterns.<br> An information-gain concept is quite important in order to restrain these noisy patterns, and was already studied for integrating it with a frequency criteria. Yang et.~al. gave a sequential mining system InfoMiner which can find periodic synchronous patterns being interesting and well-balanced from the both view-points of frequency and self-information. <br> In this paper, we refine and extend the InfoMiner technologies in the following points: firstly, our method can handle ordinary, i.e., asynchronous and non-periodic patterns by using a sliding window mechanism, whereas InfoMiner cannot; secondly we give several combination measures for choosing valuable patterns based on frequency and self-information, while InfoMiner has just one measure which, we show in this paper, is not appropriate nor effective for handling newspaper article corpora; thirdly, we proposed a new unified method for pruning the search space of sequential data mining, which can uniformally be applied to any combination measures proposed here. <br> We conduct experiments for evaluating the effectiveness and efficiency of the proposed method with respect to the runtime and the amount of excluding noisy patterns.

収録刊行物

被引用文献 (1)*注記

もっと見る

参考文献 (13)*注記

もっと見る

関連プロジェクト

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ