Mining Infrequent Patterns of Two Frequent Substrings from a Single Set of Biological Sequences

この論文をさがす

抄録

This paper is devoted to considering mining infrequent patterns from biological sequences. As such a mining algorithm, FPCS (Finding Peculiar Composite Strings) was proposed, where two substrings x and y are decided by given data and their concatenation xy is evaluated in a model-driven manner. Although its effectiveness has already shown, it requires the background set of sequences, in addition to the target set. In this paper, we propose another approach for infrequent patterns, which, given a single set of sequences, finds string patterns of two substrings frequent in the set. Therefore, the proposed approach is simpler than FPCS. Using biological features, such as RNA, of popular bacterial DNA sequences, the effectiveness of the proposed approach is evaluated. For B. subtilis and C. perfringens, the proposed approach can find RNA regions as well as FPCS while it fails to do that for E. coli and S. enterica because FPCS is more finely granular than the proposed approach.

収録刊行物

関連プロジェクト

もっと見る

詳細情報 詳細情報について

  • CRID
    1571698602834295296
  • NII論文ID
    110009586917
  • NII書誌ID
    AN10505667
  • ISSN
    09196072
  • 本文言語コード
    en
  • データソース種別
    • CiNii Articles
    • KAKEN

問題の指摘

ページトップへ