Mining Infrequent Patterns of Two Frequent Substrings from a Single Set of Biological Sequences

Daisuke Ikeda

抄録

This paper is devoted to considering mining infrequent patterns from biological sequences. As such a mining algorithm, FPCS (Finding Peculiar Composite Strings) was proposed, where two substrings x and y are decided by given data and their concatenation xy is evaluated in a model-driven manner. Although its effectiveness has already shown, it requires the background set of sequences, in addition to the target set. In this paper, we propose another approach for infrequent patterns, which, given a single set of sequences, finds string patterns of two substrings frequent in the set. Therefore, the proposed approach is simpler than FPCS. Using biological features, such as RNA, of popular bacterial DNA sequences, the effectiveness of the proposed approach is evaluated. For B. subtilis and C. perfringens, the proposed approach can find RNA regions as well as FPCS while it fails to do that for E. coli and S. enterica because FPCS is more finely granular than the proposed approach.

収録刊行物

情報処理学会研究報告. MPS, 数理モデル化と問題解決研究報告

情報処理学会研究報告. MPS, 数理モデル化と問題解決研究報告 2013 (3), 1-4, 2013-07-15

一般社団法人情報処理学会

詳細情報詳細情報について

CRID: 1571698602834295296

NII論文ID: 110009586917

NII書誌ID: AN10505667

ISSN: 09196072

本文言語コード: en

データソース種別

CiNii Articles
KAKEN

Mining Infrequent Patterns of Two Frequent Substrings from a Single Set of Biological Sequences

この論文をさがす

抄録

収録刊行物

関連プロジェクト

詳細情報詳細情報について

書き出し

問題の指摘

Mining Infrequent Patterns of Two Frequent Substrings from a Single Set of Biological Sequences

この論文をさがす

抄録

収録刊行物

関連プロジェクト

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について