Mining Infrequent Patterns of Two Frequent Substrings from a Single Set of Biological Sequences
-
- Daisuke Ikeda
- Department of Informatics, Kyushu University
この論文をさがす
抄録
This paper is devoted to considering mining infrequent patterns from biological sequences. As such a mining algorithm, FPCS (Finding Peculiar Composite Strings) was proposed, where two substrings x and y are decided by given data and their concatenation xy is evaluated in a model-driven manner. Although its effectiveness has already shown, it requires the background set of sequences, in addition to the target set. In this paper, we propose another approach for infrequent patterns, which, given a single set of sequences, finds string patterns of two substrings frequent in the set. Therefore, the proposed approach is simpler than FPCS. Using biological features, such as RNA, of popular bacterial DNA sequences, the effectiveness of the proposed approach is evaluated. For B. subtilis and C. perfringens, the proposed approach can find RNA regions as well as FPCS while it fails to do that for E. coli and S. enterica because FPCS is more finely granular than the proposed approach.
収録刊行物
-
- 情報処理学会研究報告. MPS, 数理モデル化と問題解決研究報告
-
情報処理学会研究報告. MPS, 数理モデル化と問題解決研究報告 2013 (3), 1-4, 2013-07-15
一般社団法人情報処理学会
- Tweet
詳細情報 詳細情報について
-
- CRID
- 1571698602834295296
-
- NII論文ID
- 110009586917
-
- NII書誌ID
- AN10505667
-
- ISSN
- 09196072
-
- 本文言語コード
- en
-
- データソース種別
-
- CiNii Articles
- KAKEN