近傍ハッシュ法によるエラー許容頻出パターン列挙(一般セッション3) Neighborhood hashing for enumerating all frequent patterns allowing errors

Abstract

ゲノム配列から,ある機能発現に関わる遺伝子を発見したいという要望がある.このような遺伝子は対象となるゲノム配列集合においてエラーを含んだ形で類出する配列パターンの形をとることがしばしばある.本研究では,エラーを含む文字列集合から,ある長さの頻出部分文字列パターンを高速に全列挙するアルゴリズムを提案する.エラーの影響から通常のパターン列挙と異なり,入力文字列には現れないパターンも列挙の対象となる.提案手法ではパターン頻出性の必要条件を利用して最小限の候補パターンをハッシュ格納することにより,高速な全列挙を実現する.

We propose a practically fast algorithm that enumerates all m-length substring patterns appearing in at least θ sequences among a given set of string sequences, where at most k errors are allowed for each appearance. The problem of enumerating such substring patterns is derived from the genome science, where frequent substring patterns allowing errors are candidates of a gene related to a certain function. From this context, some pattern should be enumerated even if it does not appear in any sequence, because it may match a sufficient number of sequences by allowing errors. In order to prevent overlooking such potential patterns, we propose a hash-based enumeration algorithm. The algorithm stores not only a substring pattern appearing in a sequence but also its neighboring patterns. By using several techniques/conditions to exclude non-frequent patterns, our algorithm achieves an efficient enumeration of frequent substring patterns allowing errors.

Journal

IPSJ SIG technical reports   [List of Volumes]

IPSJ SIG technical reports 2008(58), 63-66, 2008-06-19  [Table of Contents]

Information Processing Society of Japan (IPSJ)

References:  3

You must have a user ID to see the references.If you already have a user ID, please click "Login" to access the info.New users can click "Sign Up" to register for an user ID.

Preview

Preview

Codes

  • NII Article ID (NAID) :
    110006839053
  • NII NACSIS-CAT ID (NCID) :
    AA12055912
  • Text Lang :
    ENG
  • Article Type :
    ART
  • ISSN :
    09196072
  • NDL Article ID :
    9570405
  • NDL Source Classification :
    ZM13(科学技術--科学技術一般--データ処理・計算機)
  • NDL Call No. :
    Z14-1121
  • Databases :
    CJP  NDL  NII-ELS 

Export