Scalable Detection of Frequent Substrings by Grammar-Based Compression

NAKAHARA Masaya, MARUYAMA Shirou, KUBOYAMA Tetsuji, SAKAMOTO Hiroshi

doi:10.1587/transinf.e96.d.457

抄録

A scalable pattern discovery by compression is proposed. A string is representable by a context-free grammar deriving the string deterministically. In this framework of grammar-based compression, the aim of the algorithm is to output as small a grammar as possible. Beyond that, the optimization problem is approximately solvable. In such approximation algorithms, the compressor based on edit-sensitive parsing (ESP) is especially suitable for detecting maximal common substrings as well as long frequent substrings. Based on ESP, we design a linear time algorithm to find all frequent patterns in a string approximately and prove several lower bounds to guarantee the length of extracted patterns. We also examine the performance of our algorithm by experiments in biological sequences and other compressible real world texts. Compared to other practical algorithms, our algorithm is faster and more scalable with large and repetitive strings.

収録刊行物

IEICE Transactions on Information and Systems

IEICE Transactions on Information and Systems E96.D (3), 457-464, 2013

一般社団法人電子情報通信学会

キーワード

詳細情報詳細情報について

CRID: 1390282679356401920

NII論文ID: 10031167431

NII書誌ID: AA10826272

DOI: 10.1587/transinf.e96.d.457

ISSN: 17451361; 09168532

Web Site: https://www.jstage.jst.go.jp/article/transinf/E96.D/3/E96.D_457/_pdf

本文言語コード: en

データソース種別

JaLC
Crossref
CiNii Articles

抄録ライセンスフラグ: 使用不可

Scalable Detection of Frequent Substrings by Grammar-Based Compression

この論文をさがす

抄録

収録刊行物

被引用文献 (3)*注記

参考文献 (30)*注記

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

Scalable Detection of Frequent Substrings by Grammar-Based Compression

この論文をさがす

抄録

収録刊行物

被引用文献 (3)*注記

参考文献 (30)*注記

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について