A Fast Algorithm for Computing Longest Common Subsequences of Small Alphabet Size

この論文にアクセスする

この論文をさがす

著者

    • C.K.Poon
    • Department of Computer Science, University of Hong Kong

抄録

Given two strings of lengths m and n⩾m on an alphabet of size s the longest common subsequence (LCS) problem is to determine the longest subsequence that can be obtained by deleting zero or more symbols from either string. The first ョネmn) algorithm was given by Hirschberg in 1975. The algorithm was later revised to ョネln) where l is the length of an LCS between the two strings. Another strategy given by Hunt and Szymanski takes ョネrlogn) time where r⩽mn is the total number of matches between the two strings. Apostolico and Guerra combined the two approaches and derived an ョネmlogn+dlog(mn/d)) algorithm where d⩽r is the number of dominant matches (minimal candidates) between the two strings. Efficient algorithms for two similar strings were devised by Nakatsu et al.[7]and Myers[61with time complexities of ョネn(m-1)) and ョネn(n-1)) respectively. This paper presents a new algorithm for this problem which requires preprocessing that is nearly standard for the LCS problem and has time and space complexity of ョネns+min{ds lm}) and ョネns+d) respectively. This algorithm is particularly efficient when s (the alphabet size) is small Different data structures are used to obtain variations of the basic algorithm that require different time and space complexities.Given two strings of lengths m and n⩾m on an alphabet of size s, the longest common subsequence (LCS) problem is to determine the longest subsequence that can be obtained by deleting zero or more symbols from either string. The first ョネmn) algorithm was given by Hirschberg in 1975. The algorithm was later revised to ョネln), where l is the length of an LCS between the two strings. Another strategy given by Hunt and Szymanski takes ョネrlogn) time, where r⩽mn is the total number of matches between the two strings. Apostolico and Guerra combined the two approaches and derived an ョネmlogn+dlog(mn/d)) algorithm, where d⩽r is the number of dominant matches (minimal candidates) between the two strings. Efficient algorithms for two similar strings were devised by Nakatsu et al.[7]and Myers[61with time complexities of ョネn(m-1)) and ョネn(n-1)), respectively. This paper presents a new algorithm for this problem, which requires preprocessing that is nearly standard for the LCS problem and has time and space complexity of ョネns+min{ds,lm}) and ョネns+d), respectively. This algorithm is particularly efficient when s (the alphabet size) is small Different data structures are used to obtain variations of the basic algorithm that require different time and space complexities.

Given two strings of lengths m and n⩾m on an alphabet of size s, the longest common subsequence (LCS) problem is to determine the longest subsequence that can be obtained by deleting zero or more symbols from either string. The first Ο(mn) algorithm was given by Hirschberg in 1975. The algorithm was later revised to Ο(ln), where l is the length of an LCS between the two strings. Another strategy given by Hunt and Szymanski takes Ο(rlogn) time, where r⩽mn is the total number of matches between the two strings. Apostolico and Guerra combined the two approaches and derived an Ο(mlogn+dlog(mn/d)) algorithm, where d⩽r is the number of dominant matches (minimal candidates) between the two strings. Efficient algorithms for two similar strings were devised by Nakatsu et al.[7]and Myers[61with time complexities of Ο(n(m-1)) and Ο(n(n-1)), respectively. This paper presents a new algorithm for this problem, which requires preprocessing that is nearly standard for the LCS problem and has time and space complexity of Ο(ns+min{ds,lm}) and Ο(ns+d), respectively. This algorithm is particularly efficient when s (the alphabet size) is small Different data structures are used to obtain variations of the basic algorithm that require different time and space complexities.

収録刊行物

  • Journal of Information Processing

    Journal of Information Processing 13(4), 463-469, 1991-02-10

    一般社団法人情報処理学会

被引用文献:  1件中 1-1件 を表示

各種コード

  • NII論文ID(NAID)
    110002673543
  • NII書誌ID(NCID)
    AA00700121
  • 本文言語コード
    ENG
  • 資料種別
    Article
  • ISSN
    1882-6652
  • データ提供元
    CJP引用  NII-ELS  IPSJ 
ページトップへ