Under-representation of repetitive sequences in whole-genome shotgun sequence databases: an illustration using a recently acquired transposable element.

Access this Article

Author(s)

Abstract

It is widely accepted in a conceptual framework that repetitive sequences, especially those with high sequence homogeneity among copies, tend to be under-represented in whole-genome shotgun sequence databases, because of the difficulty of assembling sequence reads into contigs. Although this is easily inferred, there is no quantitative illustration of this phenomenon. An example using a currently used database is expected to contribute to the intuitive understanding of how serious the under-representation is. The present study provides the first quantitative example (in the case of 16 copies of virtually identical, 4.7-kb sequences in a genome of 7 × 10 (8) bp) by comparing the results of BLAST searches of a sequence database (contig N50; 9.8 kb) with those of Southern blot analysis of genomic DNA. This has revealed that the internal regions of the repetitive sequences are under-represented to a striking extent.

Journal

  • Genome / National Research Council Canada = Génome / Conseil national de recherches Canada

    Genome / National Research Council Canada = Génome / Conseil national de recherches Canada 55(2), 172-175, 2012-02

    Canadian Science Publishing

Codes

  • NII Article ID (NAID)
    120003988291
  • Text Lang
    ENG
  • Article Type
    journal article
  • ISSN
    0831-2796
  • Data Source
    IR 
Page Top