A Method for Isoform Prediction from RNA-Seq Data by Iterative Mapping

DOI
  • Ohno Tomoshige
    Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University
  • Seno Shigeto
    Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University
  • Takenaka Yoichi
    Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University
  • Matsuda Hideo
    Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University

抄録

Alternative splicing plays an important role in eukaryotic gene expression by producing diverse proteins from a single gene. Predicting how genes are transcribed is of great biological interest. To this end, massively parallel whole transcriptome sequencing, often referred to as RNA-Seq, is becoming widely used and is revolutionizing the cataloging isoforms using a vast number of short mRNA fragments called reads. Conventional RNA-Seq analysis methods typically align reads onto a reference genome (mapping) in order to capture the form of isoforms that each gene yields and how much of every isoform is expressed from an RNA-Seq dataset. However, a considerable number of reads cannot be mapped uniquely. Those so-called multireads that are mapped onto multiple locations due to short read length and analogous sequences inflate the uncertainty as to how genes are transcribed. This causes inaccurate gene expression estimations and leads to incorrect isoform prediction. To cope with this problem, we propose a method for isoform prediction by iterative mapping. The positions from which multireads originate can be estimated based on the information of expression levels, whereas quantification of isoform-level expression requires accurate mapping. These procedures are mutually dependent, and therefore remapping reads is essential. By iterating this cycle, our method estimates gene expression levels more precisely and hence improves predictions of alternative splicing. Our method simultaneously estimates isoform-level expressions by computing how many reads originate from each candidate isoform using an EM algorithm within a gene. To validate the effectiveness of the proposed method, we compared its performance with conventional methods using an RNA-Seq dataset derived from a human brain. The proposed method had a precision of 66.7% and outperformed conventional methods in terms of the isoform detection rate.

収録刊行物

詳細情報 詳細情報について

  • CRID
    1390282680241543296
  • NII論文ID
    130002073619
  • DOI
    10.11185/imt.7.921
  • ISSN
    18810896
  • 本文言語コード
    en
  • データソース種別
    • JaLC
    • CiNii Articles
  • 抄録ライセンスフラグ
    使用不可

問題の指摘

ページトップへ