Protein Fold Recognition with Representation Learning and Long Short-Term Memory

  • Tsubaki Masashi
    Graduate School of Information Science, Nara Institute of Science and Technology
  • Shimbo Masashi
    Graduate School of Information Science, Nara Institute of Science and Technology
  • Matsumoto Yuji
    Graduate School of Information Science, Nara Institute of Science and Technology

抄録

<p>Predicting the 3D structure of a protein from its amino acid sequence is an important challenge in bioinformatics. Since directly predicting the 3D structure is hard to achieve, classifying a protein into one of the “folds”, which are pre-defined structural labels in protein databases such as SCOP and CATH, is generally used as an intermediate step to determine the 3D structure. This classification task is called protein fold recognition (PFR), and much research has addressed the problem of either (i) feature extractions from amino acid sequences or (ii) classification methods of the protein folds. In this paper, we propose a new approach for PFR with (i) learning feature representations with unsupervised methods from a large protein database instead of manual feature selection and using external tools. (ii) learning deep neural architectures, recurrent neural networks (RNNs) with long short-term memory (LSTM) units, and re-training the representations instead of fixing the extracted features. On a benchmark dataset, our approach outperforms existing methods that use various physicochemical features.</p>

収録刊行物

参考文献 (22)*注記

もっと見る

関連プロジェクト

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ