Analysis of Conventional Dropout and its Application to Group Dropout

この論文をさがす

抄録

Deep learning is a state-of-the-art learning method that is used in fields such as visual object recognition and speech recognition. It uses very deep layers and a huge number of units and connections, so overfitting is a serious problem. The dropout method is used to address this problem. Dropout is a regularizer that neglects randomly selected inputs and hidden units during the learning process with probability q; after learning, the neglected inputs and hidden units are combined with the learned network to express the final output. Wager et al. pointed out that conventional dropout is an adaptive L2 regularizer, so we compared the learning behavior of conventional dropout with that of stochastic gradient descent with the L2 regularizer. We found that combining the neglected hidden units with the learned network can be regarded as ensemble learning, so we analyzed, on the basis of on-line learning, conventional dropout learning from the viewpoint of ensemble learning. Next we compared conventional dropout and ensemble learning from two additional viewpoints and confirmed that conventional dropout can be regarded as ensemble learning that divides a student network into two sub-networks. On the basis of this finding, we developed a novel dropout method that divides the network into more than two sub-networks. Computer simulation demonstrated that this method enhances the benefit of ensemble learning.

Deep learning is a state-of-the-art learning method that is used in fields such as visual object recognition and speech recognition. It uses very deep layers and a huge number of units and connections, so overfitting is a serious problem. The dropout method is used to address this problem. Dropout is a regularizer that neglects randomly selected inputs and hidden units during the learning process with probability q; after learning, the neglected inputs and hidden units are combined with the learned network to express the final output. Wager et al. pointed out that conventional dropout is an adaptive L2 regularizer, so we compared the learning behavior of conventional dropout with that of stochastic gradient descent with the L2 regularizer. We found that combining the neglected hidden units with the learned network can be regarded as ensemble learning, so we analyzed, on the basis of on-line learning, conventional dropout learning from the viewpoint of ensemble learning. Next we compared conventional dropout and ensemble learning from two additional viewpoints and confirmed that conventional dropout can be regarded as ensemble learning that divides a student network into two sub-networks. On the basis of this finding, we developed a novel dropout method that divides the network into more than two sub-networks. Computer simulation demonstrated that this method enhances the benefit of ensemble learning.

収録刊行物

詳細情報 詳細情報について

  • CRID
    1050001337908425728
  • NII論文ID
    170000148786
  • NII書誌ID
    AA11464803
  • ISSN
    18827780
  • Web Site
    http://id.nii.ac.jp/1001/00182644/
  • 本文言語コード
    en
  • 資料種別
    article
  • データソース種別
    • IRDB
    • CiNii Articles

問題の指摘

ページトップへ