A Study on Restoration of Bone-Conducted Speech in Noisy Environments with LP-based Model and Gaussian Mixture Model

  • Nghia Trung Phung
    School of Information Science, Japan Advanced Institute of Science and Technology
  • Unoki Masashi
    School of Information Science, Japan Advanced Institute of Science and Technology
  • Akagi Masato
    School of Information Science, Japan Advanced Institute of Science and Technology

この論文をさがす

抄録

The restoration of bone-conducted speech is a very important issue that enables robust speech communication in extremely noisy environments. We proposed a method of blind restoration in our previous studies based on a scheme of linear prediction with a method of training and prediction based on the simple recurrent neural network. However, prediction based on neural networks is not suitable for training with large corpora, which is necessary for real applications. The over-training problem with simple recurrent neural networks makes it difficult to train various kinds of bone-conducted speech in one session. In addition, it is difficult to adapt the neural network model to bone-conducted speech in unknown noisy environments to build an open dataset restoration of bone-conducted speech. Thus, a method of training and prediction based on the Gaussian mixture model was used in this research, instead of a neural network. A method of re-estimating the residual ratio in the scheme of linear prediction is also proposed. We also investigated how the proposed method works to restore bone-conducted speech in extremely noisy environments. Objective and subjective evaluations were carried out to evaluate the improvements in sound quality and the intelligibility of restored speech. The results revealed that our proposed method outperformed previous methods in both human hearing and automatic speech recognition systems even in extremely noisy environments.

収録刊行物

  • 信号処理

    信号処理 16 (5), 409-417, 2012

    信号処理学会

被引用文献 (4)*注記

もっと見る

参考文献 (3)*注記

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ