補助音声特徴量によるDNN適応を用いた音声区間検出

太刀岡, 勇気

音声区間検出は，騒音環境下で音声認識を行う際には必須の前処理である．音声区間検出を行う際には，パワーに基づく方法がよく使われる．しかしながら，この方法は高騒音下において性能の低下が著しいため，近年ではスペクトルの形状を考慮するような方法が提案されている．とりわけ深層神経回路網（deep neural network; DNN）に基づく方法が性能が高いことが知られている．音声認識や音声強調の分野では，DNNを対象の環境に適応させて性能を向上させるために，補助特徴量が使われる．DNNに基づく音声区間検出の性能をさらに向上させるため，本論文では2つの音声のモデル化に基づく特徴量とそれらの結合を提案する．第1は非負値行列因子分解のアクティベーション，第2は音声認識の音響モデルの音響スコアを使うものである．騒音下音声区間検出の実験により，DNNに基づく手法は従来の方法を性能を上回り，2つの補助特徴量は，フレーム別の音声区間検出精度，音声認識の単語正解精度の両観点から有効であることが分かった．

Voice activity detection (VAD) is an essential pre-process for automatic speech recognition (ASR) in noisy environments. Power-based methods are widely used; however, because these methods are susceptible to noise, recently, methods that consider the shape of spectrum have been proposed. In particular, deep neural network (DNN) based methods have outperformed previous methods. In the fields of ASR and speech enhancement, to improve their performance by adapting DNNs to a target environment, auxiliary features are used. To improve the performance of DNN-based VAD further, this paper proposes two types of auxiliary features based on speech modelings and their combination. The first is activation of non-negative matrix factorization and the second is acoustic score of ASR acoustic models. Experimental results for noisy VAD tasks demonstrated that DNN-based methods outperformed one of the most effective conventional methods and that both auxiliary features improved performance in terms of both frame-level VAD accuracy and ASR word accuracy.

補助音声特徴量によるDNN適応を用いた音声区間検出

書誌事項

この論文をさがす

抄録

収録刊行物

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

補助音声特徴量によるDNN適応を用いた音声区間検出

書誌事項

この論文をさがす

抄録

収録刊行物

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について