音声ストリーム分離法の提案と複数音声の同時認識の予備実験

奥乃, 博, 中谷, 智広, 川端, 豪

本稿では，一般環境下での音声認識のための前処理として音響ストリーム分離を使用するうえでの問題点について検討する．本稿の前半では，音声ストリーム分離の方法を提案する．提案する方法は，調波構造ストリーム断片の抽出とそのグルーピング，および，入力音からすべての調波構造を除いた残差での非調波構造の補完から構成される．本稿の後半では，分離した音声ストリームを離散型単一コードブック型HMM?LRで認識するうえでの問題点を解明し，その解決策を提示する．提案する音声ストリーム分離方法で方向情報抽出のために用いたバイノーラル入力がスペクトル変形を引き起こし，音声認識に影響を与えることが判明した．この対策として，4方向で頭部音響伝達関数をかけた学習データでHMM?LRのパラメータを再学習する方法を提案した．2人の話者の500組の子音を含んだ発話（SN比0??3dB）の音声認識実験を5種類行い，音声ストリーム分離により上位10候補累積認識率に対する混合音による認識誤りを最大77％削減することができた．

This paper reports the preliminary results of experiments on listening to several sounds at once.Two issues are addressed:segregating speech streams from a mixture of sounds,and interfacing speech stream segregation with automatic speech recognition(ASR).Speech stream segregation(SSS) is designed as three processes:extracting harmonic fragments;grouping these extracted harmonic fragments according to their directions;and substituting the non-harmonic residue of harmonic fragments for non-harmonic parts of each group.The main problem in interfacing SSS with HMM-based ASR is how to reduce the recognition errors caused by spectral distortion of segregated sounds mainly due to binaural input.Our solution is to re-train the parameters of the HMM with training data binauralized for four directions.Experiments with five sets of 500 mixtures of two women's/men's utterances of a word(SNR is 0dB to -3dB)showed that the error of up to the 10th candidate of word recognition was reduced up to 77% by speech stream segregation.

音声ストリーム分離法の提案と複数音声の同時認識の予備実験

書誌事項

この論文をさがす

抄録

収録刊行物

被引用文献 (6)*注記

参考文献 (36)*注記

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

音声ストリーム分離法の提案と複数音声の同時認識の予備実験

書誌事項

この論文をさがす

抄録

収録刊行物

被引用文献 (6)*注記

参考文献 (36)*注記

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について