Distant Speech Recognition Using a Microphone Array Network

この論文をさがす

抄録

In this work, spatial information consisting of the position and orientation angle of an acoustic source is estimated by an artificial neural network (ANN). The estimated position of a speaker in an enclosed space is used to refine the estimated time delays for a delay-and-sum beamformer, thus enhancing the output signal. On the other hand, the orientation angle is used to restrict the lexicon used in the recognition phase, assuming that the speaker faces a particular direction while speaking. To compensate the effect of the transmission channel inside a short frame analysis window, a new cepstral mean normalization (CMN) method based on a Gaussian mixture model (GMM) is investigated and shows better performance than the conventional CMN for short utterances. The performance of the proposed method is evaluated through Japanese digit/command recognition experiments.

収録刊行物

参考文献 (23)*注記

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ