Noisy Speech Recognition by Multi-band Modeling Based on Feature-level Combination


OKAWA SHIGEKI:Department of Information and Network Science, Chiba Institute of Technology
SHIRAI KATSUHIKO:Department of Information and Computer Science, Waseda University

This paper presents a new approach for sub-band recombination in the framework of multiband ASR. Recent works suggest that multi-band ASR, which is based on independent processing and recombination of partial frequency bands of input speech, gives more accurate recognition, especially in noisy acoustic environments. In the case, we need to discuss (i) how to recombine the sub-band output, and (ii) how to split the input speech frequencies. We propose and evaluate "feature combination" (FC) approach, as a solution of the above point (i), instead of "likelihood combination" (LC) approach proposed by Bourlard et al. Also for the point (ii), we introduce the mutual information between sub-band features and target phoneme categories to find the optimal splitting frequencies. The experimental results show that the FC-based system can yield better performance both the conventional ASR and the LC-based system for band-limited noisy speech. Also, we could obtain a favorable band-splitting strategy by using the optimization method.

Transactions of Information Processing Society of Japan
43(7) pp.2046-2054 2002-07-15

 Mail This Article Title & Link
 (docomo/au  SoftBank)
 Go to the page for PC

2 ŁGo to the top of page