音源分離との統合によるミッシングフィーチャマスク自動生成に基づく同時発話音声認識

山本 俊一, Valin Jean-Marc, 中臺 一博, 中野 幹生, 辻野 広司, 駒谷 和範, 尾形 哲也, 奥乃 博

doi:10.7210/jrsj.25.92

書誌事項

タイトル別名

Simultaneous Speech Recognition Based on Automatic Missing Feature Mask Generation by Integrating Sound Source Separation
オンゲンブンリトノトウゴウニヨルミッシングフィーチャマスクジドウセイセイニモトヅクドウジハツワオンセイニンシキ

この論文をさがす

抄録

Our goal is to realize a humanoid robot that has the capabilities of recognizing simultaneous speech. A humanoid robot under real-world environments usually hears a mixture of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. In particular, an interface between sound source separation and speech recognition is important. In this paper, we designed an interface between sound source separation and speech recogniton by applying Missing Feature Theory (MFT) . In this method, spectral sub-bands distorted by sound source separation are detected from input speech as missing features. The detected missing features are masked on recognition not to affect the system badly. Therefore, this method is more flexible when noises change dynamically and drastically. It is the most important issue how distorted spectral sub-bands are detected. To solve the issue, we used speech feature apropriate for MFT-based ASR, and developed automatic missing feature mask generation. As a speech feature, we used a Mel-Scale Log Spectral (MSLS) feature instead of Mel-Frequency Cepstrum Coefficient (MFCC) which is commonly used for ASR. We presented a method of generating missing feature mask automatically by using information from sound source separation. To evaluate our method, we implemented it in a humanoid robotSIG2, and performed the experiments on recognition of three simultaneous isolated words. As a result, our method outperformed conventional ASR with MSLS feature.

収録刊行物

日本ロボット学会誌

日本ロボット学会誌 25 (1), 92-102, 2007

一般社団法人日本ロボット学会

キーワード

詳細情報詳細情報について

CRID: 1390282679703802752

NII論文ID: 10018695563

NII書誌ID: AN00141189

DOI: 10.7210/jrsj.25.92

ISSN: 18847145; 02891824

NDL書誌ID: 8635901

Web Site: https://ndlsearch.ndl.go.jp/books/R000000004-I8635901

本文言語コード: ja

データソース種別

JaLC
NDL
Crossref
CiNii Articles
KAKEN

抄録ライセンスフラグ: 使用不可

音源分離との統合によるミッシングフィーチャマスク自動生成に基づく同時発話音声認識

書誌事項

この論文をさがす

抄録

収録刊行物

被引用文献 (3)*注記

参考文献 (22)*注記

関連プロジェクト

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

音源分離との統合によるミッシングフィーチャマスク自動生成に基づく同時発話音声認識

書誌事項

この論文をさがす

抄録

収録刊行物

被引用文献 (3)*注記

参考文献 (22)*注記

関連プロジェクト

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について