非同期音声収録を用いた遠隔発話音声認識 (音声 音学シンポジウム2014)  [in Japanese] Distant-talking Speech Recognition with Asynchronous Speech Recording  [in Japanese]

Search this Article

Author(s)

Abstract

携帯端末を使用したアプリケーションへの注目は高まっているが,複数の携帯端末を用いた非同期音声収録による遠隔発話音声認識に着目した研究は少ない.本研究では,非同期音声収録を前提とした遠隔発話環境に頑健な音声認識システムを提案する.本研究で提案するシステムは,まず残響抑圧のためにケプストラム領域でdenoising autoencoder(DAE)を適用し大語彙連続音声認識(LVCSR)を行う.その後,音声セグメント単位での収録マイクチャンネル(携帯端末)の自動選択と環境適応を行うことで実現する.提案手法は,WSJCAMOコーパスからの発話を複数のスピーカーから発し,それを遠方に設置された複数の携帯端末で録音することで疑似的な会議音声を作成し評価する.ケプストラム領域でのDAEと自動的な携帯端末選択,環境適応を統合することで,単語誤り率(WER)はベースラインである51.8%から28.8%まで削減,すなわち44.4%の相対誤り削減率を達成した.

Although applications using mobile terminals have attracted increasing attention, there are few studies that focus on distant-talking speech recognition with asynchronous recording using several mobile terminals. In this paper, we propose a robust distant-talking speech recognition system with asynchronous speech recording. For the system proposed in this paper, after applying a denoising autoencoder in the cepstral-domain of speech to suppress reverberation and performing Large Vocabulary Continuous Speech Recognition (LVCSR), we adopted automatic asynchronous mobile terminal selection and environment adaptation using speech segments from optimal mobile terminals. The proposed method was evaluated using a reverberant WSJCAMO corpus, which was emitted by a loudspeaker and recorded in a meeting room with multiple speakers by far-field multiple mobile terminals. By integrating a cepstral-domain denoising autoencoder and automatic mobile terminal selection with environment adaptation, the average Word Error Rate (WER) was reduced from 51.8% of the baseline system to 28.8%, i.e., the relative error reduction rate was 44.4% when using multi-condition acoustic models.

Journal

  • IEICE technical report. Speech

    IEICE technical report. Speech 114(52), 153-157, 2014-05-24

    The Institute of Electronics, Information and Communication Engineers

Codes

  • NII Article ID (NAID)
    110009903128
  • NII NACSIS-CAT ID (NCID)
    AN10013221
  • Text Lang
    JPN
  • ISSN
    0913-5685
  • NDL Article ID
    025512913
  • NDL Call No.
    Z16-940
  • Data Source
    NDL  NII-ELS 
Page Top