タスク依存音響モデルのための発話レベルでの選択学習法  [in Japanese] Utterance-based Selective Training for Task-Dependent Acoustic Modeling  [in Japanese]

Access this Article

Search this Article

Author(s)

Abstract

高性能音響モデルを構築するために、音声データが大量に必要である。音響モデルの認識性能が対象タスクによるので、タスク別に音響モデルを準備する必要がある。しかし、音声データの収集と書き起こしにおけるコストが膨大であり、任意タスクのために十分の音声データを用意するのが困難である。本稿では、コスト削減を目的にした発話単位の選択学習法を検討する。提案手法は、既存の音声データベースを利用し、対象タスク用の開発データに対する尤度が上昇するように、学習発話を選択する。十分統計量を用いることで、尤度計算は高速に可能である。評価実験において、小学生の音声データで幼児モデル、大人の音声データで高齢者モデルを構築する選択学習を適用した。選択学習は10発話程度の開発データの場合にも有効であった。又、選別した発話で再学習した音響モデルの認識性能は、開発データに基づくMAPとMLLR適応で得られたモデルより優位であった。Large amounts of speech data are necessary to construct high performance acoustic models. Since speech recognition performance is task-dependent and the effort and costs for speech data collection and transcription are very high, it is infeasable to prepare enough data for every new application which makes use of speech recognition technology. In this paper an algorithm for utterance-based selective training is proposed, which enables the automatic and cost-effective construction of task-dependent acoustic models. Training utterances are selected from existing speech data resources so that the likelihood of an independent development data set is maximized. Fast calculation of the likelihood is possible with sufficient statistics. The algorithm is evaluated for constructing an infant-dependent model with speech from elementary school children and an elderly-dependent model with adult speech data. Selective training is already effective with only ten development utterances. Furthermore, a higher word accuracy than with the standard adaptation methods MAP and MLLR was achieved.

Large amounts of speech data are necessary to construct high performance acoustic models. Since speech recognition performance is task-dependent and the effort and costs for speech data collection and transcription are very high, it is infeasable to prepare enough data for every new application which makes use of speech recognition technology. In this paper an algorithm for utterance-based selective training is proposed, which enables the automatic and cost-effective construction of task-dependent acoustic models. Training utterances are selected from existing speech data resources so that the likelihood of an independent development data set is maximized. Fast calculation of the likelihood is possible with sufficient statistics. The algorithm is evaluated for constructing an infant-dependent model with speech from elementary school children and an elderly-dependent model with adult speech data. Selective training is already effective with only ten development utterances. Furthermore, a higher word accuracy than with the standard adaptation methods MAP and MLLR was achieved.

Journal

  • IPSJ SIG Notes

    IPSJ SIG Notes 2005(127(2005-SLP-059)), 235-240, 2005-12-22

    Information Processing Society of Japan (IPSJ)

References:  14

Codes

  • NII Article ID (NAID)
    110003494762
  • NII NACSIS-CAT ID (NCID)
    AN10442647
  • Text Lang
    JPN
  • Article Type
    Technical Report
  • ISSN
    09196072
  • NDL Article ID
    7768396
  • NDL Source Classification
    ZM13(科学技術--科学技術一般--データ処理・計算機)
  • NDL Call No.
    Z14-1121
  • Data Source
    CJP  NDL  NII-ELS  IPSJ 
Page Top