MSD-HMMに基づく音声のスタイル識別  [in Japanese] Style classification of speech based on MSD-HMM  [in Japanese]

Access this Article

Search this Article

Author(s)

    • 川島 啓吾 KAWASHIMA Keigo
    • 東京工業大学 大学院総合理工学研究科 物理情報システム専攻 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
    • 橘 誠 TACHIBANA Makoto
    • 東京工業大学 大学院総合理工学研究科 物理情報システム専攻 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
    • 山岸順一 [他] YAMAGISHI Junichi
    • 東京工業大学 大学院総合理工学研究科 物理情報システム専攻 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
    • 小林 隆夫 KOBAYASHI Takao
    • 東京工業大学 大学院総合理工学研究科 物理情報システム専攻 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology

Abstract

本論文では,多空間上の確率分布(MSD)に基づくHMMを用いた音声の感情・発話様式の識別について検討している.MSD-HMMにより音声のスペクトル情報と基本周波数(F0)の同時モデル化を行い,複数の話者の平静調音声で学習されたユニバーサルバックグラウンドモデル(UBM)を目標話者・スタイルの少量の文章によりモデル適応し,話者及びスタイルの同時適応を行ったモデルを用いて識別を行っている.まずMSD-MMを用いて特徴量にF0を含めることで識別率が改善することを示し,次に,適応時の初期モデルとしてUBMを用いる場合と,目標話者の読上げ音声から作成した話者依存モデルを使用する場合の比較を行い,UBMを用いて話者とスタイルの同時適応を行った場合においても,話者依存モデルと同等の性能が得られることを示す.最後に,ナレーション経験のない話者の音声を用いた評価実験を行った結果を示す.This paper describes a classification technique of emotional expressions and speaking styles of speech based on multi-space probability distribution HMM (MSD-HMM). By using MSD-HMM, we model spectral and fundamental frequency (FO) features simultaneously. A universal background model. (UBM) is trained by using neutral style speech data of multiple speakers and then adapted to the target speaker and style using a small amount of speech data. In this study, first, we investigate the effect of the use of FO and show that including FO in the feature vector improves the classification rate. Then, we compare the performance of speaker and style adapted UBM with that of speaker dependent model trained by target speaker's neutral style data and show that classification result of the adapted UBM are close to that of speaker dependent model. We also perform classification experiments using recorded speech by unprofessional speakers.

This paper describes a classification technique of emotional expressions and speaking styles of speech based on multi-space probability distribution HMM (MSD-HMM). By using MSD-HMM, we model spectral and fundamental frequency (F0) features simultaneously. A universal background model (UBM) is trained by using neutral style speech data of multiple speakers and then adapted to the target speaker and style using a small amount of speech data. In this study, first, we investigate the effect of the use of F0 and show that including F0 in the feature vector improves the classification rate. Then, we compare the performance of speaker and style adapted UBM with that of speaker dependent model trained by target speaker's neutral style data and show that classification result of the adapted UBM are close to that of speaker dependent model. We also perform classification experiments using recorded speech by unprofessional speakers.

Journal

  • IPSJ SIG Notes

    IPSJ SIG Notes 2005(127(2005-SLP-059)), 241-246, 2005-12-22

    Information Processing Society of Japan (IPSJ)

References:  10

Codes

  • NII Article ID (NAID)
    110003494763
  • NII NACSIS-CAT ID (NCID)
    AN10442647
  • Text Lang
    JPN
  • Article Type
    Technical Report
  • ISSN
    09196072
  • NDL Article ID
    7768404
  • NDL Source Classification
    ZM13(科学技術--科学技術一般--データ処理・計算機)
  • NDL Call No.
    Z14-1121
  • Data Source
    CJP  NDL  NII-ELS  IPSJ 
Page Top