Spectral Features for Perceptually Natural Phoneme Replacement by Another Speaker's Speech
-
- TAKOU Reiko
- NHK (Japan Broadcasting Corp.) Science and Technology Research Laboratories
-
- SEGI Hiroyuki
- NHK (Japan Broadcasting Corp.) Science and Technology Research Laboratories
-
- TAKAGI Tohru
- NHK Engineering Services, Inc.
-
- SEIYAMA Nobumasa
- NHK (Japan Broadcasting Corp.) Science and Technology Research Laboratories
この論文をさがす
抄録
The frequency regions and spectral features that can be used to measure the perceived similarity and continuity of voice quality are reported here. A perceptual evaluation test was conducted to assess the naturalness of spoken sentences in which either a vowel or a long vowel of the original speaker was replaced by that of another. Correlation analysis between the evaluation score and the spectral feature distance was conducted to select the spectral features that were expected to be effective in measuring the voice quality and to identify the appropriate speech segment of another speaker. The mel-frequency cepstrum coefficient (MFCC) and the spectral center of gravity (COG) in the low-, middle-, and high-frequency regions were selected. A perceptual paired comparison test was carried out to confirm the effectiveness of the spectral features. The results showed that the MFCC was effective for spectra across a wide range of frequency regions, the COG was effective in the low- and high-frequency regions, and the effective spectral features differed among the original speakers.
収録刊行物
-
- IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
-
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E95.A (4), 751-759, 2012
一般社団法人 電子情報通信学会
- Tweet
詳細情報 詳細情報について
-
- CRID
- 1390001206309872896
-
- NII論文ID
- 10030937884
-
- NII書誌ID
- AA10826239
-
- BIBCODE
- 2012IEITF..95..751T
-
- ISSN
- 17451337
- 09168508
-
- 本文言語コード
- en
-
- データソース種別
-
- JaLC
- Crossref
- CiNii Articles
-
- 抄録ライセンスフラグ
- 使用不可