Contributions of temporal cue on the perception of speaker individuality and vocal emotion for noise-vocoded speech

この論文をさがす

抄録

This paper investigates the importance of temporal cues in the perception of speaker individuality and vocal emotion. Experiments of speaker and vocal-emotion recognition were carried out using an analysis/synthesis method of noise-vocoded speech (NVS). The temporal resolution of NVS was controlled by varying the upper limit of modulation frequency (0, 0.5, 1, 2, 4, 8, 16, 32, and 64 Hz). In addition, the role of temporal cue in the different spectral resolution condition was also investigated by varying the number of channels (4, 8, and 16). The results demonstrated that temporal resolution contributes to the recognition of both speaker and vocal emotion. Therefore, temporal cues are found to be important for the perception of not only linguistic information but also speaker individuality and vocal emotion. On the other hand, the performance of speaker recognition was less sensitive to the spectral resolution, at least in the limited set of stimuli in the present study. For vocal-emotion recognition, the spectral resolution was shown to be important for recognizing only neutral, joy, and cold anger, but not sadness or hot anger. The important modulation frequency band for the perception of nonlinguistic information was suggested to be higher than that of linguistic information.

収録刊行物

被引用文献 (7)*注記

もっと見る

参考文献 (27)*注記

もっと見る

関連プロジェクト

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ