Spoken language processing

書誌事項

Spoken language processing

edited by Joseph Mariani

ISTE Ltd , John Wiley & Sons, 2009

タイトル別名

Traitement automatique du langage parlé 1 et 2

大学図書館所蔵 件 / 1

この図書・雑誌をさがす

注記

Includes bibliographical references and index

"First published in France in 2002 by Hermes Science/Lavoisier entitled Traitement automatique du language parlé 1 et 2" -- T.p. verso

内容説明・目次

内容説明

Speech processing addresses various scientific and technological areas. It includes speech analysis and variable rate coding, in order to store or transmit speech. It also covers speech synthesis, especially from text, speech recognition, including speaker and language identification, and spoken language understanding. This book covers the following topics: how to realize speech production and perception systems, how to synthesize and understand speech using state-of-the-art methods in signal processing, pattern recognition, stochastic modelling computational linguistics and human factor studies.

目次

Preface xiii Chapter 1. Speech Analysis 1 Christophe D'ALESSANDRO 1.1. Introduction 1 1.1.1. Source-filter model 1 1.1.2. Speech sounds 2 1.1.3. Sources 6 1.1.4. Vocal tract 12 1.1.5. Lip-radiation 18 1.2. Linear prediction 18 1.2.1. Source-filter model and linear prediction 18 1.2.2. Autocorrelation method: algorithm 21 1.2.3. Lattice filter 28 1.2.4. Models of the excitation 31 1.3. Short-term Fourier transform 35 1.3.1. Spectrogram 35 1.3.2. Interpretation in terms of filter bank 36 1.3.3. Block-wise interpretation 37 1.3.4. Modification and reconstruction 38 1.4. A few other representations 39 1.4.1. Bilinear time-frequency representations 39 1.4.2. Wavelets 41 1.4.3. Cepstrum 43 1.4.4. Sinusoidal and harmonic representations 46 1.5. Conclusion 49 1.6. References 50 Chapter 2. Principles of Speech Coding 55 Gang FENG and Laurent GIRIN 2.1. Introduction 55 2.1.1. Main characteristics of a speech coder 57 2.1.2. Key components of a speech coder 59 2.2. Telephone-bandwidth speech coders 63 2.2.1. From predictive coding to CELP 65 2.2.2. Improved CELP coders 69 2.2.3. Other coders for telephone speech 77 2.3. Wideband speech coding 79 2.3.1. Transform coding 81 2.3.2. Predictive transform coding 85 2.4. Audiovisual speech coding 86 2.4.1. A transmission channel for audiovisual speech 86 2.4.2. Joint coding of audio and video parameters 88 2.4.3. Prospects 93 2.5. References 93 Chapter 3. Speech Synthesis 99 Olivier BOEFFARD and Christophe D'ALESSANDRO 3.1. Introduction 99 3.2. Key goal: speaking for communicating 100 3.2.1. What acoustic content? 101 3.2.2. What melody? 102 3.2.3. Beyond the strict minimum 103 3.3 Synoptic presentation of the elementary modules in speech synthesis systems 104 3.3.1. Linguistic processing 105 3.3.2. Acoustic processing 105 3.3.3. Training models automatically 106 3.3.4. Operational constraints 107 3.4. Description of linguistic processing 107 3.4.1. Text pre-processing 107 3.4.2. Grapheme-to-phoneme conversion 108 3.4.3. Syntactic-prosodic analysis 110 3.4.4. Prosodic analysis 112 3.5. Acoustic processing methodology 114 3.5.1. Rule-based synthesis 114 3.5.2. Unit-based concatenative synthesis 115 3.6. Speech signal modeling 117 3.6.1. The source-filter assumption 118 3.6.2. Articulatory model 119 3.6.3. Formant-based modeling 119 3.6.4. Auto-regressive modeling 120 3.6.5. Harmonic plus noise model 120 3.7. Control of prosodic parameters: the PSOLA technique 122 3.7.1. Methodology background 124 3.7.2. The ancestors of the method 125 3.7.3. Descendants of the method 128 3.7.4. Evaluation 131 3.8. Towards variable-size acoustic units 131 3.8.1. Constitution of the acoustic database 134 3.8.2. Selection of sequences of units 138 3.9. Applications and standardization 142 3.10. Evaluation of speech synthesis 144 3.10.1. Introduction 144 3.10.2. Global evaluation 146 3.10.3. Analytical evaluation 151 3.10.4. Summary for speech synthesis evaluation 153 3.11. Conclusions 154 3.12. References 154 Chapter 4. Facial Animation for Visual Speech 169 Thierry GUIARD-MARIGNY 4.1. Introduction 169 4.2. Applications of facial animation for visual speech 170 4.2.1. Animation movies 170 4.2.2. Telecommunications 170 4.2.3. Human-machine interfaces 170 4.2.4. A tool for speech research 171 4.3. Speech as a bimodal process 171 4.3.1. The intelligibility of visible speech 172 4.3.2. Visemes for facial animation 174 4.3.3. Synchronization issues 175 4.3.4. Source consistency 176 4.3.5. Key constraints for the synthesis of visual speech 177 4.4. Synthesis of visual speech 178 4.4.1. The structure of an artificial talking head 178 4.4.2. Generating expressions 178 4.5. Animation 180 4.5.1. Analysis of the image of a face 180 4.5.2. The puppeteer 181 4.5.3. Automatic analysis of the speech signal 181 4.5.4. From the text to the phonetic string 181 4.6. Conclusion 182 4.7. References 182 Chapter 5. Computational Auditory Scene Analysis 189 Alain DE CHEVEIGNE 5.1. Introduction 189 5.2. Principles of auditory scene analysis 191 5.2.1. Fusion versus segregation: choosing a representation 191 5.2.2. Features for simultaneous fusion 191 5.2.3. Features for sequential fusion 192 5.2.4. Schemes 193 5.2.5. Illusion of continuity, phonemic restoration 193 5.3. CASA principles 193 5.3.1. Design of a representation 193 5.4. Critique of the CASA approach 200 5.4.1. Limitations of ASA 201 5.4.2. The conceptual limits of "separable representation" 202 5.4.3. Neither a model, nor a method? 203 5.5. Perspectives 203 5.5.1. Missing feature theory 203 5.5.2. The cancellation principle 204 5.5.3. Multimodal integration 205 5.5.4. Auditory scene synthesis: transparency measure 205 5.6. References 206 Chapter 6. Principles of Speech Recognition 213 Renato DE MORI and Brigitte BIGI 6.1. Problem definition and approaches to the solution 213 6.2. Hidden Markov models for acoustic modeling 216 6.2.1. Definition 216 6.2.2. Observation probability and model parameters 217 6.2.3. HMM as probabilistic automata 218 6.2.4. Forward and backward coefficients 219 6.3. Observation probabilities 222 6.4. Composition of speech unit models 223 6.5. The Viterbi algorithm 226 6.6. Language models 228 6.6.1. Perplexity as an evaluation measure for language models 230 6.6.2. Probability estimation in the language model 232 6.6.3. Maximum likelihood estimation 234 6.6.4. Bayesian estimation 235 6.7. Conclusion 236 6.8. References 237 Chapter 7. Speech Recognition Systems 239 Jean-Luc GAUVAIN and Lori LAMEL 7.1. Introduction 239 7.2. Linguistic model 241 7.3. Lexical representation 244 7.4. Acoustic modeling 247 7.4.1. Feature extraction 247 7.4.2. Acoustic-phonetic models 249 7.4.3. Adaptation techniques 253 7.5. Decoder 256 7.6. Applicative aspects 257 7.6.1. Efficiency: speed and memory 257 7.6.2. Portability: languages and applications 259 7.6.3. Confidence measures 260 7.6.4. Beyond words 261 7.7. Systems 261 7.7.1. Text dictation 262 7.7.2. Audio document indexing 263 7.7.3. Dialog systems 265 7.8. Perspectives 268 7.9. References 270 Chapter 8. Language Identification 279 Martine ADDA-DECKER 8.1. Introduction 279 8.2. Language characteristics 281 8.3. Language identification by humans 286 8.4. Language identification by machines 287 8.4.1. LId tasks 288 8.4.2. Performance measures 288 8.4.3. Evaluation 289 8.5. LId resources 290 8.6. LId formulation 295 8.7. Lid modeling 298 8.7.1. Acoustic front-end 299 8.7.2. Acoustic language-specific modeling 300 8.7.3. Parallel phone recognition 302 8.7.4. Phonotactic modeling 304 8.7.5. Back-end optimization 309 8.8. Discussion 309 8.9. References 311 Chapter 9. Automatic Speaker Recognition 321 Frederic BIMBOT. 9.1. Introduction 321 9.1.1. Voice variability and characterization 321 9.1.2. Speaker recognition 323 9.2. Typology and operation of speaker recognition systems 324 9.2.1. Speaker recognition tasks 324 9.2.2. Operation 325 9.2.3. Text-dependence 326 9.2.4. Types of errors 327 9.2.5. Influencing factors 328 9.3. Fundamentals 329 9.3.1. General structure of speaker recognition systems 329 9.3.2. Acoustic analysis 330 9.3.3. Probabilistic modeling 331 9.3.4. Identification and verification scores 335 9.3.5. Score compensation and decision 337 9.3.6. From theory to practice 342 9.4. Performance evaluation 343 9.4.1. Error rate 343 9.4.2. DET curve and EER 344 9.4.3. Cost function, weighted error rate and HTER 346 9.4.4. Distribution of errors 346 9.4.5. Orders of magnitude 347 9.5. Applications 348 9.5.1. Physical access control 348 9.5.2. Securing remote transactions 349 9.5.3. Audio information indexing 350 9.5.4. Education and entertainment 350 9.5.5. Forensic applications 351 9.5.6. Perspectives 352 9.6. Conclusions 352 9.7. Further reading 353 Chapter 10. Robust Recognition Methods 355 Jean-Paul HATON 10.1. Introduction 355 10.2. Signal pre-processing methods 357 10.2.1. Spectral subtraction 357 10.2.2. Adaptive noise cancellation 358 10.2.3. Space transformation 359 10.2.4. Channel equalization 359 10.2.5. Stochastic models 360 10.3. Robust parameters and distance measures 360 10.3.1. Spectral representations 361 10.3.2. Auditory models 364 10.3.3 Distance measure 365 10.4. Adaptation methods 366 10.4.1 Model composition 366 10.4.2. Statistical adaptation 367 10.5. Compensation of the Lombard effect 368 10.6. Missing data scheme 369 10.7. Conclusion 369 10.8. References 370 Chapter 11. Multimodal Speech: Two or Three senses are Better than One 377 Jean-Luc SCHWARTZ, Pierre ESCUDIER and Pascal TEISSIER 11.1. Introduction 377 11.2. Speech is a multimodal process 379 11.2.1. Seeing without hearing 379 11.2.2. Seeing for hearing better in noise 380 11.2.3. Seeing for better hearing... even in the absence of noise 382 11.2.4. Bimodal integration imposes itself to perception 383 11.2.5. Lip reading as taking part to the ontogenesis of speech 385 11.2.6. ...and to its phylogenesis ? 386 11.3. Architectures for audio-visual fusion in speech perception 388 11.3.1.Three paths for sensory interactions in cognitive psychology 389 11.3.2. Three paths for sensor fusion in information processing 390 11.3.3. The four basic architectures for audiovisual fusion 391 11.3.4. Three questions for a taxonomy 392 11.3.5. Control of the fusion process 394 11.4. Audio-visual speech recognition systems 396 11.4.1. Architectural alternatives 397 11.4.2. Taking into account contextual information 401 11.4.3. Pre-processing 403 11.5. Conclusions 405 11.6. References 406 Chapter 12. Speech and Human-Computer Communication 417 Wolfgang MINKER & Francoise NEEL 12.1. Introduction 417 12.2. Context 418 12.2.1. The development of micro-electronics 419 12.2.2. The expansion of information and communication technologies and increasing interconnection of computer systems 420 12.2.3. The coordination of research efforts and the improvement of automatic speech processing systems 421 12.3. Specificities of speech 424 12.3.1. Advantages of speech as a communication mode 424 12.3.2. Limitations of speech as a communication mode 425 12.3.3. Multidimensional analysis of commercial speech recognition products 427 12.4. Application domains with voice-only interaction 430 12.4.1. Inspection, control and data acquisition 431 12.4.2. Home automation: electronic home assistant 432 12.4.3. Office automation: dictation and speech-to-text systems 432 12.4.4. Training 435 12.4.5. Automatic translation 438 12.5. Application domains with multimodal interaction 439 12.5.1. Interactive terminals 440 12.5.2. Computer-aided graphic design 441 12.5.3. On-board applications 442 12.5.4. Human-human communication facilitation 444 12.5.5. Automatic indexing of audio-visual documents 446 12.6. Conclusions 446 12.7. References 447 Chapter 13. Voice Services in the Telecom Sector 455 Laurent COURTOIS, Patrick BRISARD and Christian GAGNOULET 13.1. Introduction 455 13.2. Automatic speech processing and telecommunications 456 13.3. Speech coding in the telecommunication sector 456 13.4. Voice command in telecom services 457 13.4.1. Advantages and limitations of voice command 457 13.4.2. Major trends 459 13.4.3. Major voice command services 460 13.4.4. Call center automation (operator assistance) 460 13.4.5. Personal voice phonebook 462 13.4.6. Voice personal telephone assistants 463 13.4.7. Other services based on voice command 463 13.5. Speaker verification in telecom services 464 13.6. Text-to-speech synthesis in telecommunication systems 464 13.7. Conclusions 465 13.8. References 466 List of Authors 467 Index 471

「Nielsen BookData」 より

詳細情報

ページトップへ