Computational models of speech pattern processing

著者

    • NATO Advanced Study Institute on Computational Models of Speech Pattern Processing

書誌事項

Computational models of speech pattern processing

edited by Keith Ponting

(NATO ASI series, Series F . Computer and systems sciences ; no. 169)

Springer, 1999

大学図書館所蔵 件 / 29

この図書・雑誌をさがす

注記

"Proceedings of the NATO Advanced Study Institute on Computational Models of Speech Pattern Processing, held in St. Helier, Jersey, U.K., July 7-18, 1997."--T.p. verso

Includes bibliographical references and index

内容説明・目次

内容説明

This high-level collection of invited tutorial papers and contributed papers is based on a NATO workshop held in 1997. It surveys and discusses the latest techniques in the field of speech science and technology with a view to working toward a unifying theory of speech pattern processing. The tutorials presenting significant leading-edge research are a valuable resource for researchers and others wishing to extend their knowledge of the field. Most of the papers are sorted into two groups, approaching respectively from the acoustic and the linguistic perspectives. The acoustic papers include reviews of work on human perception, the state of the art in very-large-vocabulary recognition, connectionist and hybrid models, robust approaches, and speaker characteristics. The linguistic papers include work on psycholinguistics, language modeling and adaptation, the use of natural language knowledge sources, multilingual systems, and systems using speech technology.

目次

Speech Pattern Processing.- 1. The State-of-the-Art in Speech.- 2. Speech Patterning.- 3. Speech Pattern Processing.- 4. Whither a Unified Theory?.- 4.1 Towards a Theory.- 4.2 Practical Issues.- 5. What We Know.- 6. Some Things We Don't Know.- 7. The Way Forward.- References.- Psycho-acoustics and Speech Perception.- 1. Introduction.- 2. Psycho-acoustics.- 3. Speech Perception.- 3.1 Vowel Reduction and Schwa.- 3.2 Spectro-temporal Dynamics of Formant Transitions.- 3.3 Consonant Reduction.- 4. Discussion.- References.- Acoustic Modelling for Large Vocabulary Continuous Speech Recognition.- 1. Introduction.- 2. Overview of LVCSR Architecture.- 3. Front End Processing.- 4. Basic Phone Modelling.- 4.1 HMM Phone Models.- 4.2 HMM Parameter Estimation.- 4.3 Context-Dependent Phone Models.- 5. Adaptation for LVCSR.- 5.1 Maximum Likelihood Linear Regression.- 5.2 Estimating the MLLR Transforms.- 6. Progress in LVCSR.- 7. Discriminative Training for LVCSR.- 8. Conclusions.- References.- Tree-based Dependence Models for Speech Recognition.- 1. Introduction.- 2. Hidden Tree Framework.- 3. Hidden Dependence Trees.- 3.1 The Mathematical Framework.- 3.2 Application to Speech.- 3.3 Topology Design and Parameter Estimation.- 3.4 Experiments.- 4. Multiscale Tree Processes.- 4.1 The Mathematical Framework.- 4.2 Application to Speech.- 4.3 Topology Design and Parameter Estimation.- 4.4 Experiments.- 5. Discussion.- References.- Connectionist and Hybrid Models for Automatic Speech Recognition.- 1. Introduction.- 2. A Brief Overview of Neural Networks.- 2.1 Basic Principles.- 2.2 Main Models for ASR.- 3. Signal Processing and Feature Extraction using ANNs.- 4. Neural Networks as Static Pattern Classifiers.- 4.1 Speech Pattern Classification with Perceptrons.- 4.2 Feature Maps.- 5. Dynamic Aspects.- 5.1 Position of the Problem.- 5.2 Time Delays.- 5.3 Dynamic Classifiers.- 5.4 Recurrent NNs.- 6. Hybrid Models.- 6.1 Position of the Problem.- 6.2 Proposed Solutions.- 7. Conclusion.- References.- Computational Models for Auditory Speech Processing.- 1. Introduction.- 2. A nonlinear computational model for basilar membrane wave motions.- 3. Frequency-domain and time-domain computational solutions to the BM model.- 4. Interval analysis of auditory model's outputs for temporal information extraction.- 5. IPIH representation of clean and noisy speech sounds.- 6. Speech recognition experiments.- 7. Summary and discussions.- References.- Speaker Adaptation of CDHMMs Using Bayesian Learning.- 1. Introduction.- 2. Bayesian Estimation of CDHMMs.- 2.1 Prior Density Definition.- 2.2 Forgetting Mechanism.- 2.3 Prior Parameter Estimation and MAP Solution.- 3. Acoustic Normalization.- 4. Tasks, Corpus and System.- 5. Speaker Adaptation Experiments.- 6. Conclusions.- References.- Discriminative Improvement of the Representation Space for Continuous Speech Recognition.- 1. Introduction.- 2. Discriminative Feature Extraction.- 3. SGDFE Algorithm for CSR.- 4. Experimental Results.- 5. Conclusions.- References.- Dealing with Loss of Synchronism in Multi-Band Continuous Speech Recognition Systems.- 1. Introduction.- 2. Forcing Synchronism Between the Bands.- 2.1 First Approach.- 2.2 Experiments.- 3. Modeling Loss of Synchronism.- 3.1 Theoretical Approach.- 3.2 Experimental Approach.- 4. Conclusion.- References.- K-Nearest Neighbours Estimator in a HMM-Based Recognition System.- 1. Introduction.- 2. K-NN Assessment.- 3. K-NN estimator in HMM.- 3.1 Adaptation Principle.- 3.2 HMM Estimation Improvement.- 4. Evaluations.- 4.1 Recognition rates.- 4.2 SNALC Evaluation.- 5. Perspectives.- References.- Robust Speech Recognition.- 1. Mismatches between Training and Testing.- 1.1 Speech Variation.- 1.2 Inter-Speaker Variation.- 2. Reducing Mismatches to Improve Speech Recognition.- 2.1 Principles of Adaptive Speech Recognition.- 2.2 Three Principal Adaptation Methods for Reducing Mismatches.- 2.3 Important Practical Issues.- 2.4 N-Best-Based Unsupervised Adaptation.- 3. Conclusion.- References.- Channel Adaptation.- 1. Introduction.- 1.1 Matched condition training.- 1.2 Robust features.- 1.3 Model adaptation.- 1.4 Channel adaptation.- 1.5 Speech enhancement.- 2. Models of distortion.- 2.1 Minimum mean square error.- 2.2 Additive noise estimation.- 3. Methods for channel adaptation.- 3.1 Global transformations.- 3.2 Class-specific corrections.- 3.3 Empirical methods based on stereo data.- 3.4 Model-based compensation.- 4. Conclusion.- References.- Speaker Characterization, Speaker Adaptation and Voice Conversion.- 1. Introduction.- 2. Speaker-Characterization.- 3. Speaker Recognition.- 4. Speaker-Adaptation Techniques for Speech Recognition.- 4.1 Classification of Speaker-Adaptation/Normalization Methods.- 4.2 Speaker Cluster Selection Methods.- 4.3 Interpolated Re-Estimation Algorithm.- 4.4 Spectral Mapping Algorithm.- 5. Individuality Problems in Speech Synthesis and Coding.- 6. Conclusion.- References.- Speaker Recognition.- 1. Principles of Speaker Recognition.- 2. Text-Independent Speaker Recognition Methods.- 2.1 Long-Term-Statistics-Based Methods.- 2.2 VQ-Based Methods.- 2.3 Ergodic-HMM-Based Methods.- 2.4 Speech-Recognition-Based Methods.- 3. Text-prompted Speaker Recognition.- 4. Normalization and Adaptation Techniques.- 4.1 Parameter-Domain Normalization.- 4.2 Likelihood Normalization.- 4.3 HMM Adaptation for Noisy Conditions.- 4.4 Updating Models and A Priori Threshold for Speaker Verification...- 5. Open Questions and Concluding Remarks.- References.- Application of Acoustic Discriminative Training in an Ergodic HMM for Speaker Identification.- 1. Introduction.- 2. Experimental Conditions.- 3. System Architecture.- 3.1 Acoustic Segmentation.- 3.2 The PTE-HMM Model.- 4. Experimental Results.- 5. Conclusions.- References.- Comparison of Several Compensation Techniques for Robust Speaker Verification.- 1. Introduction.- 2. The HMM recognition system.- 3. Mismatch Compensation Techniques.- 3.1 CMS.- 3.2 SMI.- 3.3 SM2.- 4. Experiments and Results.- 5. Discussion and Conclusion.- References.- Segmental Acoustic Modeling for Speech Recognition.- 1. Introduction.- 2. Segmental and Hidden Markov Models.- 2.1 General Modeling Framework.- 2.2 Models of Feature Dynamics.- 3. Recognition and Training.- 3.1 Recognition Algorithms.- 3.2 Parameter Estimation Algorithms.- 4. Segmental Features.- 5. Summary.- References.- Trajectory Representations and Acoustic Descriptions for a Segment-Modelling Approach to Automatic Speech Recognition.- 1. Introduction.- 2. Modelling Trajectories in Speech.- 3. Representing an Unobserved Trajectory with Segmental HMMs.- 3.1 Calculating segment probabilities.- 3.2 Recognition experiment.- 4. HMM Recognition with Formant Features.- 5. Modelling trajectories of cepstrum and formant features.- 6. Conclusions.- References.- Suprasegmental Modelling.- 1. Introduction.- 2. The Verbmobil System.- 3. Computation of Prosodic Information.- 3.1 Extraction of Prosodic Features.- 3.2 Prosodic Classes.- 3.3 New Boundary Labels: The Syntactic-prosodic M-labels.- 3.4 Classification of Prosodic Events.- 3.5 Improving the Classification Results with Stochastic Language Models.- 3.6 Prosodic scoring of WHGs.- 4. The Use of Prosodic Information.- 4.1 Prosody and Syntax - Interaction with the TUG-Grammar.- 4.2 Prosody and the Other Linguistic Modules.- 5. Concluding Remarks.- 6. References.- Computational Models for Speech Production.- 1. Introduction.- 2. Speech production models in science/technology literatures.- 3. Derivation of discrete-time version of statistical task-dynamic model.- 4. Algorithms for learning task-dynamic model parameters and for likelihood computation.- 4.1 Model with deterministic, time-invariant parameters.- 4.2 Model with random, time-invariant parameters.- 4.3 Model with random, smoothly time-varying parameters.- 4.4 Discriminative learning of production models' parameters.- 5. Other types of computational models of speech production.- 6. Summary and discussions.- References.- Articulatory Features and Associated Production Models in Statistical Speech Recognition.- 1. Introduction.- 2. Functional description of human speech communication as an encoding- decoding process.- 3. Overview of theories of speech perception.- 4. A general framework of statistical speech recognition.- 5. Brief analysis of weaknesses of current speech recognition technology.- 6. Phonological model: Overlapping articulatory features and related HMMs.- 7. Task-dynamic model of speech production.- 8. Interfacing overlapping features to task-dynamic model and a general architecture for speech recognition.- 9. Discussions: Machine speech recognition.- References.- Talker Normalization with Articulatory Analysis-by-Synthesis.- 1. Introduction.- 2. Normalization Procedure.- 3. Experiments.- 4. Conclusion.- References.- The Psycholinguistics of Spoken Word Recognition.- 1. Introduction.- 2. Overview: Models of spoken word recognition.- 3. Currency of mapping: units and the nature of lexical representations.- 4. Temporal nature of speech: early vs delayed commitment.- 4.1 Delayed commitment.- 5. Multiple lexical hypotheses, lexical competition and graded activation.- 6. Language architecture: Lexical and segmental levels.- 7. Language architecture: Lexical and sentential.- 8. Contribution of attention.- References.- Issues in Using Models for Self Evaluation and Correction of Speech.- 1. Introduction.- 2. Using models.- 3. Norm building.- 4. Matching between the subject's world and the technical world.- 5. Settlement of the speech education program.- 6. Management of the education program.- 7. Conclusion.- References.- The Use of the Maximum Likelihood Criterion in Language Modelling.- 1. Introduction.- 2. Perplexity and Maximum Likelihood.- 3. Smoothing and Discounting for Sparse Data.- 3.1 Modelfree Discounting and Turing-Good Estimates.- 3.2 Absolute Discounting.- 4. Partitioning-Based Models.- 4.1 Equivalence Classes of Histories and Decision Trees.- 4.2 Two-Sided Partitionings and Word Classes.- 5. Word Trigger Pairs.- 6. Maximum Entropy Approach.- 7. Conclusions.- References.- Language Model Adaptation.- 1. Introduction.- 2. Background on Language Models.- 3. Adaptation paradigms.- 3.1 LM adaptation in dialogue systems.- 4. Basic statistical methods.- 4.1 Maximum a-posteriori estimation.- 4.2 Linear interpolation.- 4.3 Sublanguages mixture adaptation.- 4.4 Backing-off.- 4.5 Maximum Entropy.- 4.6 Minimum Discrimination Information.- 4.7 Generalized iterative scaling.- 4.8 Cache model and word triggers.- 5. Practical applications of adaptation paradigms.- 5.1 The 1993 ARPA evaluation method.- 5.2 Mixture based adaptation.- 5.3 Adaptation with a cache model.- 5.4 ME and MDI adaptation.- 5.5 LM adaptation in interactive systems.- 6. Conclusion.- References.- Using Natural-Language Knowledge Sources in Speech Recognition.- 1. Introduction.- 2. Issues in Language Modeling for Speech Recognition.- 3. Formal Models for Natural Language.- 3.1 Finite-State Grammars.- 3.2 Context-Free Grammars.- 3.3 Augmented Context-Free Grammars.- 3.4 Expressive Power of Grammar Formalisms and the Requirements of Natural Language.- 4. Search Architectures for Natural-Language-Based Language Models.- 4.1 Word Lattice Parsing.- 4.2 N-best Filtering or Rescoring.- 4.3 Dynamic Generation of Partial Grammar Networks.- 5. Compiling Unification Grammars into Context-Free Grammars.- 5.1 Instantiating Unification Grammars.- 5.2 Removing Left Recursion from Context-Free Grammars.- 6. Robust Natural-Language-Based Language Models.- 6.1 Combining Linguistics and Statistics in a Language Model.- 6.2 Fully Statistical Natural-Language Grammars.- 7. Summary.- References.- How May I Help You?.- 1. Introduction.- 2. A Spoken Dialog System.- 3. Database.- 4. Algorithms.- 4.1 Salient Fragment Acquisition.- 4.2 Recognizing Fragments in Speech.- 4.3 Call Classification.- 5. Experiment Results.- 6. Conclusions.- References.- of Rules into a Stochastic Approach for Language Modelling.- 1. Introduction.- 2. Stack Decoding Strategy.- 2.1 The Algorithm.- 2.2 The Evaluation Function.- 2.3 Peculiar Advantages of the Algorithm.- 3. Rules.- 3.1 Correction of Biases.- 3.2 Under-represented Structures and Long Span Dependencies.- 4. Multi Level Interactions.- 4.1 Linguistic and Syntactic.- 4.2 Phonology.- 5. Conclusion.- References.- History Integration into Semantic Classification.- 1. Introduction.- 2. Classifier.- 3. Data.- 4. Dialogue History Integration.- 5. Discussion.- References.- Multilingual Speech Recognition.- 1. Introduction.- 2. Architecture of the National SQEL Demonstrators.- 3. Language Identification with Different Amounts of Knowledge about the Training Data.- 3.1 A System with Explicit Language Identification.- 3.2 A System with Implicit Language Identification.- 3.3 Language Identification Based on Cepstral Feature Vectors.- 4. Results.- 5. Conclusions and Future Work.- References.- Toward ALISP: A proposal for Automatic Language Independent Speech Processing.- 1. Introduction.- 2. Practical benefit of ALISP.- 3. Issues specific to ALISP.- 3.1 Selecting features.- 3.2 Modeling speech units.- 3.3 Defining a derivation criterion.- 3.4 Building a lexicon.- 4. Some tools for ALISP.- 4.1 Temporal Decomposition.- 4.2 The multigram model.- 5. Experiments.- 5.1 Cross-Language Recognition.- 5.2 Very low bit rate speech coding.- 5.3 Mono-Speaker Continuous Speech Recognition.- 6. Conclusions.- References.- Interactive Translation of Conversational Speech.- 1. Introduction.- 2. Background.- 2.1 The Problem of Spoken Language Translation.- 2.2 Research Efforts on Speech Translation.- 3. JANUS-II - A Conversational Speech Translator.- 3.1 Task Domains and Data Collection.- 3.2 System Description.- 3.3 Performance Evaluation.- 4. Applications and Forms of Deployment.- 4.1 Interactive Dialog Translation.- 4.2 Portable Speech Translation Device.- 4.3 Passive Simultaneous Dialog Translation.- References.- Multimodal Speech Systems.- 1. Introduction.- 2. System Architecture: Knowledge Sources and Controllers.- 2.1 Environment Model.- 2.2 System Model.- 2.3 User Model.- 2.4 Task Model.- 2.5 Dialogue Model.- 2.6 Models Interdependency.- 2.7 Role of Speech in Multimodal Applications.- 3. Information Speech Systems.- 3.1 Spontaneous Language Characteristics.- 3.2 Case Grammar Formalism used for Task Modelling.- 3.3 Different Parsing Methods.- 3.4 Task and Dialogue Model Integration.- 4. Conclusion.- References.- Multimodal Interfaces for Multimedia Information Agents.- 1. Introduction.- 2. Interpretation of Multimodal Input.- 2.1 Multimodal Components.- 2.2 Joint Interpretation.- 3. Multimodal Error Correction.- 3.1 Multimodal Interactive Error Repair.- 3.2 Error Repair for Multimedia Information Agents.- 3.3 Evaluating Interactive Error Repair.- 4. Multimodal Information Agents.- 4.1 Information Access.- 4.2 Information Creation.- 4.3 Information Manipulation.- 4.4 Information Dissemination.- 4.5 Controlling the Interface.- 5. The QuickDoc Application.- 6. Conclusions.- References.

「Nielsen BookData」 より

関連文献: 1件中  1-1を表示

詳細情報

ページトップへ