The informational complexity of learning : perspectives on neural networks and generative grammar

書誌事項

The informational complexity of learning : perspectives on neural networks and generative grammar

Partha Niyogi

Kluwer Academic Publishers, c1998

大学図書館所蔵 件 / 31

この図書・雑誌をさがす

注記

Originally presented as the author's thesis (doctoral)--Massachusettes Institute of Technology

Includes bibliographical references (p. 213-222) and index

内容説明・目次

内容説明

Among other topics, The Informational Complexity of Learning: Perspectives on Neural Networks and Generative Grammar brings together two important but very different learning problems within the same analytical framework. The first concerns the problem of learning functional mappings using neural networks, followed by learning natural language grammars in the principles and parameters tradition of Chomsky. These two learning problems are seemingly very different. Neural networks are real-valued, infinite-dimensional, continuous mappings. On the other hand, grammars are boolean-valued, finite-dimensional, discrete (symbolic) mappings. Furthermore the research communities that work in the two areas almost never overlap. The book's objective is to bridge this gap. It uses the formal techniques developed in statistical learning theory and theoretical computer science over the last decade to analyze both kinds of learning problems. By asking the same question - how much information does it take to learn? - of both problems, it highlights their similarities and differences. Specific results include model selection in neural networks, active learning, language learning and evolutionary models of language change. The Informational Complexity of Learning: Perspectives on Neural Networks and Generative Grammar is a very interdisciplinary work. Anyone interested in the interaction of computer science and cognitive science should enjoy the book. Researchers in artificial intelligence, neural networks, linguistics, theoretical computer science, and statistics will find it particularly relevant.

目次

  • 1. Introduction.- 1.1 The Components of a Learning Paradigm.- 1.1.1 Concepts, Hypotheses, and Learners.- 1.1.2 Generalization, Learnability, Successful learning.- 1.1.3 Informational Complexity.- 1.2 Parametric Hypothesis Spaces.- 1.3 Technical Contents and Major Contributions.- 1.3.1 A Final Word.- 2. Generalization Error For Neural Nets.- 2.1 Introduction.- 2.2 Definitions and Statement of the Problem.- 2.2.1 Random Variables and Probability Distributions.- 2.2.2 Learning from Examples and Estimators.- 2.2.3 The Expected Risk and the Regression Function.- 2.2.4 The Empirical Risk.- 2.2.5 The Problem.- 2.2.6 Bounding the Generalization Error.- 2.2.7 A Note on Models and Model Complexity.- 2.3 Stating the Problem for Radial Basis Functions.- 2.4 Main Result.- 2.5 Remarks.- 2.5.1 Observations on the Main Result.- 2.5.2 Extensions.- 2.5.3 Connections with Other Results.- 2.6 Implications of the Theorem in Practice: Putting In the Numbers.- 2.6.1 Rate of Growth of n for Guaranteed Convergence.- 2.6.2 Optimal Choice of n.- 2.6.3 Experiments.- 2.7 Conclusion.- 2-A Notations.- 2-B A Useful Decomposition of the Expected Risk.- 2-C A Useful Inequality.- 2-D Proof of the Main Theorem.- 2-D.1 Bounding the approximation error.- 2-D.2 Bounding the estimation error.- 2-D.3 Bounding the generalization error.- 3. Active Learning.- 3.1 A General Framework For Active Approximation.- 3.1.1 Preliminaries.- 3.1.2 The Problem of Collecting Examples.- 3.1.3 In Context.- 3.2 Example 1: A Class of Monotonically Increasing Bounded Functions.- 3.2.1 Lower Bound for Passive Learning.- 3.2.2 Active Learning Algorithms.- 3.2.2.1 Derivation of an optimal sampling strategy.- 3.2.3 Empirical Simulations, and other Investigations.- 3.2.3.1 Distribution of Points selected.- 3.2.3.2 Classical Optimal Recovery.- 3.2.3.3 Error Rates and Sample Complexities for some Arbitrary Functions: Some Simulations.- 3.3 Example 2: A Class of Functions with Bounded First Derivative.- 3.3.1 Lower Bounds.- 3.3.2 Active Learning Algorithms.- 3.3.2.1 Derivation of an optimal sampling strategy.- 3.3.3 Some Simulations.- 3.3.3.1 Distribution of points selected.- 3.3.3.2 Error Rates:.- 3.4 Conclusions, Extensions, and Open Problems.- 3.5 A Simple Example.- 3.6 Generalizations.- 3.6.1 Localized Function Classes.- 3.6.2 The General ?-focusing strategy
  • .- 3.6.3 Generalizations and Open Problems.- 4. Language Learning.- 4.1 Language Learning and The Poverty of Stimulus.- 4.2 Constrained Grammars-Principles and Parameters.- 4.2.1 Example: A 3-parameter System from Syntax.- 4.2.2 Example: Parameterized Metrical Stress in Phonology.- 4.3 Learning in the Principles and Parameters Framework.- 4.4 Formal Analysis of the Triggering Learning Algorithm.- 4.4.1 Background.- 4.4.2 The Markov formulation.- 4.4.2.1 Parameterized Grammars and their Corresponding Markov Chains.- 4.4.2.2 Markov Chain Criteria for Learnability.- 4.4.2.3 The Markov chain for the 3-parameter Example.- 4.4.3 Derivation of the transition probabilities for the Markov TLA structure.- 4.4.3.1 Formalization.- 4.4.3.2 Additional Properties of the Learning System.- 4.5 Characterizing Convergence Times for the Markov Chain Model.- 4.5.1 Some Transition Matrices and Their Convergence Curves.- 4.5.2 Absorption Times.- 4.5.3 Eigenvalue Rates of Convergence.- 4.5.3.1 Eigenvalues and Eigenvectors.- 4.5.3.2 Representation of Tk.- 4.5.3.3 Initial Conditions and Limiting Distributions.- 4.5.3.4 Rate of Convergence.- 4.5.3.5 Transition Matrix Recipes:.- 4.6 Exploring Other Points.- 4.6.1 Changing the Algorithm.- 4.6.2 Distributional Assumptions.- 4.6.3 Natural Distributions-CHILDES COORPUS.- 4.7 Batch Learning Upper and Lower Bounds: An Aside.- 4.8 Conclusions, Open Questions, and Future Directions.- 4-A Unembedded Sentences For Parametric Grammars.- 4-B Memoryless Algorithms and Markov Chains.- 4-C Proof of Learnability Theorem.- 4-C.1 Markov state terminology.- 4-C.2 Canonical Decomposition.- 4-D Formal Proof.- 5. Language Change.- 5.1 Introduction.- 5.2 Language Change in Parametric Systems.- 5.3 Example 1: A Three Parameter System.- 5.3.1 Starting with Homogeneous Populations:.- 5.3.1.1 A = TLA
  • Pi = Uniform
  • Finite Sample = 128.- 5.3.1.2 A = Greedy, No S.V.
  • Pi = Uniform
  • Finite Sample = 128.- 5.3.1.3 A = a) R.W. b) S. V. only
  • Pi = Uniform
  • Finite Sample = 128.- 5.3.1.4 Rates of Change.- 5.3.2 Non-homogeneous Populations: Phase-Space Plots.- 5.3.2.1 Phase-Space Plots: Grammatical Trajectories.- 5.3.2.2 Issues of Stability.- 5.4 Example 2: The Case of Modern French:.- 5.4.1 The Parametric Subspace and Data.- 5.4.2 The Case of Diachronie Syntax Change in French.- 5.4.3 Some Dynamical System Simulations.- 5.4.3.1 Homogeneous Populations [Initial-Old French].- 5.4.3.2 Heterogeneous Populations (Mixtures).- 5.5 Conclusions.- 6. Conclusions.- 6.1 Emergent Themes.- 6.2 Extensions.- 6.3 A Concluding Note.- References.

「Nielsen BookData」 より

詳細情報

ページトップへ