An information-theoretic approach to neural computing

Bibliographic Information

An information-theoretic approach to neural computing

Gustavo Deco, Dragan Obradovic

(Perspectives in neural computing)

Springer, c1996

Available at  / 49 libraries

Search this Book/Journal

Note

Includes bibliographical references (p. [243]-257) and index

Description and Table of Contents

Description

A detailed formulation of neural networks from the information-theoretic viewpoint. The authors show how this perspective provides new insights into the design theory of neural networks. In particular they demonstrate how these methods may be applied to the topics of supervised and unsupervised learning, including feature extraction, linear and non-linear independent component analysis, and Boltzmann machines. Readers are assumed to have a basic understanding of neural networks, but all the relevant concepts from information theory are carefully introduced and explained. Consequently, readers from varied scientific disciplines, notably cognitive scientists, engineers, physicists, statisticians, and computer scientists, will find this an extremely valuable introduction to this topic.

Table of Contents

1 Introduction.- 2 Preliminaries of Information Theory and Neural Networks.- 2.1 Elements of Information Theory.- 2.1.1 Entropy and Information.- 2.1.2 Joint Entropy and Conditional Entropy.- 2.1.3 Kullback-Leibler Entropy.- 2.1.4 Mutual Information.- 2.1.5 Differential Entropy, Relative Entropy and Mutual Information.- 2.1.6 Chain Rules.- 2.1.7 Fundamental Information Theory Inequalities.- 2.1.8 Coding Theory.- 2.2 Elements of the Theory of Neural Networks.- 2.2.1 Neural Network Modeling.- 2.2.2 Neural Architectures.- 2.2.3 Learning Paradigms.- 2.2.4 Feedforward Networks: Backpropagation.- 2.2.5 Stochastic Recurrent Networks: Boltzmann Machine.- 2.2.6 Unsupervised Competitive Learning.- 2.2.7 Biological Learning Rules.- I: Unsupervised Learning.- 3 Linear Feature Extraction: Infomax Principle.- 3.1 Principal Component Analysis: Statistical Approach.- 3.1.1 PCA and Diagonalization of the Covariance Matrix.- 3.1.2 PCA and Optimal Reconstruction.- 3.1.3 Neural Network Algorithms and PCA.- 3.2 Information Theoretic Approach: Infomax.- 3.2.1 Minimization of Information Loss Principle and Infomax Principle.- 3.2.2 Upper Bound of Information Loss.- 3.2.3 Information Capacity as a Lyapunov Function of the General Stochastic Approximation.- 4 Independent Component Analysis: General Formulation and Linear Case.- 4.1 ICA-Definition.- 4.2 General Criteria for ICA.- 4.2.1 Cumulant Expansion Based Criterion for ICA.- 4.2.2 Mutual Information as Criterion for ICA.- 4.3 Linear ICA.- 4.4 Gaussian Input Distribution and Linear ICA.- 4.4.1 Networks With Anti-Symmetric Lateral Connections.- 4.4.2 Networks With Symmetric Lateral Connections.- 4.4.3 Examples of Learning with Symmetric and Anti-Symmetric Networks.- 4.5 Learning in Gaussian ICA with Rotation Matrices: PCA.- 4.5.1 Relationship Between PCA and ICA in Gaussian Input Case.- 4.5.2 Linear Gaussian ICA and the Output Dimension Reduction.- 4.6 Linear ICA in Arbitrary Input Distribution.- 4.6.1 Some Properties of Cumulants at the Output of a Linear Transformation.- 4.6.2 The Edgeworth Expansion Criteria and Theorem 4.6.2.- 4.6.3 Algorithms for Output Factorization in the Non-Gaussian Case.- 4.6.4 Experimental Results of Linear ICA Algorithms in the Non-Gaussian Case.- 5 Nonlinear Feature Extraction: Boolean Stochastic Networks.- 5.1 Infomax Principle for Boltzmann Machines.- 5.1.1 Learning Model.- 5.1.2 Examples of Infomax Principle in Boltzmann Machine.- 5.2 Redundancy Minimization and Infomax for the Boltzmann Machine.- 5.2.1 Learning Model.- 5.2.2 Numerical Complexity of the Learning Rule.- 5.2.3 Factorial Learning Experiments.- 5.2.4 Receptive Fields Formation from a Retina.- 5.3 Appendix.- 6 Nonlinear Feature Extraction: Deterministic Neural Networks.- 6.1 Redundancy Reduction by Triangular Volume Conserving Architectures.- 6.1.1 Networks with Linear, Sigmoidal and Higher Order Activation Functions.- 6.1.2 Simulations and Results.- 6.2 Unsupervised Modeling of Chaotic Time Series.- 6.2.1 Dynamical System Modeling.- 6.3 Redundancy Reduction by General Symplectic Architectures.- 6.3.1 General Entropy Preserving Nonlinear Maps.- 6.3.2 Optimizing a Parameterized Symplectic Map.- 6.3.3 Density Estimation and Novelty Detection.- 6.4 Example: Theory of Early Vision.- 6.4.1 Theoretical Background.- 6.4.2 Retina Model.- II: Supervised Learning.- 7 Supervised Learning and Statistical Estimation.- 7.1 Statistical Parameter Estimation - Basic Definitions.- 7.1.1 Cramer-Rao Inequality for Unbiased Estimators.- 7.2 Maximum Likelihood Estimators.- 7.2.1 Maximum Likelihood and the Information Measure.- 7.3 Maximum A Posteriori Estimation.- 7.4 Extensions of MLE to Include Model Selection.- 7.4.1 Akaike's Information Theoretic Criterion (AIC).- 7.4.2 Minimal Description Length and Stochastic Complexity.- 7.5 Generalization and Learning on the Same Data Set.- 8 Statistical Physics Theory of Supervised Learning and Generalization.- 8.1 Statistical Mechanics Theory of Supervised Learning.- 8.1.1 Maximum Entropy Principle.- 8.1.2 Probability Inference with an Ensemble of Networks.- 8.1.3 Information Gain and Complexity Analysis.- 8.2 Learning with Higher Order Neural Networks.- 8.2.1 Partition Function Evaluation.- 8.2.2 Information Gain in Polynomial Networks.- 8.2.3 Numerical Experiments.- 8.3 Learning with General Feedforward Neural Networks.- 8.3.1 Partition Function Approximation.- 8.3.2 Numerical Experiments.- 8.4 Statistical Theory of Unsupervised and Supervised Factorial Learning.- 8.4.1 Statistical Theory of Unsupervised Factorial Learning.- 8.4.2 Duality Between Unsupervised and Maximum Likelihood Based Supervised Learning.- 9 Composite Networks.- 9.1 Cooperation and Specialization in Composite Networks.- 9.2 Composite Models as Gaussian Mixtures.- 10 Information Theory Based Regularizing Methods.- 10.1 Theoretical Framework.- 10.1.1 Network Complexity Regulation.- 10.1.2 Network Architecture and Learning Paradigm.- 10.1.3 Applications of the Mutual Information Based Penalty Term.- 10.2 Regularization in Stochastic Potts Neural Network.- 10.2.1 Neural Network Architecture.- 10.2.2 Simulations.- References.

by "Nielsen BookData"

Related Books: 1-1 of 1

Details

Page Top