Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability

 SCHMIDHUBER Jurgen
 IDSIA
Search this Article
Author(s)

 SCHMIDHUBER Jurgen
 IDSIA
Journal

 Neural Networks

Neural Networks 10(5), 857873, 19970701
References: 86

1
 <no title>

ADLEMAN L.
Time, space and randomness, 1979
Cited by (1)

2
 Application of timebounded Kolmogorov complexity in complexity theory

ALLENDER A.
Kolmogorov complexity and computational complexity, 622, 1992
Cited by (1)

3
 Complexity regularization with application to artificial neural networks

BARRON A. R.
Nonparametric functional estimation and related topics, 561576, 1988
Cited by (1)

4
 <no title>

BARTO A. G.
Connectionist approaches for control, 1989
Cited by (1)

5
 Algorithmic information theory

BARZDIN Y. M.
Encyclopaedia of mathematics 1, 140142, 1988
Cited by (1)

6
 Logical depth and physical complexity

BENNETT C. H.
The universal Turing machine: a half century survey 1, 227258, 1988
Cited by (1)

7
 <no title>

CHAITIN G.
Algorithmic Information Theory, 1987
Cited by (1)

8
 Elimination of overtraining by a mutual information network

DECO G.
Proceedings of the International Conference on Artificial Neural Networks, 744749, 1993
Cited by (1)

9
 Limitations of inductive learning

DIETTERICH T. G
Proceedings of the Sixth International Workshop on Machine Learning, 124128, 1989
Cited by (1)

10
 On the symmetry of algorithmic information

GACS P.
Soviet Math. Dokl. 15, 14771480, 1974
Cited by (1)

11
 The minimum description length principle and its application to online learning of handprinted characters

GAO Q.
Proceedings of the 11th IEEE International Joint Conference on Artificial Intelligence, 843848, 1989
Cited by (1)

12
 Structural Risk Minimization for Character Recognition

GUYON I.
Advances in neural information processing systems 4, 471479, 1992
Cited by (2)

13
 Generalized Kolmogorov complexity and the structure of feasible computations

HARTMANIS J.
Proceedings of the 24th IEEE Symposium on Foundations of Computer Science, 439445, 1983
Cited by (1)

14
 Second order derivatives for network pruning : optimal brain surgeon

HASSIBI B.
Advances in Neural Information Processing Systems 5, 164171, 1993
Cited by (6)

15
 Universelle Suche und inkrementelles Lernen

HEIL S.
Unpublished diploma thesis. Fakultat fur Informatik, Lehrstuhl Prof. Brauer, Technische Universitat Munchen., 1995
Cited by (1)

16
 Keeping neural networks simple.

HINTON G. E.
Proceedings of the International Conference on Artificial Neural Networks, Amsterdam, 1118, 1993
Cited by (1)

17
 Simplifying neural nets by discovering flat minima

HOCHREITER S.
Advances in neural information processing systems 7, 529536, 1995
Cited by (1)

18
 Three approaches to the quantitative definition of information

KOLMOGOROV A.
Problems of Information Transmission 1, 111, 1965
Cited by (1)

19
 <no title>

KOLMOGOROV A. N.
Grundbegniffe der Wahrscheinlichkeitsrechnung, 1933
Cited by (1)

20
 A simple weight decay can improve generalization.

KROGH A.
Advances in Neural Information Processing Systems No. 4, 950957, 1992
Cited by (5)

21
 Une procedure d'apprentissage ponr reseau a seuil asymetrique

LECUN Y.
Proceedings of Cognitiva 85, 599604, 1985
Cited by (1)

22
 Second order properties of error surfaces: learning time and generalization

LECUN Y.
Advances in neural information processing systems 3, 918924, 1991
Cited by (1)

23
 On the notion of a random sequence

LEVIN L. A.
Soviet Math. Dokl. 14(5), 14131416, 1973
Cited by (1)

24
 Universal sequential search problems

LEVIN L. A.
Problems of Information Transmission 9(3), 265266, 1973
Cited by (1)

25
 Laws of information (nongrowth) and aspects of the foundation of probability theory

LEVIN L. A.
Problems of Information Transmission 10(3), 206210, 1974
Cited by (1)

26
 Various measures of complexity for finite objects (axiomatic description)

LEVIN L. A.
Soviet Math. Dokl. 17(2), 522526, 1976
Cited by (1)

27
 A theory of learning simple concepts under simple distributions and average case complexity for the universal distribution

LI M.
Proceedings of the 30th American IEEE Symposium on Foundations of Computer Science, 3439, 1989
Cited by (1)

28
 <no title>

LI M.
An Introduction to Kolmogorov Complexity and Its Applications, 1993
Cited by (5)

29
 Perspectives of current research about the complexity of learning on neural nets

MAASS W.
Theoretical advances in neural computation and learning, 1994
Cited by (1)

30
 Discovery by minimal length encoding: a case study in molecular evolution

MILOSAVLJEVIC A.
Machine Learning 12, 8796, 1993
Cited by (1)

31
 The effective number of parameters: an analysis of generalization regularization in nonlinear learning systems

MOODY J. E.
Advances in neural information processing systems 4, 847854, 1992
Cited by (1)

32
 Skeletonization : A technique for trimming the fat from a network via relevance assessment

MOZER M. C.
Advances in neural information processing systems 1, 107115, 1989
Cited by (3)

33
 Simplifying neural networks by soft weight sharing

NOWLAN S. J.
Neural Computation 4, 173193, 1992
Cited by (1)

34
 <no title>

PARKER D. B.
Learninglogic, 1985
Cited by (4)

35
 ChaitinKolmogorov complexity and generalization in neural networks

PEARLMUTTER B. A.
Advances in neural information processing systems 3, 925931, 1991
Cited by (2)

36
 Some experiment in applying inductive inference principles to surface reconstruction.

PEDNAULT E. P. D.
11th IJCAI, 16031609, 1989
Cited by (1)

37
 Learning internal representations by error propagation

RUMELHART D. E.
Parallel Distributed Processing 1, 318362, 1986
Cited by (38)

38
 On decreasing the ratio between learning complexity and number of timevarying variables in fully recurrent nets

SCHMIDHUBER J.
Proceedings of the International Conference on Artificial Neural Networks, 460463, 1993
Cited by (1)

39
 A selfreferential weight matrix

SCHMIDHUBER J.
Proceedings of the International Conference on Artificial Neural Networks, 446451, 1993
Cited by (1)

40
 <no title>

SCHMIDHUBER J.
Discovering problem solutions with low Kolmogorov complexity and high generalization capability, 1994
Cited by (1)

41
 <no title>

SCHMIDHUBER J.
Machine Learning: Proceedings of the Twelfth International Conference, 488496, 1995
Cited by (1)

42
 A general method for incremental selfimprovement and multiagent learning in unrestricted environments

SCHMIDHUBER J.
Evolutionary computation: theory and applications, 1996
Cited by (1)

43
 <no title>

SCHMIDHUBER J.
Simple principles of metalearning, 1996
Cited by (1)

44
 A mathematical theory of communication (parts I and II)

SHANNON C. E.
Bell System Technical Journal 27, 379423, 1948
Cited by (1)

45
 An application of algorithmic probability to problems in artificial intelligence

SOLOMONOFF R.
Uncertainty in artificial intelligence, 473491, 1986
Cited by (1)

46
 Shift of bias for inductive concept learning

UTGOFF P.
Machine learning 2, 163190, 1986
Cited by (1)

47
 Principles of risk minimization for learning theory

VAPNIK V.
Advances in Neural Information Processing Systems 4, 831838, 1992
Cited by (1)

48
 An information theoretic measure for classification

WALLACE C. S.
Computer Journal 11(2), 185194, 1968
Cited by (1)

49
 <no title>

WATANABE O.
Kolmogorov Complexity and Computational Complexity, 1992
Cited by (2)

50
 Learning from delayed rewards

WATKINS C.
PhD thesis, King's College London, 1989
Cited by (1)

51
 Predicting the future : a connectionist approach

WEIGEND A. S.
International Journal of Neural Systems 1, 193209, 1990
Cited by (6)

52
 Beyond regression: new tools for prediction and analysis in the behavioral sciences

WERBOS P. J.
Ph. D. thesis, Harvard University, 1974
Cited by (2)

53
 Solving POMDPs with Levin search and EIRA

WIERING M.
Machine learning: proceedings of the thirteenth international conference, 534542, 1996
Cited by (1)

54
 <no title>

WILLIAMS R. J.
Toward a theory of reinforcement learning connectionist systems, 1988
Cited by (2)

55
 <no title>

WOLPERT D. H.
Technical Report SFI TR 9303016, 1993
Cited by (1)

56
 Incremental selfimprovement for lifetime multiagent reinforcement learning

ZHAO J.
From Animais to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, 516525, 1996
Cited by (1)

57
 Statistical theory of learning curves under entropic loss

AMARI S. I.
Neural Computation 5, 140153, 1993
DOI Cited by (25)

58
 Understanding retinai color coding from first principles

ATICK J.
Neural Computation 4, 559572, 1992
DOI Cited by (2)

59
 Unsupervised learning

BARLOW H.
Neural Computation 1, 295311, 1989
DOI Cited by (20)

60
 Neuronlike adaptive elements that can solve difficult learning control problems

BARTO A. G.
IEEE Trans. Syst., Man. & Cybern. 13, 834846, 1983
Cited by (117)

61
 What size net gives valid generalization?

BAUM E. B.
Neural Computation 1(1), 151160, 1989
DOI Cited by (51)

62
 Unsupervised learning procedures for neural networks

BECKER S.
Int. J. Neural Systems 2(1&2), 1733, 1991
DOI Cited by (4)

63
 Occam's Razor

BLUMER A.
Information Processing Letters 24, 377380, 1987
Cited by (7)

64
 On the length of programs for computing finite binary sequences

CHAITIN G.
J. ACM 13(4), 547569, 1966
DOI Cited by (5)

65
 On the length of programs for computing finite binary sequences: statistical considerations

CHAITIN G.
Journal of the ACM 16, 145159, 1969
DOI Cited by (1)

66
 Theory of program size formally identical to information theory

CHAITIN G.
J. ACM 22(3), 329340, 1975
Cited by (7)

67
 Kolmogorov's contributions to information theory and algorithmic complexity

COVER T. M.
Annals of Probability 17, 840865, 1985
DOI Cited by (2)

68
 TD(λ) converges with probability 1

DAYAN P.
Machine Learning, 295301, 1994
DOI Cited by (3)

69
 Connectionist Learning Algorithm with Provable Generalization and Scaling Bounds

GALLANT S. I.
Neural Networks 3, 191201, 1990
Cited by (4)

70
 Quantifying inductive bias: AI learning algorithms and Valiant's learning framework

HAUSSLER D.
Artif. Intell. 36(2), 177222, 1988
Cited by (13)

71
 Randomness conservation inequalities: information and independence in mathematical theories

LEVIN L. A.
Information and Control 61, 1537, 1984
Cited by (1)

72
 Selforganization in perceptual network

LINSKER R.
Computer 21(3), 105117, 1988
DOI Cited by (48)

73
 A practical Bayesian framework for backpropagation networks

MACKAY D. J. C.
Neural Computation 4, 448472, 1992
DOI Cited by (37)

74
 The Definigion of Random sequences

MARTINLOF P.
Info. Control 9, 602619, 1966
DOI Cited by (8)

75
 Inferring Decision Trees Using the Minimum Description Length Principle

QUINLAN J. R.
Information and Computation Vol.80, 227248, 1989
DOI Cited by (42)

76
 Modeling by Shortest Data Description

RISSANEN J.
Automatica 14, 465471, 1978
DOI Cited by (123)

77
 A Universal Prior for Integers and Estimation by Minimum Description Length

RISSANEN J.
Ann. Statist 11, 416431, 1983
DOI Cited by (61)

78
 Stochastic complexity and modeling.

RISSANEN J.
Annals of Statistics 14, 10801100, 1986
DOI Cited by (54)

79
 Overfitting avoidance as bias

SCHAFFER C.
Machine Learning 10, 153178, 1993
DOI Cited by (5)

80
 A local learning algorithm for dynamic feedforward and recurrent networks

SCHMIDHUBER J.
Connection Science 1(4), 403412, 1989
DOI Cited by (2)

81
 Learning complex, extended sequences using the principle of history compression

SCHMIDHUBER J.
Neural Computation 4(2), 234242, 1992
DOI Cited by (2)

82
 Learning factorial codes by predictability minimization

SCHMIDHUBER J.
Neural Computation 4, 863879, 1992
DOI Cited by (3)

83
 A unified approach to the definition of random sequences

SCHNORR C. P.
Mathematical Systems Theory 5, 246258, 1971
DOI Cited by (1)

84
 A Formal Theoryof Inductive Inference

SOLOMONOFF R.
Information and Control 7(1), 122, 1964
DOI Cited by (2)

85
 A theory of the learnable

L. G. Valiant
Communications of ACM 27, 11341142, 1984
DOI Cited by (101)

86
 The complexity of finite objects and the algorithmic concepts of infomation and randomness

ZVONKIN A. K.
Russian Mathematical Surveys 25(6), 83124, 1970
DOI Cited by (1)
Cited by: 1

1
 An approach to guaranteeing generalisation in neural networks

POLHILL J. Gary , WEIR Michael K.
Neural networks : the official journal of the International Neural Network Society 14(8), 10351048, 20011001
References (41)