Strength or accuracy : credit assignment in learning classifier systems

著者

    • Kovacs, Tim

書誌事項

Strength or accuracy : credit assignment in learning classifier systems

Tim Kovacs

(CPHC/BCS distinguished dissertation series)

Springer, c2004

大学図書館所蔵 件 / 3

この図書・雑誌をさがす

注記

Includes bibliographical references (p. [283]-301) and index

内容説明・目次

内容説明

Classifier systems are an intriguing approach to a broad range of machine learning problems, based on automated generation and evaluation of condi tion/action rules. Inreinforcement learning tasks they simultaneously address the two major problems of learning a policy and generalising over it (and re lated objects, such as value functions). Despite over 20 years of research, however, classifier systems have met with mixed success, for reasons which were often unclear. Finally, in 1995 Stewart Wilson claimed a long-awaited breakthrough with his XCS system, which differs from earlier classifier sys tems in a number of respects, the most significant of which is the way in which it calculates the value of rules for use by the rule generation system. Specifically, XCS (like most classifiersystems) employs a genetic algorithm for rule generation, and the way in whichit calculates rule fitness differsfrom earlier systems. Wilson described XCS as an accuracy-based classifiersystem and earlier systems as strength-based. The two differin that in strength-based systems the fitness of a rule is proportional to the return (reward/payoff) it receives, whereas in XCS it is a function of the accuracy with which return is predicted. The difference is thus one of credit assignment, that is, of how a rule's contribution to the system's performance is estimated. XCS is a Q learning system; in fact, it is a proper generalisation of tabular Q-learning, in which rules aggregate states and actions. In XCS, as in other Q-learners, Q-valuesare used to weightaction selection.

目次

1 Introduction.- 1.1 Two Example Machine Learning Tasks.- 1.2 Types of Task.- 1.2.1 Supervised and Reinforcement Learning.- 1.2.2 Sequential and Non-sequential Decision Tasks.- 1.3 Two Challenges for Classifier Systems.- 1.3.1 Problem 1: Learning a Policy from Reinforcement.- 1.3.2 Problem 2: Generalisation.- 1.4 Solution Methods.- 1.4.1 Method 1: Reinforcement Learning Algorithms.- 1.4.2 Method 2: Evolutionary Algorithms.- 1.5 Learning Classifier Systems.- 1.5.1 The Tripartite LCS Structure.- 1.5.2 LCS = Policy Learning + Generalisation.- 1.5.3 Credit Assignment in Classifier Systems.- 1.5.4 Strength and Accuracy-based Classifier Systems.- 1.6 About the Book.- 1.6.1 Why Compare Strength and Accuracy.- 1.6.2 Are LCS EC- or RL-based.- 1.6.3 Moving in Design Space.- 1.7 Structure of the Book.- 2 Learning Classifier Systems.- 2.1 Types of Classifier Systems.- 2.1.1 Michigan and Pittsburgh LCS.- 2.1.2 XCS and Traditional LCS?.- 2.2 Representing Rules.- 2.2.1 The Standard Ternary Language.- 2.2.2 Other Representations.- 2.2.3 Summary of Rule Representation.- 2.2.4 Notation for Rules.- 2.3 XCS.- 2.3.1 Wilson's Motivation for XCS.- 2.3.2 Overview of XCS.- 2.3.3 Wilson's Explore/Exploit Framework.- 2.3.4 The Performance System.- 2.3.4.1 The XCS Performance System Algorithm.- 2.3.4.2 The Match Set and Prediction Array.- 2.3.4.3 Action Selection.- 2.3.4.4 Experience-weighting of System Prediction.- 2.3.5 The Credit Assignment System.- 2.3.5.1 The MAM Technique.- 2.3.5.2 The Credit Assignment Algorithm.- 2.3.5.3 Sequential and Non-sequential Updates.- 2.3.5.4 Parameter Update Order.- 2.3.5.5 XCS Parameter Updates.- 2.3.6 The Rule Discovery System.- 2.3.6.1 Random Initial Populations.- 2.3.6.2 Covering.- 2.3.6.3 The Niche Genetic Algorithm.- 2.3.6.4 Alternative Mutation Schemes.- 2.3.6.5 Triggering the Niche GA.- 2.3.6.6 Deletion of Rules.- 2.3.6.7 Classifier Parameter Initialisation.- 2.3.6.8 Subsumption Deletion.- 2.4 SB-XCS.- 2.4.1 Specification of SB-XCS.- 2.4.2 Comparison of SB-XCS and Other Strength LCS.- 2.5 Initial Tests of XCS and SB-XCS.- 2.5.1 The 6 Multiplexer.- 2.5.2 Woods2.- 2.6 Summary.- 3 How Strength and Accuracy Differ.- 3.1 Thinking about Complex Systems.- 3.2 Holland's Rationale for CS-1 and his Later LCS.- 3.2.1 Schema Theory.- 3.2.2 The Bucket Brigade.- 3.2.3 Schema Theory + Bucket Brigade = Adaptation.- 3.3 Wilson's Rationale for XCS.- 3.3.1 A Bias towards Accurate Rules.- 3.3.2 A Bias towards General Rules.- 3.3.3 Complete Maps.- 3.3.4 Summary.- 3.4 A Rationale for SB-XCS.- 3.5 Analysis of Populations Evolved by XCS and SB-XCS.- 3.5.1 SB-XCS.- 3.5.2 XCS.- 3.5.3 Learning Rate.- 3.6 Different Goals, Different Representations.- 3.6.1 Default Hierarchies.- 3.6.2 Partial and Best Action Maps.- 3.6.3 Complete Maps.- 3.6.4 What do XCS and SB-XCS Really Learn?.- 3.7 Complete and Partial Maps Compared.- 3.7.1 Advantages of Partial Maps.- 3.7.2 Disadvantages of Partial Maps.- 3.7.3 Complete Maps and Strength.- 3.7.4 Contrasting Complete and Partial Maps in RL Terminology.- 3.7.5 Summary of Comparison.- 3.8 Ability to Express Generalisations.- 3.8.1 Mapping Policies and Mapping Value Functions.- 3.8.2 Adapting the Accuracy Criterion.- 3.8.3 XCS-hard and SB-XCS-easy Functions.- 3.8.4 Summary of Generalisation and Efficiency.- 3.9 Summary.- 4 What Should a Classifier System Learn?.- 4.1 Representing Boolean Functions.- 4.1.1 Truth Tables.- 4.1.2 On-sets and Off-sets.- 4.1.3 Sigma Notation.- 4.1.4 Disjunctive Normal Form.- 4.1.5 Representing Functions with Sets of Rules.- 4.1.6 How Should a Classifier System Represent a Solution?.- 4.1.7 The Value of a Single Rule.- 4.1.1 The Value of a Set of Rules.- 4.1.1 Complete and Correct Representations.- 4.1.1 Minimal Representations.- 4.1.1 Non-overlapping Representations.- 4.1.1 Why XCS Prefers Non-overlapping Populations.- 4.1.1 Should we Prefer Non-overlapping Populations?.- 4.1.1 Optimal Rule Sets: [O]s.- 4.1.1 Conflicting Rules.- 4.1.1 Representation in XCS.- 4.3 How Should We Measure Performance?.- 4.3.1 Measures of Performance.- 4.3.2 Measures of Population State.- 4.3.3 Measuring Performance and Measuring State.- 4.3.4 New Population State Metrics.- 4.3.5 Testing XCS with %[PI].- 4.3.6 Testing XCS with %[m-DNF].- 4.3.7 Summary of Metrics and Properties.- 4.4 Summary.- 5 Prospects for Adaptation.- 5.1 Known Problems with Strength LCS.- 5.2 Methodology for Rule Type Analysis.- 5.3 Analysis of Rule Types.- 5.3.1 Correct and Incorrect Actions.- 5.3.2 Overgeneral Rules.- 5.3.3 Strong Overgeneral Rules.- 5.3.4 Fit Overgeneral Rules.- 5.3.5 Parallel Definitions of Strength and Fitness.- 5.4 When are Strong and Fit Overgenerals Possible?.- 5.4.1 Biases in the Reward Function are Relevant.- 5.4.2 Competition for Action Selection.- 5.4.3 Competition for Reproduction.- 5.5 Strong Overgenerals in XCS.- 5.5.1 Biases between Actions do not Produce Strong Overgenerals.- 5.5.2 Some Properties of Accuracy-based Fitness.- 5.6 Strong Overgenerals in SB-XCS.- 5.6.1 When are Strong Overgenerals Impossible in SB-XCS?.- 5.6.2 What Makes Strong Overgenerals Possible in SB-XCS?.- 5.7 Fit Overgenerals and the Survival of Rules under the GA.- 5.7.1 Comparison on an Unbiased Reward Function.- 5.7.2 Comparison on a Biased Reward Function.- 5.7.3 Discussion.- 5.8 Designing Strong and Fit Overgenerals for XCS.- 5.8.1 Biased Variance Functions.- 5.8.2 Empirical Results.- 5.8.3 Avoiding Fit Overgenerals.- 5.8.4 SB-XCS and Biased Variance Functions.- 5.9 Strong and Fit Undergeneral Rules.- 5.10 Why Bias the Reward Function.- 5.10.1 Some State-actions are more Important than Others.- 5.10.2 A Rule Allocation Bias can Focus Resources.- 5.11 Rule Allocation Reconsidered.- 5.11.1 Knowing What Not to Do.- 5.11.2 Managing Exploration.- 5.11.3 Complete and Partial Maps Revisited.- 5.11.4 Alternatives to Biasing the Reward Function.- 5.11.5 Can SB-XCS Avoid Strong and Fit Overgenerals?.- 5.12 Sequential Tasks.- 5.12.1 The Need to Pass Values Back.- 5.12.2 The Need for Discounting.- 5.12.3 How Q-functions become Biased.- 5.12.4 Examples.- 5.12.5 Woods2 Revisited.- 5.12.6 When Will the Value Function be Unbiased?.- 5.13 What Tasks can we Solve with SB-XCS?.- 5.14 Extensions.- 5.14.1 Fitness Sharing.- 5.14.2 Other Factors Contributing to Strong Overgenerals.- 5.14.3 Qualitative and Quantitative Approaches.- 5.15 Summary.- 6 Classifier Systems and Q-learning.- 6.1 Classifier Systems and Q-learning.- 6.1.1 Q-learning in Classifier Systems.- 6.1.2 Is it Really Q-learning?.- 6.1.3 XCS is a Proper Generalisation of Tabular Q-learning.- 6.1.4 Summary.- 6.2 The GA-view and RL-view Revisited.- 6.2.1 How SB-XCS Determines Policies.- 6.2.2 How XCS Determines Policies.- 6.2.3 Three Approaches to Determining a Policy.- 6.2.4 The GA-view and the RL-view.- 6.2.5 Combining Evolution and Q-learning.- 6.3. XCS is Closer to Tabular Q-learning than to SB-XCS.- 6.4. Summary.- 7 Conclusion.- 7.1 The Capacities of Various Types of LCS.- 7.2 Contributions.- 7.3 The Take-home Message.- 7.4 Open Problems and Future Work.- 7.4.1 Fitness Sharing and Strength-based Fitness.- 7.4.2 Further Study of Accuracy-based Fitness.- 7.5 Concluding Remarks.- 7.5.1 The Moral of the Story: The Need for a Complex Systems Design Methodology.- 7.5.2 Classifier Systems and Reinforcement Learning.- 7.5.3 The Future.- Appendices.- A Evaluation of Macroclassifiers.- B Example XCS Cycle.- B.1 The Performance System Algorithm.- B.2 The Credit Assignment Algorithm.- B.3 The Rule Discovery Algorithm.- C Learning from Reinforcement.- C.1 Three Learning Paradigms.- C.1.1 Supervised Learning.- C.1.2 Reinforcement Learning.- C.1.3 Unsupervised Learning.- C.2 The Explore/Exploit Dilemma: a Feature of RL.- C.3 Sequential and Non-sequential Tasks.- C.3.1 Immediate Reward and Long-term Value.- C.3.2 Sequential Decisions Imply RL.- C.3.3 Episodic and Continuing Tasks.- C.4 The Agent's Goal: Maximising Return.- C.4.1 Return and Reward.- C.4.2 Sequential Formulations of Return.- C.5 Formalising RL Tasks.- C.5.1 Environment.- C.5.2 Learning Agent.- C.5.3 Agent-environment Interaction.- C.6 Summary.- D Generalisation Problems.- D.1 Why Generalise?.- D.1.1 The Curse of Dimensionality.- D.1.2 The Need for Generalisation.- D.2 Generalisation in RL.- D.2.1 Generalising Over Policies and Value Functions.- D.3 State Aggregation.- D.4 State Space and Generalisation Space.- D.5 Summary.- E Value Estimation Algorithms.- E.1 The Value of State-actions.- E.2 Non-sequential RL: Estimating Reward Functions.- E.2.1 The Value of State-actions in Non-sequential Tasks.- E.3 Estimating Expectations with Sample Averages.- E.3.1 Incremental Updates.- E.3.2 A General Form of Incremental Update.- E.3.3 Setting StepSize in Incremental Updates.- E.3.4 A Prediction Algorithm for Non-sequential RL.- E.4 Sequential RL: Estimating Long-term Value Functions.- E.4.1 Long-term Value Functions.- E.4.2 The Value of State-actions in Sequential Tasks.- E.4.3 The Value of a Policy.- E.4.4 Estimating Values with Monte Carlo Methods.- E.4.5 Estimating Values with Temporal Difference Methods.- E.4.6 Russell and Norvig's Maze: A Sequential RL Task.- E.4.7 Summary of Sequential RL.- E.5 State Aggregation.- E.5.1 Fixed and Adaptive Aggregation Schemes.- E.5.2 The Value of Aggregations I: Return.- E.5.3 The Value of Aggregations II: Predictive Utility.- E.6 Storing Value Estimates.- E.6.1 Storing Estimates of Aggregations.- E.6.2 Sparse Estimators, Models and Search.- E.6.3 Function Approximators.- E.7 Summary.- F Generalised Policy Iteration Algorithms.- F.1 Policy Improvement.- F.2 Optimal Policies.- F.3 Generalised Policy Iteration.- F.3.1 How Well must we Evaluate a Policy?.- F.3.2 Convergence Properties of GPI Control Algorithms.- F.3.3 Initialising Value Functions.- F.3.4 What Characterises GPI Algorithms?.- F.4 State-value Functions.- F.5 Summary.- G Evolutionary Algorithms.- G.1 Evolution.- G.2 Elements of EAs.- G.2.1 A Generic EA.- G.2.2 Population-based Search.- G.2.3 Fitness Functions.- G.2.4 Probabilistic Selection of Parents.- G.2.5 Genetic Operators.- G.2.6 Replacement.- G.3 EAs as Search.- G.3.1 Local and Global Optima.- G.4 The Generalisation Problem.- G.5 Niching and Mating Restriction.- G.5.1 Fitness Sharing.- G.5.2 Crowding.- G.5.3 Mating Restriction.- G.6 RL with EAs.- G.6.1 Non-associative RL with an EA.- G.6.2 Associative RL with an EA.- G.6.3 Sequential RL with an EA.- G.7 Comparing GPI and EA Methods for RL.- G.7.1 Similarities between GPI and EA Methods.- G.8 Summary.- H The Origins of Sarsa.- H.1 Modified Connectionist Q-learning.- H.2 ZCS's Implicit Bucket Brigade.- H.3 Who Invented Sarsa?.- I Notation.- References.

「Nielsen BookData」 より

関連文献: 1件中  1-1を表示

詳細情報

  • NII書誌ID(NCID)
    BA67614771
  • ISBN
    • 1852337702
  • LCCN
    2003061884
  • 出版国コード
    uk
  • タイトル言語コード
    eng
  • 本文言語コード
    eng
  • 出版地
    London ; New York
  • ページ数/冊数
    xvi, 307 p.
  • 大きさ
    24 cm
  • 分類
  • 件名
  • 親書誌ID
ページトップへ