Knowledge acquisition from databases

Author(s)

    • Wu, Xindong

Bibliographic Information

Knowledge acquisition from databases

Xindong Wu

Ablex Pub., c1995

  • pbk.

Available at  / 8 libraries

Search this Book/Journal

Note

Includes bibliographical references (p. 195-204) and index

Description and Table of Contents

Description

This is a textbook for undergraduate and postgraduate students on machine learning, expert systems, and artificial intelligence courses. The text may also serve as a reference book for researchers in machine learning, knowledge based systems, genetic algorithms, and neural networks.

Table of Contents

List of Figures List of Tables About the Author Preface Glossary of Notation I Knowledge Acquisition From Databases: An Overview 1 1 Introduction 3 1.1 Problem and Domain . 3 1.2 Outline of the Book . . 4 2 Learning From Databases 7 2.1 Does Database Technology Need Machine Learning? . 7 2.2 How Can Machine Learning Be Coupled With Databases? 8 2.3 Intelligent Learning Database Systems: A Definition . 9 2.4 Pneumonia and Tuberculosis: A Simple Example. . . . . . 1 1 II Rule Induction From Examples 3 Symbolic Rule Induction in Machine Learning 3.1 Introduction .. , ..... ........ . 3.2 Background in Knowledge-Based Systems 18 3.2.1 Knowledge Acquisition in Knowledge-Based Systems 18 3.2.2 Components in a Learning System 20 3.2.3 Learning Strategies: Overview . . . 21 3.3 Attribute-Based Induction . . . . . . . . . 23 3.4 Incremental Generalization-Specialization 24 3.4.1 ARCH . . . . . . . . 24 3.4.2 Version Spaces 26 3.5 Explanation-Based Learning 29 3.5.1 The Paradigm . . 29 3.5.2 An Example Run 29 3.5.3 Discussion 31 3.6 Conclusions . . . . . . . 31 4 Constructing Decision Trees With 1D3 and C4.5 33 4.1 Developer and Background . . . . . 33 4.2 The ID3 Algorithm . . . . . . . . . 34 4.3 An Example: Play and Don't Play 35 4.4 Advantages and Disadvantages . . . 38 4.5 Recent Development of ID3 . . . . 39 4.5.1 Noise Handling During Induction 39 4.5.2 Incremental Learning . . . . . 40 4.5.3 Constructive Learning . . . . . . 40 4.5.4 Postpruning of Decision Trees . . 41 4.5.5 Decompiling Decision Trees Into Production Rules . 41 4.5.6 Binarization of Decision Trees . . . . . . . . 42 4.5.7 A New Selection Criterion for Decision Tree Construction . . . .43  4.5.8 Structured Induction 48 4.5.9 Conclusions . . . . . 48  5 Generating Rules With HCV 49 5.1 The Extension Matrix Approach 49 5.1.1 Developers and Background 49 5.1.2 Terminology and Notation 49  5.1.3 Optimization Problems 51 5.1.4 Heuristic Strategies in AEl 52 5.1.5 Further Considerations 53 5.2 The HFL Algorithm 54 5.2.1 Four Strategies in HFL 54  5.2.2 Algorithm Description 57  5.2.3 An Example Run of HFL 59  5.3 The HCV Algorithm 61 5.3.1 Algorithm Description 62 5.3.2 Two Example Runs of HCV 63  5.3.3 A Comparison Between HCV and AEl 68 5.4 Some Related Problems in Implementation 68 5.4.1 Introduction . . . . . . . . . . . . . . . 68 5.4.2 Don't Gares in HCV . . . . . . . . . . 69 5.4.3 Noise Handling and Real-Valued Attributes in HCV 70 5.4.4 Size of Extension Matrices . . . . . . . . . 70  6 Dealing With Noise and Real-Valued Attributes 73 6.1 Sources of Noise ............ 73  6.2 Noise Handling ............ 75  6.2.1 Preprocessing of Training Data 75 6.2.2 Induction-Time Processing . . . 77 6.2.3 Postprocessing of Induction Results 78 6.2.4 Deduction-Time Processing ... 80  6.3 Dea.ling With Real-Valued Attributes 83 6.3.1 The Simplest Class-Separating Method  84 6.3.2 Bayesian Classifiers 84  6.3.3 The Information Gain Heuristic 85  6.3.4 Other Methods . . . . . . . . . . . . 86  6.4 Fuzzy Interpretation of Discretized Intervals 86  7 A Performance Compa1·ison of HCV With 1D3, C4.5, and NewID  89 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 89 7.2 The MONK's Problems. . . . . . . . . . . . . . . . 90 7.3 Performance Comparison on the MONK's Problems 91 7.4 Summaries of More Experiments ......... 92  7.4.1 Data Sets Without Real-Valued Attributes 92 7.4.2 Data Sets With Real-Valued Domains 93 7.5 Conclusions ....................93 III KEshell2: An Intelligent Learning Database System 97  8 KEshell: A Rule Schema + Rule Body-Based Knowledge Engineering Shell 99 8.1 Introduction . . . . . 99 8.2 Problems in Production Systems 100 8.2.1 Low Efficiency 101 8.2.2 Lack of Flexibility in Expressing Procedural Knowledge . . . . . . . . . . . 103 8.2.3 Lack of Flexibility in Inexact Inference 103 8.3 Rule Schema + Rule Body 104 8.3.1 The Syntax . . . . . . 104 8.3.2 Advantages of the Language 107 8.3.3 A Working Example 108 8.4 LFA: Linear Forward Chaining on Rule Schema+ Rule Body 110 8.4.1 Domain Reasoning Networks . . . . 110 8.4.2 Sorting Knowledge in a Knowledge Base Into a Partial Order . . . . 112 8.4.3 Linear Forward Chaining . . . . . 114 8.4.4 Restrictions on LFA . . . . . . 115 8.5 KEshell: A Rule Schema+ Rule Body-Based Knowledge Engineering Shell . . . . . . . . 116 8.5.1 Overview . . . . . . 116 8.5.2 Structured Interactive Knowledge Transfer (SIKT) 118 8.5.3 The Editor . . . . 120 8.5.4 The Tracing Engine 121 8.5.5 The Interface 121 8.6 Conclusions . . . . . . . . 124 9 A Representation for Integrating Knowledge and Data 125 9.1 Introduction . . 125 9.2 Motivations . . . . . . . . . . 126 9.3 The Representation . . . . . 129 9.3.1 Representation for the Relational Model 129 9.3.2 Representation for More Semantic Information 130 9.4 Discussions . . . . . . . . 137 9.5 An Approach to Generation of Semantic Networks From Relational Database Schemata  137 10 KEshell2: An Intelligent Learning Database System 139 10.1 System Structure 139  10.2 Monitor 139 10.3 KBMS 141 10.4 DBMS 141 10.5 I/D Engine 143 10.6 K.A. Engine 143 10.6.1 Semantic Information 143 10.6.2 Induction From Databases  144  10.7 Conclusions 147 11 Conclusions 149  Appendices A HCV (Version 2.0) User's Manual 153 A.I Introduction 153 A.1.1 The HCV Algorithm 153 A.1.2 The HCV (Version 2.0) Software 154 A.2 Notational Conventions 154 A.3 How to Run HCV (Version 2.0) 155 A.3.1 Overview 155 A.3.2 Getting Started 156 A.3.3 The hcv Shell Script 156 A.3.4 Files 158 A.3.5 Cross-Validation: cvshell 162  A.4 Experiments With Individual Modules 163  A.4.1 Data Preprocessing 163 A.4.2 Discretization 164 A.4.3 Preprocessing of Noise 167  A.4.4 Induction 168 A.4.5 Postprocessing of Rules 170 A.4.6 Deduction 171 A.5 Installation 173 A.5.1 Unpacking the HCV (Version 2.0) Distribution File 173 A.5.2 Configuring HCV (Version 2.0) 174 A.5.3 Building HCV (Version 2.0) 175 A.5.4 Verified Platforms 175  A.5.5 Availability and Distribution . . . . . . . . . . . . . 176 B Results Produced by HCV on the MONK's Problems 177 B.1 The Ml Problem 177 B.2 The M2 Problem 179 B.3 The M3 Problem 187 C An Example Run of SIKT in KE'shell 191 References 195 Subject Index

by "Nielsen BookData"

Details

Page Top