Data mining for business analytics : concepts, techniques, and applications with JMP Pro

書誌事項

Data mining for business analytics : concepts, techniques, and applications with JMP Pro

Galit Shmueli ... [et al.]

Wiley, c2017

  • : cloth

大学図書館所蔵 件 / 3

この図書・雑誌をさがす

注記

Includes bibliographical references (p. 431-432) and index

内容説明・目次

内容説明

Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro (R) presents an applied and interactive approach to data mining. Featuring hands-on applications with JMP Pro (R), a statistical package from the SAS Institute, the book uses engaging, real-world examples to build a theoretical and practical understanding of key data mining methods, especially predictive models for classification and prediction. Topics include data visualization, dimension reduction techniques, clustering, linear and logistic regression, classification and regression trees, discriminant analysis, naive Bayes, neural networks, uplift modeling, ensemble models, and time series forecasting. Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro (R) also includes: Detailed summaries that supply an outline of key topics at the beginning of each chapter End-of-chapter examples and exercises that allow readers to expand their comprehension of the presented material Data-rich case studies to illustrate various applications of data mining techniques A companion website with over two dozen data sets, exercises and case study solutions, and slides for instructors www.dataminingbook.com Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro (R) is an excellent textbook for advanced undergraduate and graduate-level courses on data mining, predictive analytics, and business analytics. The book is also a one-of-a-kind resource for data scientists, analysts, researchers, and practitioners working with analytics in the fields of management, finance, marketing, information technology, healthcare, education, and any other data-rich field.

目次

FOREWORD xvii PREFACE xix ACKNOWLEDGMENTS xxi PART I PRELIMINARIES 1 Introduction 3 1.1 What Is Business Analytics? 3 Who Uses Predictive Analytics? 4 1.2 What Is Data Mining? 5 1.3 Data Mining and Related Terms 5 1.4 Big Data 6 1.5 Data Science 7 1.6 Why Are There So Many Different Methods? 7 1.7 Terminology and Notation 8 1.8 Roadmap to This Book 10 Order of Topics 11 Using JMP Pro, Statistical Discovery Software from SAS 11 2 Overview of the Data Mining Process 14 2.1 Introduction 14 2.2 Core Ideas in Data Mining 15 Classification 15 Prediction 15 Association Rules and Recommendation Systems 15 Predictive Analytics 16 Data Reduction and Dimension Reduction 16 Data Exploration and Visualization 16 Supervised and Unsupervised Learning 16 2.3 The Steps in Data Mining 17 2.4 Preliminary Steps 19 Organization of Datasets 19 Sampling from a Database 19 Oversampling Rare Events in Classification Tasks 19 Preprocessing and Cleaning the Data 20 Changing Modeling Types in JMP 20 Standardizing Data in JMP 25 2.5 Predictive Power and Overfitting 25 Creation and Use of Data Partitions 25 Partitioning Data for Crossvalidation in JMP Pro 27 Overfitting 27 2.6 Building a Predictive Model with JMP Pro 29 Predicting Home Values in a Boston Neighborhood 29 Modeling Process 30 Setting the Random Seed in JMP 34 2.7 Using JMP Pro for Data Mining 38 2.8 Automating Data Mining Solutions 40 Data Mining Software Tools: the State of theMarket by Herb Edelstein 41 Problems 44 PART II DATA EXPLORATION AND DIMENSION REDUCTION 3 Data Visualization 51 3.1 Uses of Data Visualization 51 3.2 Data Examples 52 Example 1: Boston Housing Data 53 Example 2: Ridership on Amtrak Trains 53 3.3 Basic Charts: Bar Charts, Line Graphs, and Scatterplots 54 Using The JMP Graph Builder 54 Distribution Plots: Boxplots and Histograms 56 Tools for Data Visualization in JMP 59 Heatmaps (Color Maps and Cell Plots): Visualizing Correlations and Missing Values 59 3.4 Multidimensional Visualization 61 Adding Variables: Color, Size, Shape, Multiple Panels, and Animation 62 Manipulations: Rescaling, Aggregation and Hierarchies, Zooming, Filtering 65 Reference: Trend Lines and Labels 68 Adding Trendlines in the Graph Builder 69 Scaling Up: Large Datasets 70 Multivariate Plot: Parallel Coordinates Plot 71 Interactive Visualization 72 3.5 Specialized Visualizations 73 Visualizing Networked Data 74 Visualizing Hierarchical Data: More on Treemaps 75 Visualizing Geographical Data: Maps 76 3.6 Summary of Major Visualizations and Operations, According to Data Mining Goal 77 Prediction 77 Classification 78 Time Series Forecasting 78 Unsupervised Learning 79 Problems 79 4 Dimension Reduction 81 4.1 Introduction 81 4.2 Curse of Dimensionality 82 4.3 Practical Considerations 82 Example 1: House Prices in Boston 82 4.4 Data Summaries 83 Summary Statistics 83 Tabulating Data (Pivot Tables) 85 4.5 Correlation Analysis 87 4.6 Reducing the Number of Categories in Categorical Variables 87 4.7 Converting a Categorical Variable to a Continuous Variable 90 4.8 Principal Components Analysis 90 Example 2: Breakfast Cereals 91 Principal Components 95 Normalizing the Data 97 Using Principal Components for Classification and Prediction 100 4.9 Dimension Reduction Using Regression Models 100 4.10 Dimension Reduction Using Classification and Regression Trees 100 Problems 101 PART III PERFORMANCE EVALUATION 5 Evaluating Predictive Performance 105 5.1 Introduction 105 5.2 Evaluating Predictive Performance 106 Benchmark: The Average 106 Prediction Accuracy Measures 107 Comparing Training and Validation Performance 108 5.3 Judging Classifier Performance 109 Benchmark: The Naive Rule 109 Class Separation 109 The Classification Matrix 109 Using the Validation Data 111 Accuracy Measures 111 Propensities and Cutoff for Classification 112 Cutoff Values for Triage 112 Changing the Cutoff Values for a Confussion Matrix in JMP 114 Performance in Unequal Importance of Classes 115 False-Positive and False-Negative Rates 116 Asymmetric Misclassification Costs 116 Asymmetric Misclassification Costs in JMP 119 Generalization to More Than Two Classes 120 5.4 Judging Ranking Performance 120 Lift Curves 120 Beyond Two Classes 122 Lift Curves Incorporating Costs and Benefits 122 5.5 Oversampling 123 Oversampling the Training Set 126 Stratified Sampling and Oversampling in JMP 126 Evaluating Model Performance Using a Nonoversampled Validation Set 126 Evaluating Model Performance If Only Oversampled Validation Set Exists 127 Applying Sampling Weights in JMP 128 Problems 129 PART IV PREDICTION AND CLASSIFICATION METHODS 6 Multiple Linear Regression 133 6.1 Introduction 133 6.2 Explanatory versus Predictive Modeling 134 6.3 Estimating the Regression Equation and Prediction 135 Example: Predicting the Price of Used Toyota Corolla Automobiles 136 Coding of Categorical Variables in Regression 138 Additional Options for Regression Models in JMP 140 6.4 Variable Selection in Linear Regression 141 Reducing the Number of Predictors 141 How to Reduce the Number of Predictors 142 Manual Variable Selection 142 Automated Variable Selection 142 Coding of Categorical Variables in Stepwise Regression 143 Working with the All Possible Models Output 145 When Using a Stopping Algorithm in JMP 147 Other Regression Procedures in JMP Pro-Generalized Regression 149 Problems 150 7 k-Nearest Neighbors (k-NN) 155 7.1 The 𝑘-NN Classifier (Categorical Outcome) 155 Determining Neighbors 155 Classification Rule 156 Example: Riding Mowers 156 Choosing 𝑘 157 𝑘 Nearest Neighbors in JMP Pro 158 The Cutoff Value for Classification 159 𝑘-NN Predictions and Prediction Formulas in JMP Pro 161 𝑘-NN with More Than Two Classes 161 7.2 𝑘-NN for a Numerical Response 161 Pandora 161 7.3 Advantages and Shortcomings of 𝑘-NN Algorithms 163 Problems 164 8 The Naive Bayes Classifier 167 8.1 Introduction 167 Naive Bayes Method 167 Cutoff Probability Method 168 Conditional Probability 168 Example 1: Predicting Fraudulent Financial Reporting 168 8.2 Applying the Full (Exact) Bayesian Classifier 169 Using the ''Assign to the Most Probable Class'' Method 169 Using the Cutoff Probability Method 169 Practical Difficulty with the Complete (Exact) Bayes Procedure 170 Solution: Naive Bayes 170 Example 2: Predicting Fraudulent Financial Reports, Two Predictors 172 Using the JMP Naive Bayes Add-in 174 Example 3: Predicting Delayed Flights 174 8.3 Advantages and Shortcomings of the Naive Bayes Classifier 179 Spam Filtering 179 Problems 180 9 Classification and Regression Trees 183 9.1 Introduction 183 9.2 Classification Trees 184 Recursive Partitioning 184 Example 1: Riding Mowers 185 Categorical Predictors 186 9.3 Growing a Tree 187 Growing a Tree Example 187 Classifying a New Observation 188 Fitting Classification Trees in JMP Pro 191 Growing a Tree with CART 192 9.4 Evaluating the Performance of a Classification Tree 192 Example 2: Acceptance of Personal Loan 192 9.5 Avoiding Overfitting 193 Stopping Tree Growth: CHAID 194 Growing a Full Tree and Pruning It Back 194 How JMP Limits Tree Size 196 9.6 Classification Rules from Trees 196 9.7 Classification Trees for More Than Two Classes 198 9.8 Regression Trees 199 Prediction 199 Evaluating Performance 200 9.9 Advantages and Weaknesses of a Tree 200 9.10 Improving Prediction: Multiple Trees 204 Fitting Ensemble Tree Models in JMP Pro 206 9.11 CART and Measures of Impurity 207 Problems 207 10 Logistic Regression 211 10.1 Introduction 211 Logistic Regression and Consumer Choice Theory 212 10.2 The Logistic Regression Model 213 Example: Acceptance of Personal Loan (Universal Bank) 214 Indicator (Dummy) Variables in JMP 216 Model with a Single Predictor 216 Fitting One Predictor Logistic Models in JMP 218 Estimating the Logistic Model from Data: Multiple Predictors 218 Fitting Logistic Models in JMP with More Than One Predictor 221 10.3 Evaluating Classification Performance 221 Variable Selection 222 10.4 Example of Complete Analysis: Predicting Delayed Flights 223 Data Preprocessing 225 Model Fitting, Estimation and Interpretation---A Simple Model 226 Model Fitting, Estimation and Interpretation---The Full Model 227 Model Performance 229 Variable Selection 230 Regrouping and Recoding Variables in JMP 232 10.5 Appendixes: Logistic Regression for Profiling 234 Appendix A: Why Linear Regression Is Problematic for a Categorical Response 234 Appendix B: Evaluating Explanatory Power 236 Appendix C: Logistic Regression for More Than Two Classes 238 Nominal Classes 238 Problems 241 11 Neural Nets 245 11.1 Introduction 245 11.2 Concept and Structure of a Neural Network 246 11.3 Fitting a Network to Data 246 Example 1: Tiny Dataset 246 Computing Output of Nodes 248 Preprocessing the Data 251 Activation Functions and Data Processing Features in JMP Pro 251 Training the Model 251 Fitting a Neural Network in JMP Pro 254 Using the Output for Prediction and Classification 256 Example 2: Classifying Accident Severity 258 Avoiding overfitting 259 11.4 User Input in JMP Pro 260 Unsupervised Feature Extraction and Deep Learning 263 11.5 Exploring the Relationship between Predictors and Response 264 Understanding Neural Models in JMP Pro 264 11.6 Advantages and Weaknesses of Neural Networks 264 Problems 265 12 Discriminant Analysis 268 12.1 Introduction 268 Example 1: Riding Mowers 269 Example 2: Personal Loan Acceptance (Universal Bank) 269 12.2 Distance of an Observation from a Class 270 12.3 From Distances to Propensities and Classifications 272 Linear Discriminant Analysis in JMP 275 12.4 Classification Performance of Discriminant Analysis 275 12.5 Prior Probabilities 277 12.6 Classifying More Than Two Classes 278 Example 3: Medical Dispatch to Accident Scenes 278 Using Categorical Predictors in Discriminant Analysis in JMP 279 12.7 Advantages and Weaknesses 280 Problems 282 13 Combining Methods: Ensembles and Uplift Modeling 285 13.1 Ensembles 285 Why Ensembles Can Improve Predictive Power 286 The Wisdom of Crowds 287 Simple Averaging 287 Bagging 288 Boosting 288 Creating Ensemble Models in JMP Pro 289 Advantages and Weaknesses of Ensembles 289 13.2 Uplift (Persuasion) Modeling 290 A-B Testing 290 Uplift 290 Gathering the Data 291 A Simple Model 292 Modeling Individual Uplift 293 Using the Results of an Uplift Model 294 Creating Uplift Models in JMP Pro 294 Using the Uplift Platform in JMP Pro 295 13.3 Summary 295 Problems 297 PART V MINING RELATIONSHIPS AMONG RECORDS 14 Cluster Analysis 301 14.1 Introduction 301 Example: Public Utilities 302 14.2 Measuring Distance between Two Observations 305 Euclidean Distance 305 Normalizing Numerical Measurements 305 Other Distance Measures for Numerical Data 306 Distance Measures for Categorical Data 308 Distance Measures for Mixed Data 308 14.3 Measuring Distance between Two Clusters 309 Minimum Distance 309 Maximum Distance 309 Average Distance 309 Centroid Distance 309 14.4 Hierarchical (Agglomerative) Clustering 311 Hierarchical Clustering in JMP and JMP Pro 311 Hierarchical Agglomerative Clustering Algorithm 312 Single Linkage 312 Complete Linkage 313 Average Linkage 313 Centroid Linkage 313 Ward's Method 314 Dendrograms: Displaying Clustering Process and Results 314 Validating Clusters 316 Two-Way Clustering 318 Limitations of Hierarchical Clustering 319 14.5 Nonhierarchical Clustering: The 𝑘-Means Algorithm 320 𝑘-Means Clustering Algorithm 321 Initial Partition into 𝐾 Clusters 322 𝐾-Means Clustering in JMP 322 Problems 329 PART VI FORECASTING TIME SERIES 15 Handling Time Series 335 15.1 Introduction 335 15.2 Descriptive versus Predictive Modeling 336 15.3 Popular Forecasting Methods in Business 337 Combining Methods 337 15.4 Time Series Components 337 Example: Ridership on Amtrak Trains 337 15.5 Data Partitioning and Performance Evaluation 341 Benchmark Performance: Naive Forecasts 342 Generating Future Forecasts 342 Partitioning Time Series Data in JMP and Validating Time Series Models 342 Problems 343 16 Regression-Based Forecasting 346 16.1 A Model with Trend 346 Linear Trend 346 Fitting a Model with Linear Trend in JMP 348 Creating Actual versus Predicted Plots and Residual Plots in JMP 350 Exponential Trend 350 Computing Forecast Errors for Exponential Trend Models 352 Polynomial Trend 352 Fitting a Polynomial Trend in JMP 353 16.2 A Model with Seasonality 353 16.3 A Model with Trend and Seasonality 356 16.4 Autocorrelation and ARIMA Models 356 Computing Autocorrelation 356 Improving Forecasts by Integrating Autocorrelation Information 360 Fitting AR (Autoregression) Models in the JMP Time Series Platform 361 Fitting AR Models to Residuals 361 Evaluating Predictability 363 Summary: Fitting Regression-Based Time Series Models in JMP 365 Problems 366 17 Smoothing Methods 377 17.1 Introduction 377 17.2 Moving Average 378 Centered Moving Average for Visualization 378 Trailing Moving Average for Forecasting 379 Computing a Trailing Moving Average Forecast in JMP 380 Choosing Window Width (𝑤) 382 17.3 Simple Exponential Smoothing 382 Choosing Smoothing Parameter 𝛼 383 Fitting Simple Exponential Smoothing Models in JMP 384 Creating Plots for Actual versus Forecasted Series and Residuals Series Using the Graph Builder 386 Relation between Moving Average and Simple Exponential Smoothing 386 17.4 Advanced Exponential Smoothing 387 Series with a Trend 387 Series with a Trend and Seasonality 388 Problems 390 PART VII CASES 18 Cases 402 18.1 Charles Book Club 401 The Book Industry 401 Database Marketing at Charles 402 Data Mining Techniques 403 Assignment 405 18.2 German Credit 409 Background 409 Data 409 Assignment 409 18.3 Tayko Software Cataloger 410 Background 410 The Mailing Experiment 413 Data 413 Assignment 413 18.4 Political Persuasion 415 Background 415 Predictive Analytics Arrives in US Politics 415 Political Targeting 416 Uplift 416 Data 417 Assignment 417 18.5 Taxi Cancellations 419 Business Situation 419 Assignment 419 18.6 Segmenting Consumers of Bath Soap 420 Business Situation 420 Key Problems 421 Data 421 Measuring Brand Loyalty 421 Assignment 421 18.7 Direct-Mail Fundraising 423 Background 423 Data 424 Assignment 425 18.8 Predicting Bankruptcy 425 Predicting Corporate Bankruptcy 426 Assignment 428 18.9 Time Series Case: Forecasting Public Transportation Demand 428 Background 428 Problem Description 428 Available Data 428 Assignment Goal 429 Assignment 429 Tips and Suggested Steps 429 References 431 Data Files Used in the Book 433 Index 435

「Nielsen BookData」 より

詳細情報

ページトップへ