CiNii Books - Rank-based methods for shrinkage and selection : with application to machine learning

Rank-Based Methods for Shrinkage and Selection A practical and hands-on guide to the theory and methodology of statistical estimation based on rank Robust statistics is an important field in contemporary mathematics and applied statistical methods. Rank-Based Methods for Shrinkage and Selection: With Application to Machine Learning describes techniques to produce higher quality data analysis in shrinkage and subset selection to obtain parsimonious models with outlier-free prediction. This book is intended for statisticians, economists, biostatisticians, data scientists and graduate students. Rank-Based Methods for Shrinkage and Selection elaborates on rank-based theory and application in machine learning to robustify the least squares methodology. It also includes: Development of rank theory and application of shrinkage and selection Methodology for robust data science using penalized rank estimators Theory and methods of penalized rank dispersion for ridge, LASSO and Enet Topics include Liu regression, high-dimension, and AR(p) Novel rank-based logistic regression and neural networks Problem sets include R code to demonstrate its use in machine learning

Table of Contents

1 Introduction to Rank-based Regression 1 1.1 Introduction 1 1.2 Robustness of the Median 1 1.2.1 Mean vs. Median 1 1.2.2 Breakdown Point 4 1.2.3 Order and Rank Statistics 5 1.3 Simple Linear Regression 6 1.3.1 Least Squares Estimator (LSE) 6 1.3.2 Theil's Estimator 7 1.3.3 Belgium Telephone Data Set 7 1.3.4 Estimation and Standard Error Comparison 9 1.4 Outliers and their Detection 11 1.4.1 Outlier Detection 12 1.5 Motivation for Rank-based Methods 13 1.5.1 Effect of a Single Outlier 13 1.5.2 Using Rank for the Location Model 16 1.5.3 Using Rank for the Slope 19 1.6 The Rank Dispersion Function 20 1.6.1 Ranking and Scoring Details 23 1.6.2 Detailed Procedure for R-estimation 25 1.7 Shrinkage Estimation and Subset Selection 30 1.7.1 Multiple Linear Regression using Rank 30 1.7.2 Penalty Functions 32 1.7.3 Shrinkage Estimation 34 1.7.4 Subset Selection 36 1.7.5 Blended Approaches 39 1.8 Summary 39 1.9 Problems 41 2 Characteristics of Rank-based Penalty Estimators 47 2.1 Introduction 47 2.2 Motivation for Penalty Estimators 47 2.3 Multivariate Linear Regression 49 2.3.1 Multivariate Least Squares Estimation 49 2.3.2 Multivariate R-estimation 51 2.3.3 Multicollinearity 51 2.4 Ridge Regression 53 2.4.1 Ridge Applied to Least Squares Estimation 53 2.4.2 Ridge Applied to Rank Estimation 55 2.5 Example: Swiss Fertility Data Set 56 2.5.1 Estimation and Standard Errors 59 2.5.2 Parameter Variance using Bootstrap 60 2.5.3 Reducing Variance using Ridge 61 2.5.4 Ridge Traces 62 2.6 Selection of Ridge Parameter 𝜆2 65 2.6.1 Quadratic Risk 65 2.6.2 K-fold Cross-validation Scheme 68 2.7 LASSO and aLASSO 71 2.7.1 Subset Selection 71 2.7.2 Least Squares with LASSO 71 2.7.3 The Adaptive LASSO and its Geometric Interpretation 73 2.7.4 R-estimation with LASSO and aLASSO 77 2.7.5 Oracle Properties 78 2.8 Elastic Net (Enet) 82 2.8.1 Naive Enet 82 2.8.2 Standard Enet 83 2.8.3 Enet in Machine Learning 84 2.9 Example: Diabetes Data Set 85 2.9.1 Model Building with R-aEnet 85 2.9.2 MSE vs. MAE 88 2.9.3 Model Building with LS-Enet 91 2.10 Summary 94 2.11 Problems 95 3 Location and Simple Linear Models 101 3.1 Introduction 101 3.2 Location Estimators and Testing 104 3.2.1 Unrestricted R-estimator of 𝜃 104 3.2.2 Restricted R-estimator of 𝜃 107 3.3 Shrinkage R-estimators of Location 108 3.3.1 Overview of Shrinkage R-estimators of 𝜃 108 3.3.2 Derivation of the Ridge-type R-estimator 113 3.3.3 Derivation of the LASSO-type R-estimator 114 3.3.4 General Shrinkage R-estimators of 𝜃 114 3.4 Ridge-type R-estimator of 𝜃 117 3.5 Preliminary Test R-estimator of 𝜃 118 3.5.1 Optimum Level of Significance of PTRE 121 3.6 Saleh-type R-estimators 122 3.6.1 Hard-Threshold R-estimator of 𝜃 122 3.6.2 Saleh-type R-estimator of 𝜃 123 3.6.3 Positive-rule Saleh-type (LASSO-type) R-estimator of 𝜃 125 3.6.4 Elastic Net-type R-estimator of 𝜃 127 3.7 Comparative Study of the R-estimators of Location 129 3.8 Simple Linear Model 132 3.8.1 Restricted R-estimator of Slope 134 3.8.2 Shrinkage R-estimator of Slope 135 3.8.3 Ridge-type R-estimation of Slope 135 3.8.4 Hard-Threshold R-estimator of Slope 136 3.8.5 Saleh-type R-estimator of Slope 137 3.8.6 Positive-rule Saleh-type (LASSO-type) R-estimator of Slope 138 3.8.7 The Adaptive LASSO (aLASSO-type) R-estimator 138 3.8.8 nEnet-type R-estimator of Slope 139 3.8.9 Comparative Study of R-estimators of Slope 140 3.9 Summary 141 3.10 Problems 142 4 Analysis of Variance (ANOVA) 149 4.1 Introduction 149 4.2 Model, Estimation and Tests 149 4.3 Overview of Multiple Location Models 150 4.3.1 Example: Corn Fertilizers 151 4.3.2 One-way ANOVA 151 4.3.3 Effect of Variance on Shrinkage Estimators 153 4.3.4 Shrinkage Estimators for Multiple Location 156 4.4 Unrestricted R-estimator 158 4.5 Test of Significance 161 4.6 Restricted R-estimator 162 4.7 Shrinkage Estimators 163 4.7.1 Preliminary Test R-estimator 163 4.7.2 The Stein-Saleh-type R-estimator 164 4.7.3 The Positive-rule Stein-Saleh-type R-estimator 165 4.7.4 The Ridge-type R-estimator 167 4.8 Subset Selection Penalty R-estimators 169 4.8.1 Preliminary Test Subset Selector R-estimator 169 4.8.2 Saleh-type R-estimator 170 4.8.3 Positive-rule Saleh Subset Selector (PRSS) 171 4.8.4 The Adaptive LASSO (aLASSO) 173 4.8.5 Elastic-net-type R-estimator 177 4.9 Comparison of the R-estimators 178 4.9.1 Comparison of URE and RRE 179 4.9.2 Comparison of URE and Stein-Saleh-type R-estimators 179 4.9.3 Comparison of URE and Ridge-type R-estimators 179 4.9.4 Comparison of URE and PTSSRE 180 4.9.5 Comparison of LASSO-type and Ridge-type R-estimators 180 4.9.6 Comparison of URE, RRE and LASSO 181 4.9.7 Comparison of LASSO with PTRE 181 4.9.8 Comparison of LASSO with SSRE 182 4.9.9 Comparison of LASSO with PRSSRE 182 4.9.10 Comparison of nEnetRE with URE 183 4.9.11 Comparison of nEnetRE with RRE 183 4.9.12 Comparison of nEnetRE with HTRE 183 4.9.13 Comparison of nEnetRE with SSRE 184 4.9.14 Comparison of Ridge-type vs. nEnetRE 184 4.10 Summary 185 4.11 Problems 185 5 Seemingly Unrelated Simple Linear Models 191 5.1 Introduction 191 5.1.1 Problem Formulation 193 5.2 Signed and Signed Rank Estimators of Parameters 194 5.2.1 General Shrinkage R-estimator of 𝛽 198 5.2.2 Ridge-type R-estimator of 𝛽 199 5.2.3 Preliminary Test R-estimator of 𝛽 201 5.3 Stein-Saleh-type R-estimator of 𝛽 202 5.3.1 Positive-rule Stein-Saleh R-estimators of 𝛽 202 5.4 Saleh-type R-estimator of 𝛽 203 5.4.1 LASSO-type R-estimator of the 𝛽 205 5.5 Elastic-net-type R-estimators 206 5.6 R-estimator of Intercept When Slope Has Sparse Subset 207 5.6.1 General Shrinkage R-estimator of Intercept 207 5.6.2 Ridge-type R-estimator of 𝜃 209 5.6.3 Preliminary Test R-estimators of 𝜃 209 5.7 Stein-Saleh-type R-estimator of 𝜃 210 5.7.1 Positive-rule Stein-Saleh-type R-estimator of 𝜃 211 5.7.2 LASSO-type R-estimator of 𝜃 213 5.8 Summary 213 5.8.1 Problems 214 6 Multiple Linear Regression Models 215 6.1 Introduction 215 6.2 Multiple Linear Model and R-estimation 215 6.3 Model Sparsity and Detection 218 6.4 General Shrinkage R-estimator of 𝛽 221 6.4.1 Preliminary Test R-estimator 222 6.4.2 Stein-Saleh-type R-estimator 224 6.4.3 Positive-rule Stein-Saleh-type R-estimator 225 6.5 Subset Selectors 226 6.5.1 Preliminary Test Subset Selector R-estimator 226 6.5.2 Stein-Saleh-type R-estimator 228 6.5.3 Positive-rule Stein-Saleh-type R-estimator (LASSO-type) 229 6.5.4 Ridge-type Subset Selector 231 6.5.5 Elastic Net-type R-estimator 231 6.6 Adaptive LASSO 232 6.6.1 Introduction 232 6.6.2 Asymptotics for LASSO-type R-estimator 233 6.6.3 Oracle Property of aLASSO 235 6.7 Summary 238 6.8 Problems 239 7 Partially Linear Multiple Regression Model 241 7.1 Introduction 241 7.2 Rank Estimation in the PLM 242 7.2.1 Penalty R-estimators 246 7.2.2 Preliminary Test and Stein-Saleh-type R-estimator 248 7.3 ADB and ADL2-risk 249 7.4 ADL2-risk Comparisons 253 7.5 Summary: L2-risk Efficiencies 260 7.6 Problems 262 8 Liu Regression Models 263 8.1 Introduction 263 8.2 Linear Unified (Liu) Estimator 263 8.2.1 Liu-type R-estimator 266 8.3 Shrinkage Liu-type R-estimators 268 8.4 Asymptotic Distributional Risk 269 8.5 Asymptotic Distributional Risk Comparisons 271 8.5.1 Comparison of SSLRE and PTLRE 272 8.5.2 Comparison of PRSLRE and PTLRE 274 8.5.3 Comparison of PRLRE and SSLRE 276 8.5.4 Comparison of Liu-Type Rank Estimators With Counterparts 277 8.6 Estimation of d 279 8.7 Diabetes Data Analysis 280 8.7.1 Penalty Estimators 281 8.7.2 Performance Analysis 284 8.8 Summary 288 8.9 Problems 288 9 Autoregressive Models 291 9.1 Introduction 291 9.2 R-estimation of 𝜌 for the AR(𝑝)-Model 292 9.3 LASSO, Ridge, Preliminary Test and Stein-Saleh-type R-estimators 294 9.4 Asymptotic Distributional L2-risk 296 9.5 Asymptotic Distributional L2-risk Analysis 299 9.5.1 Comparison of Unrestricted vs. Restricted R-estimators 300 9.5.2 Comparison of Unrestricted vs. Preliminary Test R-estimator 300 9.5.3 Comparison of Unrestricted vs. Stein-Saleh-type R-estimators 300 9.5.4 Comparison of the Preliminary Test vs. Stein-Saleh-type R-estimators 302 9.6 Summary 303 9.7 Problems 304 10 High-Dimensional Models 307 10.1 Introduction 307 10.2 Identifiability of 𝛽 and Projection 309 10.3 Parsimonious Model Selection 309 10.4 Some Notation and Separation 311 10.4.1 Special Matrices 311 10.4.2 Steps Towards Estimators 312 10.4.3 Post-selection Ridge Estimation of 𝛽 𝒮1 and 𝜷 𝒮2 312 10.4.4 Post-selection Ridge R-estimators for 𝛽 𝒮1 and 𝜷 𝒮2 313 10.5 Post-selection Shrinkage R-estimators 315 10.6 Asymptotic Properties of the Ridge R-estimators 316 10.7 Asymptotic Distributional L2-Risk Properties 321 10.8 Asymptotic Distributional Risk Efficiency 324 10.9 Summary 326 10.10 Problems 327 11 Rank-based Logistic Regression 329 11.1 Introduction 329 11.2 Data Science and Machine Learning 329 11.2.1 What is Robust Data Science? 329 11.2.2 What is Robust Machine Learning? 332 11.3 Logistic Regression 333 11.3.1 Log-likelihood Setup 334 11.3.2 Motivation for Rank-based Logistic Methods 338 11.3.3 Nonlinear Dispersion Function 341 11.4 Application to Machine Learning 342 11.4.1 Example: Motor Trend Cars 344 11.5 Penalized Logistic Regression 347 11.5.1 Log-likelihood Expressions 347 11.5.2 Rank-based Expressions 348 11.5.3 Support Vector Machines 349 11.5.4 Example: Circular Data 353 11.6 Example: Titanic Data Set 359 11.6.1 Exploratory Data Analysis 359 11.6.2 RLR vs. LLR vs. SVM 365 11.6.3 Shrinkage and Selection 367 11.7 Summary 370 11.8 Problems 371 12 Rank-based Neural Networks 377 12.1 Introduction 377 12.2 Set-up for Neural Networks 379 12.3 Implementing Neural Networks 381 12.3.1 Basic Computational Unit 382 12.3.2 Activation Functions 382 12.3.3 Four-layer Neural Network 384 12.4 Gradient Descent with Momentum 386 12.4.1 Gradient Descent 386 12.4.2 Momentum 388 12.5 Back Propagation Example 389 12.5.1 Forward Propagation 390 12.5.2 Back Propagation 392 12.5.3 Dispersion Function Gradients 394 12.5.4 RNN Algorithm 395 12.6 Accuracy Metrics 396 12.7 Example: Circular Data Set 400 12.8 Image Recognition: Cats vs. Dogs 405 12.8.1 Binary Image Classification 406 12.8.2 Image Preparation 406 12.8.3 Over-fitting and Under-fitting 409 12.8.4 Comparison of LNN vs. RNN 410 12.9 Image Recognition: MNIST Data Set 414 12.10 Summary 421 12.11 Problems 421 Bibliography 433 Author Index 443 Subject Index445

by "Nielsen BookData"

Rank-based methods for shrinkage and selection : with application to machine learning

Author(s)

Bibliographic Information

Available at 1 libraries

Search this Book/Journal

Note

Description and Table of Contents

Details

Export