Bibliographic Information

R for everyone : advanced analytics and graphics

Jared P. Lander

(Addison Wesley data & analytics series)

Addison-Wesley, c2014

Available at  / 11 libraries

Search this Book/Journal

Note

Includes bibliographical references and indexes

Description and Table of Contents

Description

Statistical Computation for Programmers, Scientists, Quants, Excel Users, and Other Professionals Using the open source R language, you can build powerful statistical models to answer many of your most challenging questions. R has traditionally been difficult for non-statisticians to learn, and most R books assume far too much knowledge to be of help. R for Everyone is the solution. Drawing on his unsurpassed experience teaching new users, professional data scientist Jared P. Lander has written the perfect tutorial for anyone new to statistical programming and modeling. Organized to make learning easy and intuitive, this guide focuses on the 20 percent of R functionality you'll need to accomplish 80 percent of modern data tasks. Lander's self-contained chapters start with the absolute basics, offering extensive hands-on practice and sample code. You'll download and install R; navigate and use the R environment; master basic program control, data import, and manipulation; and walk through several essential tests. Then, building on this foundation, you'll construct several complete models, both linear and nonlinear, and use some data mining techniques. By the time you're done, you won't just know how to write R programs, you'll be ready to tackle the statistical problems you care about most. COVERAGE INCLUDES * Exploring R, RStudio, and R packages * Using R for math: variable types, vectors, calling functions, and more * Exploiting data structures, including data.frames, matrices, and lists * Creating attractive, intuitive statistical graphics * Writing user-defined functions * Controlling program flow with if, ifelse, and complex checks * Improving program efficiency with group manipulations * Combining and reshaping multiple datasets * Manipulating strings using R's facilities and regular expressions * Creating normal, binomial, and Poisson probability distributions * Programming basic statistics: mean, standard deviation, and t-tests * Building linear, generalized linear, and nonlinear models * Assessing the quality of models and variable selection * Preventing overfitting, using the Elastic Net and Bayesian methods * Analyzing univariate and multivariate time series data * Grouping data via K-means and hierarchical clustering * Preparing reports, slideshows, and web pages with knitr * Building reusable R packages with devtools and Rcpp * Getting involved with the R global community

Table of Contents

Foreword xiii Preface xv Acknowledgments xix About the Author xxi Chapter 1: Getting R 11.1 Downloading R 1 1.2 R Version 2 1.3 32-bit vs. 64-bit 2 1.4 Installing 2 1.5 Revolution R Community Edition 10 1.6 Conclusion 11 Chapter 2: The R Environment 13 2.1 Command Line Interface 14 2.2 RStudio 15 2.3 Revolution Analytics RPE 26 2.4 Conclusion 27 Chapter 3: R Packages 29 3.1 Installing Packages 29 3.2 Loading Packages 32 3.3 Building a Package 33 3.4 Conclusion 33 Chapter 4: Basics of R 35 4.1 Basic Math 35 4.2 Variables 36 4.3 Data Types 38 4.4 Vectors 43 4.5 Calling Functions 49 4.6 Function Documentation 49 4.7 Missing Data 50 4.8 Conclusion 51 Chapter 5: Advanced Data Structures 53 5.1 data.frames 53 5.2 Lists 61 5.3 Matrices 68 5.4 Arrays 71 5.5 Conclusion 72 Chapter 6: Reading Data into R 73 6.1 Reading CSVs 73 6.2 Excel Data 74 6.3 Reading from Databases 75 6.4 Data from Other Statistical Tools 77 6.5 R Binary Files 77 6.6 Data Included with R 79 6.7 Extract Data from Web Sites 80 6.8 Conclusion 81 Chapter 7: Statistical Graphics 83 7.1 Base Graphics 83 7.2 ggplot2 86 7.3 Conclusion 98 Chapter 8: Writing R Functions 99 8.1 Hello, World! 99 8.2 Function Arguments 100 8.3 Return Values 103 8.4 do.call 104 8.5 Conclusion 104 Chapter 9: Control Statements 105 9.1 if and else 105 9.2 switch 108 9.3 ifelse 109 9.4 Compound Tests 111 9.5 Conclusion 112 Chapter 10: Loops, the Un-R Way to Iterate 113 10.1 for Loops 113 10.2 while Loops 115 10.3 Controlling Loops 115 10.4 Conclusion 116 Chapter 11: Group Manipulation 117 11.1 Apply Family 117 11.2 aggregate 120 11.3 plyr 124 11.4 data.table 129 11.5 Conclusion 139 Chapter 12: Data Reshaping 141 12.1 cbind and rbind 141 12.2 Joins 142 12.3 reshape2 149 12.4 Conclusion 153 Chapter 13: Manipulating Strings 155 13.1 paste 155 13.2 sprintf 156 13.3 Extracting Text 157 13.4 Regular Expressions 161 13.5 Conclusion 169 Chapter 14: Probability Distributions 171 14.1 Normal Distribution 171 14.2 Binomial Distribution 176 14.3 Poisson Distribution 182 14.4 Other Distributions 185 14.5 Conclusion 186 Chapter 15: Basic Statistics 187 15.1 Summary Statistics 187 15.2 Correlation and Covariance 191 15.3 T-Tests 200 15.4 ANOVA 207 15.5 Conclusion 210 Chapter 16: Linear Models 211 16.1 Simple Linear Regression 211 16.2 Multiple Regression 216 16.3 Conclusion 232 Chapter 17: Generalized Linear Models 233 17.1 Logistic Regression 233 17.2 Poisson Regression 237 17.3 Other Generalized Linear Models 240 17.4 Survival Analysis 240 17.5 Conclusion 245 Chapter 18: Model Diagnostics 247 18.1 Residuals 247 18.2 Comparing Models 253 18.3 Cross-Validation 257 18.4 Bootstrap 262 18.5 Stepwise Variable Selection 265 18.6 Conclusion 269 Chapter 19: Regularization and Shrinkage 271 19.1 Elastic Net 271 19.2 Bayesian Shrinkage 290 19.3 Conclusion 295 Chapter 20: Nonlinear Models 297 20.1 Nonlinear Least Squares 297 20.2 Splines 300 20.3 Generalized Additive Models 304 20.4 Decision Trees 310 20.5 Random Forests 312 20.6 Conclusion 313 Chapter 21: Time Series and Autocorrelation 315 21.1 Autoregressive Moving Average 315 21.2 VAR 322 21.3 GARCH 327 21.4 Conclusion 336 Chapter 22: Clustering 337 22.1 K-means 337 22.2 PAM 345 22.3 Hierarchical Clustering 352 22.4 Conclusion 357 Chapter 23: Reproducibility, Reports and Slide Shows with knitr 359 23.1 Installing a LATEX Program 359 23.2 LATEX Primer 360 23.3 Using knitr with LATEX 362 23.4 Markdown Tips 367 23.5 Using knitr and Markdown 368 23.6 pandoc 369 23.7 Conclusion 371 Chapter 24: Building R Packages 373 24.1 Folder Structure 373 24.2 Package Files 373 24.3 Package Documentation 380 24.4 Checking, Building and Installing 383 24.5 Submitting to CRAN 384 24.6 C++ Code 384 24.7 Conclusion 390 Appendix A: Real-Life Resources 391 A.1 Meetups 391 A.2 Stackoverflow 392 A.3 Twitter 393 A.4 Conferences 393 A.5 Web Sites 393 A.6 Documents 394 A.7 Books 394 A.8 Conclusion 394 Appendix B: Glossary 395 List of Figures 409 List of Tables 417 General Index 419 Index of Functions 429 Index of Packages 433 Index of People 435 Data Index 437

by "Nielsen BookData"

Related Books: 1-1 of 1

Details

Page Top