R for everyone : advanced analytics and graphics

書誌事項

R for everyone : advanced analytics and graphics

Jared P. Lander

(Addison Wesley data & analytics series)

Addison-Wesley, c2017

2nd ed.

大学図書館所蔵 件 / 14

この図書・雑誌をさがす

注記

Includes bibliographical references and indexes

内容説明・目次

内容説明

Using the open source R language, you can build powerful statistical models to answer many of your most challenging questions. R has traditionally been difficult for non-statisticians to learn, and most R books assume far too much knowledge to be of help. R for Everyone is the solution. Drawing on his unsurpassed experience teaching new users, professional data scientist Jared P. Lander has written the perfect tutorial for anyone new to statistical programming and modeling. Organized to make learning easy and intuitive, this guide focuses on the 20 percent of R functionality you'll need to accomplish 80 percent of modern data tasks. Lander's self-contained chapters start with the absolute basics, offering extensive hands-on practice and sample code. You'll download and install R; navigate and use the R environment; master basic program control, data import, and manipulation; and walk through several essential tests. Then, building on this foundation, you'll construct several complete models, both linear and nonlinear, and use some data mining techniques. By the time you're done, you won't just know how to write R programs, you'll be ready to tackle the statistical problems you care about most. Coverage Includes: Exploring R, RStudio, and R packages Using R for math: variable types, vectors, calling functions, and more Exploiting data structures, including data.frames, matrices, and lists Creating attractive, intuitive statistical graphics Writing user-defined functions Controlling program flow with if, ifelse, and complex checks Improving program efficiency with group manipulations Combining and reshaping multiple datasets Manipulating strings using R's facilities and regular expressions Creating normal, binomial, and Poisson probability distributions Programming basic statistics: mean, standard deviation, and t-tests Building linear, generalized linear, and nonlinear models Assessing the quality of models and variable selection Preventing overfitting, using the Elastic Net and Bayesian methods Analyzing univariate and multivariate time series data Grouping data via K-means and hierarchical clustering Preparing reports, slideshows, and web pages with knitr Building reusable R packages with devtools and Rcpp Getting involved with the R global community

目次

Foreword xv Preface xvii Acknowledgments xxi About the Author xxv Chapter 1: Getting R 1 1.1 Downloading R 1 1.2 R Version 2 1.3 32-bit vs. 64-bit 2 1.4 Installing 2 1.5 Microsoft R Open 14 1.6 Conclusion 14 Chapter 2: The R Environment 15 2.1 Command Line Interface 16 2.2 RStudio 17 2.3 Microsoft Visual Studio 31 2.4 Conclusion 31 Chapter 3: R Packages 33 3.1 Installing Packages 33 3.2 Loading Packages 36 3.3 Building a Package 37 3.4 Conclusion 37 Chapter 4: Basics of R 39 4.1 Basic Math 39 4.2 Variables 40 4.3 Data Types 42 4.4 Vectors 47 4.5 Calling Functions 52 4.6 Function Documentation 52 4.7 Missing Data 53 4.8 Pipes 54 4.9 Conclusion 55 Chapter 5: Advanced Data Structures 57 5.1 data.frames 57 5.2 Lists 64 5.3 Matrices 70 5.4 Arrays 73 5.5 Conclusion 74 Chapter 6: Reading Data into R 75 6.1 Reading CSVs 75 6.2 Excel Data 79 6.3 Reading from Databases 81 6.4 Data from Other Statistical Tools 84 6.5 R Binary Files 85 6.6 Data Included with R 87 6.7 Extract Data from Web Sites 88 6.8 Reading JSON Data 90 6.9 Conclusion 92 Chapter 7: Statistical Graphics 93 7.1 Base Graphics 93 7.2 ggplot2 96 7.3 Conclusion 110 Chapter 8: Writing R functions 111 8.1 Hello, World! 111 8.2 Function Arguments 112 8.3 Return Values 114 8.4 do.call 115 8.5 Conclusion 116 Chapter 9: Control Statements 117 9.1 if and else 117 9.2 switch 120 9.3 ifelse 121 9.4 Compound Tests 123 9.5 Conclusion 123 Chapter 10: Loops, the Un-R Way to Iterate 125 10.1 for Loops 125 10.2 while Loops 127 10.3 Controlling Loops 127 10.4 Conclusion 128 Chapter 11: Group Manipulation 129 11.1 Apply Family 129 11.2 aggregate 132 11.3 plyr 136 11.4 data.table 140 11.5 Conclusion 150 Chapter 12: Faster Group Manipulation with dplyr 151 12.1 Pipes 151 12.2 tbl 152 12.3 select 153 12.4 filter 161 12.5 slice 167 12.6 mutate 168 12.7 summarize 171 12.8 group_by 172 12.9 arrange 173 12.10 do 174 12.11 dplyr with Databases 176 12.12 Conclusion 178 Chapter 13: Iterating with purrr 179 13.1 map 179 13.2 map with Specified Types 181 13.3 Iterating over a data.frame 186 13.4 map with Multiple Inputs 187 13.5 Conclusion 188 Chapter 14: Data Reshaping 189 14.1 cbind and rbind 189 14.2 Joins 190 14.3 reshape2 197 14.4 Conclusion 200 Chapter 15: Reshaping Data in the Tidyverse 201 15.1 Binding Rows and Columns 201 15.2 Joins with dplyr 202 15.3 Converting Data Formats 207 15.4 Conclusion 210 Chapter 16: Manipulating Strings 211 16.1 paste 211 16.2 sprintf 212 16.3 Extracting Text 213 16.4 Regular Expressions 217 16.5 Conclusion 224 Chapter 17: Probability Distributions 225 17.1 Normal Distribution 225 17.2 Binomial Distribution 230 17.3 Poisson Distribution 235 17.4 Other Distributions 238 17.5 Conclusion 240 Chapter 18: Basic Statistics 241 18.1 Summary Statistics 241 18.2 Correlation and Covariance 244 18.3 T-Tests 252 18.4 ANOVA 260 18.5 Conclusion 263 Chapter 19: Linear Models 265 19.1 Simple Linear Regression 265 19.2 Multiple Regression 270 19.3 Conclusion 287 Chapter 20: Generalized Linear Models 289 20.1 Logistic Regression 289 20.2 Poisson Regression 293 20.3 Other Generalized Linear Models 297 20.4 Survival Analysis 297 20.5 Conclusion 302 Chapter 21: Model Diagnostics 303 21.1 Residuals 303 21.2 Comparing Models 309 21.3 Cross-Validation 313 21.4 Bootstrap 318 21.5 Stepwise Variable Selection 321 21.6 Conclusion 324 Chapter 22: Regularization and Shrinkage 325 22.1 Elastic Net 325 22.2 Bayesian Shrinkage 342 22.3 Conclusion 346 Chapter 23: Nonlinear Models 347 23.1 Nonlinear Least Squares 347 23.2 Splines 350 23.3 Generalized Additive Models 353 23.4 Decision Trees 359 23.5 Boosted Trees 361 23.6 Random Forests 364 23.7 Conclusion 366 Chapter 24: Time Series and Autocorrelation 367 24.1 Autoregressive Moving Average 367 24.2 VAR 374 24.3 GARCH 379 24.4 Conclusion 388 Chapter 25: Clustering 389 25.1 K-means 389 25.2 PAM 397 25.3 Hierarchical Clustering 403 25.4 Conclusion 407 Chapter 26: Model Fitting with Caret 409 26.1 Caret Basics 409 26.2 Caret Options 409 26.3 Tuning a Boosted Tree 411 26.4 Conclusion 415 Chapter 27: Reproducibility and Reports with knitr 417 27.1 Installing a LaTeX Program 417 27.2 LaTeX Primer 418 27.3 Using knitr with LaTeX 420 27.4 Conclusion 426 Chapter 28: Rich Documents with RMarkdown 427 28.1 Document Compilation 427 28.2 Document Header 427 28.3 Markdown Primer 429 28.4 Markdown Code Chunks 430 28.5 htmlwidgets 432 28.6 RMarkdown Slideshows 444 28.7 Conclusion 446 Chapter 29: Interactive Dashboards with Shiny 447 29.1 Shiny in RMarkdown 447 29.2 Reactive Expressions in Shiny 452 29.3 Server and UI 454 29.4 Conclusion 463 Chapter 30: Building R Packages 465 30.1 Folder Structure 465 30.2 Package Files 465 30.3 Package Documentation 472 30.4 Tests 475 30.5 Checking, Building and Installing 477 30.6 Submitting to CRAN 479 30.7 C++ Code 479 30.8 Conclusion 484 Appendix A: Real-Life Resources 485 A.1 Meetups 485 A.2 Stack Overflow 486 A.3 Twitter 487 A.4 Conferences 487 A.5 Web Sites 488 A.6 Documents 488 A.7 Books 488 A.8 Conclusion 489 Appendix B: Glossary 491 List of Figures 507 List of Tables 513 General Index 515 Index of Functions 521 Index of Packages 527 Index of People 529 Data Index 531

「Nielsen BookData」 より

関連文献: 1件中  1-1を表示

詳細情報

ページトップへ