Statistical applications for environmental analysis and risk assessment

Author(s)

    • Ofungwu, Joseph

Bibliographic Information

Statistical applications for environmental analysis and risk assessment

Joseph Ofungwu

(Statistics in practice)

Wiley, c2014

  • : hardback

Available at  / 7 libraries

Search this Book/Journal

Note

Includes bibliographical references (p. 609-612) and index

Description and Table of Contents

Description

Statistical Applications for Environmental Analysis and Risk Assessment guides readers through real-world situations and the best statistical methods used to determine the nature and extent of the problem, evaluate the potential human health and ecological risks, and design and implement remedial systems as necessary. Featuring numerous worked examples using actual data and "ready-made" software scripts, Statistical Applications for Environmental Analysis and Risk Assessment also includes: * Descriptions of basic statistical concepts and principles in an informal style that does not presume prior familiarity with the subject * Detailed illustrations of statistical applications in the environmental and related water resources fields using real-world data in the contexts that would typically be encountered by practitioners * Software scripts using the high-powered statistical software system, R, and supplemented by USEPA's ProUCL and USDOE's VSP software packages, which are all freely available * Coverage of frequent data sample issues such as non-detects, outliers, skewness, sustained and cyclical trend that habitually plague environmental data samples * Clear demonstrations of the crucial, but often overlooked, role of statistics in environmental sampling design and subsequent exposure risk assessment.

Table of Contents

Preface xvii Acknowledgments xix 1 Introduction 1 1.1 Introduction and Overview 1 1.2 The Aim of the Book: Get Involved! 2 1.3 The Approach and Style: Clarity, Clarity, Clarity 3 Part I Basic Statistical Measures and Concepts 5 2 Introduction to Software Packages Used in This Book 7 2.1 R 8 2.1.1 Helpful R Tips 9 2.1.2 Disadvantages of R 10 2.2 ProUCL 10 2.2.1 Helpful ProUCL Tips 11 2.2.2 Potential Deficiencies of ProUCL 12 2.3 Visual Sample Plan 12 2.4 DATAPLOT 13 2.4.1 Helpful Tips for Running DATAPLOT in Batch Mode 13 2.5 Kendall-Thiel Robust Line 14 2.6 Minitab (R) 14 2.7 Microsoft Excel 15 3 Laboratory Detection Limits, Nondetects, and Data Analysis 17 3.1 Introduction and Overview 17 3.2 Types of Laboratory Data Detection Limits 18 3.3 Problems with Nondetects in Statistical Data Samples 19 3.4 Options for Addressing Nondetects in Data Analysis 20 3.4.1 Kaplan-Meier Estimation 21 3.4.2 Robust Regression on Order Statistics 22 3.4.3 Maximum Likelihood Estimation 23 4 Data Sample, Data Population, and Data Distribution 25 4.1 Introduction and Overview 25 4.2 Data Sample Versus Data Population or Universe 26 4.3 The Concept of a Distribution 27 4.3.1 The Concept of a Probability Distribution Function 28 4.3.2 Cumulative Probability Distribution and Empirical Cumulative Distribution Functions 31 4.4 Types of Distributions 34 4.4.1 Normal Distribution 34 4.4.1.1 Goodness-of-Fit (GOF) Tests for the Normal Distribution 40 4.4.1.2 Central Limit Theorem 48 4.4.2 Lognormal, Gamma, and Other Continuous Distributions 49 4.4.2.1 Gamma Distribution 51 4.4.2.2 Logistic Distribution 51 4.4.2.3 Other Continuous Distributions 52 4.4.3 Distributions Used in Inferential Statistics (Student's t, Chi-Square, F) 53 4.4.3.1 Student's t Distribution 53 4.4.3.2 Chi-Square Distribution 55 4.4.3.3 F Distribution 57 4.4.4 Discrete Distributions 57 4.4.4.1 Binomial Distribution 57 4.4.4.2 Poisson Distribution 61 Exercises 64 5 Graphics for Data Analysis and Presentation 67 5.1 Introduction and Overview 67 5.2 Graphics for Single Univariate Data Samples 68 5.2.1 Box and Whiskers Plot 68 5.2.2 Probability Plots (i.e., Quantile-Quantile Plots for Comparing a Data Sample to a Theoretical Distribution) 72 5.2.3 Quantile Plots 79 5.2.4 Histograms and Kernel Density Plots 82 5.3 Graphics for Two or More Univariate Data Samples 86 5.3.1 Quantile-Quantile Plots for Comparing Two Univariate Data Samples 86 5.3.2 Side-by-Side Box Plots 89 5.4 Graphics for Bivariate and Multivariate Data Samples 91 5.4.1 Graphical Data Analysis for Bivariate Data Samples 91 5.4.2 Graphical Data Analysis for Multivariate Data Samples 95 5.5 Graphics for Data Presentation 98 5.6 Data Smoothing 105 5.6.1 Moving Average and Moving Median Smoothing 105 5.6.2 Locally Weighted Scatterplot Smoothing (LOWESS or LOESS) 108 5.6.2.1 Smoothness Factor and the Degree of the Local Regression 109 5.6.2.2 Basic and Robust LOWESS Weighting Functions 109 5.6.2.3 LOESS Scatterplot Smoothing for Data with Multiple Variables 112 Exercises 113 6 Basic Statistical Measures: Descriptive or Summary Statistics 115 6.1 Introduction and Overview 115 6.2 Arithmetic Mean and Weighted Mean 116 6.3 Median and Other Robust Measures of Central Tendency 117 6.4 Standard Deviation, Variance, and Other Measures of Dispersion or Spread 119 6.4.1 Quantiles (Including Percentiles) 121 6.4.2 Robust Measures of Spread: Interquartile Range and Median Absolute Deviation 124 6.5 Skewness and Other Measures of Shape 124 6.6 Outliers 134 6.6.1 Tests for Outliers 135 6.7 Data Transformations 139 Exercises 141 Part II Statistical Procedures for Mostly Univariate Data 143 7 Statistical Intervals: Confidence, Tolerance, and Prediction Intervals 145 7.1 Introduction and Overview 145 7.2 Confidence Intervals 146 7.2.1 Parametric Confidence Intervals 151 7.2.1.1 Parametric Confidence Interval around the Arithmetic Mean or Median for Normally Distributed Data 151 7.2.1.2 Lognormal and Other Parametric Confidence Intervals 153 7.2.2 Nonparametric Confidence Intervals Around the Mean, Median, and Other Percentiles 154 7.2.3 Parametric Confidence Band Around a Trend Line 164 7.2.4 Nonparametric Confidence Band Around a Trend Line 166 7.3 Tolerance Intervals 168 7.3.1 Parametric Tolerance Intervals 169 7.3.2 Nonparametric Tolerance Intervals 170 7.4 Prediction Intervals 173 7.4.1 Parametric Prediction Intervals for Future Individual Values and Future Means 175 7.4.2 Nonparametric Prediction Intervals for Future Individual Values and Future Medians 176 7.5 Control Charts 178 Exercises 178 8 Tests of Hypothesis and Decision Making 181 8.1 Introduction and Overview 181 8.2 Basic Terminology and Procedures for Tests of Hypothesis 182 8.3 Type I and Type II Decision Errors, Statistical Power, and Interrelationships 190 8.4 The Problem with Multiple Tests or Comparisons: Site-Wide False Positive Error Rates 193 8.5 Tests for Equality of Variance 195 Exercises 199 9 Applications of Hypothesis Tests: Comparing Populations, Analysis of Variance 201 9.1 Introduction and Overview 201 9.2 Single Sample Tests 202 9.2.1 Parametric Single-Sample Tests: One-Sample t-Test and One-Sample Proportion Test 203 9.2.2 Nonparametric Single-Sample Tests: One-Sample Sign Test and One-Sample Wilcoxon Signed Rank Test 205 9.2.2.1 Nonparametric One-Sample Sign Test 206 9.2.2.2 Nonparametric One-Sample Wilcoxon Signed Rank Test 208 9.3 Two-Sample Tests 208 9.3.1 Parametric Two-Sample Tests 210 9.3.1.1 Parametric Two-Sample t-Test for Independent Populations 210 9.3.1.2 Parametric Two-Sample t-Test for Paired Populations 214 9.3.2 Nonparametric Two-Sample Tests 216 9.3.2.1 Nonparametric Wilcoxon Rank Sum Test for Two Independent Populations 216 9.3.2.2 Nonparametric Gehan Test for Two Independent Populations 220 9.3.2.3 Nonparametric Quantile Test for Two Independent Populations 221 9.3.2.4 Nonparametric Two-Sample Paired Sign Test and Paired Wilcoxon Signed Rank Test 222 9.4 Comparing Three or More Populations: Parametric ANOVA and Nonparametric Kruskal-Wallis Tests 227 9.4.1 Parametric One-Way ANOVA 228 9.4.1.1 Computation of Parametric One-Way ANOVA 230 9.4.2 Nonparametric One-Way ANOVA (Kruskal-Wallis Test) 235 9.4.3 Follow-Up or Post Hoc Comparisons After Parametric and Nonparametric One-Way ANOVA 238 9.4.4 Parametric and Nonparametric Two-Way and Multifactor ANOVA 244 Exercises 255 10 Trends, Autocorrelation, and Temporal Dependence 257 10.1 Introduction and Overview 257 10.2 Tests for Autocorrelation and Temporal Effects 258 10.2.1 Test for Autocorrelation Using the Sample Autocorrelation Function 259 10.2.2 Test for Autocorrelation Using the Rank Von Neumann Ratio Method 261 10.2.3 An Example on Site-Wide Temporal Effects 264 10.3 Tests for Trend 265 10.3.1 Parametric Test for Trends-Simple Linear Regression 266 10.3.2 Nonparametric Test for Trends-Mann-Kendall Test and Seasonal Mann-Kendall Test 271 10.3.3 Nonparametric Test for Trends-Theil-Sen Trend Test 273 10.4 Correcting Seasonality and Temporal Effects in the Data 279 10.4.1 Correcting Seasonality for a Single Data Series 280 10.4.2 Simultaneously Correcting Temporal Dependence for Multiple Data Sets 281 10.5 Effects of Exogenous Variables on Trend Tests 282 Exercises 285 Part III Statistical Procedures for Mostly Multivariate Data 287 11 Correlation, Covariance, Geostatistics 289 11.1 Introduction and Overview 289 11.2 Correlation and Covariance 290 11.2.1 Pearson's Correlation Coefficient 292 11.2.2 Spearman's and Kendall's Correlation Coefficients 294 11.3 Introduction to Geostatistics 300 11.3.1 The Variogram or Covariogram 300 11.3.2 Kriging 302 11.3.3 A Note on Data Sample Size and Lag Distance Requirements 311 Exercises 312 12 Simple Linear Regression 315 12.1 Introduction and Overview 315 12.2 The Simple Linear Regression Model 316 12.2.1 The True or Population X-Y Relationship 317 12.2.2 The Estimated X-Y Relationship Based on a Data Sample 320 12.3 Basic Applications of Simple Linear Regression 324 12.3.1 Description and Graphical Review of the Data Sample for Regression 324 12.3.1.1 Computing the Regression 325 12.3.1.2 Interpreting the Regression Results 326 12.4 Verify Compliance with the Assumptions of Conventional Linear Regression 332 12.4.1 Assumptions of Linearity and Homoscedasticity 332 12.4.2 Assumption of Independence 334 12.4.3 Exogeneity Assumption, Normality of the Y Errors, and Absence of Outliers 337 12.5 Check the Regression Diagnostics for the Presence of Influential Data Points 339 12.6 Confidence Intervals for the Predicted Y Values 343 12.7 Regression for Left-Censored Data (Non-detects) 344 Exercises 349 13 Data Transformation Versus Generalized Linear Model 351 13.1 Introduction and Overview 351 13.2 Data Transformation 352 13.2.1 General Approach for Data Transformations 355 13.2.2 The Ladder of Powers 357 13.2.3 The Bulging Rule and Data Transformations for Regression Analysis 359 13.2.4 Facilitating Data Transformations Using Box-Cox Methods 366 13.2.5 Back-Transformation Bias and Other Issues with Data Transformation 367 13.2.5.1 Logarithmic Transformations 369 13.2.5.2 Other Transformations 370 13.2.6 Transformation Bias Correction 371 13.3 The Generalized Linear Model (GLM) and Applications for Regression 374 13.3.1 Components of the Generalized Linear Model and Inherent Limitations 374 13.3.2 Estimation and Hypothesis Tests of Significance for GLM Parameters 376 13.3.3 Deviance, Null Deviance, Residual Deviance, and Goodness of Fit 377 13.3.4 Diagnostics for GLM 379 13.3.5 Procedural Steps for Regression with GLM in R 380 13.4 Extension of Data Transformation and Generalized Linear Model to Multiple Regression 385 13.4.1 Data Transformation for Multiple Regression 385 13.4.2 Generalized Linear Models for Multiple Regression 387 Exercises 387 14 Robust Regression 391 14.1 Introduction and Overview 391 14.2 Kendall-Theil Robust Line 393 14.2.1 Computation of the Kendall-Theil Robust Line Regression 393 14.2.2 Test of Significance for the Kendall-Theil Robust Line 396 14.2.3 Bias Correction for Y Predictions by the Kendall-Theil Robust Line 397 14.3 Weighted Least Squares Regression 398 14.3.1 Procedure for Weighted Least Squares Regression for Known Variances of the Observations 399 14.4 Iteratively Reweighted Least Squares Regression 405 14.4.1 The Iteratively Reweighted Least Squares Procedure 409 14.5 Other Robust Regression Alternatives: Bounded Influence Methods 412 14.5.1 Least Absolute Deviation or Least Absolute Values 412 14.5.2 Quantile Regression 413 14.5.3 Least Median of Squares 413 14.5.4 Least Trimmed Squares 414 14.6 Robust Regression Methods for Multiple-Variable Data 416 Exercises 417 15 Multiple Linear Regression 419 15.1 Introduction and Overview 419 15.2 The Need for Multiple Regression 420 15.3 The Multiple Linear Regression (MLR) Model 421 15.4 The Estimated Multivariable X-Y Relationship Based on a Data Sample 422 15.5 Assumptions of Multiple Linear Regression 430 15.5.1 Linearity of the Relationship Between the Dependent and Explanatory Variables 431 15.5.2 Absence of Multicollinearity Among the Explanatory Variables 433 15.5.2.1 Potential Remedies for Multicollinearity 436 15.5.3 Homoscedasticity or Constancy of Variance of the Y Population Errors 439 15.5.4 Statistical Independence of the Y Population Errors 441 15.5.5 Exogeneity Assumption, Normality of the Y Errors, and Absence of Outliers 445 15.5.6 Absence of Variability or Errors in the Explanatory Variables 446 15.6 Hypothesis Tests for Reliability of the MLR Model 447 15.6.1 ANOVA F Test for Overall Significance of the Regression 447 15.6.1.1 A Note on ANOVA Tables 448 15.6.2 Partial t and Partial F Tests for Individual Regression Coefficients 452 15.6.3 Complete and Reduced Models 452 15.7 Confidence Intervals for the Regression Coefficients and Predicted Y Values 457 15.8 Coefficient of Multiple Correlation (R), Multiple Determination (R2), Adjusted R2, and Partial Correlation Coefficients 458 15.8.1 Coefficient of Multiple Correlation (R) 458 15.8.2 Coefficient of Multiple Determination (R2) and Adjusted R2 459 15.8.3 Partial Correlations and Squared Partial Correlations 460 15.9 Regression Diagnostics 462 15.10 Model Interactions and Multiplicative Effects 467 15.10.1 The Multiple Linear Regression Interaction Model 467 15.10.2 Hypothesis Tests of the Interaction Terms for Significance 468 Exercises 474 16 Categorical Data Analysis 477 16.1 Introduction and Overview 477 16.2 Types of Variables and Associated Data 478 16.2.1 Quantitative Variables 479 16.2.2 Qualitative Variables 479 16.3 One-Way Analysis of Variance Regression Model 480 16.3.1 Interpretation of the Regression Results and ANOVA F-Test for Overall Significance of the Regression Model 485 16.4 Two-Way Analysis of Variance Regression Model with No Interactions 486 16.5 Two-Way Analysis of Variance Regression Model with Interactions 490 16.6 Analysis of Covariance Regression Model 491 Exercises 499 17 Model Building: Stepwise Regression and Best Subsets Regression 501 17.1 Introduction and Overview 501 17.2 Consequences of Inappropriate Variable Selection 502 17.3 Stepwise Regression Procedures 505 17.3.1 Advantages and Disadvantages of Stepwise Procedures 512 17.4 Subsets Regression 513 Exercises 522 18 Nonlinear Regression 525 18.1 Introduction and Overview 525 18.2 The Nonlinear Regression Model 526 18.3 Assumptions of Nonlinear Least Squares Regression 528 Exercises 545 Part IV Statistics in Environmental Sampling Design and Risk Assessment 547 19 Data Quality Objectives and Environmental Sampling Design 549 19.1 Introduction and Overview 549 19.2 Sampling Design 550 19.3 Sampling Plans 550 19.3.1 Simple Random Sampling 552 19.3.2 Systematic Sampling 554 19.3.3 Other Sampling Designs 556 19.4 Sample Size Determination 557 19.4.1 Types I and II Decision Errors 558 19.4.2 Variance and Gray Region 559 19.4.3 Width of the Gray Region 560 19.4.4 Computation of the Recommended Minimum Sample Size for Estimating the Population Mean or Median 561 19.4.4.1 Minimum Sample Size for Computing UCL95 on the Mean for Normally Distributed Data 562 19.4.4.2 Minimum Sample Size for Computing UCL95 on the Median for Nonnormally Distributed Data 564 19.4.5 Computation of the Recommended Minimum Sample Size for Comparing a Population Mean or Median with a Fixed Threshold Value 565 19.4.6 Computation of the Recommended Minimum Sample Size for Comparing the Population Means or Medians for Two Populations 568 Exercises 569 20 Determination of Background and Applications in Risk Assessment 571 20.1 Introduction and Overview 571 20.2 When Background Sampling is Required and When it is not 572 20.3 Background Sampling Plans 572 20.4 Graphical and Quantitative Data Analysis for Site Versus Background Data Comparisons 573 20.5 Determination of Exposure Point Concentration and Contaminants of Potential Concern 583 Exercises 585 21 Statistics in Conventional and Probabilistic Risk Assessment 587 21.1 Introduction and Overview 587 21.2 Conventional or Point Risk Estimation 588 21.3 Probabilistic Risk Assessment Using Monte Carlo Simulation 594 Exercises 598 Appendix A: Software Scripts 599 Appendix B: Datasets 603 References 609 Answers for Exercises 613 Index 619

by "Nielsen BookData"

Related Books: 1-1 of 1

Details

Page Top