Test equating, scaling, and linking : methods and practices


Test equating, scaling, and linking : methods and practices

Michael J. Kolen ; Robert L. Brennan

(Statistics for social science and public policy)

Springer, c2010

2nd ed

  • : pbk

大学図書館所蔵 件 / 2



Originally published in c2004

Bibliography: p. [477]-509

Includes index



By providing an introduction to test equating which both discusses the most frequently used equating methodologies and covering many of the practical issues involved, this volume expands upon the coverage of the first edition by providing a new chapter on test scaling and a second on test linking.


Contents Preface Notation 1 Introduction and Concepts 1.1 Equating and Related Concepts 1.1.1 Test Forms and Test Specifications 1.1.2 Equating 1.1.3 Processes That Are Related to Equating 1.1.4 Equating and Score Scales 1.1.5 Equating and the Test Score Decline of the 1960s and 1970s 1.2 Equating and Scaling in Practice-A Brief Overview of This Book 1.3 Properties of Equating 1.3.1 Symmetry Property 1.3.2 Same Specifications Property 1.3.3 Equity Properties 1.3.4 Observed Score Equating Properties 1.3.5 Group Invariance Property 1.4 Equating Designs 1.4.1 Random Groups Design 1.4.2 Single Group Design 1.4.3 Single Group Design with Counterbalancing 1.4.4 ASVAB Problems with a Single Group Design 1.4.5 Common-Item Nonequivalent Groups Design 1.4.6 NAEP Reading Anomaly-Problems with Common Items 1.5 Error in Estimating Equating Relationships 1.6 Evaluating the Results of Equating 1.7 Testing Situations Considered 1.8 Preview 1.9 Exercises 2 Observed Score Equating Using the Random Groups Design 2.1 Mean Equating 2.2 Linear Equating 2.3 Properties of Mean and Linear Equating 2.4 Comparison of Mean and Linear Equating 2.5 Equipercentile Equating 2.5.1 Graphical Procedures 2.5.2 Analytic Procedures 2.5.3 Properties of Equated Scores in Equipercentile Equating 2.6 Estimating Observed Score Equating Relationships 2.7 Scale Scores 2.7.1 Linear Conversions 2.7.2 Truncation of Linear Conversions 2.7.3 Nonlinear Conversions 2.8 Equating Using Single Group Designs 2.9 Equating Using Alternate Scoring Schemes 2.10 Preview of What Follows 2.11 Exercises 3 Random Groups-Smoothing in Equipercentile Equating 3.1 A Conceptual Statistical Framework for Smoothing 3.2 Properties of Smoothing Methods 3.3 Presmoothing Methods 3.3.1 Polynomial Log-linear Method 3.3.2 Strong True Score Method 3.3.3 Illustrative Example 3.4 Postsmoothing Methods 3.4.1 Illustrative Example 3.5 Practical Issues in Equipercentile Equating 3.5.1 Summary of Smoothing Strategies 3.5.2 Equating Error and Sample Size 3.6 Exercises 4 Nonequivalent Groups-Linear Methods 4.1 Tucker Method 4.1.1 Linear Regression Assumptions 4.1.2 Conditional Variance Assumptions 4.1.3 Intermediate Results 4.1.4 Final Results 4.1.5 Special Cases 4.2 Levine Observed Score Method 4.2.1 Correlational Assumptions 4.2.2 Linear Regression Assumptions 4.2.3 Error Variance Assumptions 4.2.4 Intermediate Results 4.2.5 General Results 4.2.6 Classical Congeneric Model Results 4.3 Levine True Score Method 4.3.1 Results 4.3.2 First-Order Equity 4.4 Illustrative Example and Other Topics 4.4.1 Illustrative Example 4.4.2 Synthetic Population Weights 4.4.3 Mean Equating 4.4.4 Decomposing Observed Di.erences in Means and Variances 4.4.5 Relationships Among Tucker and Levine Equating Methods 4.4.6 Scale Scores 4.5 Appendix Proof that o2 s (TX) = a2 1o2 s (TV ) Under the Classical Congeneric Model 4.6 Exercises 5 Nonequivalent Groups-Equipercentile Methods 5.1 Frequency Estimation Equipercentile Equating 5.1.1 Conditional Distributions 5.1.2 Frequency Estimation Method 5.1.3 Evaluating the Frequency Estimation Assumption 5.1.4 Numerical Example 5.1.5 Estimating the Distributions 5.2 Braun-Holland Linear Method 5.3 Chained Equipercentile Equating 5.4 Illustrative Example 5.4.1 Illustrative Results 5.4.2 Comparison Among Methods 5.4.3 Practical Issues in Equipercentile Equating with Common Items 5.5 Exercises 6 Item Response Theory Methods 6.1 Some Necessary IRT Concepts 6.1.1 Unidimensionality and Local Independence Assumptions 6.1.2 IRT Models 6.1.3 IRT Parameter Estimation 6.2 Transformations of IRT Scales 6.2.1 Transformation Equations 6.2.2 Demonstrating the Appropriateness of Scale Transformations 6.2.3 Expressing A and B Constants 6.2.4 Expressing A and B Constants in Terms of Groups of Items and/or Persons 6.3 Transforming IRT Scales When Parameters Are Estimated 6.3.1 Designs 6.3.2 Mean/Sigma and Mean/Mean Transformation Methods 6.3.3 Characteristic Curve Transformation Methods 6.3.4 Comparisons Among Scale Transformation Methods 6.4 Equating and Scaling 6.5 Equating True Scores 6.5.1 Test Characteristic Curves 6.5.2 True Score Equating Process 6.5.3 The Newton-Raphson Method 6.5.4 Using True Score Equating with Observed Scores 6.6 Equating Observed Scores 6.7 IRT True Score Versus IRT Observed Score Equating 6.8 Illustrative Example 6.8.1 Item Parameter Estimation and Scaling 6.8.2 IRT True Score Equating 6.8.3 IRT Observed Score Equating 6.8.4 Rasch Equating 6.9 Using IRT Calibrated Item Pools 6.9.1 Common-Item Equating to a Calibrated Pool 6.9.2 Item Preequating 6.9.3 Robustness to Violations of IRT Assumptions 6.10 Equating with Polytomous IRT 6.10.1 Polytomous IRT Models for Ordered Responses 6.10.2 Scoring Function, Item Response Function, and Test Characteristic Curve 6.10.3 Parameter Estimation and Scale Transformation with Polytomous IRT Models 6.10.4 True Score Equating 6.10.5 Observed Score Equating 6.10.6 Example using the Graded Response Model 6.11 Practical Issues and Caveat 6.12 Exercises 7 Standard Errors of Equating 7.1 De.nition of Standard Error of Equating 7.2 The Bootstrap 7.2.1 Standard Errors Using the Bootstrap 7.2.2 Standard Errors of Equating 7.2.3 Parametric Bootstrap 7.2.4 Standard Errors of Smoothed Equipercentile Equating 7.2.5 Standard Errors of Scale Scores 7.2.6 Standard Errors of Equating Chains 7.2.7 Mean Standard Error of Equating 7.2.8 Caveat 7.3 The Delta Method 7.3.1 Mean Equating Using Single Group and Random Groups Designs 7.3.2 Linear Equating Using the Random Groups Design 7.3.3 Equipercentile Equating Using the Random Groups Design 7.3.4 Standard Errors for Other Designs 7.3.5 Approximations 7.3.6 Standard Errors for Scale Scores 7.3.7 Standard Errors of Equating Chains 7.3.8 Using Delta Method Standard Errors 7.4 Using Standard Errors in Practice 7.5 Exercises 8 Practical Issues in Equating 8.1 Equating and the Test Development Process 8.1.1 Test Speci.cations 8.1.2 Characteristics of Common-item Sets 8.1.3 Changes in Test Specifications 8.2 Data Collection: Design and Implementation 8.2.1 Choosing Among Equating Designs 8.2.2 Developing Equating Linkage Plans 8.2.3 Examinee Groups Used in Equating 8.2.4 Sample Size Requirements 8.3 Choosing From Among the Statistical Procedures 8.3.1 Equating Criteria in Research Studies 8.3.2 Characteristics of Equating Situations 8.4 Choosing From Among Equating Results 8.4.1 Equating Versus Not Equating 8.4.2 Use of Robustness Checks 8.4.3 Choosing From Among Results in the Random Groups Design 8.4.4 Choosing From Among Results in the Common-Item Nonequivalent Groups Design 8.4.5 Use of Consistency Checks 8.4.6 Equating and Score Scales 8.4.7 Assessing First and Second Order Equity for Scale Scores 8.5 Importance of Standardization Conditions and Quality Control 8.5.1 Test Development 8.5.2 Test Administration and Standardization Conditions 8.5.3 Quality Control 8.5.4 Reequating 8.6 Conditions Conducive to Satisfactory Equating 8.7 Comparability Issues in Special Circumstances 8.7.1 Comparability Issues with Computer-Based Tests 8.7.2 Comparability of Performance Assessments 8.7.3 Score Comparability with Optional Test Sections 8.8 Conclusion 8.9 Exercises 9 Score Scales 9.1 Scaling Perspectives 9.2 Score Transformations 9.3 Incorporating Normative Information 9.3.1 Linear Transformations 9.3.2 Nonlinear Transformations 9.3.3 Example: Normalized Scale Scores 9.3.4 Importance of Norm Group in Setting the Score Scale 9.4 Incorporating Score Precision Information 9.4.1 Rules of Thumb for Number of Distinct Score Points 9.4.2 Linearly Transformed Score Scales with a Given Standard Error of Measurement 9.4.3 Score Scales with Approximately Equal Conditional Standard Errors of Measurement 9.4.4 Example: Incorporating Score Precision 9.4.5 Evaluating Psychometric Properties of Scale Scores 9.4.6 The IRT e-Scale as a Score Scale 9.5 Incorporating Content Information 9.5.1 Item Mapping 9.5.2 Scale Anchoring 9.5.3 Standard Setting 9.5.4 Numerical Example 9.5.5 Practical Usefulness 9.6 Maintaining Score Scales 9.7 Scales for Test Batteries and Composites 9.7.1 Test Batteries 9.7.2 Composite Scores 9.7.3 Maintaining Scales for Batteries and Composites 9.8 Vertical Scaling and Developmental Score Scales 9.8.1 Structure of Batteries 9.8.2 Type of Domain Being Measured 9.8.3 Definition of Growth 9.8.4 Designs for Data Collection for Vertical Scaling 9.8.5 Test Scoring 9.8.6 Hieronymus Statistical Methods 9.8.7 Thurstone Statistical Methods 9.8.8 IRT Statistical Methods 9.8.9 Thurstone Illustrative Example 9.8.10 IRT Illustrative Example 9.8.11 Statistics for Comparing Scaling Results 9.8.12 Some Limitations of Vertically Scaled Tests 9.8.13 Research on Vertical Scaling 9.9 Exercises 10 Linking 10.1 Linking Categorization Schemes and Criteria 10.1.1 Types of Linking 10.1.2 Mislevy/Linn Taxonomy 10.1.3 Degrees of Similarity 10.2 Group Invariance 10.2.1 Statistical Methods Using Observed Scores 10.2.2 Statistics for Overall Group Invariance 10.2.3 Statistics for Pairwise Group Invariance 10.2.4 Example: ACT and ITED Science Tests 10.3 Additional Examples 10.3.1 Extended Time 10.3.2 Test Adaptations and Translated Tests 10.4 Discussion 10.5 Exercises 11 Current and Future Challenges 11.1 Score Scales 11.2 Equating 11.3 Vertical Scaling 11.4 Linking 11.5 Summary References Appendix A: Answers to Exercises Appendix B: Computer Programs Index

「Nielsen BookData」 より

関連文献: 1件中  1-1を表示


  • ISBN
    • 9781441923042
  • 出版国コード
  • タイトル言語コード
  • 本文言語コード
  • 出版地
    New York
  • ページ数/冊数
    xxvi, 548 p.
  • 大きさ
    24 cm
  • 件名
  • 親書誌ID