A Multifaceted Rasch Analysis of Rater Reliability of the Speaking Section of the GTEC CBT

Access this Article

Author(s)

Abstract

<p>  Second language (L2) speaking assessment can be affected by raters as well as tasks and other factors. High-stakes speaking tests require that high rater reliability be assured and that such information be reported to the public. In Japan, investigations into rater reliability and the use of multifaceted Rasch analysis have been limited for L2 speaking assessment in both high-stakes contexts and classroom situations. To fill this gap, this study examines the rater reliability of the Speaking Section of the Global Test of English Communication Computer Based Testing (GTEC CBT). This test has nine tasks for evaluation and 23 assessment criteria. We analyzed 648 test takers' responses using multifaceted Rasch analysis. The results showed that raters differed in severity to a small degree but demonstrated high rater agreement and rater self-consistency. The bias analysis indicated a small percentage of systematic biased patterns between raters and test takers and 25.78% of biases between raters and criteria. Implications for improving assessment were discussed.</p>

Journal

  • ARELE: Annual Review of English Language Education in Japan

    ARELE: Annual Review of English Language Education in Japan 28(0), 241-256, 2017

    The Japan Society of English Language Education

Codes

Page Top