Designing and evaluating language corpora : a practical framework for corpus representativeness

Author(s)

Bibliographic Information

Designing and evaluating language corpora : a practical framework for corpus representativeness

Jesse Egbert, Douglas Biber, Bethany Gray

Cambridge University Press, 2022

  • : hardback

Available at  / 1 libraries

Search this Book/Journal

Note

Includes bibliographical references (p. 271-279) and index

Description and Table of Contents

Description

Corpora are ubiquitous in linguistic research, yet to date, there has been no consensus on how to conceptualize corpus representativeness and collect corpus samples. This pioneering book bridges this gap by introducing a conceptual and methodological framework for corpus design and representativeness. Written by experts in the field, it shows how corpora can be designed and built in a way that is both optimally suited to specific research agendas, and adequately representative of the types of language use in question. It considers questions such as 'what types of texts should be included in the corpus?', and 'how many texts are required?' - highlighting that the degree of representativeness rests on the dual pillars of domain considerations and distribution considerations. The authors introduce, explain, and illustrate all aspects of this corpus representativeness framework in a step-by-step fashion, using examples and activities to help readers develop practical skills in corpus design and evaluation.

Table of Contents

  • 1. Introduction
  • 2. Approaches to representativeness in previous corpus linguistic research
  • 3. Corpus representativeness: a conceptual and methodological framework
  • 4. Domain considerations
  • 5. Distribution considerations
  • 6. The influence of domain and distribution considerations on corpus representativeness - bringing it all together
  • 7. Corpus design and representativeness in practice
  • Glossary
  • Appendix A. Example articles documenting existing corpora
  • Appendix B. Survey of corpus design and compilation practices.

by "Nielsen BookData"

Details

  • NCID
    BC11921260
  • ISBN
    • 9781107151383
  • Country Code
    uk
  • Title Language Code
    eng
  • Text Language Code
    eng
  • Place of Publication
    Cambridge
  • Pages/Volumes
    xiii, 284 p.
  • Size
    24 cm
  • Classification
  • Subject Headings
Page Top