NaDev: An Annotated Corpus to Support Information Extraction from Research Papers on Nanocrystal Devices

    • Dieb Thaer M.
    • Graduate School of Information Science and Technology, Hokkaido University
    • Hara Shinjiro
    • Research Center for Integrated Quantum Electronics, Hokkaido University


The process of nanocrystal device development is not well systematized. To support this process, analysis of the information produced by developmental experiments is required. In this study, we constructed an annotated corpus to support the extraction of experimental information from relevant publications. We designed the corpus-construction guidelines by cooperating with a domain expert. We evaluated these guidelines through corpus-construction experiments with graduate students from this domain, and then evaluated the corpus with the domain expert. In the corpus construction experiments, we achieved a sufficient level of Inter-Annotator Agreement by using a loose agreement measure that ignored the term-boundary mismatch problem, and made an agreement corpus that excluded annotations based on misunderstanding the guidelines. The domain expert evaluated this agreement corpus and modified the guidelines based on real examples. Using these guidelines, we finalized the corpus called "NaDev" (<u>Na</u>nocrystal <u>Dev</u>ice development corpus). The NaDev corpus and its construction guidelines will be released via our website, The NaDev corpus aims to support automatic information extraction from publications relevant to nanocrystal device development. This information can be used to solve problems in the nanotechnology domain using the massive availability of fresh information. To the best of our knowledge, this is the first corpus constructed for the development of nanocrystal devices.


  • Journal of Information Processing

    Journal of Information Processing 24(3), 554-564, 2016

    Information Processing Society of Japan


