Development of a Database of Health Insurance Claims : Standardization of Disease Classifications and Anonymous Record Linkage

Access this Article

Search this Article


    • SATO Toshihiko
    • Kitasato Clinical Research Center, Kitasato University School of Medicine
    • IKEDA Shunya
    • Department of Pharmaceutical Sciences, School of Pharmacy, International University of Health and Welfare
    • NODA Mitsuhiko
    • Department of Diabetes and Metabolic Medicine, National Center for Global Health and Medicine
    • NAKAYAMA Takeo
    • Department of Health Informatics, Kyoto University School of Public Health


<b>Background: </b>Health insurance claims (ie, receipts) record patient health care treatments and expenses and, although created for the health care payment system, are potentially useful for research. Combining different types of receipts generated for the same patient would dramatically increase the utility of these receipts. However, technical problems, including standardization of disease names and classifications, and anonymous linkage of individual receipts, must be addressed.<BR><b>Methods: </b>In collaboration with health insurance societies, all information from receipts (inpatient, outpatient, and pharmacy) was collected. To standardize disease names and classifications, we developed a computer-aided post-entry standardization method using a disease name dictionary based on International Classification of Diseases (ICD)-10 classifications. We also developed an anonymous linkage system by using an encryption code generated from a combination of hash values and stream ciphers. Using different sets of the original data (data set 1: insurance certificate number, name, and sex; data set 2: insurance certificate number, date of birth, and relationship status), we compared the percentage of successful record matches obtained by using data set 1 to generate key codes with the percentage obtained when both data sets were used.<BR><b>Results: </b>The dictionary’s automatic conversion of disease names successfully standardized 98.1% of approximately 2 million new receipts entered into the database. The percentage of anonymous matches was higher for the combined data sets (98.0%) than for data set 1 (88.5%).<BR><b>Conclusions: </b>The use of standardized disease classifications and anonymous record linkage substantially contributed to the construction of a large, chronologically organized database of receipts. This database is expected to aid in epidemiologic and health services research using receipt information.


  • Journal of Epidemiology

    Journal of Epidemiology 20(5), 413-419, 2010-09-01

    Japan Epidemiological Association

References:  17

Cited by:  5


  • NII Article ID (NAID)
  • Text Lang
  • Article Type
    Journal Article
  • ISSN
  • Data Source
    CJP  CJPref  J-STAGE 
Page Top