Automatic Identification of Duplicate Records and "Works" in Japanese Union Catalogs : An Experiment on UNICANET Bibliographic Records

  • TANIGUCHI Shoichi
    Graduate School of Library, Information and Media Studies, University of Tsukuba

Bibliographic Information

Other Title
  • 総合目録データに対する機械的書誌同定と著作同定の試み : ゆにかねっとレコードによる実験
  • ソウゴウ モクロク データ ニ タイスル キカイテキ ショシ ドウ テイ ト チョサク ドウ テイ ノ ココロミ : ユ ニ カネッ ト レコード ニ ヨル ジッケン

Search this article

Abstract

Automatic identification of duplicate records and "works" was tried on bibliographic records in UNICANET, a union catalog operated by the National Diet Library. Identifying duplicates is to group records representing the same resource while identifying "works" indicates to group records sharing the same work, being defined in FRBR. This paper reports the extent to which records can be automatically identified as members of a particular resource and of a particular work and also which of the possible alternatives are effective. The method used in this study is to extract data values from certain fields in records encoded in DC-NDL schema, to normalize those values, and then to generate identification keys to be matched with a database storing incrementally the identified records. Several ways of choosing fields and values for title and author name, combing the generated identification keys, and other choices were examined and grouping records was executed for each way. The record groups built automatically were evaluated by comparing them with the sample correct sets built manually. The results of the experiment show that automatic identification of duplicates and works is fully archived. It also shows that it is effective (a) to use the normalization proposed, (b) regarding the choices in titles, to adopt titles and their transcription comprehensively except series titles, and to apply the decomposition and recombination of titles while generating the title identification keys, and (c) as for authors, to adopt author names and their transcription comprehensively, and to take publishers when no author is found.

Journal

Related Projects

See more

Details 詳細情報について

Report a problem

Back to top