XMLを利用した日本古典史料の英日全文連携検索システムの構築-日米共同研究について-

Bibliographic Information

Other Title
  • XML オ リヨウ シタ ニホン コテン シリョウ ノ エイニチ ゼンブン レンケイ ケンサク システム ノ コウチク ニチベイ キョウドウ ケンキュウ ニ ツイテ
  • XML ヲ リヨウ シタ ニホン コテン シリョウ ノ エイニチ ゼンブン レンケイ ケンサク システム ノ コウチク ニチベイ キョウドウ ケンキュウ ニ ツイテ

Search this article

Abstract

P(論文)

The goals of this full text coordinated retrieval system are to study ancient Japanese culture such as Shinto and Japanese spiritual life, to study the historical development of ancient Japanese state and its structure, and to conduct ethnological/ethnographic research. It further aims to send out the information regarding Japanese culture to the world and to promote the international collaboration. This system will provide the means for making a database of texts available to scholars not only who are Japanese speakers but English speakers, thus reaping synergetic research results and promoting collaboration between English-speaking researchers and Japanese researchers. The texts we have chosen to work with are 1)Japanese ancient chronologies such as Kojiki, Nihon Shoki and Shoku Nihongi, 2)Engishiki, a collection of laws and regulations on shrines, 3)Izumo Fudoki, a local document of a certain geographic area, and 4)Gukansho, the representative history text of the Middle Ages. We intend to digitize the twenty-five volumes of Japanese classic texts and to facilitate an internet-based coordinated search, browse and reuse between the texts using both English and Japanese. The purpose of this full text coordinated retrieval system is to assist historical research that requires a large volume of data, without any special processing, i.e. Japanese historical documents that do not have fixed data forms. As these documents require a search function using a certain program, we used the CGI (Common Gateway Interface)function that provides the contents retrieved from the Web server upon request of the browser with a dynamic and interactive environment. As the number of stored historical documents increases, the efficiency rate of the search function will decrease if we depend on the technique of matching simple letter patterns. The difficulty in extracting certain subjects from the texts that have sentence structures (logic structures)also results in significant limitations and exposes the weakness of the system. In order to solve these problems, it is important to develop a search method focusing on the document structures and descriptions of Japanese classic texts and to design and implement a more robust system. XML (Extensible Markup Language), which could define the logic structure of the documents and attributes in addition to texts, has been receiving attention, and certain standards of the language have been set, which we are using for our standard for managing, circulating, and providing documents on the Web. XML enables us to record information about each text such as notes, titles, extracts, important insertions, and interactive references independently from the text. In order to take full advantage of these features of the language we employed XML to define the logic structure of the texts, and designed the full text coordinated retrieval system. In this paper, we explain the processes and problems we encountered in developing the full text coordinated retrieval system, describing our efforts to increase efficiency and the addition of various search functions.

Journal

Related Projects

See more

Details 詳細情報について

Report a problem

Back to top