A study in resource federation for e-Science e-Scienceのための資源連携に関する研究
A study in resource federation for e-Science
This research seeks to seamlessly support the infrastructure of distributed computingand storage through the development and study of a software-abstractionlayer that interfaces to multiple Grid middleware and to new Cloud environments.Through this abstraction it is possible to sustain uninterrupted access to resourcesthat is robust to the dynamic nature of those resources (compute nodes may fail,storage resources may go offline while a computation is being performed). Westudied the software-abstraction layer and provided our Universal Grid User Interface(UGI) architecture for multiple kinds of Grid and Cloud middleware tosupport end users and application engineers. UGI is implemented based on ASimple API for Grid Applications (SAGA) and provides supplemental and extendedfunctions that are not included in SAGA.We demonstrated that job submissions can be executed in the UGI-based userenvironment with different Grid resources. We provided and verified a simple wayto execute the jobs based on High Energy and Nuclear Physics (HENP) libraries.For file manipulation, we demonstrated that an application can access the differentfile-system middleware in the Data Grids. The application enables to handlepieces as completed files, even if a large file is cut up and the separated parts arestored on different Data Grids. We managed the files distributed in heterogeneousData Grids by using a catalog service. The example demonstrated that an applicationcan obtain the location information about the pieces of files distributed amongdifferent kinds of Data Grids, and then access the distributed files.For applied tools and applications, we demonstrated a method to reliably managefiles with Resource Namespace Service (RNS), a UGI-based Web applicationfor Particle Therapy Simulation (PTSim), and an approach inspired by Ant ColonyOptimization (ACO). Our method for reliably managing large files works on differentkinds of Data Grids using RNS. The volume of digital data and the size ofan individual file are increasing due to the introduction of high-resolution images,high-definition audiovisual files, etc. The reliable storage of such large files is becomingproblematic with whole file replication as a failure in the integrity of thefile is difficult to localize. Our method involves managing large files in Data Gridsby splitting them into smaller units in a traceable manner and then managing thesmaller units. The RNS catalog service contains EPR (Endpoint Reference) andmetadata that describe the original locations as well as the checksum values. Theexample we shows how our Grid application can retrieve the actual file locationsand the checksum values from the RNS service.Our second tool is a UGI-based Web application for PTSim. PTSim is a simulationsystem for particle therapy. The application of particle physics to themedical environment is one of the application areas that have a direct benefit tomankind. PTSim makes use of the Geant4 toolkit to simulate the passage of particlesthrough the human body. It includes a Web interface that can be used byseveral collaborating medical particle therapy centers. The Web interface allowsa non-Grid environment to be easily ported to Grid to take advantage of the additionalresources.Our last tool is for an approach inspired by swarm intelligence, ACO. Swarmintelligence is one of approaches to provide a fault tolerant and efficient means oftransferring data in a dynamic environment. Swarm intelligence is inspired primarilyby observations of the collective behavior of social insects in addressingcomplex distributed problems. The basic idea is that each member of the swarmhas simple rules that govern its behavior, but the interaction among the membersof the swarm can be used to tackle problems that are difficult to solve with complicatednumeric methods. We investigate the problem of data distribution amonga client and servers in a dynamic environment. We regard each download from aserver to the client as a single member in a swarm. The member’s behavior is simplyto reliably download a data file. Each member can communicate with othermembers to allow the swarm to settle on the best set of servers to download thedata from based on the current status of the environment. ACO is one of Swarmintelligence methods. We created a simulator following the ACO based approachand showed that our approach works well, providing a fault tolerant and efficientmeans of transferring data in a dynamic environment.We can utilize the computing and storage resources with our implementationand solution. The challenges of today’s researchers who need to collaborate withgeographically distributed colleagues with distributed computing and storage resourcescan be overcome.