The GENIUS Grid Portal and robot certificates: a new tool for e-Science
- Roberto Barbera†1, 2,
- Giacinto Donvito†6,
- Alberto Falzone†4,
- Giuseppe La Rocca1Email author,
- Luciano Milanesi†5,
- Giorgio Pietro Maggi†3, 6 and
- Saverio Vicario†7
© Barbera et al; licensee BioMed Central Ltd. 2009
Published: 16 June 2009
Grid technology is the computing model which allows users to share a wide pletora of distributed computational resources regardless of their geographical location. Up to now, the high security policy requested in order to access distributed computing resources has been a rather big limiting factor when trying to broaden the usage of Grids into a wide community of users. Grid security is indeed based on the Public Key Infrastructure (PKI) of X.509 certificates and the procedure to get and manage those certificates is unfortunately not straightforward. A first step to make Grids more appealing for new users has recently been achieved with the adoption of robot certificates.
Robot certificates have recently been introduced to perform automated tasks on Grids on behalf of users. They are extremely useful for instance to automate grid service monitoring, data processing production, distributed data collection systems. Basically these certificates can be used to identify a person responsible for an unattended service or process acting as client and/or server. Robot certificates can be installed on a smart card and used behind a portal by everyone interested in running the related applications in a Grid environment using a user-friendly graphic interface. In this work, the GENIUS Grid Portal, powered by EnginFrame, has been extended in order to support the new authentication based on the adoption of these robot certificates.
The work carried out and reported in this manuscript is particularly relevant for all users who are not familiar with personal digital certificates and the technical aspects of the Grid Security Infrastructure (GSI). The valuable benefits introduced by robot certificates in e-Science can so be extended to users belonging to several scientific domains, providing an asset in raising Grid awareness to a wide number of potential users.
The adoption of Grid portals extended with robot certificates, can really contribute to creating transparent access to computational resources of Grid Infrastructures, enhancing the spread of this new paradigm in researchers' working life to address new global scientific challenges. The evaluated solution can of course be extended to other portals, applications and scientific communities.
Unfortunately the scenario we have today is a bit different. Due to the ongoing evolution of this technology so far no standard is available and there's an initial gap scientists need to overcome before to start up. Moreover each Grid user needs to subscribe for a personal X.509 certificate, adhere to a specific VO and obtain an account on one of the trusted UI (User Interface) for the project where he is involved. All these steps may have caused fear and confusion amongst researchers and caused the running away of potential new users. The Italian INFN  and the Italian web technology company NICE  Srl, at the beginning of 2002, started to develop the GENIUS [4, 5] Grid Portal in order to provide transparent access to the Grid for the end-users. Thanks to this work, today researchers coming from different scientific domains can access the Grid to run their own applications using a conventional web interface. All the complexity of the underlying gLite grid middleware  will be hidden to the end-user by the portal. In this manuscript we are going to introduce the new feature designed in this portal in order to support robot certificates.
Starting from the 28th of Feb. 2008 the Italian INFN CA (Certification Authority)  modified its CP/CPS (Certification Policy and Certification Practice Statement of a Certification Authority) to permit users to apply for robot certificates which now are officially recognized as a standard by the IGTF (International Grid Trust Federation) . UK and Netherlands CAs are already issuing robot certificates. The exploitation of these certificates by other CAs in the next few years is warmly foreseen. These new certificates have been introduced to permit users, who are not familiar with deal personal certificates and belonging to a VO, to experience the Grid paradigm for research activity reducing the initial barriers. The robot certificate (also known as portal certificate), associated with a specific application that the user wants to share with the Grid community, can be installed in a smart card and used with a portal by anyone who is interested in running this application in a Grid environment using an user-friendly interface. For security reasons, in order to reduce the risks of having the portal certificate compromised, the INFN CA decided to issue these new certificates on the Aladdin eToken smart card . Each smart card can support several robot certificates: one for each application we want to share with other users of the same VO. An user's PIN is prompted every time the certificate stored on the smart card is read to generate a proxy. The proxy is a term used to describe a certificate that is derived from, and signed by, a normal X.509 Public Key Certificate. It is used to grant access within a PKI based authentication system. Use of a proxy credential is a common technique used in security systems to allow entity A to grant to another entity B the right for B to be authorized with others as if it were A. In other words, entity B is acting as a proxy on behalf of entity A.
With the mkproxy script a proxy certificate is generated for the user. This proxy is used to access Grid and run applications. In this work we have extended the architecture of the portal by adding the functionality introduced by this script. As the proxy certificate has been created the user can start to access the Grid. Since the beginning, the adoption of a personal certificate to access Grid resources has represented a limiting factor for the real spreading of this paradigm. Many researchers would be interested in using Grid as a tool to resolve problems and speed up the creation of scientific results, but the basis of the PKI risks to discourage many of them. The benefits introduced by robot certificates in Life Science are far reaching because they can contribute to make transparent the access in Grid of biologists interested to run some specific applications.
In the next sub-sections the architecture of the GENIUS Grid Portal, powered by EnginFrame, will be presented and the work carried out to extend its framework to support these new certificates will be described in detail.
The GENIUS Architecture
The client side (top left in the figure) represented by a user's workstation running a web browser. Thanks to recent modern client side technologies, many kinds of devices can be used in addition to usual notebooks or workstations, like palmtops or new generation of mobile phones;
the protocols (top right in the figure): the users can use different protocols to access the presentation engine over services, the exposed gateways are available for portlets, web services and RSS; at the present, these protocols can be used by third party clients and not vice versa, accessing the virtualized services;
the server side (right in the figure): a UI machine (equipped with the LCG/gLite middleware services able to submit jobs and manage data on the Grid) which runs the Apache Web Server, the Java/XML portal framework EnginFrame , developed by NICE Srl, and GENIUS itself. The server block is composed by:
◦ the presentation engine for the rendering of layouts and XSL/XML streams, based on leading WEB standards, provides access to underlying services via https, including html, soap and RSS; also the XML virtualization layer provides a set of XML processing functions that simplify the management of information coming from plug-in extensions;
◦ the layer for the Authentication and ACL (Access Control List) management, a core component, with many options to restrict the views of services to different profiles of users, influencing the behaviour of other services;
◦ the Data Management and Virtualization layer provides an abstraction of access to remote data and sources and support for a complete data life-cycle;
The Application kits (left centre side): make the abstraction layer that hides the business logic of specific end-user applications, on the right hand side (right centre side) other transversal services that allow the VOMS Proxy authentication by user X509 certificate, the access to X11 interactive application using VNC  over SSL in secure way, and Monitoring; the Applications are developed by plug-in extensions, and GENIUS code itself is developed like a plug-in to the EnginFrame core;
The remote resources (bottom right in the figure): the Grid, computational resources and distributed data;
Briefly, thanks to the Agent-Server design of the EnginFrame core, the EF Server manages the end user browsing by providing web pages via https, talks to the EF Agent, expects XML response from the Agent; on the other side, the EF Agent translates requests from the EF Server into actions on the computing resources, (i.e. on the gLite User Interface), with the right credentials and correct user-id on the machine, and translates the response from the UI into XML. Using the EnginFrame services the user can interact with files on the UI and, from there, the user can send jobs to the Grid and manage the data of the given Virtual Organization the user belongs to. The use of the web interface eliminates any problem connected to the need of a particular Operating System and/or middle-ware running on the client, and to the locations themselves of the client and the server: the user can interact with the grid from everywhere and with "everything". Making use EnginFrame capability of services virtualization, GENIUS is transparently compliant with latest versions of the LCG /gLite middle-ware, and can be easily installed on a variety of Linux flavours, ranging from RedHat 9 to Scientific Linux, both 32 and 64 bits platforms. The multi-layered architecture of EnginFrame greatly simplifies the development of Web Portals exposing computing services that can run on a broad range of different computational Grid systems. In the last few years the architecture of the portal has been successfully customized to run applications of different scientific domains such as: Life Science, Humanities, Earth Science, Astro-Particle Physics, HEP. Due to its modularity architecture of the EnginFrame framework it is considered a Grid gateway.
Accessing the Grid using a robot certificate and the GENIUS Grid Portal
The Service Definition Files (SDF™) are the core of the EnginFrame framework. Basically they are simple high-level XML files which describe how to link the existing command-line world to users' Web interface. Each SDF must have an .xml extension and, in order to be processed, must be included in the DOC_ROOT of the Web Server. Behind the Web Server, data is managed through the Spooler abstraction. A Spooler is a dedicated zone in the file system. It's used for hosting files provided by users (e.g., input files) or generated by other services (e.g., output or temporary files).
If the smart card is available on the server, an automatic service, deployed in the portal, will drive the user to create a temporary proxy before running the application connected with the robot certificate in Grid. Hereafter follows the action invoked from the portal to generate the proxy using the robot certificate.
Users Tracking System (UTS)
Four different system views to inspect the accounting data produced in Grid by a robot certificate are available:
a "global view" which allows the administrator to retrieve a complete dump of all the information registered in the database;
a "session view" which reports only information about all the sessions started and closed by the users;
an "application view" which reports information about the application submission;
an "advanced query" which allows administrator to perform some advanced queries by putting a specific "where" clause in the dedicated text area.
The GENIUS Grid portal that transparently supports robot certificates has been successfully used by non-grid users, involved in the context of the LIBI Italian Laboratory for Bioinformatics  to run a bioinformatics application on a Grid Infrastructure. In this section some details about the application and its workflow which has been set up in order to run this application on the EGEE  Grid Infrastructure are shown. The application MrBayes (Ronquist and Huelsenbeck 2003)  produces a Bayesian phylogenetic inference among different aligned biosequences. The inference allows identifying the distribution of the most likely genetic relationship among the set of chosen biosequences and at the same time the best set of values for the parameters of the postulated model of evolution of the biosequences. MrBayes has a great richness of model of evolution for DNA (both as nucleotide and codon), RNA (model for evolution of doublet of nucleotide to model the secondary structure of an RNA molecule), protein, and even arbitrary hereditary discrete characters. Another peculiarity of the application is that it allows the usage of mixed models, such as using different models for different parts of each biosequence with the possibility to share parameters among the different models.
The program uses a Metropolis-Coupled Monte Carlo Markov Chain (MCMCMC) to perform the Markovian integration necessary to solve numerically the Bayesian equation. The MCMCMC approach allowed the development of a parallel version of the algorithm (Altekar et al. 2004) . The result of the numerical integration is a sample from the posterior distribution allowing interesting development for future grid implementation. In fact different samples of the posterior distribution could be merged together to increase reliability of the results and to check for the convergence of the algorithm. But it should be noted that the program is not perfectly scalable given that for moderately complex problems the time necessary to reach stationary, and to produce useful sampling, is not so small compared to the maximum time allowed in each single CPU of EGEE.
The input required is a single text file, nexus formatted (Maddison et al. 1997) , subdivided in a data block and MrBayes block in which the models and the parameter of Markovian integration are defined. The output is composed of three kinds of large files (typically of several hundreds of mega base each) that describe, respectively, the posterior distribution of numerical and topological parameters, and several diagnostic measures related to the mixing of Markov chains and the converging of the algorithm as whole.
The present work aims at reporting the work performed by the Italian INFN in order to adopt robot certificates in e-Science. This work demonstrates how it's possible to access and exploit the massive potential of grid technology without worrying about the complexity of the GSI authentication. The benefits introduced by this work are far-reaching for several user communities and applications. The valuable results depicted in this work can be easily extended to other scientific domains and different applications. The GENIUS Grid Portal and its features is the official portal of the GILDA t-Infrastructure  for Grid dissemination and training set up. It is managed by INFN in the context of the EGEE Projects, but some other regional Grid projects such as Trigrid  and PI2S2  are adopting the GENIUS portal with success, porting on the web interface many applications running on their infrastructure, being such a powerful gateway to the grid resources with the required security. The solution evaluated and described in this manuscript is not of course restricted to the GENIUS Grid Portal and can be easily extended to other portals.
We gratefully acknowledge all the people who supported this work contributing with ideas, requirements and feedback. This work was supported in part by the MUR FIRB LIBI "Italian Laboratory for Bioinformatics", LITBIO (RBLA0332RH), ITALBIONET (RBPR05ZK2Z_001) Italian projects and by the EGEE-III and BIOINFOGRID (contract number: 026808) European projects. We would like to warmly thank Jan Just Keijser firstname.lastname@example.org from NIKHEF for his technical support.
This article has been published as part of BMC Bioinformatics Volume 10 Supplement 6, 2009: European Molecular Biology Network (EMBnet) Conference 2008: 20th Anniversary Celebration. Leading applications and technologies in bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/10?issue=S6.
- Foster I, Kesselman K: The GRID: blueprint for a new computing infrastructure. San Francisco: Morgan Kaufmann; 1999.Google Scholar
- Andronico G, Barbera R, Falzone A, Lo Re G, Pulvirenti A, Rodolico A: GENIUS: a web portal for the grid. Nucl Instrument and Methods in Phy Res A. Visit also the official GENIUS web site 2003. [https://genius.ct.infn.it/]Google Scholar
- Barbera R, Falzone A, Ardizzone V, Scardaci D: The GENIUS Grid Portal: Its Architecture, Improvements of Features, and New Implementations about Authentication and Authorization. In WETICE 2007. 16th IEEE International Workshops on. Enabling Technologies: Infrastructure for Collaborative Enterprises; 2007.Google Scholar
- INFN CA[http://security.fi.infn.it/CA/]
- The Large Hadron Collider[http://www.cern.ch/lcg]
- The LIBI Italian Laboratory[http://www.libi.it/libi/ilprogettolibi]
- The EGEE Project[http://public.eu-egee.org/]
- Ronquist F, Huelsenbeck JP: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 2003, 19(12):1572–1574. 10.1093/bioinformatics/btg180View ArticlePubMedGoogle Scholar
- Maddison DR, Swofford DL, Maddison WP: NEXUS: an extensible file format for systematic information. Syst Biol 1997, 46(4):590–621. 10.2307/2413497View ArticlePubMedGoogle Scholar
- Altekar G, Dwarkadas S, Huelsenbeck JP, Ronquist F: Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics 2004, 20(3):407–415. 10.1093/bioinformatics/btg427View ArticlePubMedGoogle Scholar
- JST – Job Submission tool, De Sario G, Gisel A, Tulipano A, Donvito G, Maggi GP: "High-throughput GRID computing for Life Sciences, in Mario Cannataro (Ed.), Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine and Healthcare, IGI Global".[http://webcms.ba.infn.it/cms-software/index.html/index.php/Main/JobSubmissionTool] (to appear) – Visit also the JST web site
- Andronico G, Ardizzone V, Barbera R, Catania R, Falzone A, Giorgio E, La Rocca G, Monforte S, Pappalardo M, Passaro G, Platania G: GILDA: The Grid INFN Virtual Laboratory for Dissemination Activities. Testbeds and Research Infrastructures for the Development of Networks and Communities, 2005. Tridentcom 2005 – Visit also the official GILDA web site [https://gilda.ct.infn.it/]
- The TRIGRID Project[http://www.trigrid.it/]
- The PI2S2 Project[http://www.pi2s2.it/]
- The LITBIO Project[http://www.litbio.org/]
- The BIOINFOGRID Project[http://www.bioinfogrid.eu/]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.