Volume 9 Supplement 2
SWS: accessing SRS sites contents through Web Services
© Romano and Marra; licensee BioMed Central Ltd. 2008
Published: 26 March 2008
Web Services and Workflow Management Systems can support creation and deployment of network systems, able to automate data analysis and retrieval processes in biomedical research. Web Services have been implemented at bioinformatics centres and workflow systems have been proposed for biological data analysis.
New databanks are often developed by taking into account these technologies, but many existing databases do not allow a programmatic access. Only a fraction of available databanks can thus be queried through programmatic interfaces. SRS is a well know indexing and search engine for biomedical databanks offering public access to many databanks and analysis tools. Unfortunately, these data are not easily and efficiently accessible through Web Services.
We have developed ‘SRS by WS’ (SWS), a tool that makes information available in SRS sites accessible through Web Services. Information on known sites is maintained in a database, srsdb. SWS consists in a suite of WS that can query both srsdb, for information on sites and databases, and SRS sites. SWS returns results in a text-only format and can be accessed through a WSDL compliant client. SWS enables interoperability between workflow systems and SRS implementations, by also managing access to alternative sites, in order to cope with network and maintenance problems, and selecting the most up-to-date among available systems.
Development and implementation of Web Services, allowing to make a programmatic access to an exhaustive set of biomedical databases can significantly improve automation of in-silico analysis. SWS supports this activity by making biological databanks that are managed in public SRS sites available through a programmatic interface.
Technologies for the automation of biological data analysis
Biological data is available in heterogeneous information systems that are distributed over the Internet. In this situation, data integration is needed in order to achieve a better and wider view of available information, to automatically carry out analysis and/or searches involving more databases and software and to perform analysis involving large data sets. In this frame, the need is felt for a system that is able to improve the information accessibility by raising it at an automatic level.
Among current ICT technologies, Workflow Management Systems (WMS), in connection with Web Services (WS), seem to be the most promising ones. A recent paper presented a methodology for the automation of in-silico data analysis processes through workflow management systems that takes into account the synergic use of XML schema, XML data storage, data and task ontologies, Web Services, workflows systems and enactment portals .
More specifically, reasons for the setting up of WS in bioinformatics have already been presented by many authors [2–4]. WS have already been implemented by many Institutes and service centres in the biomedical field. Partial lists of Web Services for bioinformatics are available at the myGrid Wiki site  and in the Taverna web site . Also, Web Services can be retrieved and accessed through the MOBY Central , a WS archive based on BioMOBY , an open source software that implements an architecture for the discovery and distribution of biological data through Web Services.
Workflows are defined as “computerized facilitations or automations of a business process, in whole or part” (Workflow Management Coalition, WfMC) . Their goal is the implementation of data analysis processes in standardized environments and their main advantages relate to effectiveness, reproducibility, reusability of procedures and of intermediate results and traceability.
Some WMS have also been proposed in bioinformatics [10–18], the Taverna Workbench [19, 20] from the European Bioinformatics Institute (EBI, ) probably being the best known among open source applications developed by public research institutes. It is able to build complex analysis workflows, to access both remote and local processors of various kinds, to launch execution of workflows and to display different types of results, including text, web pages and various kinds of images and diagrams. Processors that can be used through the Taverna Workbench include Web Services. One kind of Web Services that can be accessed by using Taverna Workbench, are those implemented by using Soaplab [22, 23], a tool for the rapid deployment of Web Services developed at EBI by Martin Senger.
Extending available contents
New information sources are often developed by taking into account above mentioned technologies. Otherwise, many databases, developed during previous years, do not offer a programmatic access to them: this reduces the information that is available for automated analysis.
Some ways to interact with SRS 7 and to compose queries in its query language.
Show list of databanks
Show databank's information
Show databank field's information
Query multiple databanks
Query databank (return text only)
Query databank (return some fields only)
SRS used to be free for academic and no-profit institutes. Through its many public sites, it offers access to more than 1,000 databanks and about 180 analysis tools. The most important databank, like GenBank, Interpro, and Gene Ontology, can be included in many sites. This partial redundancy can be used to overcome possible sites' crashes and network faults that can occur at any time. Although some SRS libraries are subsets or subsections of other, this does not mean that they have less relevance and interest for researchers. For the sake of the SRS system they effectively are different databases, each of them having its special interest. Advantages of considering such libraries as real databases include, for subsets, improvement of performances, and, for subsections and views, simplified data management. The latter topic is especially important when considering automated analysis. Unfortunately, SRS libraries are not accessible through WS. So, a tool that would allow to interact with these databanks through Web Services would be extremely useful.
Previous work for Web Services for SRS libraries
To our knowledge, the only attempt to develop a system that is able to interact with SRS through Web Services was made by the group of Prof. Douglas Kell at the University of Manchester . They developed srs2acd, a set of java classes that are able to create AJAX Command Definition (ACD)  files for accessing each databank that is available in an SRS site, given its main URL. These files can then be used for deploying Web Services by using the Soaplab tool. Each one of the resulting Web Services is then specially devoted to a unique implementation, i.e. to one database in one SRS site. Main limitations of this tool are that each implementation has a related Web Service, thus producing a high number of services, and that access to alternative, but equivalent, WS is not automatically managed.
The list of public SRS sites
A list of SRS public sites  is maintained by BioWisdom Ltd . This information is available as a simple, partially structured, HTML page. The list allows to identify all available copies of a given database and to compare them on the basis of their number of entries. It also provides the status of each site at a given date, although it is only updated once per day.
As previously said, the same library can be included in many SRS sites. This duplication and partial overlap can support fault tolerance and help overcome sites' crashes and network faults. An analysis of the status of sites (active and non-active) over a period of 65 consecutive days showed that less than the 50% of sites was always active in that period.
Transparent access to SRS contents
A system for accessing SRS contents, without the need of specifying which implementation should be queried, would be useful. This ‘transparent’ access to SRS would allow researchers using workflow management systems to abstract their workflow from the connection details and would offer them a way for avoiding network problems. Such system should be able to check which sites are available at the time of execution of the workflow and select the ‘best’ site, i.e. the site including the most up-to-date information, among active sites.
In this paper, we present a set of Web Services that makes information available in public SRS sites accessible as a whole, not singularly, through a programmatic access. It enables WMS to access all active SRS sites and to query needed libraries. It also manages access to alternative, but equivalent implementations, by also selecting the most up-to-date among available systems.
A suite of Web Services
We have developed SWS (SRS by WS), a suite of WS allowing to query biological databases available in public SRS sites, without specifying which one, and to return results in a simple text-only format. It allows to check sites, to query selected libraries and to retrieve essential information on sites, such as lists of included databases and tools, and on databases, such as sites where they are implemented and related sizes/versions.
SWS can be invoked by specifying the name of the databank to be queried and the query terms. It then automatically choose the best site, performs the query and returns complete results. Users can also specify the following information: the SRS site to be queried, the fields where the information must be searched, the desired output fields.
Available Web Services
SWS currently includes five Web Services. The following three WS allow to retrieve information on available databases. getDBs retrieves acronyms of all libraries that are available in a specified site, or in all known sites, if none is specified. Similarly, getSites retrieves acronyms of all SRS sites that include a specified library, or of all sites if no library is specified. Finally, getImplementations retrieves all implementations of a specified library. These WS are only available for informative reasons, they do not actually query any SRS site. Instead, results can be used to identify available libraries and to choose sites and libraries to be queried in following steps.
The fourth WS, querySWS, allows to actually perform queries on a specified library. The query (i.e. terms that must be searched in the database) is a mandatory parameter. The site, instead, can be omitted. In this case, SWS identifies the best one by selecting, among those that are active, the site where that specific library has the greatest number of entries and, when more sites have the same number, the most recent version of SRS (this function is currently limited to SRS versions 6 and 7). Further parameters of this WS allow to determine which parts (fields) of the library must be queried, and which parts of the entries (records) must be returned.
Finally, the fifth WS, testSites, allows to check for the availability of a site at a given time. Input parameters are the acronym of the site to be checked, the number of retries (in case of initial failure) and the time between retries. All these parameters are optional. When no site is specified, all sites are tested.
SWS Web Services and related inputs and output.
Web Service name and description
getDBs Retrieve data from srsdb about libraries.
lib: the acronym of the library. Default value: ALL
Output: list of libraries' acronyms
getSites Retrieve data from srsdb about SRS sites.
site: the acronym of the site. Default value: ALL
Output: list of sites and related info
getImplementations Retrieve data from srsdb about implementations of libraries in SRS public sites.
lib: the acronym of the library. Default value: ALL
site: the acronym of the site. Default value: ALL
Output: list of implementations and related info
testSites Check actual availability of sites.
site: the acronym of the site. Default value: ALL
retries: the number of retries after an initial failure. Default value: 1
time: time between retries. Default value: 5 secs
activity: status of sites in one of the forms: <site> is active<site> is not active
querySWS Query a specified library and return entries related to queried terms.
lib: the acronym of the library. No default value.
site: the acronym of the site. Default value: best site
query: the query (terms to be queried). No default value.
in_fields: list of library's fields to be searched. Default: AllText
out_fields: list of library's fields to be returned. Default: all
Entries: library's entries matching query
Availability of the tool
SWS is available on-line . Through this site, users and software agents can retrieve WSDL descriptions of Web Services included in SWS. Some WSDL compliant WMS, such as Taverna Workbench, can directly interact with SWS through this interface and execute the services.
A support site for SWS is also available on-line . In this site, a description of the software and usage information of single WS are provided, together with contacts data. We plan to allow, in the near future, downloading all files needed to implement a local version of SWS, including data structure, scripts for retrieving information on available sites, ACD definitions for a Soaplab implementation of Web Services, scripts that actually implement the services and installation instructions. SWS is being developed as an open source and it is under refinement and continuous development.
We believe automation of data analysis and retrieval processes will offer bioinformatics the possibility of implementing a really machine-oriented, distributed analysis environment. For this to happen, the development and implementation of WS that allow to make access to an exhaustive set of biomedical databases and analysis software is needed. Only a few of the many biological data sources that are currently available on-line can be queried through standard programmatic interfaces. Our tool SWS contributes to overcoming this limitation by supporting programmatic access to known SRS sites. Contrary to other similar tools, SWS offers a unique point of access for all sites because it leverages from a database of available sites, databanks and implementations that is derived by BioWisdom's list of public SRS sites and kept up-to-date by using own scripts. BioWisdom's list is a useful reference enabling SWS not to start from scratch. It is used both as a starting list and to check for new sites. An exhaustive list of public SRS sites is very difficult to achieve, we hope that in the near future we can set up an alternative list and receive information on new sites. By querying this database, SWS can determine which is the best site for each databank at any given time, and it therefore overcomes, at least partially, possible problems arising from sites' crashes and network faults. This is achieved b using the number of records included in the databank since, usually, this increases each time a new version of the databank is released. Information regarding databases' version is not usually available and, thus, it can only rarely be used. SWS also allows for the creation of workflows that are not dependent on one single site and can, therefore, work more consistently.
SWS currently presents some limitations that we plan to overcome in the near future. It is currently not able to query sites using SRS 8, due to the difficulty in building queries through wgetz with this version. This limitation can be considered a minor one, since the vast majority of public SRS sites is still based on SRS 7. We plan to overcome this limitation by collaborating with administrators of SRS sites with specific expertise on this version. Another limitation of SWS is that it can presently only be efficiently used when the SRS configuration of a databank is known. In this case, both searched and retrieved fields can be specified. This can be overcome by adding support for the retrieval of descriptions of databanks fields. Another limitation refers to the scarce information that SWS reports about the databanks and sites it used for a query. In fact, it's true that users are not informed about which data set was used. This problem is going to be faced by including detailed information on data provenance, mainly comprising date, time, site, database, db version, number of records. Finally, access to tools (analysis software) that are available in SRS sites is not possible. We don't see this point as a limitation, because of the availability of alternative Web Services offering access to this software, like, e.g., EMBOSS  related ones.
Web Services are the most promising among ICT tools, in view of the automation of network based data retrieval and analysis in biology. We developed SWS, a suite of Web Services that support query and retrieve of data from databases included in public SRS sites. These Web Services can increase the amount of data that currently is available for the setting up of complex workflows and can in fact improve automation of in-silico analysis by extending possible applications.
SWS is available for interested researchers through their workflow management systems, provided they are SOAP compliant and can use WSDL descriptions. The tool will soon be available for downloading from the SWS support site for local implementations.
SWS is currently being further developed in the sphere of the Laboratory of Interdisciplinary Technologies in Bioinformatics – LITBIO .
The list of public SRS sites
Information published in BioWisdom list of public SRS sites.
Information on the sites
Short acronym of the site, often the acronym of the institute where it resides
Long description of the site, usually the name and country of the Institute hosting the site
The URL for the home page of the SRS site (not the institute)
No of libraries and tools at the site
The number of libraries and the number of tools that are available at the site. Sites that are not currently active are labelled here with ‘Could not access site’
SRS version of the site
The number of the version of SRS that is available at the site. It ranges from 6 to 8. The most frequent version is 7.1.x .
Information on the library
Short acronym of the library, usually an internationally known acronym
Information on the implementation of a library in a site
Acronym of site, same as above
The URL of the CGI program (either wgetz or srs, depending on SRS version). Specific parameters and options are appended to this URL to compose the query.
No of entries
Number of entries of the library in the current site
Information on the implementation of a tool in a site
Acronym of the site, same as above
The URL of the CGI program, same as above for libraries
The list allows to identify all available copies of a given database. Unluckily, not all SRS sites include information on databases' version, so this data cannot be regularly used. In order to compare databanks, the number of their entries can be used instead. Usually, this number depends on the version of the database and it is higher for later releases. So, one can assume that the higher is the number of entries of a database, the most recent is the release.
The local database srsdb
Web Services were implemented by means of perl scripts. These can also be run through command lines. Two types of services were implemented, one for querying srsdb and one for querying and checking SRS sites. Services and command lines usage information are available in additional file 1.
The first three scripts, i.e. getDBs, getSites and getImplementations, just interpret line arguments, query the srsdb database and return results to standard output. The fourth script, querySWS, initially queries the local database for retrieving information on available sites and databases that is needed to perform the remote query and then it carries out the actual request by searching the chosen site. Results are retrieved by using the GNU Wget non-interactive network downloader and returned to standard output, after checking for errors. The last script, testSites, retrieves from srsdb information on sites to be checked and then check if they are active, returning this information to the user.
Web Services deployment
Web Services have been deployed by using Soaplab [22, 23], a SOAP-based Analysis Web Service tool providing a programmatic access to local, command-line applications and to the contents of ordinary web pages. The only requirements of Soaplab are the Apache Tomcat servlet engine  with the Axis SOAP toolkit , a Java Virtual Machine  and, optionally, perl  and MySQL. Once the server has been installed, new Web Services are deployed (added to the system) by defining simple descriptions of related execution commands. Definitions are written in the AJAX Command Definition (ACD) language  and are then converted into XML before they can be used by remote users.
List of abbreviations used
AJAX Command Definition
European Bioinformatics Institute
European Molecular Biology Open Software Suite
HyperText Markup Language
Information and Communication Technology
Laboratory of Interdisciplinary Technologies in Bioinformatics
Simple Object Access Protocol
Sequence Retrieval System
Structured Query Language
SRS by Web Services
Uniform Resource Locator
World Wide Web Consortium
Workflow Management Coalition
Workflow Management System
Web Services Description Language
Extensible Markup Language
This work was partially supported by the Italian Ministry of Education, University and Research (MIUR), project “Laboratory of Interdisciplinary Technologies in Bioinformatics - LITBIO”. Our system is partially based on open source and is available under GNU Lesser General Public Licence (LGPL).
This article has been published as part of BMC Bioinformatics Volume 9 Supplement 2, 2008: Italian Society of Bioinformatics (BITS): Annual Meeting 2007. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/9?issue=S2
- Romano P: Automation of in-silico data analysis processes through workflow management systems. Briefings in Bioinformatics 2008, 9(1):57–68. doi: 10.1093/bib/bbm056View ArticlePubMedGoogle Scholar
- Stein L: Creating a bioinformatics nation. Nature 2002, 417: 119–120.View ArticlePubMedGoogle Scholar
- Jamison DC: Open Bioinformatics. Bioinformatics 2003, 19(6):679–680. editorialView ArticlePubMedGoogle Scholar
- Neerincx P, Leunissen J: Evolution of web services in bioinformatics. Briefings in Bioinformatics 2005, 6(2):178–188.View ArticlePubMedGoogle Scholar
- List of Web Services at myGrid Wiki site [http://www.mygrid.org.uk/wiki/Mygrid/BiologicalWebServices]
- List of Web Services at Taverna Workbench web site [http://taverna.sourceforge.net/index.php?doc=services.html]
- MOBY Central[http://moby.ucalgary.ca/moby/MOBY-Central.pl]
- Wilkinson MD, Links M: BioMOBY: an open-source biological web services proposal. Briefings in Bioinformatics 2002, 3(4):331–341.View ArticlePubMedGoogle Scholar
- Workflow Management Coalition (WfMC) [http://www.wfmc.org/]
- Altintas I, Berkley C, Jaeger E, Jones M, Ludascher B, Mock S: Kepler: an extensible system for design and execution of scientific workflows, In Proceedings of 16th International Conference on Scientific and Statistical Database Management (SSDBM'04), 21–23 June 2004, Thira (Santorini), Greece 2004. IEEE Computer Society Digital Library 2004, 423–424.Google Scholar
- Tang F, Chua CL, Ho LY, Lim YP, Issac P, Krishnan A: Wildfire: distributed, Grid-enabled workflow construction and execution. BMC Bioinformatics 2005, 6(1):69.PubMed CentralView ArticlePubMedGoogle Scholar
- Leo P, Marinelli C, Pappadà G, Scioscia G, Zanchetta L: BioWBI: an Integrated Tool for building and executing Bioinformatic Analysis Workflows. In In Proceedings of the 1st Annual Meeting of the Bioinformatics Italian Society (BITS2004). Padua, Italy; 2004:25. 26–27 March 2004Google Scholar
- Taylor I, Wang I, Shields M, Majithia S: Distributed computing with Triana on the Grid. Concurrency and Computation: Practice and Experience 2005, 17(9):1197–1214.View ArticleGoogle Scholar
- Churches D, Gombas G, Harrison A, Maassen J, Robinson C, Shields M, Taylor I, Wang I: Programming Scientific and Distributed Workflow with Triana Services. In Concurrency and Computation: Practice and Experience (Special Issue: Workflow in Grid Systems) 2006, 18(10):1021–1037.View ArticleGoogle Scholar
- Ben Miled Z, Gao N, Bukhres O, Lu L, Li N, He Y, Mahoui M, Chen J: SIBIOS: A System for the Integration of Bioinformatics Services. In In Proceedings of the Second International Workshop on Challenges of Large Applications in Distributed Environments (CLADE). Honolulu, Hawaii. IEEE Xplore; 2004:74–83. 2004, 7 June 2004. doi: 10.1109/CLADE.2004.1309094Google Scholar
- Deelman E, Singha G, Sua M-H, Blythe J, Gil Y, Kesselman C, Mehta G, Vahia K, Berriman GB, Good J, Laity A, Jacob JC, Katzc DS: Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Scientific Programming 2005, (13):219–237.
- Shah SP, He DY, Sawkins JN, Druce JC, Quon G, Lett D, Zheng GX, Xu T, Ouellette BF: Pegasys: software for executing and integrating analyses of biological sequences. BMC Bioinformatics 2004, 5: 40.PubMed CentralView ArticlePubMedGoogle Scholar
- Bartocci E, Corradini F, Merelli E, Scortichini L: BioWMS: a web based Workflow Management System for Bioinformatics. BMC Bioinformatics 2007, 8(Suppl 1):S2.PubMed CentralView ArticlePubMedGoogle Scholar
- Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, Carver T, Glover K, Pocock M, Wipat A, Li P: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 2004, 20(17):3045–3054.View ArticlePubMedGoogle Scholar
- Hull D, Wolstencroft K, Stevens R, Goble C, Pocock MR, Li P, Oinn T: Taverna: a tool for building and running workflows of services. Nucleic Acids Research 2006, W729-W732.Google Scholar
- European Bioinformatics Institute [http://www.ebi.ac.uk/]
- Soaplab [http://www.ebi.ac.uk/Tools/webservices/soaplab/overview]
- Senger M, Rice P, Oinn T: Soaplab - a unified Sesame door to analysis tools. In In Proceedings of UK e-Science, All Hands Meeting Edited by: Edited by Simon J Cox. 2003, 509–513. 11–9, September 2003Google Scholar
- Etzold T, Ulyanov A, Argos P: SRS: information retrieval system for molecular biology data banks. Meth Enzymol 1996, 266: 114–128.View ArticlePubMedGoogle Scholar
- SRS at BioWisdom[http://www.biowisdom.com/navigation/srs/srs]
- Prof. Kell Group at the University of Manchester [http://dbkgroup.org/soaplab/]
- AJAX Command Definition [http://emboss.sourceforge.net/developers/acd/]
- List of public SRS sites [http://downloads.biowisdomsrs.com/publicsrs.html]
- BioWisdom [http://www.biowisdom.com/]
- SWS services [http://bioinformatics.istge.it:8080/axis/services]
- SWS support site [http://bioinformatics.istge.it/sws/]
- Rice P, Longden I, Bleasby A: EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics 2000, 16(6):276–277.View ArticlePubMedGoogle Scholar
- Laboratory of Interdisciplinary Technologies for Bioinformatics (LITBIO) [http://www.litbio.org/]
- MySQL site [http://www.mysql.com/]
- Tomcat Apache servlet site [http://tomcat.apache.org/]
- Axis toolkit site [http://ws.apache.org/axis/]
- Java programming language [http://java.sun.com/]
- perl programming language [http://www.perl.org/]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.