Seahawk: moving beyond HTML in Web-based bioinformatics analysis
© Gordon and Sensen; licensee BioMed Central Ltd. 2007
Received: 03 April 2007
Accepted: 18 June 2007
Published: 18 June 2007
Traditional HTML interfaces for input to and output from Bioinformatics analysis on the Web are highly variable in style, content and data formats. Combining multiple analyses can therfore be an onerous task for biologists. Semantic Web Services allow automated discovery of conceptual links between remote data analysis servers. A shared data ontology and service discovery/execution framework is particularly attractive in Bioinformatics, where data and services are often both disparate and distributed. Instead of biologists copying, pasting and reformatting data between various Web sites, Semantic Web Service protocols such as MOBY-S hold out the promise of seamlessly integrating multi-step analysis.
We have developed a program (Seahawk) that allows biologists to intuitively and seamlessly chain together Web Services using a data-centric, rather than the customary service-centric approach. The approach is illustrated with a ferredoxin mutation analysis. Seahawk concentrates on lowering entry barriers for biologists: no prior knowledge of the data ontology, or relevant services is required. In stark contrast to other MOBY-S clients, in Seahawk users simply load Web pages and text files they already work with. Underlying the familiar Web-browser interaction is an XML data engine based on extensible XSLT style sheets, regular expressions, and XPath statements which import existing user data into the MOBY-S format.
As an easily accessible applet, Seahawk moves beyond standard Web browser interaction, providing mechanisms for the biologist to concentrate on the analytical task rather than on the technical details of data formats and Web forms. As the MOBY-S protocol nears a 1.0 specification, we expect more biologists to adopt these new semantic-oriented ways of doing Web-based analysis, which empower them to do more complicated, ad hoc analysis workflow creation without the assistance of a programmer.
The MOBY-S protocol
A key aspect to chaining together services is the ability to directly use output from one service as input to another. In the past, in order to achieve data compatibility between programs, developers would modify existing analysis software and repackage it, or develop completely new programs suites. The most prominent examples of these two approaches were the command-line suites GCG  and EMBOSS  respectively. In a Semantic Web approach, MOBY-S defines a centralized, world-writable data-type ontology to promote a comprehensive common semantic for biological data. Actual data instances passed on the Web have a standardized XML representation. Several graphical utility programs exist  to allow developers to easily browse and edit the service and data-type ontologies, and to register new services. The four MOBY ontologies (see Figure 1) are represented using a combination of OWL and RDF (see ), the foundations for the W3C's vision of the Semantic Web. For processing simplicity MOBY uses only a small subset of OWL's expressive power.
MOBY-S clients and their audiences
MOBY-S, as it approaches a stable 1.0 specification, has the potential to unify analysis in ways other Semantic Web efforts in the Life Sciences to date have not . A large amount of effort thus is being spent in making accessible clients. Currently, there are at least 10 different MOBY-S client programs. They serve a diverse range of niche audiences, from programmers through to average computer users. Programs can be subdivided into three categories based on user skill they assume, from most to least:
Do construction of a workflow before data instances can be created (visual programming)
Dynamically build service options based on an entered data instance (standalone browsers)
Execute MOBY-S Services from within another application (embedded browsers)
In the category of visual programming tools are Taverna  and REMORA . With its MOBY-S plug-in , Taverna is a Java application which allows the user to build MOBY-S workflows, and then execute the workflows on data loaded from a file, or entered manually. The development of workflows requires a degree of patience and visual programming skills, as the user is not logically guided from one action to the next. Taverna's popularity stems not from simplicity of use, but from its flexibility, robustness, and support for invoking virtually any WSDL-describe Web Service. Taverna provides the ability to execute the workflow over large lists of input, making it ideal for "power-users" who want to process large datasets, but lack traditional programming skills. Settings, such as the time between successive calls to a service, can be set to prevent overloading service providers. REMORA, on the other hand, is HTML-based and acts somewhat more like a browser than Taverna: services are added sequentially, with a list of valid service options presented automatically for every input/output in turn. The user selects the MOBY-S data ontology type and namespace from a list, and object details are filled in after the workflow construction is complete. The workflow is executed, and the user is notified by e-mail. In the results page, each octagonal shape in the workflow is hyperlinked to a simple HTML representation of the data at that stage.
In the category of standalone browsers, from most extensive to simplest user interface, are Dashboard , MOWserv , Ahab  and Gbrowse_moby . Dashboard is a Java application to help MOBY-S service providers register and deploy their services. It includes an interface to create and display MOBY-S Objects, primarily for service testing purposes. Dashboard is a developer-centric interface, as it exposes many of the details of the MOBY-S protocol (including the underlying XML), is service-oriented, and does not choreograph multiple chained invocations. MOWserv is an HTML-based browser where users select data-types from the MOBY-S Data Type Ontology, then fill in the required fields in a form. They proceed to the "Objects" tab, and click on the data to display a list of available services. Service executions are performed asynchronously: they are stored in the "Tasks" tab, and the user checks the status of submitted jobs periodically. While MOWServ provides significant guidance to a user, a drawback of the task queue and tab organization is that the analysis has neither a direct workflow representation for programmers, nor does it sequentially invoke a chain of services as a biologists might expect.
Turning to more biologist-oriented clients, Ahab is an HTML-interface, where available services are shown in a hierarchy and interactive tips about namespaces, data-types and services are displayed. Data entry is simplified by Ahab's exclusive use of basic MOBY-S Objects (database namespace + ID only) to seed the analysis, even if it does somewhat limit the type of analysis available. It also provides an intuitive service-selection hierarchy and logical service chaining. Unfortunately, both the directed graph (default) and text views are cluttered by data structures and relationships only intelligible to those familiar with MOBY-S's RDF  technical details. Compare this with Gbrowse_moby, the original MOBY-S client: it provides the simplest interface of all the clients, using hyperlinks on data to chain services together in a Web browser. Like Ahab, it is restricted to basic object input, but unlike Ahab does not display any ontology hierarchies. The textual and sometimes graphical representation of Gbrowse_moby's output is both succinct and sufficient for the vast majority of Bioinformatics data (e.g. sequences, their alignments and annotations).
In the embedded browser category, several applications directly use the Java, Python or Perl MOBY-S libraries to find and/or execute MOBY-S Services. They internally create MOBY-S XML object representations for submission, and parse the service results back into some native display of the application. Such applications include BioTrawler , which visualizes protein interaction networks, and both BioFloWeb and AtiDB Client, which implicitly use the European Plant Network's MOBY-S services . Because the use of MOBY-S is programmatic, these types of applications do not have data-type and service selection interfaces, nor a MOBY-specific display interface. An exception to this rule is Seahawk: it is a standalone browser, but it can be easily embedded in existing Java applications as a pop-up menu, as in the genome browser Bluejay . This functionality is described in more detail in the Methods section.
Defining the biologist's needs
Each of the clients described above serves a niche user type, but how can we get even more biologists to adopt Semantic Web Services? Based on the strengths and weaknesses of those client, a need was identified for an improved way for biologists to access MOBY-S services. The salient observations about existing software are:
All of the interfaces either accept only simple objects (namespace and id, in Gbrowse), or require a user to build composite objects piece-by-piece. This somewhat limits the type of analysis possible in the former case, and requires an intimate knowledge of MOBY-S's ontologies and data structures in the latter.
All of the interfaces require a user to go to a particular Web page (or a CVS download in the case of Dashboard), and manually input data. This manual effort requires the user to already be familiar MOBY-S's object and namespace ontologies, in order to formulate the data. Users are also required to break away from their other applications to use MOBY-S.
As most data in Bioinformatics is textually represented, hypertext (HTML) interfaces are the most natural fit for displaying data (and hence its popularity as a presentation medium for MOBY client software so far). While HTML pages are easy for biologists to work with, for any given hypertext client described here, there are different pages associated with 1) MOBY data input, 2) MOBY data display and 3) MOBY service selection. Users must constantly flip between service and data page "modes" to chain together an analysis.
Using visual programming tools creates reuseable workflows, but they are relatively difficult for biologists to use, compared to browsing in the other clients.
To address these issues, Seahawk attempts to provide:
Creating Input: The ability to modify and extend the automated linking of existing Bioinformatics data to MOBY-S Service (and seed analysis with composite MOBY-S Objects).
Embedding: The ability to easily link MOBY-S Services into existing Bioinformatics software.
Browser Interface: More interactivity versus the HTML interfaces previously described, and improved usability versus the visual programming tools for the most common types of analysis.
Output: The ability to create workflows more easily than the visual programming clients.
With the exception of MOWServ (where objects with many fields can be built manually), the HTML-based interfaces for MOBY-S are all seeded with basic MOBY-S Objects having a (namespace, id) tuple. This assumes first of all that the user is accessing a piece of data already in existence in a database, and that the database is connected to MOBY. Unfortunately, both assumptions are often false. Users may be interested in analyzing a new sequence they have just elucidated in the lab, have yet to submit to a public database (pre-publication), or any one of many other reasons. Even if they are accessing a published piece of data, it is quite possible that the database they are using has not yet been "hooked into" MOBY-S by any developers.
Plain text (e.g. a FastA formatted file)
HTML (e.g. an NCBI Entrez Web Page)
Rich Text (e.g. a conference proceedings)
MOBY-S object XML representation (e.g. output from a MOBY-S Service)
Data can be loaded from file:, ftp:, or http: URLs using the disk icon in Seahawk, or by simply using cut and paste, or drag and drop facilities of the operating system. This input flexibility means that the user's existing desktop files, Web links, and highlighted parts of Web pages (e.g. an NCBI Genbank entry page) can be directly manipulated and used as Seahawk analysis input.
Highlighted text is automatically turned into a MOBY-S String
Seahawk will create a MOBY-S DNASequence, RNASequence or AminoAcidSequence if 95% of the text characters are valid for that sequence type (the 5% exception is meant to deal with formatting characters, such as position numbers in the leading columns of GenBank records). The invalid characters are stripped from the data.
The text is tested against a set of regular expression rules
The regular expression and XPath rules are specified in a special rules file described in the Methods section. The three sequence object types described above are the only ontology terms hardcoded into Bluejay, but could be overridden by new regex rules if these terms change.
Embedding Seahawk in other applications
Using and improving the Web browser paradigm
This interface makes the inquiry task and data-centric, rather than service-centric: the users ask themselves what they should do next with the data, not what data do they need to run a particular service. Seahawk improves upon the other HTML-based MOBY-S clients by avoiding constant browser-page changes. This is accomplished by displaying service choices and input parameters as pop-up menus and dialog windows respectively.
By launching a dialog rather than loading an HTML-form in the browser, Seahawk maintains its browser-display-equals-service-results philosophy. Secondary input does not enter the browser history, and because it is non-modal, the user may defer service execution while they explore other analysis choices. The user can avoid the dialog altogether, and simply use the default values, by holding down the Control key while selecting the service. This feature makes service navigation even simpler.
While Seahawk has no facility to edit the MOBY-S Objects displayed (as the edits might break logical or biological constraints of the object), the user may put object collections, individual objects, or object members on the clipboard using the "Add to clipboard" option available from every service selection menu. The clipboard allows the user to pick salient data from any step of the analysis for use later on, providing a way to arbitrarily combine information from multiple services (pages) or analysis branches (tabs).
The Seahawk software described here consists of approximately 15,000 lines of Java code. Seahawk also uses some existing Java code from the BioMOBY public code repository (hosted at cvs.open-bio.org). A theme throughout the implementation of Seahawk is to lower the entry barrier to the Semantic Web for users and developers. This is achieved in practice via the use of several XML technologies (DOM, XSLT, XPath ) so that customization of Seahawk can be done without Java coding. The use of declarative programming (e.g. XSLT, XPath) for customization are numerous in the context of semantic data manipulation (paper in preparation), such as modularity, security, and low developer buy-in. The reader is directed to the BioMOBY Website  for complete, concrete examples of the customization methodologies described in this section.
Seahawk converts raw MOBY-S XML returned from services into HTML suitable for display in a javax. swing.JEditorPane. This conversion is done using an XSLT processor (discovered at run-time using the JAX-T API), and an XSLT style sheet. The conventions used for the transformation and subsequent display are:
Seahawk will interpret any URL with a numeric XPointer (e.g. file:///foo.xml#/1/2/3/4/4) as a link to part of a MOBY-S XML document, and hence will automatically provide MOBY-S Service links when clicked, by parsing the data at that XPointer
Hyperlinks of the form http://moby/namespace value?id=id_value will be used by Seahawk to construct basic MOBY-S database identifier objects, for linking out to relevant MOBY-S Services.
All other hyperlinks will be launched into an external browser (e.g. Firefox or Internet Explorer)
The underlying XML representation of the semantic data is always retained, even if the HTML interface is changed via new style sheets. This means that there are no potential risk of Seahawk inadvertently changing the data as it brokers the passage of messages from one MOBY-S Service to the next.
Creating MOBY-S data With a rules file
Seahawk provides the ability to map unstructured text, or any XML document data, into MOBY-S semantic data via a rules file. The rule set can be easily augmented as developers adopt Seahawk for their data. The rules file is written in XML, with a base element called mappings, which holds any number of object children.
Seahawk data rule format: regular expressions. Complete rules file, containing one regular expression rule for creating a basic MOBY-S NCBI Global Identifier record from a string, such as "gi"122354" or "GI:636353". Captured groups from the regex can be used to populate the MOBY-S Object fields using the standard Perl and Java syntax ($1, $2, etc.).
<regex>(?:GI|gi) [:|](\ d+)</regex>
Seahawk data rule format: XPath expressions. XPath-based rule for creating a basic MOBY-S Gene Ontology record from a DOM data source, in this case an AGAVE XML document. The DOM context node for the XPath evaluation is determined by the application.
<!-- Build a MOBY Object in the Gene Ontology namespace-->
<!-- Find gene elements w/GO classification children-->
<!-- Find the ID attribute of the above xpath result-->
Note that the "./@id" in the namespace rule is another XPath statement. Its context is the results of the xpath rule, and fetches the id attribute of the AGAVE classification element.
Embedding Seahawk in other applications
To simplify the use of Seahawk as a component inside another program (such as the Bluejay example given earlier), a specialized java.lang.ClassLoader was written. This ClassLoader captures all of the classes required to run Seahawk (determined by running a series of automated tests), and puts them in one JAR file. This minimalist JAR builder allows any developer to include a single JAR file in their program to access Seahawk. It is also used to minimize the applet download size. The file contains just the relevant classes to run Seahawk – such as those from the BioMOBY CVS, and from The Apache Foundation's  Axis (SOAP), Xalan (XSLT), Xerces (XML parsing) and XPath packages. Whereas these packages in their totality constitute about 20 MB in JAR files, the minimized package provides a standalone, fully functional MOBY-S Services browser in less than 3 MB.
Integrating Seahawk into other Java code. Complete Java code required to integrate Seahawk into a DOM-based application. The developer might also want to add rules specific to their application.
MobyContentGUI mGUI = MobyUtils.getMobyContentGUI(null);
//Pick W3C DOM node 'contextNode' for rules eval, then...
popup = new JPopupMenu();
//Evaluate Seahawk's XPath rules, from that node
mGUI.addPopupOptions(contextNode, popup, true);//true=async
Seahawk improves the user experience over existing MOBY-S clients with two main features: pop-up menus from hyperlinks and clipboard functionality.
Introducing hyperlinked pop-up menus to display service options has several advantages. First, the user is not sent to a new page to select the service (as happens in other clients). Treating services as hyperlinks between input and output data maintains a data-centric browsing experience for end-users. Second, the pop-up menu does not occupy any screen real-estate when not in use, but still provides a detailed (tool-tips) logical (ontology-based) hierarchy when in use. Third, the hyperlinks allow for easy object decomposition because they can be inserted for each object member without affecting the display's readability.
The clipboard helps Seahawk cross over from purely a browser to a browser/editor hybrid. The clipboard acts as a collator to MOBY-S Object Collections, allowing users to combine objects as they see fit. It also allows a user to temporarily keep data from various steps of the analysis, without keeping many tabs open. Individual members of a composite object can be chosen and added to the clipboard too, facilitating MOBY-S Object decomposition. The clipboard, like any tab, can be saved to disk, and reopened in another Seahawk session in the future.
Seahawk introduces three novel features to Web Services clients in general: data-creation-by-highlight, rule-based systems for data mapping, and service-interface-as-component for application integration. Data import and data-creation-by-highlighting together provide an important facility to the biologist: creating MOBY-S Objects with semantic meaning out of plain text. This allows the user to import an array of existing text-based data into a Semantic Web Service system, including the many standard Web resources the user is familiar with already. Such a bridge from the existing Web to the Semantic Web is essential to user adoption. Highlighting is also especially important to biologists because it allows them to easily select subsequences of DNA and protein that they deem biologically meaningful.
The unique regular expression and XPath based rule system for creating MOBY-S Objects improves the user experience both directly and indirectly. In addition to being the mechanism by which highlighting text generates structured data objects, it allows Seahawk to directly "hook into" XML-based third-party applications. Users indirectly benefit too: the rules system allows developers to easily add new data mappings, and hence new analysis possibilities.
The visual simplicity of pop-up menu service selection helps make Seahawk blend in with external applications that use it as a helper component. The focus on making Seahawk a small JAR, with an easy to use API, is meant to encourage the embedding of Seahawk within Java applications. By integrating Seahawk into their existing applications, Bioinformatics developers can provide the power of Semantic Web Services to the end-user without making them go to a separate application, and manually transfer the relevant data.
Traditionally, Web Services have been oriented towards developers, who predetermined the service to be called, then wrapped the service execution and response within another program. The real key to empowering the biologist is to have domain-specific ontologies that can help the user, rather than the programmer, select appropriate data and analysis options. The MOBY-S system provides such ontologies for Bioinformatics.
Seahawk is a MOBY-S client built on the foundation of the Web-browser interface, familiar to virtually all potential users, not just developers. Seahawk hides all of the underlying implementation details of MOBY-S from the user, lowering the barrier to using Semantic Web Services. Many features of Seahawk can be classified as either "improved" or "new" based on their degree of novelty compared to other Web Services software and especially other MOBY-S clients. To improve the end-user experience, the key on the front-end is the incorporation of UI elements that keep the experience data-centric, treating services as links between data. The key on the back-end is making it as easy as possible to create semantic data from data the user is already familiar with (primarily Web pages and flat-file records), addressed with a novel regular expression/XPath rule system, and application embedding.
Much Bioinformatics analysis happens on the Web because information and resources are scattered amongst many labs. There are three key actors in the Semantic Web for Life Sciences, users (biologists), application developers, and service providers. Seahawk lowers the barriers for user and developer adoption. Adoption of MOBY-S by service providers is gaining momentum as the protocol approaches version 1.0. A critical mass of all three actors will allow us to empower the biologist to seamlessly perform multi-step analysis in this largely Web-based field.
Availability and requirements
Project home page
Java 1.5 or higher
GNU Lesser General Public License (LGPL)
Any restrictions to use by non-academics
This work was supported by Genome Canada through Genome Alberta's Integrated and Distributed Bioinformatics Platform Project, as well as by The Alberta Science and Research Authority, Western Economic Diversification, The Alberta Network for Proteomics Innovation and the Canada Foundation for Innovation. CWS is the iCORE/Sun Microsystems Industrial Chair for Applied Bioinformatics.
Seahawk makes use of some jMOBY API code written by Martin Senger (International Rice Research Institute) and Eddie Kawas (University of British Columbia). We would also like to thank Dr. Michael Shepherd of Dalhousie University for his critical review of an earlier version of this manuscript.
- Wilkinson M, Links M: BioMOBY: an open source biological web services proposal. Briefings in Bioinformatics 2002, 3(4):331–341. 10.1093/bib/3.4.331View ArticlePubMedGoogle Scholar
- Web services description language[http://www.w3.org/TR/wsdl]
- Womble D: GCG: The Wisconsin Package of sequence analysis programs. Methods in Molecular Biology 2000, 132: 3–22.PubMedGoogle Scholar
- Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends in Genetics 2000, 16(6):276–277. 10.1016/S0168-9525(00)02024-2View ArticlePubMedGoogle Scholar
- Moby registry clients[http://biomoby.open-bio.org/index.php/moby-clients/registry_clients]
- Good B, Wilkinson M, Links M: The Life Sciences Semantic Web is full of creeps! Briefings in Bioinformatics 2006, 7(3):275–286. 10.1093/bib/bbl025View ArticlePubMedGoogle Scholar
- Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, Carver T, Glover K, Pocock M, Wipat A, Li P: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 2004, 20(17):3045–54. 10.1093/bioinformatics/bth361View ArticlePubMedGoogle Scholar
- Carrere S, Gouzy J: REMORA: a pilot in the ocean of BioMoby web-services. Bioinformatics 2006, 22(7):900–1. 10.1093/bioinformatics/btl001View ArticlePubMedGoogle Scholar
- Kawas E, Senger M, Wilkinson MD: BioMoby extensions to the Taverna workflow management and enactment software. BMC Bioinformatics 2006, 30(7):523. 10.1186/1471-2105-7-523View ArticleGoogle Scholar
- BioMoby Dashboard[http://biomoby.open-bio.org/CVS_CONTENT/moby-live/Java/docs/Dashboard.html]
- Navas-Delgado I, del Mar Rojano-Munoz M, Ramirez S, Perez A, Leon E, Aldana-Montes JF, Trelles O: Intelligent client for integrating bioinformatics services. Bioinformatics 2006, 22: 106–11. 10.1093/bioinformatics/bti740View ArticlePubMedGoogle Scholar
- Ahab, a tool for surfing the BioMoby sea[http://bioinfo.icapture.ubc.ca/bgood/Ahab.html]
- Wilkinson M: Gbrowse Moby: a Web-based browser for BioMoby Services. Source Code Biol Med 2006, 1: 4. 10.1186/1751-0473-1-4PubMed CentralView ArticlePubMedGoogle Scholar
- Resource Description Framework[http://www.w3.org/RDF/]
- Wilkinson M, Schoof H, Ernst R, Haase D: BioMOBY successfully integrates distributed heterogeneous bioinformatics Web Services. The PlaNet exemplar case. Plant Physiology 2005, 138: 5–17. 10.1104/pp.104.059170PubMed CentralView ArticlePubMedGoogle Scholar
- Turinsky A, Ah-Seng A, Gordon P, Stromer J, Taschuk M, Xu E, Sensen C: Bioinformatics visualization and integration with open standards: the Bluejay genomic browser. In Silico Biology 2005, 5(2):187–198.PubMedGoogle Scholar
- States D: Bioinformatics code must enforce citation. Nature 2002, 417: 588. 10.1038/417588bView ArticlePubMedGoogle Scholar
- Extensible Markup Language[http://www.w3.org/xml]
- The Seahawk MOBY End-User Applet[http://biomoby.open-bio.org/CVS_CONTENT/moby-live/Java/docs/Seahawk.html]
- The Apache Software Foundation[http://www.apache.org/]
- Ferrer-Costa C, Gelpi J, Zamakola L, Parraga I, de la Cruz X, Orozco M: PMUT: a web-based tool for the annotation of pathological mutations on proteins. Bioinformatics 2005, 21: 3176–8. 10.1093/bioinformatics/bti486View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.