Workflow-driven software design
The design of a software platform that supports the development of protocols and data management in an experimental context has to be based on and directed by the laboratory workflow. The laboratory workflow can be divided into four principal steps: 1) project definition phase, 2) experimental design and data acquisition phase, 3) data analysis and processing phase and 4) data retrieval phase (Figure 1).
Project definition phase
A scientific project starts with a hypothesis and the choice of methods required to address a specific biological question. Already during this initial phase it is crucial to define the question as specifically as possible and to capture the information in a digital form. Documents collected during the literature research should be collated with the evolving project definition for later review or for sharing with other researchers. All files collected in this period should be attached to the defined projects and experiments in the software.
Experimental design and data acquisition
Following the establishment of a hypothesis and based on preliminary experiments, the detailed design of the biological experiments is then initiated. Usually, the experimental work follows already established standard operating procedures, which have to be modified and optimized for the specific biological experiment. These protocols are defined as a sequence of protocol steps. However, well-established protocols must be kept flexible in a way that particular conditions can be changed. The typically changing parameters of standard protocol steps (e.g. fixation times, temperature changes etc.) are important to record as they are used to improve the experimental reproducibility.
Equipped with a collection of standard operating procedures, an experiment can be initiated and the data generated. In general, data acquisition comprises not only files but also observations of interest, which might be relevant for the interpretation of the results. Most often these observations disappear in paper notebooks and are not accessible in a digital form. Hence, these experimental notes should be stored and attached to the originating protocol step, experiment or project.
Data analysis and processing
After storing the raw result files, additional analysis and post-processing steps must be performed to obtain processed data for subsequent analysis. In order to extract information and to combine it in a statistically meaningful manner, multiple data sets have to be acquired. The software workflow should enable also the inclusion of external analytical steps, so that files resulting from external analysis software can be assigned to their original raw data files. Finally, the data files generated at the analysis stage should be connected to the raw data, allowing connection of the data files with the originating experimental context.
Data retrieval
By following the experimental workflow, all experimental data e.g. different files, protocols, notes etc. should be organized in a chronological and project-oriented way and continuously registered during their acquisition. An additional advantage should be the ability to search and retrieve the data. Researchers frequently have to search through notebooks to find previously uninterpretable observations. Subsequently, as the project develops, the researchers gain a different perspective and recognize that prior observations could lead to new discoveries. Therefore, the software should offer easy to use interfaces that allow searches through observation notes, projects- and experiment descriptions.
Software Architecture
iLAP is a multi-tier client-server application and can be subdivided into different functional modules which interact as self-contained units according to their defined responsibilities (see Figure 2).
Presentation tier
The presentation tier within iLAP is formed by a Web interface, using Tapestry [20] as the model view controller and an Axis Web service [21], which allows programming access to parts of the application logic. Thus, on the client side, a user requires an Internet connection and a recent Web browser with Java Applet support, available for almost every platform. In order to provide a simple, consistent but also attractive Web interface, iLAP follows usability guidelines described in [22, 23] and uses Web 2.0 technologies for dynamic content generation.
Business tier and runtime environment
The business tier is realized as view-independent application logic, which stores and retrieves datasets by communicating with the persistence layer. The internal management of files is also handled from a central service component, which persists the meta-information for acquired files to the database, and stores the file content in a file-system-based data hierarchy. The business layer also holds asynchronous services for application-internal JMS messaging and for integration of external computing resources like high-performance computing clusters. All services of this layer are implemented as Spring [24] beans, for which the Spring-internal interceptor classes provide transactional integrity.
The business tier and the persistence tier are bound by the Spring J2EE lightweight container, which manages the component-object life cycle. Furthermore, the Spring context is transparently integrated into the Servlet context of Tapestry using the HiveMind [25] container backend. This is realized by using the automatic dependency injection functionality of HiveMind which avoids integrative glue code for lookups into the Spring container. Since iLAP uses Spring instead of EJB related components, the deployment of the application only requires a standard conformed Servlet container. Therefore, the Servlet container Tomcat [26] is used, which offers not only Servlet functionality but J2EE infrastructure services [27] such as centrally configured data-sources and transaction management realized with the open source library JOTM [28]. This makes the deployment of iLAP on different servers easier, because machine-specific settings for different production environments are kept outside the application configuration.
External programming interfaces
The SOAP Web service interface for external programmatic access is realized by combining the Web service framework Axis with corresponding iLAP components. The Web service operates as an external access point for Java Applets within the Web application, as well as for external analysis and processing applications such as ImageJ.
Model driven development
In order to reduce coding and to increase the long term maintainability, the model driven development environment AndroMDA [29] is used to generate components of the persistence layer and recurrent parts from the above mentioned business layer. AndroMDA accomplishes this by translating an annotated UML-model into a JEE-platform-specific implementation using Hibernate and Spring as base technology. Due to the flexibility of AndroMDA, application external services, such as the user management system, have a clean integration in the model. Dependencies of internal service components on such externally defined services are cleanly managed by its build system.
By changing the build parameters in the AndroMDA configuration, it is also possible to support different relational database management systems. This is because platform specific code with the same functionality is generated for data retrieval. Furthermore, technology lock-in regarding the implementation of the service layers was also addressed by using AndroMDA, as the implementation of the service facade can be switched during the build process from Spring based components to distributed Enterprise Java Beans. At present, iLAP is operating on one local machine and, providing the usage scenarios do not demand it, this architectural configuration will remain. However, chosen technologies are known to work on Web server farms and crucial distribution of the application among server nodes is transparently performed by the chosen technologies.
Asynchronous data processing
The asynchronous handling of business processes is realized in iLAP with message-driven Plain Old Java Objects (POJOs). Hence, application tasks, such as the generation of image previews, can be performed asynchronously. If performed immediately, these would unnecessarily block the responsiveness of the Web front-end. iLAP delegates tasks via JMS messages to back-end services, which perform the necessary processing actions in the background.
These back-end services are also UML-modelled components and receive messages handled by the JMS provider ActiveMQ. If back-end tasks consume too many calculation resources, the separation of Web front-end and JMS message receiving services can be realized by copying the applications onto two different servers and changing the Spring JMS configuration.
For the smooth integration of external computing resources like the high-performance computing cluster or special compute nodes with limited software licenses the JClusterService is used. JClusterService is a separately developed J2EE application which enables a programmer to run generic applications on a remote execution host or high-performance computing cluster. Every application which offers a command line interface can be easily integrated by defining a service definition in XML format and accessing it via a SOAP-based programming interface from any Java-application. The execution of the integrated application is carried out either by using the internal JMS-queuing system for single host installations or by using the open source queuing systems like Sun Grid Engine (Sun Microsystems) or OpenPBS/Torque.