Volume 10 Supplement 10
Service-based analysis of biological pathways
© Zheng and Bouguettaya; licensee BioMed Central Ltd. 2009
Published: 01 October 2009
Computer-based pathway discovery is concerned with two important objectives: pathway identification and analysis. Conventional mining and modeling approaches aimed at pathway discovery are often effective at achieving either objective, but not both. Such limitations can be effectively tackled leveraging a Web service-based modeling and mining approach.
Inspired by molecular recognitions and drug discovery processes, we developed a Web service mining tool, named PathExplorer, to discover potentially interesting biological pathways linking service models of biological processes. The tool uses an innovative approach to identify useful pathways based on graph-based hints and service-based simulation verifying user's hypotheses.
Web service modeling of biological processes allows the easy access and invocation of these processes on the Web. Web service mining techniques described in this paper enable the discovery of biological pathways linking these process service models. Algorithms presented in this paper for automatically highlighting interesting subgraph within an identified pathway network enable the user to formulate hypothesis, which can be tested out using our simulation algorithm that are also described in this paper.
Biological pathways are represented as networks of interactions among biological entities such as cell, DNA, RNA and enzyme. The exposure of biological pathways are expected to deepen our understanding of how diseases come about and help expedite drug discovery for treating them. Computer-based pathway study currently relies on two main approaches of entity/process representation: free-text descriptions and computer models. Free-text based approaches used in GenBank , DIP , KEGG [3, 4], Swiss-Prot , and COPE  rely on free text annotations and narratives [7, 8] to target towards human comprehension. One major disadvantage with these approaches is their inherent lack of support for computer-based simulation of these processes. Computer models (e.g., [9–14]) of biological processes, on the other hand, while enabling computer-based simulations of biological processes, are often constructed in isolated environments, limited to the study of known pathways, and lack the ability to facilitate the discovery of new pathways. We propose to use Web service modeling strategy  to bridge the gaps between the two representation approaches. Using this strategy, biological processes are modeled as Web service operations, which can be first described and published by one organization, and later discovered and invoked by independently developed applications from other organizations. A service operation may consume some input substance meeting a set of preconditions and then produce some output substance as a result of its invocation. Some of these input and output substances may themselves carry processes that are known to us and thus can be also modeled and deployed as Web services. Domain ontologies containing definition of various entity types would be used by these Web services for describing their operation inputs and outputs. This service oriented process modeling and deployment strategy not only allows for the identification of pathways linking processes of biological entities, as do existing natural language processing approaches (e.g., [16, 17]), but would more importantly bring about unprecedented opportunities for analyzing such pathways right on the Web through direct invocation of involved services. When enough details are captured in these process models, this in-place invocation capability presents an inexpensive and accessible alternative to existing in vitro and/or in vivo exploratory mechanisms.
The second key contribution of our work is the development of our service mining tool, named PathExplorer, used to discover potentially interesting biological pathways (i.e., composition networks) linking service models of biological processes. Unlike traditional top down service composition approaches that are driven by specific user goals, Web service mining, which aims at the discovery of any interesting and useful compositions of Web services, is carried out in a bottom up fashion with no such goals to guide the search process. As a result, it faces the challenge of combinatorial explosion as the number of service models increases. In search for efficient mining algorithms and framework, we drew inspirations from molecular recognitions and drug discovery methodologies and developed several key mining algorithms with performance that is linear to the number of service models that are involved .
In , we applied our Web service mining framework  to service models of biological processes that are deployed using Web Service Modeling eXecution environment (WSMX)  for the discovery of biological pathways. These service models are expressed using both Web Service Markup Language (WSML) and Web Services Description Language (WSDL). We then explored the opportunity of evaluating such pathways on the Web through direct invocation of involved services. In , we extended our approach to also provide graph-based hints on discovered pathways to help user formulate hypotheses, which can then be either confirmed or rejected based on simulation results, leading to the identification of useful pathways. In this paper, we establish the analogies between molecules and Web services, paving the way for future interdisciplinary exploration of these two seemingly unrelated subjects. We also describe in detail our graph expansion algorithms that are not covered in . The algorithms are used to identify subgraphs linking interesting edges and user selected nodes within an existing pathway network. These subgraphs provide the basis for hypothesis formulation and simulation based evaluation.
The bottom up Web service mining inevitably exposes itself to the problem of combinatorial explosion, which, if left unaddressed, renders the mining process unscalable as the number of services involved increases. Nature, however, has provided us with ample examples on how composition takes place in a bottom up fashion. In this section, we first establish analogies between molecular world and Web services world. We then draw inspirations from molecular recognitions and drug discovery processes and present our Web service recognitions mechanisms and mining framework.
Analogies between molecules and Web services
The analogy between the molecular and Web service worlds continues at a more complex process level. In the chemical world, the DNA inside our cells provides a complete genetic blueprint that carries the information required to manufacture the enzyme proteins, which in turn are responsible for orchestrating our body's chemistry. The progression from DNA to mRNA to protein involves a molecular assembly line  that follows a remarkable process (Figure 1(c)). Likewise, Web service composition can also involve a complex process (Figure 1(d)) using process template as blueprint for process flow instances. The process template is analogous to the blueprint carried by the mRNA and the process flow instance is analogous to the protein chain.
The similarities between Web services and molecules offer some interesting insights. They suggest that like molecules that compose from bottom up as if they are living beings, Web service can also be treated as living beings that recognize each other under the right conditions. The process analogy indicates that recognition-triggered service composition may extend to a flow network. Such a flow network can be either designed from top down or emerged from bottom up. As a result, instead of having to search for interesting and useful Web service compositions and composition networks exhaustively, the compositions and composition networks could form "naturally" from bottom up, similar to what is happening in the natural world.
Web service/operation recognitions
Similar to the molecular world, the natural formation of service compositions is based on automatic recognitions among Web services and their corresponding operations. We have identified the following three service/operation recognition mechanisms that are applicable to Web service models of biological processes:
When operation op1 of service s a consumes an entity (i.e., input parameter) that in turn provides service s b , we say that s a : op1 inhibits s b as shown in Figure 2(b).
A target operation op t indirectly recognizes a source operation op s , if op s generates some or all input parameters of op t , as shown in Figure 2(c). Indirect recognition is in contrast to the concept of direct recognition , where an operation can be directly invoked by another. Direct recognition is applicable to fields such as e-commerce but not pathway discovery and is thus not included here. These recognition mechanisms form the basis of the filtering algorithms  in our mining framework.
Lead compound identification and screening,
Clinical trial, and
There are several interesting observations about the process described above. First, the drug discovery process has adopted the strategy of screening molecules (step 3) using "coarse-grained" filtering approach to quickly reduce the search space from the focused library of potential ligands to one that contains those most likely to bind to a protein target with high affinity. It then increases the computation complexity with better accuracy on a reduced search space for lead optimization (step 4). With a much smaller remaining space, the discovery process finally conducts more expensive clinical study for drug evaluation. This is a powerful strategy and can also apply well in the field of Web service mining.
Web service screening could take advantage of some "mining context" to scope down the searching space and identify potentially composable Web services in an early stage. The identification of the composability can be achieved using a "coarse-grained" ontology-based filtering mechanism. Automatic verification and objective analysis can be applied next in a reduced pool of candidate services. A more elaborate runtime simulation mechanism can then be applied towards composed Web service leads in a much smaller search space to investigate the relationships among various composition leads involved in the composition network. Finally, expensive subjective usefulness analysis involving human in the loop can be conducted in an even smaller search space to distinguish those that are truly useful.
Figure 2 shows the architecture of PathExplorer, which starts with scope specification, a manual phase involving a domain expert defining mining context including functional areas (e.g., cell enzyme, drug functions) and/or locales (e.g., heart, brain) where these functions reside. Based on such mining context, PathExplorer establishes a hierarchy of domain ontology indices to speed up later phases in the mining process. Scope specification is followed by several automatic phases. The first of these is search space determination, where the mining context is used to define a focused library of existing Web services as the initial pool for further mining. The next is the screening phase, where Web services in the focused library would go through filtering algorithms for the purpose of identifying potentially interesting leads of service compositions or pathway segments. The filtering algorithms are based on the three service/operation recognition mechanisms described earlier.
Based on these recognition mechanisms coupled with a publication/subscription-based algorithm , linkages between Web services and their operations in the focused library can be quickly established from bottom-up. These pathway segment leads are then semantically verified based on a subset of operation pre-and post-conditions involving binary variables (e.g., whether the input to an operation is activated) and enumerated properties (e.g., the locale of an operation input). Finally, verified pathway segment leads are linked together using our link algorithms for establishing more comprehensive pathway network.
Discovered pathways from the screening phase are input to the evaluation phase, which consists of four sub-phases. Objective evaluation identifies and highlights interesting segments of a pathway by checking whether such linkages are novel (i.e., previously unknown). An interactive session follows next with the user taking hints from these highlighted interesting segments within the pathway network and picking a handful of nodes representing services, operations and parameters to pursue further. PathExplorer then attempts to link these nodes into a connected graph using a subset of nodes and edges in the original graph. This subgraph provides the user the basis to formulate hypotheses. As an example, such a hypothesis may state that an increase in the dosage amount of Aspirin will lead to the relief of pain, but may inadvertently increase the risk of ulcer in the stomach. These hypotheses can be tested out via simulation, which involves PathExplorer invoking the relevant service operations, changing the quantity/attribute value of various entities involved. Simulation results showing the dynamic relationships between these biological entities are then presented to the user, whose subjective evaluation finally determines whether the pathway in pursuit is actually useful.
Service-based modeling of biological processes
Pathway visualization and establishment of interesting subgraphs
When an operation is to be invoked, the algorithm checks two factors. First, it examines whether all the pre-conditions of the operation are met. An operation that does not have available input entities meeting its preconditions should simply not be invoked. Second, it determines how many instances are available for providing the corresponding service. This factor is needed due to the fact that biological entities of the same type each has a discrete service process that deals with input and output of a finite proportion. The available instances of a particular service providing entity will drive the amount of various other entities they may consume and/or produce. For this reason, the algorithm treats each entity node in a pathway network such as one shown in Figure 11 as a container of entity instances of the noted ontology type. In some cases, the service provider is also used as an input parameter. For example, the sensePain operation from the NociceptorService in Figure 3(f) has a precondition stating that the Nociceptor itself should be bound in order to provide this service. In order to express this precondition, we decided to include the service providing entity also as an input parameter. In cases such as this, the number of service providing instances will be determined by checking further whether each of the service providing entity instances also meets the precondition of the corresponding operation.
Results and discussion
While Figures 9(a) to 9(d) clearly illustrate the relationships between Aspirin and Stomach_Cell, the relationship between the dosage amount of Aspirin and the sensation of pain is less obvious in these Figures. Except for Figure 13(a), which shows some accumulation of PainSignal when the quantity of Aspirin is 10, the rest of plots show no pattern of such accumulation or the variation thereof. A closer look at the highlighted pathway in Figure 11 reveals that this is actually consistent with the way the simulation is set up. Since PainSignal is created and then converted by the Brain to ReliefSignal, which disappears after it is sensed by Nociceptor, this whole path at the bottom actually acts as a 'leaky bucket'. To examine exactly what is going on along that path, we decided to make two changes in the simulation setting. First, we reduce the maximum frequency of invoking the Brain service to half that of Nociceptor. This creates a potential imbalance between the production rate of PainSignal and ReliefSignal since the processPain operation from the BrainService will be consequently invoked less frequently than the sensePain operation from the NociceptorService. Second, we disable the senseRelief operation of the NociceptorService. This essentially stops the leaking of the ReliefSignal that are generated as a result of the PainSignal. When we apply only the first change to the simulation, the imbalance of the processing rates for PainSignal and ReliefSignal results in a net accumulation of PainSignal when the quantity of Aspirin is 10 (Figure 13(e)). When the quantity is increased to 40 (Figure 13(f)), we see there are some occasional and temporary accumulation of PainSignal. Finally, we apply the second change along with the first one. Consequently, we notice that while the pattern of PainSignal's accumulation hasn't changed much, there is a consistent accumulation of ReliefSignal. Since each PainSignal is eventually converted to a ReliefSignal by the Brain according to the highlighted pathway in Figure 11, the rate of ReliefSignal's accumulation actually provides a much better picture on how fast PainSignal has been generated. We see that as the dosage amount of Aspirin increases, less ReliefSignal is generated, an indication that less PainSignal has been generated. Thus it is obvious that the increase of the dosage amount of Aspirin has a positive effect on the suppression of PainSignal's generation. This confirms the other half of user's original hypothesis.
Simulation results such as these presented in Figure 13 provide useful information to a pathway analyst. They can be used to determine whether further more expensive in vitro and/or in vivo experiments are needed. If enough details are captured in the process models that the simulation is based on, then the simulation itself would present an inexpensive and accessible alternative to existing in vitro and/or in vivo exploratory mechanisms. Using the service-oriented simulation environment, the interrelationships among various entities involved in the pathway network can now be exposed in a more holistic fashion than traditional text-based pathway discovery mechanisms, which inherently lack the simulation capability.
We proposed to model biological processes as Web service to bridge the gap between free-text description and traditional computer models of these processes. We presented our service mining tool named PathExplorer and demonstrated the feasibility of applying our service mining strategy to the discovery of pathways linking service models of biological processes. We described how PathExplorer identifies interesting segments in a pathway graph and automatically establishes a connected graph linking nodes that the user is interested in exploring. The graph, which is highlighted inside the discovered pathway network provides the user the basis for formulating hypothesis, which can then be tested out through simulation.
List of abbreviations used
(Graph Markup Language)
(Web Ontology Language based Web service ontology)
(Simply Object Access Protocol)
(Web Services Description Language)
(WSDL with Semantics)
(Web Service Markup Language)
(Web Service Modeling Toolkit)
(Web Service Modeling eXecution environment)
(eXtensible Markup Language).
We would like to thank Maciej Zaremba from the National University of Ireland for his help on WSMX related issues.
This article has been published as part of BMC Bioinformatics Volume 10 Supplement 10, 2009: Semantic Web Applications and Tools for Life Sciences, 2008. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/10?issue=S10.
- Database of Interacting Proteins[http://dip.doe-mbi.ucla.edu/]
- Kyoto Encyclopedia of Genes and Genomes[http://www.genome.jp/kegg/]
- Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, arki M, Hirakawa M: From genomics to chemicalgenomics: new developments in KEGG. Mucleic Acids Research 2006, 34: 354–357.View ArticleGoogle Scholar
- COPE – Cytokines Online Pathfinder Encyclopaedia[http://www.copewithcytokines.de/]
- Cohen J: Bioinformatics: An Introduction for Computer Scientists. ACM Computing Surveys 2004, 36(2):122–158.View ArticleGoogle Scholar
- Brent R, Bruck J: Can computers help to explain biology? Nature 2006, 440(23):416–417.View ArticlePubMedGoogle Scholar
- Karp PD, Paley S, Romero P: The Pathway Tools Software. Bioinformatics 2002, 18(Suppl 1):S225–32.View ArticlePubMedGoogle Scholar
- de Jong H, Page M: Qualitative Simulation of Large and Complex Genetic Regulation Systems. Proceedings of the 14th European Conference on Artificial Intelligence, Berlin, Germany 2000, 141–145.Google Scholar
- Tomita M, Hashimoto K, Takahashi K, Shimizu TS, Matsuzaki Y, Miyoshi F, Saito K, Tanida S, Yugi K, Venter JC, CAH III: E-CELL: software environment for whole-cell simulation. Bioinformatics 1999, 15: 72–84.View ArticlePubMedGoogle Scholar
- Biochemical Pathway Simulator[http://www.brc.dcs.gla.ac.uk/projects/bps/]
- Cardelli L: Abstract Machines of Systems Biology. Transactions on Computational Systems Biology III 1999, 3737: 145–168.View ArticleGoogle Scholar
- Zheng G, Bouguettaya A: Web Service Mining for Biological Pathway Discovery. In Proceedings of the 2nd International Workshop on Data Integration in the Life Sciences (DILS 2005), of Lecture Notes in Computer Science. Volume 3615. Edited by: Ludäscher B, Raschid L. San Diego, CA: Springer; 2005:292–295.Google Scholar
- Ng SK, Wong M: Toward Routine Automatic Pathway Discovery From On-line Scientific Text Abstracts. Genome Informatics 1999, 10: 104–112.PubMedGoogle Scholar
- Yao D, Wang J, Lu Y, Noble N, Sun H, Zhu X, Lin N, Payan DG, Li M, Qu K: PathwayFinder: Paving The Way Toward Automatic Pathway Extraction. In APBC '04: Proceedings of the Second Conference on Asia-Pacific Bioinformatics. Dunedin, New Zealand: Australian Computer Society, Inc; 2004:53–62.Google Scholar
- Zheng G, Bouguettaya A: Service Mining on the Web. IEEE Transactions on Services Computing 2009, 2: 65–78.View ArticleGoogle Scholar
- Zheng G, Bouguettaya A: Discovering Pathways of Service Oriented Biological Processes. In Proceedings of the 9th International Conference on Web Information Systems Engineering (WISE), of Lecture Notes in Computer Science. Volume 5175. Edited by: Bailey J, Maier D, Schewe KD, Thalheim B, Wang XS. Auckland, New Zealand: Springer; 2008:189–205.Google Scholar
- Zheng G, Bouguettaya A: A Web Service Mining Framework. In Proceedings of IEEE International Conference on Web Services (ICWS). Salt Lake City, Utah, USA: IEEE Computer Society; 2007:1096–1103.View ArticleGoogle Scholar
- Web Services Execution Environment[http://sourceforge.net/projects/wsmx]
- Zheng G, Bouguettaya A: PathExplorer: Service Mining for Biological Pathways on the Web.In Proceedings of the Workshop on Semantic Web Applications and Tools for Life Sciences (SWAT4LS), of CEUR Workshop Proceedings. Edinburgh, UK Edited by: Burger A, Paschke A, Romano P, Splendiani A. 2009., 435: [http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-435/]Google Scholar
- Ball P: Designing the Molecular World – Chemistry at the Frontier. Princeton, New Jersey: Princeton University Press; 1994.Google Scholar
- Augen J: The evolving role of information technology in the drug discovery process. Drug Discovery Today 2002, 7: 315–323.View ArticlePubMedGoogle Scholar
- NF-kappaB Pathway[http://www.cellsignal.com/reference/pathway/NF_kappaB.html]
- Auyang SY: From experience to design – The science behindAspirin.[http://www.creatingtechnology.org/biomed/aspirin.htm]
- Landau M: Inflammatory Villain Turns Do-Gooder.[http://focus.hms.harvard.edu/2001/Aug10_2001/immunology.html]
- Apache Axis2/Java – Next Generation Web Services[http://ws.apache.org/axis2/]
- Web Services Description Language (WSDL) 1.1[http://www.w3.org/TR/wsdl]
- The Web Service Modeling Language WSML[http://www.wsmo.org/wsml/wsml-syntax]
- OWL-S: Semantic Markup for Web Services[http://www.w3.org/Submission/OWL-S/]
- Web Services Semantics – WSDL-S[http://www.w3.org/Submission/WSDL-S/]
- The Web Service Modeling Toolkit (WSMT)[http://sourceforge.net/projects/wsmt]
- yEd – Java Graph Editor[http://www.yworks.com/en/products_yed_about.htm]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.