- Methodology article
- Open Access
Extending pathways and processes using molecular interaction networks to analyse cancer genome data
© Glaab et al; licensee BioMed Central Ltd. 2010
- Received: 24 August 2010
- Accepted: 13 December 2010
- Published: 13 December 2010
Cellular processes and pathways, whose deregulation may contribute to the development of cancers, are often represented as cascades of proteins transmitting a signal from the cell surface to the nucleus. However, recent functional genomic experiments have identified thousands of interactions for the signalling canonical proteins, challenging the traditional view of pathways as independent functional entities. Combining information from pathway databases and interaction networks obtained from functional genomic experiments is therefore a promising strategy to obtain more robust pathway and process representations, facilitating the study of cancer-related pathways.
We present a methodology for extending pre-defined protein sets representing cellular pathways and processes by mapping them onto a protein-protein interaction network, and extending them to include densely interconnected interaction partners. The added proteins display distinctive network topological features and molecular function annotations, and can be proposed as putative new components, and/or as regulators of the communication between the different cellular processes. Finally, these extended pathways and processes are used to analyse their enrichment in pancreatic mutated genes. Significant associations between mutated genes and certain processes are identified, enabling an analysis of the influence of previously non-annotated cancer mutated genes.
The proposed method for extending cellular pathways helps to explain the functions of cancer mutated genes by exploiting the synergies of canonical knowledge and large-scale interaction data.
- Cellular Pathway
- Protein Interaction Network
- Candidate Node
- Extension Procedure
- Molecular Function Annotation
Processes and pathways, whose deregulation may contribute to the development of cancers , are often represented as cascades of proteins transmitting a signal from the cell surface to the nucleus. However, the delineation of the canonical members of these cellular pathways is based on a multitude of experimental methods, and some inconsistencies exist across databases . Indeed, the assignment of a protein to a pathway often relies on the experimental procedure and on a subjective assessment of the protein's importance for the process. Many closely associated regulators, effectors or targets of cellular pathways may therefore have been overlooked by these classical approaches. Additionally, recent functional genomics high-throughput initiatives have identified a large number of interaction partners for signalling proteins, suggesting more complex relationships between cellular pathways than in their traditional representations . In this context, the analysis of cancer mutated genes at the level of canonical cellular processes and pathways may previously have missed potentially interesting findings.
This paper introduces a new methodology to amalgamate the information from cellular process and pathway databases with large-scale protein-protein interaction data. Previous approaches for in-silico generation of cellular processes based on molecular interaction data have constructed pathways from scratch (see [4–7]), and related approaches for disease candidate gene prioritisation also rely on interaction network data to identify molecules which are associated with a gene set [8–10]. However, to the best of the authors' knowledge an extension approach which preserves existing process definitions has not yet been investigated.
Here, we present a procedure for extending cellular pathways and processes by mapping them onto a protein-protein interaction network and identifying densely interconnected interaction partners. Briefly, we map proteins annotated for different cellular processes onto a large protein-protein interaction network, and extend each of these processes by adding their most densely interconnected network partners (using various graph-theoretic criteria). These added proteins display distinctive network topological features and molecular function annotations and can be proposed as putative new components of the corresponding cellular process, and/or as regulators of the communication between different cellular processes. This is illustrated by the prediction of new Alzheimer disease candidate genes and the identification of proteins with potential involvement in the crosstalk between several interleukin signalling pathways.
Finally, we employ the extension procedure to investigate mutated genes from a large-scale resequencing study of pancreatic tumours. We identified many pathways and processes enriched in mutated genes, as well as cancer mutated genes predicted to be involved in specific pathway deregulations.
All data processing and analysis steps were implemented in the programming language R. The web-based pathway visualisation on http://www.infobiotics.net/pathexpand was implemented in PHP.
Interaction network construction
The human protein-protein interactions were combined from 5 public databases, as of July, 2009. These include MIPS , DIP , MINT , HPRD  and IntAct . We considered only experimental methods dedicated to the identification of direct binary protein interactions (see datasets section on the webpage http://www.infobiotics.net/pathexpand). The final protein interaction network contained 9392 proteins (nodes) and 38857 interactions (edges).
The gene/protein sets corresponding to cellular pathways/processes were extracted from the public databases KEGG , BioCarta  and Reactome  and were mapped onto the protein interaction network. Since the interaction data does not represent the entire proteome, on average about 60% of the pathway proteins could be mapped onto the network.
Process extension procedure
where degree(v) is the number of direct links of node v, process_links(v, p) is the number of direct links from v to a node in the process p and outside_links(v, p) is the number of direct links from v to a node outside of process p. triangle_links(v, p) is the number of triangles in which v occurs together with a node in p and another candidate-node, and possible_triangles(v, p) is the number of these triangles which could potentially be formed, if all other candidate nodes would be part of a triangle formed together with v and p. T1, T2 and T3 are defined here as T1 = 1.0, T2 = 0.1 and T3 = 0.3 (this selection provided a reasonable trade-off between the number of extended pathways and the average size of the extension). For T1 = 1.0, equation 2 corresponds to a well-known condition in graph theory introduced to define "strong communities" in networks (stating that the number of connections to the pathway/community must exceed the number of connections to the rest of the graph, see ). Given that a candidate node can have connections with all the original pathway nodes, the threshold T3 always has to be smaller than 1 (i.e. the maximum pathway node coverage is 1).
Thus, the proteins that are added to a pathway by the procedure are both strongly associated with the original pathway members and provide an extended pathway with a compact network representation. Specifically, we expect that added proteins which increase the compactness by connecting disconnected proteins in the original pathway are more likely to be functionally similar to the pathway members. The order in which proteins are added to extend a pathway is given by a greedy strategy, i.e. the protein that increases the compactness the most is always added first.
Topological network analysis
To quantify local and global topological properties of proteins in the network, we used the web-application TopoGSA  to compute five topological descriptors: the number of connections to other nodes (degree), the tendency of nodes to form clusters (clustering coefficient), their centrality in the network (betweenness and eigenvector centrality) and the distances between them (shortest path length). For a detailed explanation of these topological characteristics, see .
10% of the proteins from each pathway were removed randomly among those proteins that are connected to at least one other protein in the pathway. If the set of proteins that are connected to other pathway members covers less than 10% of the total number of proteins, we iteratively remove random proteins from this set and recompute the set until it is empty.
To each reduced pathway the proposed extension procedure was applied as well as 100 alternative random extensions, computed by sampling randomly the same number of proteins from the candidate proteins of the reduced pathway (see definition of candidates in the process extension section above).
P-value significance scores are estimated as the relative frequency of cases where more proteins were correctly recovered by a random extension than by the proposed extension procedure across all pathways in a database.
Semantic similarity analysis of Gene Ontology terms
We quantified pairwise similarities between protein annotations based on Jiang and Conrath's semantic similarity measure for GO terms . Using this similarity score, we computed the average GO-term similarities between all pairwise combinations of GO biological process (BP) terms for the original proteins in the cellular pathway and the added proteins. A random extension model was created by randomly sampling the same number of proteins from the candidate proteins of the pathway (see definition of candidates in the pathway extension section) as in the real extension, excluding the proteins from the extended cellular pathway under consideration. The reader should note that it is not possible to compare the extensions of real pathways to extensions of random gene/protein sets with similar connectivity in the network, because in most cases these sets would largely overlap with other pathways.
The enrichment of molecular functions among the proteins added to the cellular pathways/processes by the extension procedure was tested for all databases independently using the DAVID functional annotation clustering tool  (Gene Ontology Molecular Functions and InterPro protein domains), with the proteins from the interaction network. Functional annotation clusters with a more than 2-fold enrichment were selected and manually labelled.
To estimate the probability of observing certain overlaps between extended or original protein sets representing pathways and other protein sets of interest, e.g. cancer-related proteins, we used a classical over-representation analysis (ORA) based on the one-tailed Fisher exact test. To adjust for multiple testing, we employ the approach by Benjamini .
In the following we discuss the results obtained by applying our pathway extension approach to cellular pathway and process datasets from the databases KEGG , BioCarta  and Reactome . Across all databases, 1859 different processes were considered (with a minimum size of 10 proteins) and mapped onto a network containing 38857 interactions (see Methods).
Extension of cellular pathways/processes with protein interaction data predicts new putative components
Statistics on added proteins across different databases
no. of examined pathways
no. of extended pathways
avg. pathway size
avg. size after extension
total no. of added proteins
no. of unique added proteins
Molecular function categories of proteins added by the extension method (2-fold enrichment, see methods)
Phosphatase activity, Regulator activity, Binding, Kinase inhibitor/regulator, Cytokine binding/TNF receptor
Phosphatase activity, Regulator activity, Cytokine binding/TNF receptor
Network properties of the proteins added to the cellular pathways/processes
Topological properties of BioCarta pathway/process extensions 17
Proposed extension: Added proteins only (mean)
Random model: Added proteins only (mean/stddev.)
Original cellular processes (mean/stddev.)
All network proteins (mean/stddev.)
Shortest path length
Functional annotations of the proteins added to the cellular pathways/processes
The extension procedure can recover known pathway proteins after deletion
A cross-validation procedure (Methods) showed that the cellular pathway extension recovers a Significantly larger number of randomly deleted pathway- nodes in the network than a simplistic extension based on random selection among the candidate nodes (p-values smaller than 0.01 for all databases). Specifically, the distribution of the number of recoveries across the 100 random model extensions never provided a higher number of recoveries than the proposed extension method.
Prediction of new components
Based on the observations that 1) The proteins added by our method are well connected and central in the protein interaction network, 2) The added proteins display gene ontology annotations matching better to the original cellular pathway/process annotations than random proteins, and are enriched in processes known to be related to cellular signalling, and 3) Our method is able to recover known cellular pathway/process proteins in a cross-validation experiment, we propose to consider the proteins added by the extension procedure as new candidate components with a functional role in the corresponding cellular processes.
To illustrate the utility of our extension procedure for the prediction of new components, we analysed a cellular map modelling the process likely to be deregulated by the most penetrant Alzheimer susceptibility genes (created manually from the literature  and available in the KEGG database ). Our extension method added 5 different proteins to this cellular map http://www.infobiotics.net/pathexpand. Interestingly, three of them have previously been implied in Alzheimer disease (TMED10, APH1B and PITX3). Two other proteins, METTL2B and MMP17, which are also added to the Alzheimer cellular map by our method, have not been linked to the disease so far, to the best of our knowledge. MMP17 is a member of the metallopeptidase protein family involved in the breakdown of the extracellular matrix. According to the Huge navigator , 6 other members of this protein family have been associated with the Alzheimer disease. The other candidate is a methyltransferase-like, METTL2B. Another member of this family, MMETL10 has been associated with Alzheimer disease in a case-control study . Thus, using the Alzheimer disease pathway as a first test case of our method, we can propose MMP17 and METTL2B as new candidate disease genes.
The extension of cellular processes points to extensive communication
The involvement of some proteins in multiple processes suggests that extensive communication exists between different cellular processes. Indeed, before applying the extension procedure, about 50% of the cellular process proteins are annotated for more than one cellular process. Interestingly, after the extension procedure, the percentage of unique proteins among all proteins added to the cellular processes ranged from 30% (BioCarta) to 66% (Reactome), revealing that many proteins are added to more than one cellular process. In agreement with our observations for the original process proteins, again about 50% of the added proteins belong to more than one cellular process. Accordingly, many proteins in the protein interaction network are well connected with different cellular processes, and might therefore be expected to have a functional role in the communication between the cellular processes.
Functional enrichment of tumour mutated genes in extended cellular pathways reveals new putative regulators of cancer pathways
Large-scale tumour resequencing projects have revealed a large number of genes mutated in different cancer types [27–29]. To understand the biological significance of these mutated genes, those cellular processes containing more mutated genes than expected by chance have been identified (see for instance ).
We applied an enrichment analysis on cancer mutated genes extracted from a pancreatic large-scale resequencing study , with extended cellular processes from BioCarta, KEGG and Reactome, and identified significant associations between different cancer types and the extended pathways (Methods).
Cellular processes enriched in pancreatic mutated genes
Cellular Process database
ORA Q-value before/after extension
Pathway size before/after extension
Number of mutated genes in new pathway
Number of mutated genes among added genes
Mutated genes among added genes
LRP1B, TFPI2 PON1, SIGLEC11
RASIP1, RASGRP3, PLEKHG2
MAPK signaling pathway
DOCK2, MAPKBP1, SLC9A5 RASIP1, DUSP19, PLEKHG2
Cell adhesion molecules
Wnt signaling pathway
MAPKBP1, PLEKHG2, ANKRD6
Neuroactive ligand- receptor interaction
MAPKinase Signaling Pathway
Signaling by PDGF
VPS13A, LIG3 FMR2
Cell Cycle G1/S Check Point
Agrin Postsynaptic Differentiation
p38 MAPK Signaling Pathway
ALK in cardiac myocytes
Fc epsilon RI signaling pathway
DOCK2, MAPKBP1, DUSP19, ATF2, RASGRP3
ErbB signaling pathway
VPS13A, MAPKBP1, NEK8, LIG3, DUSP19, AFF2, GLTSCR1
Regulation of actin cytoskeleton
RASIP1, CDC42BPA, PLEKHG2, CYFIP1
HIV-I Nef negative effector of Fas and TNF
p53 signaling pathway
Signaling in Immune system
In conclusion, the extensions of the cell cycle G1/S and other processes provide useful explanatory information for the cancer association of these pathways/processes by adding new regulators that increase the connectivity between cancer mutated genes and other process members in the interaction network. For instance, in the G1/S process, SMAD3 is connected to other process members by adding the proteins TGIF2, GRB2 and PLAGL1, and SMAD4 is connected to the process member CDK2 by adding UHRF2. Thus, the overall coherence of the processes is increased and an expanded view of the influence of different cancer genes in these processes is obtained.
The extension of known cellular pathways and processes with densely interconnected interaction partners in a protein-protein interaction network leads to the proposal of new putative components and to the identification of mediators of the communication between the processes. Thus, by taking into account canonical knowledge as well as large-scale interaction data, the extended pathways help to explain the functions of cancer mutated genes.
The web-based pathway visualisation, details about the generation of the human protein-protein interaction network and the complete enrichment analysis results are freely available at http://www.infobiotics.net/pathexpand.
Funding: We acknowledge support by the Marie-Curie Early Stage-Training programme (grant MEST-CT-2004-007597), the Biotechnology and Biological Sciences Research Council (BB/F01855X/1), the Spanish Science and Innovation (MICINN) grant (BIO2007-66855, "functions for gene sets") and Instituto de Salud Carlos III RTIC COMBIOMED network (RD07/0067/0014). AB is supported by the Juan de la Cierva postdoctoral fellowship.
- Vogelstein B, Kinzler K: Cancer genes and the pathways they control. Nat Med 2004, 10(8):789–799. 10.1038/nm1087View ArticlePubMedGoogle Scholar
- Lu L, Sboner A, Huang Y, Lu H, Gianoulis T, Yip K, Kim P, Montelione G, Gerstein M: Comparing classical pathways and modern networks: towards the development of an edge ontology. Trends Biochem Sci 2007, 32(7):310–321. 10.1016/j.tibs.2007.06.003View ArticleGoogle Scholar
- Natarajan M, Lin K, Hsueh R, Sternweis P, Ranganathan R: A global analysis of cross-talk in a mammalian cellular signalling network. Nat Cell Biol 2006, 8(6):571–580. 10.1038/ncb1418View ArticlePubMedGoogle Scholar
- Kelley R, Ideker T: Systematic interpretation of genetic interactions using protein networks. Nat Biotechnol 2005, 23(5):561–566. 10.1038/nbt1096View ArticlePubMedPubMed CentralGoogle Scholar
- Ulitsky I, Shamir R: Pathway redundancy and protein essentiality revealed in the Saccharomyces cerevisiae interaction networks. Mol Syst Biol 2007, 3: 104. 10.1038/msb4100144View ArticlePubMedPubMed CentralGoogle Scholar
- Ma X, Tarone A, Li W: Mapping genetically compensatory pathways from synthetic lethal interactions in yeast. PLoS ONE 2008, 3(4):e1922. 10.1371/journal.pone.0001922View ArticlePubMedPubMed CentralGoogle Scholar
- Brady A, Maxwell K, Daniels N, Cowen L: Fault tolerance in protein interaction networks: Stable bipartite subgraphs and redundant pathways. PLoS ONE 2009, 4(4):e5364. 10.1371/journal.pone.0005364View ArticlePubMedPubMed CentralGoogle Scholar
- Cerami E, Demir E, Schultz N, Taylor BS, Sander C: Automated Network Analysis Identifies Core Pathways in Glioblastoma. PLoS ONE 2010, 5(2):e8918. 10.1371/journal.pone.0008918View ArticlePubMedPubMed CentralGoogle Scholar
- Nitsch D, Tranchevent L, Thienpont B, Thorrez L, Van Esch H, Devriendt K, Moreau Y: Network analysis of differential expression for the identification of disease-causing genes. PLoS ONE 2009, 4(5):e5526. 10.1371/journal.pone.0005526View ArticlePubMedPubMed CentralGoogle Scholar
- Aerts S, Lambrechts D, Maity S, Van Loo P, Coessens B, De Smet F, Tranchevent L, De Moor B, Marynen P, Hassan B, Carmeliet P, Moreau Y: Gene prioritization through genomic data fusion. Nat Biotechnol 2006, 24(5):537–544. 10.1038/nbt1203View ArticlePubMedGoogle Scholar
- Mewes H, Heumann K, Kaps A, Mayer K, Pfeiffer F, Stocker S, Frishman D: MIPS: a database for genomes and protein sequences. Nucleic Acids Res 1999, 27: 44–48. 10.1093/nar/27.1.44View ArticlePubMedPubMed CentralGoogle Scholar
- Xenarios I, Salwinski L, Duan X, Higney P, Kim S, Eisenberg D: DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res 2002, 30: 303–305. 10.1093/nar/30.1.303View ArticlePubMedPubMed CentralGoogle Scholar
- Chatr-Aryamontri A, Ceol A, Palazzi L, Nardelli G, Schneider M, Castagnoli L, Cesareni G: MINT: a Molecular INTeraction database. Nucleic Acids Res 2007, (35 Database):D572–574. 10.1093/nar/gkl950Google Scholar
- Peri S, Navarro J, Amanchy R, Kristiansen T, Jonnalagadda C, Surendranath V, Niranjan V, Muthusamy B, Gandhi T, Gronborg M, Ibarrola N, Deshpande N, Shanker K, Shivashankar HN, Rashmi BP, Ramya MA, Zhao Z, Chandrika K, Padma N, Harsha HC, Yatish AJ, Kavitha MP, Menezes M, Choudhury D, Suresh S, Ghosh N, Saravana R, Chandran S, Krishna S, Joy M, et al.: Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res 2003, 13(10):2363–2371. 10.1101/gr.1680803View ArticlePubMedPubMed CentralGoogle Scholar
- Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, Vingron M, Roechert B, Roepstor P, Valencia A, Margalit H, Armstrong J, Bairoch A, Cesareni G, Sherman D, Apweiler R: IntAct: an open source molecular interaction database. Nucleic Acids Res 2004, (32 Database):D452. 10.1093/nar/gkh052Google Scholar
- Kanehisa M, Goto S: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000, 28: 27–30. 10.1093/nar/28.1.27View ArticlePubMedPubMed CentralGoogle Scholar
- Nishimura D: BioCarta. Biotech Software & Internet Report 2001, 2(3):117–120. 10.1089/152791601750294344View ArticleGoogle Scholar
- Joshi-Tope G, Gillespie M, Vastrik I, D'Eustachio P, Schmidt E, de Bono B, Jassal B, Gopinath GR, Wu GR, Matthews L, Lewis S, Birney E, Stein L: Reactome: a knowledgebase of biological pathways. Nucleic Acids Res 2005, (33 Database):D428.Google Scholar
- Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D: Defining and identifying communities in networks. Proc Natl Acad Sci USA 2004, 101(9):2658–2663. 10.1073/pnas.0400054101View ArticlePubMedPubMed CentralGoogle Scholar
- Glaab E, Baudot A, Krasnogor N, Valencia A: TopoGSA: network topological gene set analysis. Bioinformatics 2010, 26(9):1271–1272. 10.1093/bioinformatics/btq131View ArticlePubMedPubMed CentralGoogle Scholar
- Junker B, Schreiber F: Analysis of biological networks. John Wiley & Sons, Hoboken, New Jersey, USA; 2008.View ArticleGoogle Scholar
- Jiang J, Conrath D: Semantic similarity based on corpus statistics and lexical taxonomy. Proc Int Conf Comp Ling 1997, 19–35.Google Scholar
- Dennis G Jr, Sherman B, Hosack D, Yang J, Gao W, Lane H, Lempicki R: DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 2003, 4(9):R60. 10.1186/gb-2003-4-9-r60View ArticlePubMed CentralGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Statist Soc Ser B (Methodological) 1995, 57: 289–300.Google Scholar
- Limviphuvadh V, Tanaka S, Goto S, Ueda K, Kanehisa M: The commonality of protein interaction networks determined in neurodegenerative disorders (NDDs). Bioinformatics 2007, 23(16):2129–2138. 10.1093/bioinformatics/btm307View ArticlePubMedGoogle Scholar
- Yu W, Clyne M, Khoury M, Gwinn M: Phenopedia and Genopedia: disease-centered and gene-centered views of the evolving knowledge of human genetic associations. Bioinformatics 2010, 26: 145–146. 10.1093/bioinformatics/btp618View ArticlePubMedGoogle Scholar
- Wood L, Parsons D, Jones S, Lin J, Sjoblom T, Leary R, Shen D, Boca S, Barber T, Ptak J, Silliman N, Szabo S, Dezso Z, Ustyanksky V, Nikolskaya T, Nikolsky Y, Karchin R, Wilson P, Kaminker J, Zhang Z, Croshaw R, Willis J, Dawson D, Shipitsin M, Willson J, Sukumar S, Polyak K, Park B, Pethiyagoda C, Pant P, et al.: The Genomic Landscapes of Human Breast and Colorectal Cancers. Science 2007, 318(5853):1108–1113. 10.1126/science.1145720View ArticlePubMedGoogle Scholar
- Jones S, Zhang X, Parsons D, Lin J, Leary R, Angenendt P, Mankoo P, Carter H, Kamiyama H, Jimeno A, Hong S, Fu B, Lin M, Calhoun E, Kamiyama M, Walter K, Nikolskaya T, Nikolsky Y, Hartigan J, Smith DR, Hidalgo M, Leach SD, Klein A, Jaffee E, Goggins M, Maitra A, Iacobuzio-Donahue C, Eshleman J, Kern S, Hruban R, et al.: Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science 2008, 321(5897):1801–1806. 10.1126/science.1164368View ArticlePubMedPubMed CentralGoogle Scholar
- Parsons D, Jones S, Zhang X, Lin J, Leary R, Angenendt P, Mankoo P, Carter H, Siu I, Gallia G, Olivi A, McLendon R, Rasheed B, Keir S, Nikolskaya T, Nikolsky DY, Busam , Tekleab H, Diaz L, Hartigan J, Smith D, Strausberg R, Marie S, Shinjo S, Yan H, Riggins G, Bigner D, Karchin R, Papadopoulos N, Parmigiani G, et al.: An integrated genomic analysis of human glioblastoma multiforme. Science 2008, 321(5897):1807–1812. 10.1126/science.1164382View ArticlePubMedPubMed CentralGoogle Scholar
- Cheng H, Gao Q, Jiang M, Ma Y, Ni X, Guo L, Jin W, Cao G, Ji C, Ying K, Xu W, Gu S, Ma Y, Xie Y, Mao Y: Molecular cloning and characterization of a novel human protein phosphatase, LMW-DSP3. Int J Biochem Cell Biol 2003, 35(2):226–234. 10.1016/S1357-2725(02)00127-9View ArticlePubMedGoogle Scholar
- Melhuish T, Gallo C, Wotton D: TGIF2 interacts with histone deacetylase 1 and represses transcription. J Biol Chem 2001, 276(34):32109–32114. 10.1074/jbc.M103377200View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.