Analysis of X-ray Structures of Matrix Metalloproteinases via Chaotic Map Clustering
© Giangreco et al; licensee BioMed Central Ltd. 2010
Received: 8 March 2010
Accepted: 8 October 2010
Published: 8 October 2010
Matrix metalloproteinases (MMPs) are well-known biological targets implicated in tumour progression, homeostatic regulation, innate immunity, impaired delivery of pro-apoptotic ligands, and the release and cleavage of cell-surface receptors. With this in mind, the perception of the intimate relationships among diverse MMPs could be a solid basis for accelerated learning in designing new selective MMP inhibitors. In this regard, decrypting the latent molecular reasons in order to elucidate similarity among MMPs is a key challenge.
We describe a pairwise variant of the non-parametric chaotic map clustering (CMC) algorithm and its application to 104 X-ray MMP structures. In this analysis electrostatic potentials are computed and used as input for the CMC algorithm. It was shown that differences between proteins reflect genuine variation of their electrostatic potentials. In addition, the analysis has been also extended to analyze the protein primary structures and the molecular shapes of the MMP co-crystallised ligands.
The CMC algorithm was shown to be a valuable tool in knowledge acquisition and transfer from MMP structures. Based on the variation of electrostatic potentials, CMC was successful in analysing the MMP target family landscape and different subsites. The first investigation resulted in rational figure interpretation of both domain organization as well as of substrate specificity classifications. The second made it possible to distinguish the MMP classes, demonstrating the high specificity of the S1' pocket, to detect both the occurrence of punctual mutations of ionisable residues and different side-chain conformations that likely account for induced-fit phenomena. In addition, CMC demonstrated a potential comparable to the most popular UPGMA (Unweighted Pair Group Method with Arithmetic mean) method that, at present, represents a standard clustering bioinformatics approach. Interestingly, CMC and UPGMA resulted in closely comparable outcomes, but often CMC produced more informative and more easy interpretable dendrograms. Finally, CMC was successful for standard pairwise analysis (i.e., Smith-Waterman algorithm) of protein sequences and was used to convincingly explain the complementarity existing between the molecular shapes of the co-crystallised ligand molecules and the accessible MMP void volumes.
Matrix metalloproteinases (MMPs) are members of the large family of zinc-containing endopeptidases and are biologically attractive drug targets owing to their involvement in tissue remodelling and degradation of extracellular matrix . Allegedly, interest in MMPs was recently prompted by evidence that a number of synthetic inhibitors used for the treatment of various pathological states, such as inflammation, arthritis, and cancer, triggered unbalanced and, to some extent, unexpected responses of certain MMPs; in this respect, MMPs have been distinguished as targets, anti-targets and counter-targets . MMP catalytic domains possess high sequence similarity (56-64%) with a common residue motif, HExGHxxGxxH, incorporating 3 histidines that coordinate the catalytic zinc ion. All protein structures exhibit the characteristic fold of zinc-dependent endopeptidases consisting of a five-stranded beta sheet (1 anti-parallel and 4 parallel) and three alpha helices. Shaped as a cavity crossing the entire enzyme, the active site is characterized by a number of subsites  directly involved in the interaction with physiological substrates and natural or synthetic inhibitors. The human genome sequence has enabled us to characterize the entire MMP family, a gallery of proteases encoded by 26 distinct genes. This family includes the archetypal MMPs, the matrylisin, the gelatinases and the convertase-activable MMPs . To date, at least 26 human MMPs are known and diverse efforts for their classification have been made. In view of this, the development of new analytical strategies enabling the decoding and proper interpretation of information encrypted in protein structures is indeed an open challenge. Among others, cluster analysis is a valuable approach to this end. Clustering deals with the partitioning of a set of N elements into K groups based on a suitable similarity criterion. As is well known, clustering is generally performed through parametric and non-parametric methods . The parametric algorithms require prior knowledge of the data structure, enabling the formulation of assumptions, such as establishing the number of clusters to be found. The clustering problem is, thus, converted into an optimization task, as a cost function is minimized in correspondence to the best partition of the data: typical examples are K-means and deterministic annealing. Non-parametric methods represent the optimal strategy when no prior knowledge of potential clusters is available: these methods make few assumptions about the structure of the data. Examples of non-parametric methods are linkage (agglomerative and divisive) algorithms, whose output is a dendrogram displaying the complete hierarchy of clustering solutions on different scales. A recently proposed non-parametric method is chaotic map clustering (CMC) . This algorithm was inspired by a study of the statistical properties of chaotic physical systems which are exploited to obtain an optimal partition of data. The CMC has already been successfully applied to cluster data in different fields, from medicine to engineering and finance; examples are: the detection of buried land mines using dynamic infrared imaging ; the study of human evolution by clustering mitochondrial DNA sequences ; the analysis of electroencephalographic signals to recognize Huntington's disease ; and the clustering of Dow Jones stock market companies for portfolio optimization strategies .
In the present investigation, CMC was used for the first time to analyse protein structures. Recently analysed through different chemometrical approaches aimed at studying the structural differences [3, 11–13], the family of MMPs was chosen as a case study. It represented a good benchmark, having a high number of entries in the World Wide Protein Data Bank (wwPDB) . In this regard, it is worth saying that the CMC algorithm is even more accurate when dealing with large number of data. Mostly based on the electrostatic potential similarity, the present study accounted for a number of MMPs higher than previous investigations greatly widening the structural boundaries of the so-called MMP target family landscape . More specifically, previous analyses have been performed on a low number (i.e., 10) of MMPs to evaluate their selectivity on the basis of GRID molecular interaction fields and consensus principal component analyses (CPCA) . Other studies, addressing a higher number of MMPs (i.e., 24, including 15 structures from homology modeling), estimated the similarity within the MMP subsites by taking into account ligand interaction energies . Based again on GRID/CPCA, a further analysis has been reported to evaluate MMP selectivity on a larger number of proteins (i.e., 56 MMPs and 1 TACE) . Finally, some of us carried out the screening of all available MMP structures from the PDB and demonstrated that the analysis of the protein sequences enabled us to reproduce the MMP classification based on the structural domain organization . The present analysis of protein electrostatic potential similarities was shown to be effective in obtaining insight into molecular recognition and substrate specificity. CMC analysis was a successful strategy in landscaping the entire MMP target family as well as in investigating the subsites responsible for molecular selectivity. Despite their diverse fundamentals, the analysis of MMPs via CMC provided satisfying results that generally match, or even outperform, those obtainable by applying standard approach such as the Unweighted Pair Group Method with Arithmetic mean (UPGMA) algorithm. CMC performances were also challenged to analyse MMP primary structures. Finally, CMC made it possible to properly relate molecular shape similarity of the co-crystallised ligands with void volumes available in the X-ray MMP complexes.
Results and Discussion
MMP target family landscape
Then, it was observed that MMPs of identical class were aggregated into highly homogenous groups, except for two singletons determined by two stromelysins whose X-ray structures (PDB:1QIA and PDB:1QIC) were missing from a stretch of six residues (i.e., Phe83-Arg-Thr-Phe-Pro-Gly88).
The comparison of CMC and UPGMA (see additional file 1) revealed that congruent results were obtained. However, the dendrogram generated via CMC was indeed more easily interpretable and, to some extent, more informative. Unlike CMC, UPGMA was in fact unable to generate the classification based on the domain organization , which is known as the highest level of MMP classification, but also failed to properly cluster MMP-8 and MMP-1 (i.e., 1HFC joined first the unique MMP-2 and then the group formed by MMP-3). Similarly, the analysis carried out by UPGMA confirmed that 1QIA and 1QIC were effectively diverse from the other elements of the same class (i.e., MMP-3) and produced a cladogram (see additional files 1) with the longest branches for these two proteins.
CMC analyses of MMP binding sites
The second stage of our investigation was focused on MMP active sites and a number of independent CMC analyses were carried out for studying: a) the significant role of the S1' pocket in determining enzyme specificity; b) the residues involved in the S2-S2' stretch embedding the catalytic domain responsible for protein function regulation; c) the region involving S3-S1-S3' subsites constituting a shallow region containing β-strand IV and two slightly variable loops among different MMP isoforms.
a) Analysis of the S1' subsite
where SI is the Hodgkin index of all the protein pairwise (i.e., i and j) combinations and N is the total number of protein structures. Being SI commutative, CSIM accounts for N(N-1)/2 calculations thus avoiding double counts.
b) Analysis of the S2-S2' subsites
The S2-S2' protein regions represented the catalytic domain and displayed a more pronounced similarity (CSIM = 0.949, N = 104) compared to the S1' subsite. As expected, this remarkable electrostatic similarity value was directly related to the consistent percentages of residue consensus.
Although the detection of electrostatic differences proved even more difficult, CMC was able to perceive the variation of electrostatic potential of diverse polar residues. Encompassing residues from position 197 to position 207 (MMP-8 numbering), the stretch under investigation is the well-known sequence motif, HExGHxxGxxH, which is common to all MMPs. As expected, the presence of charged residues among variable residues was immediately detected by the CMC algorithm. For instance, the presence of negatively charged residues (i.e., Glu, Asp) at position 206 implied a consistent variation of the electrostatic potential that was immediately perceived by the CMC algorithm, which resulted in clearly distinct groups including gelatinases, MT-MMPs and MMP-13. In addition, the CMC was able to detect the occurrence of different side-chain conformations and even punctual residue mutations. For instance, gelatinases were split into two groups. The first collected MMP-9 structures incorporating the E402Q mutation (MMP-9 numbering) while the second group contained the only wild MMP-9 structure (PDB:1GKC) and the only X-ray MMP-2 (PDB:1QIB) structure. Moreover, MMP-14 included a member (PDB:456C) of the MMP-13 class. Such a crystal structure differed from other MMP-13 proteins since it lacked residues 104 to 109 (MMP-13 numbering), whose remaining available space was occupied by the Asp421 side-chain exhibiting a diverse conformation. Furthermore, the MMP-12 group did not include two structures (PDB:2W0D and PDB:1JK3) for the occurrence of the E219A mutation (MMP-12 numbering). Interestingly, the UPGMA method afforded comparable results. In this regard, the obtained cladogram (see additional file 3) associated such elements with clearly distinguishable longer branches emerging from a fairly flat tree-like plot.
c) Analysis of the S3-S1-S3' subsites
CMC analysis of MMP primary structures
Ligand analysis via molecular shape similarity
The CMC algorithm was finally used to analyse the molecular shapes of the 84 co-crystallised ligand molecules extracted from the pool of the 104 examined X-ray MMP structures. The present analysis was aimed at evaluating binding specificity towards MMPs on the basis of the complementarity between void volumes within the MMP binding sites and the molecular shapes of the co-crystallised inhibitors. Specifically, CMC made it possible to relate molecular shape similarity with even subtle diversity of MMP physicochemical environments based on the fundamental assumption that two ligand molecules would have the same shape if their volumes matched exactly.
Thirdly, low molecular selectivity was immediately inferred by observing a high concentration of replicates (i.e., batimastat and NNGH analogues) within the same cluster, although these co-crystallised with different MMPs. Finally, the analysis of ligand molecular shapes via CMC algorithm made it possible to gain insight into selectivity beyond the backbone on the basis of the size (i.e., shallow or deep) of the S1' pocket . Inhibitors with bulkier moieties (e.g., 4-(4-phenyl-piperidin-1-yl)-benzenesulfonylamino and 4'-[(benzofuran-2-carbonyl)-amino]-biphenyl-4-sulfonylamino substituents) were effectively comprised in the same cluster intercepting deep pocket MMP complexes (i.e., PDB:1B8Y, PDB:1CIZ and PDB:1CAQ as MMP-3, PDB:1ZTQ and PDB:1ROS as MMP-12).
The main objective of the present investigation was to apply the chaotic map algorithm to clustering MMP structures. Based on electrostatic potential values, CMC analyses afforded a comprehensive representation of the intimate relationships existing among MMPs, showing that structural differences between proteins reflect genuine variation of their electrostatic potentials. In particular, CMC analysis of entire MMP structures was successful in accurately reproducing the canonical classification of MMPs normally based on domain organization. Such a result was not attained when the analysis was repeated by using the UPGMA approach. In addition, CMC demonstrated high sensitivity in discerning even smaller protein stretches, and defining relevant areas in proximity to the binding site. More importantly, CMC was able to properly detect the variance of electrostatic potential occurring for even punctual mutations of ionisable residues. Furthermore, CMC demonstrated an outstanding aptitude for capturing local distortions of the electrostatic potential probably related to physical incorporation of small ligands inducing smaller structural protein rearrangements. Interestingly, CMC represented a valid strategy even for standard analyses as those involving only MMP sequences. Similarly, CMC was successful in correctly relating the molecular shapes of ligand molecules to the void volumes available within the MMP binding sites.
In this view, CMC could represent a valuable alternative approach or a complement to other clustering methods assessing the structural similarity within protein families. CMC demonstrated performances comparable to those of the UPGMA, with the former leading however to more easily interpretable results. Incidentally, it should be said that CMC is tailored to deal with large amounts, and could also have potential in the database mining.
List of X-ray solved MMP structures retrieved from the wwPDB.
1CGE, 966C, 1HFC, 2TCL, 1CGL, 1CGF, 2J0T
1MNC, 1ZS0, 1ZP5, 1MMB, 1JAO, 1JAP, 1JAQ, 1JJ9, 1I76, 1I73, 1ZVX, 1BZS, 1KBC, 1JAN, 1A86, 1A85, 1JH1, 3DNG, 3DPE, 3DPF, 2OY2, 2OY4
1XUC, 1XUR, 1XUD, 1YOU, 830C, 456C, 1ZTQ, 2D1N, 1CXV, 2PJT, 2OW9, 2E2D, 2OZR
1Y93, 1RMZ, 1OS9, 1OS2, 1UTZ, 1UTT, 1JIZ, 1ROS, 1JK3, 3F15, 3F16, 3F17, 3F18, 3F19, 3F1A, 2W0D, 2HU6, 2OXU, 2OXW, 2OXZ
1B8Y, 1CIZ, 1CAQ, 1G4K, 2USN, 1USN, 1SLM, 1UEA, 1HFS, 1QIC, 1C3I, 1CQR, 1BQO, 1BIW, 1SLN, 1HY7, 1G05, 1G49, 1D5J, 1D8F, 1D7X, 1D8M, 2D1O, 1B3D, 1QIA, 1C8T
1MMP, 1MMQ, 1MMR
1GKC, 1GKD, 2OVX, 2OVZ, 2OW0, 2OW1, 2OW2
After removing water molecules and co-crystallised inhibitors, MMPs were aligned onto Cartesian coordinates of C-alpha atoms and the three catalytic histidine side chains of 1ZS0, selected as template.
In addition, the ligand data set comprised a number of 84 co-crystallised ligands extracted from the pool of X-ray MMP structures.
Protein electrostatic potential similarity
According to a recent work , each protein structure was subjected to electrostatic potential calculation by using Adaptive Poisson-Boltzmann Solver (APBS) program . A grid of dimensions 65 × 65 × 65 Å3 was used, together with a 1.5 Å grid spacing, for the computation of the electrostatic potential via a finite difference solution of the linearised Poisson-Boltzmann equation. The grid was centred on the global centre of mass of the superimposed structures. The dielectric constants of the solvent and the protein were set to 78 and 1, respectively. Charges were assigned by using AMBER99 force field and hydrogen bonding network optimization was not set to keep unchanged the protonation state of all polar residues .
Using default parameters, Protein Interaction Property Similarity Analysis (PIPSA) software  was run to obtain distance matrix. Electrostatic potentials (M) were computed at points (x, y, z) on a three-dimensional grid surrounding the entire protein structures. On the basis of the electrostatic potential values, Similarity Indices (SI) were then computed for grid points within the intersection of a specific region, defined as "skin", surrounding each MMP structure at a distance of 3Å from the van der Waals surface and having a thickness of 4 Å. As documented , the use of the skin region enabled to better account for the protein similarity shape.
being (Mi, Mj), , and scalar products.
It is easy to demonstrate that when two potentials are identical then SIi,j = 1, when they are uncorrelated SIi,j = 0 and when they are anti-correlated SIi,j = - 1.
where, for each given pair of proteins, SIi,j and Di,j represent the similarity and distance values, respectively. The latter were effectively used as input for the CMC algorithm.
The Smith-Waterman algorithm  was used for aligning primary structures by selecting the PAM250 scoring matrix and setting gap-open, gap-extend, and scale value at 10.0, 0.5 and 3.0, respectively. As already done for electrostatic potential similarity values, the obtained matrix was converted into the corresponding distance matrix through the equation 3.
Ligand molecular shape similarity
The molecular shape and the pairwise similarity analysis of the ligand data set was operated through the program ROCS (standing for Rapid Overlay of Chemical Structures, from OpenEye Scientific Software) , disabling solid-body optimization process to maintain unchanged the protein-ligand positions. The Tanimoto indexes calculated for all the n(n - 1)/2 pairwise MMP ligand combinations were, thus, converted into distance values by applying equation 3. The obtained data were stored into a square matrix for running CMC algorithm.
Chaotic map clustering algorithm
Written in MATLAB metalanguage (The MathWorks, Inc.) , the CMC algorithm was originally introduced as a central algorithm, where the elements to cluster are embedded in a D-dimensional feature space. In such a picture, the data-points are viewed as sites of a grid, hosting a chaotic map dynamics. Depending on the analysis carried out, the entire protein structures, the protein subsites, the sequences or the ligand structures are thus used as input data-points which are distributed in a vectorial space so that a map variable xi ∈ [-1,1], i = 1...N can be assigned to each structure. Initially, the assignment is purely random. The entire system will then evolve on the basis of the short range interactions between neighbouring maps. In this respect, the diverse distance Dij associated to the different analyses is used to measure the corresponding data coupling Jij = exp [-(Dij)2/2a2], where α is the local length scale, whose value is the average distance of the K-nearest neighbours. Being Jij an exponential decreasing function of the site distance Dij, a high value of the distance stands for a low tendency of coupling. In the present study, a pairwise version of the algorithm was implemented by simply adopting the distance matrix described above, in the equation of the couplings Jij. The parameter K is set at a value such that its change does not affect substantially the clustering results. This value is independent of the size of the dataset, rather it depends on the particular distribution of the data at hand.
where Ni(ϑ) is the number of elements in the i-th cluster at threshold ϑ and N is the total number of elements.
Unweighted Pair Group Method with Arithmetic mean
To compare CMC performance with those achievable through others clustering methods, UPGMA was used as it represents a standard clustering bioinformatics approach. Moreover, it was adopted to generate Phylip representations  from a distance matrix within the PIPSA package . The UPGMA constructs a rooted tree by using the average-linkage as metric of clustering. At each step, the nearest two clusters are combined into a higher-level cluster. The distance between any two clusters A and B is taken to be the average of all distances between pairs of objects x in A and y in B, that is, the mean distance between elements of each cluster.
The authors thank Regione Puglia ("Progetto Strategico Neurobiotech, PS 126) and the European Commission (CancerGrid STREP project, FP VI, Contract LSHC-CT-2006-03755) for their financial support.
- Burzlaff N: From Model Complexes for Zinc-Containing Enzymes. In Concepts and Models in Bioinorganic Chemistry. Wiley-Vch; 2006:397–429.Google Scholar
- Overall CM, Kleifeld O: Tumor Microenvironment - Opinion: Validating Matrix Metalloproteinases as Drug Targets and Anti-Targets for Cancer Therapy. Nat Rev Cancer 2006, 6: 227–239. 10.1038/nrc1821View ArticlePubMedGoogle Scholar
- Terp GE, Cruciani G, Christensen IT, Jorgensen FS: Structural Differences of Matrix Metalloproteinases with Potential Implication for Selectivity Examined by the GRID/CPCA Approach. J Med Chem 2002, 45: 675–2684. 10.1021/jm0109053View ArticleGoogle Scholar
- Overall CM, Lopez-Otin C: Strategies for MMP Inhibition in Cancer: Innovations for the Post-Trial Era. Nat Rev Cancer 2002, 2: 657–672. 10.1038/nrc884View ArticlePubMedGoogle Scholar
- Willett P: Similarity and Clustering in Chemical Information Systems. New York: John Wiley & Sons; 1987.Google Scholar
- Angelini L, De Carlo F, Marangi C, Pellicoro M, Stramaglia S: Clustering Data by Inhomogeneous Chaotic Map Lattices. Phys Rev Lett 2000, 85: 554–557. 10.1103/PhysRevLett.85.554View ArticlePubMedGoogle Scholar
- Angelini L, De Carlo F, Marangi C, Mannarelli M, Nardulli G, Pellicoro M, Satalino G, Stramaglia S: Chaotic neural networks clustering: an application to antipersonnel mines detection by dynamical IR imaging. Opt Eng 2001, 40: 2878–2884. 10.1117/1.1412623View ArticleGoogle Scholar
- Marangi C, Angelici L, Mannarelli M, Pellicoro M, Stramaglia S, Attimonelli M, De Robertis M, Nitti L, Pesole G, Saccone C, Tommaseo M: Proceedings of the International Workshop on Modelling Biomedical Signals, Bari, 2001. Edited by: Nardulli G, Stramaglia S. World Scientific, Singapore; 2002:196–208.Google Scholar
- Bellotti R, De Carlo F, Stramaglia S: Chaotic map clustering algorithm for EEG analysis. Physica A 2004, 334: 222–232. 10.1016/j.physa.2003.10.074View ArticleGoogle Scholar
- Basalto N, Bellotti R, De Carlo F, Facchi P, Pascazio S: Clustering stock market companies via chaotic map synchronization. Physica A 2005, 345: 196–206.View ArticleGoogle Scholar
- Lukacova V, Zhang Y, Mackov M, Baricic P, Raha S, Calva JA, Balaz S: Similarity of Binding Sites of Human Matrix Metalloproteinases. J Biol Chem 2004, 279: 14194–14200. 10.1074/jbc.M313474200View ArticlePubMedGoogle Scholar
- Pirard B, Matter H: Matrix Metalloproteinase Target Family Landscape: a Chemometrical Approach to Ligand Selectivity Based on Protein Binding Site Analysis. J Med Chem 2006, 49: 51–69. 10.1021/jm050363fView ArticlePubMedGoogle Scholar
- Nicolotti O, Miscioscia TF, Leonetti F, Muncipinto G, Carotti A: Screening of matrix metalloproteinases available from the protein data bank: insights into biological functions, domain organization, and zinc binding groups. J Chem Inf Mod 2007, 47: 2439–48. 10.1021/ci700119rView ArticleGoogle Scholar
- The Protein Data Bank[http://www.rcsb.org/pdb/home/home.do]
- Gillet VJ, Nicolotti O: Evaluation of reactant-based and product-based approaches to the design of combinatorial libraries. Perspect Drug Discov Des 2000, 20: 265–287. 10.1023/A:1008797526431View ArticleGoogle Scholar
- Overall CM, Kleifeld O: Towards third generation matrix metalloproteinase inhibitors for cancer therapy. British Journal of Cancer 2006, 94: 941–946. 10.1038/sj.bjc.6603043View ArticlePubMedPubMed CentralGoogle Scholar
- Agrawal A, Romero-Perez D, Jacobsen JA, Villarreal FJ, Cohen SM: Zinc-Binding Groups Modulate Selective Inhibition of MMPs. Chem Med Chem 2008, 3: 812–820.View ArticlePubMedPubMed CentralGoogle Scholar
- Nicolotti O, Giangreco I, Miscioscia TF, Carotti A: Improving Quantitative Structure-Activity Relationships through Multi-Objective Optimization. J Chem Inf Mod 2009, 49: 2290–2302. 10.1021/ci9002409View ArticleGoogle Scholar
- Henrich S, Richter S, Wade RC: On the use of PIPSA to Guide Target-Selective Drug Design. Chem Med Chem 2008, 3: 413–417.View ArticlePubMedGoogle Scholar
- Baker NA, Sept D, Joseph S, Holst MJ, McCammon JA: Electrostatics of nanosystems: application to microtubules and the ribosome. Proc Natl Acad Sci USA 2001, 98: 10037–10041. 10.1073/pnas.181342398View ArticlePubMedPubMed CentralGoogle Scholar
- Dolinsky TJ, Nielsen JE, McCammon JA, Baker NA: PDB2PQR: an automated pipeline for the setup, execution, and analysis of Poisson-Boltzmann electrostatics calculations. Nucleic Acids Research 2004, 32: W665-W667. 10.1093/nar/gkh381View ArticlePubMedPubMed CentralGoogle Scholar
- Wade RC, Gabdoulline RR, De Rienzo F: Protein Interaction Property Similarity Analysis. Intl J Quant Chem 2001, 83: 122–127. 10.1002/qua.1204View ArticleGoogle Scholar
- Hodgkin EE, Richards WG: Molecular similarity based on electrostatic potential and electric field. Int J Quant Chem Quant Biol Symp 1987, 14: 105–110. 10.1002/qua.560320814View ArticleGoogle Scholar
- Smith TF, Waterman MS: Identification of Common Molecular Subsequences. J Mol Biol 1981, 147: 195–197. 10.1016/0022-2836(81)90087-5View ArticlePubMedGoogle Scholar
- ROCS, version 2.4.2OpenEye Scientific Software, Inc., Santa Fe, NM, USA; 2005. [http://www.eyesopen.com]
- MATLAB The Language Of Technical Computing Version 7.3 The Mathworks; Natick, MA 2006.
- Felsenstein J: Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 1981, 17: 368–376. 10.1007/BF01734359View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.