- Research article
- Open Access
Recruitment of rare 3-grams at functional sites: Is this a mechanism for increasing enzyme specificity?
© Tobi and Bahar; licensee BioMed Central Ltd. 2007
- Received: 16 February 2007
- Accepted: 28 June 2007
- Published: 28 June 2007
A wealth of unannotated and functionally unknown protein sequences has accumulated in recent years with rapid progresses in sequence genomics, giving rise to ever increasing demands for developing methods to efficiently assess functional sites. Sequence and structure conservations have traditionally been the major criteria adopted in various algorithms to identify functional sites. Here, we focus on the distributions of the 203 different types of 3-grams (or triplets of sequentially contiguous amino acid) in the entire space of sequences accumulated to date in the UniProt database, and focus in particular on the rare 3-grams distinguished by their high entropy-based information content.
Comparison of the UniProt distributions with those observed near/at the active sites on a non-redundant dataset of 59 enzyme/ligand complexes shows that the active sites preferentially recruit 3-grams distinguished by their low frequency in the UniProt. Three cases, Src kinase, hemoglobin, and tyrosyl-tRNA synthetase, are discussed in details to illustrate the biological significance of the results.
The results suggest that recruitment of rare 3-grams may be an efficient mechanism for increasing specificity at functional sites. Rareness/scarcity emerges as a feature that may assist in identifying key sites for proteins function, providing information complementary to that derived from sequence alignments. In addition it provides us (for the first time) with a means of identifying potentially functional sites from sequence information alone, when sequence conservation properties are not available.
- Functional Site
- Inactive Conformation
- Catalytic Loop
- Gaussian Network Model
- Rare Amino Acid
Proteins perform a variety of biological functions, the efficiency and specificity of which are determined to a large extent by their intrinsic sequence-structure properties. Many biophysical and biochemical activities such as binding specific substrates, catalysis, or allosteric responses involve particular sequence motifs at functional sites. Identification of functional sites, or understanding sequence-to-function mapping, in general, has been a major goal in computational molecular biology and bioinformatics. With rapid accumulation of genome scale sequence data there is an ever increasing need for efficient assessment of potential functional sites.
Among different criteria adopted to identify functional sites, sequence conservation has probably been the most widely used, based on the fact that functional residues are often conserved across all or the majority of members in a given family of proteins. Algorithms developed for detecting conservation patterns are traditionally based on multiple sequence alignments [1–6]. Other algorithms use a combination of sequence and structure comparisons provided that structural information is available, which usually yield relatively higher predictive abilities [7–11]. A set of criteria was proposed by Thornton and coworkers for detecting catalytic residues, which was successfully implemented in an algorithm for identifying active sites . Other strategies include pattern matching such as TESS , FFF  and SPASM  that locate functional sites by detecting small three-dimensional motifs. Proteins dynamics has been shown to be another property that can be advantageously examined to detect functional sites . We have recently shown that the global hinge centers of enzymes co-localize with their catalytic sites, pointing to the importance of coupling between mechanics and chemistry for enzymatic activity . In addition, analyzing patterns of communication and shortest paths in protein structures modeled as networks has proven useful in identifying allosteric sites [17, 18].
In the present study, we propose another property that can serve as a criterion for detecting functionally important sites: scarcity or rareness in the space of sequence motifs. Motifs composed of three sequential amino acids (3-grams) are considered here as the shortest, yet distinctive, sequence fragments that can provide statistically significant information. The choice of 3-grams in the following work is based on preliminary studies as well as our previous work  that showed that 1-grams and 2-grams do not confer enough specificity while there is no major difference in the performance of 3-grams and 4-grams (due to the fact that some strong signals characteristic of 3-grams may be overlooked upon examination of 4-grams), while 3-grams lend themselves to more accurate statistics. A given 3-gram will be termed 'rare' if its probabilistic occurrence in the UniProt (Universal Protein Resource)  is significantly lower than that expected from an unbiased distribution, as will be explained in quantitative details below.
A notable feature emerging from this study is the propensity of proteins' active sites to populate rare sequence motif, akin to the notion that rare structural motifs co-localize with functionally important sites [21–23]. This selectivity suggests that enzymes tend to recruit such distinctive/rare sequences at their active sites to increase their specificity, consistent with the higher information content associated with rare events. We illustrate the functional importance of such rare 3-grams in three proteins, C-Src kinase, hemoglobin, and tyrosyl-tRNA synthetase.
Distributions of 3-grams
The unbiased probability of a given 3-gram is p0 = 1/203 = 0.000125, and the corresponding count is C0 = C tot /203 = 7782 in the UniProt. The 3-grams whose count is lower than this value by a factor of f = 2 compose 10% of all the counts C tot in UniProt, and those lower by f = 3 amount to 4% of C tot .
Scarcity scores and enhancement factors
where p0 X is the natural frequency of amino acid of type X (for details see Table S1 in Additional file 1). The enrichment in a given 3-gram count will be described by the enhancement factor
where the summation is performed over the homologous sequences (1 ≤ k ≤ m) in the alignment.
Proteins tend recruit rare 3-grams at their active sites
GT (Gutteridge-Thornton, 2005)24 dataset of 59 enzymes*
Comparison of the two histograms immediately reveals the tendency of the active site 3-grams to populate the lower UniProt counts. For example, let us consider the rare 3-grams (with C i < 1,140 or > 10,908; see Methods and Figure 2). 11.67% of active site 3-grams fall in this range, whereas the percentage drops to 4.35% when 3-grams at other regions are considered. Therefore, rare 3-grams are over-represented in the active sites by a factor of 2.67. The difference between the two distributions shown in panel A is statistically significant (χ2, P << 0.001).
Similarly, we compared the histogram obtained for the subset of active site 3-grams with that calculated for solvent-exposed 3-grams (see Methods). Figure 4B shows the histograms for active site (gray) and exposed (black) residues. The active sites are again distinguished by the propensity of rare 3-grams. Rare 3-grams are 5.4 times more frequent at active sites compared to exposed regions (χ2, P << 0.001) . The difference between the two histograms is more pronounced than that observed in panel A.
Cys and His, two residues that have the highest active site propensity (both in terms of their ligand-binding and catalytic activities) are among the least frequent amino acids in terms of their natural occurrence. Figure 5A shows the relation between the frequencies of amino acids in UniProt and their active site propensities. While the data are scattered, there is a discernible anti-correlation between the two sets of data, consistent with Figure 4; rare amino acids have a higher tendency to be in active sites. Clearly this preference becomes more pronounced when 3-grams, as opposed to monograms are examined, as a can be seen from the skewed distribution of the frequency of active site residues towards lower counts in Figure 4.
Rare 3-grams may arise from three possibilities: (1) these may be words in which the individual letters have low natural frequencies themselves (i.e. a i ≈ 1), (2) these may be rare 'words', i.e. rare combinations of particular amino acids (a i << 1); or (3) these may be rare despite their enhancement (a i >>1) due to the extremely low natural frequencies of the individual amino acids that compose them. Counterparts of Figure 3 plotted for the active site 3-grams and other 3-grams showed that both groups fall in the first category (a i ≈ 1). Their observed counts plotted against the expected ones yield a slope of 1.075 ± 0.008. The rare 3-grams belonging to the two groups, on the other hand, exhibited a slight enhancement of a i = 1.11 and 1.07, for active site and other residues, respectively. Thus, 3-grams recruited at active sites are usually combinations of residues which already have relatively low natural frequencies of occurrence. Yet, particular 3-grams exhibit significant departure from their expected occurrence probabilities, e.g. SSS, GGG, RRR, GSG (over-represented), and LEL (under-represented). See Table S2 and S3 in Additional files for details. Among rare 3-grams at active sites, also exhibiting a large enhancement in the UniProt (case 3, above), we distinguish WWH and HCW.
Scarce 3-grams in the GT dataset
Results for the GT dataset
Scarcity Score Threshold
Illustrative results for three cases
The functional role of rare 3-grams at proteins' functional sites is illustrated here for three test cases, a Src tyrosine kinase, hemoglobin and a tyrosyl-tRNA synthetase. For each case, a set of homologous proteins has been compiled and aligned, and the scarcity score <> of amino acid(s) at each position has been calculated. The analysis reveals that stretches of rare amino acids are involved in functional roles even beyond ligand binding sites.
Figure 6B shows the scarcity scores <> as a function of residue number along the c-Src sequence, based on UniProt counts of 3-grams, averaged over the set of m aligned homologous sequences (eq 5). Six n-grams (3 ≤ n ≤ 6) composed of contiguous 3-grams are distinguished by their high scores, D117WWL120, E146EWYF150, Y382VH384, I426KWTA430, D444VWSF448, and C496QCWRK501, in addition to individual residues H319, K200, C478 and P485.
D117WWL120 is located in the SH3 domain, and interacts with the linker that connects the SH2 and kinase domains when the kinase adopts its inactive conformation (Figure 6A). The SH3-linker interactions have been pointed out to serve as an independent mode of regulation for Hck, a Src family member . In addition, Trp118, at the ligand-binding site of SH3 , was found to be essential to the stability of the SH3-VSL12 complex .
The second stretch of amino acids, E146EWYF150, coincides with the SH2-SH3 linker which was shown by Kuriyan and coworkers  to work as an inducible "snap-lock" that clamps the SH2 and SH3 domains upon phosphorylation of Y572. Mutations of residues in the linker have been shown to lead to a constitutive activation of c-Src .
The third stretch, Y382VH384, forms the Src catalytic loop. Y382 takes part in key interactions that stabilize the kinase domain in the inactive conformation  and H384 belongs to the conserved HRD motif of the catalytic loop. Lys200 at SH2 domain interface interacts with the C-terminal regulatory tail.
The other three stretches of amino acids I426KWTA430, D444VWSF448, and C496QCWRK501 form a cluster located at and beneath the substrate binding region (Figure 6A). The portion Q497CWR500 is composed of two overlapping 3-grams that have the highest scarcity scores in the protein. Notably, the central residues in these stretches, W499, C498, W428, W446, are highly or fully (bold) conserved, lending support to their critical role.
A systematic analysis of amino acid conservation conducted for c-Src protein family members demonstrated that amino acids distinguished by their high scarcity scores tend to be conserved among the family members (see Conclusion), despite their being rare in the UniProt. Conserved sites distinguished by their scarcity include, in addition to the four residues listed above, W148, K427, T429, F448, S447, A430, H319, H384, C487, D444 and V445. See Additional files Table S5 for details.
Figure 6C shows the two motifs derived for Src kinase from PROSITE database . The first is the stretch of twenty two amino acids L273GQGCFG ... GTTRVAIK295 corresponding to the ATP binding region and the second is the stretch Y382VHRDLRAANILV394 corresponding to the catalytic loop. Except for the 3-gram Y382VH384 identified in both panels A and C, the 3-grams deduced from scarcity score and PROSITE analyses appear to provide complementary information.
The <> profiles for subunits α and β are presented in Figure 7B and 7C, respectively. Five stretches of residues are distinguished by their high scarcity scores in subunit α : α A12AWGK16, α R31MF33, α Y42FPHF47, α H87AH89, and α S102HCL105, colored magenta in panel A. In subunit β, β W15GK17, β Y35PWTQ39, β L91HCD94 and β H143KYH146 (cyan) emerge as rare sequences.
Notably, all of these stretches of amino acids distinguished by their high scarcity score assume functional roles in Hb. α R31MF35 and α Y42FPHFD47 are directly involved in binding the heme group. The mutant R31S results in abnormal hemoglobin – Hb Prato . α Tyr42 forms an inter-subunit hydrogen bond with β Asp99 across the dimer-dimer interface in the T state of hemoglobin but not on the R state. Kavanaugh et al  showed that this hydrogen bond and other interactions associated with the side chain of α Tyr42 make a major contribution to the stability of the T state. α H87AH89 corresponds to the heme binding site, with α His87 coordinating the Fe atom. The Fe-Nε bonds between the heme group and α His87 or β His92 are essential for the cooperativity of hemoglobin . α His103 is located inside a central cavity lined with excess of positively charged ionizable groups. It has been suggested  and experimentally validated  that the mutual repulsion of these ionizable groups increases the oxygen affinity by raising the free energy of the T state. Using network analysis, del Sol et al. confirmed that α His103 plays an important role in allosteric communication . We also note the critical position of α His103 at the α1β1 interface with α Ser102 pointing toward the heme binding pocket. Among the residues participating in rare sequences in subunit β, in addition to the heme-binding β H92, we note β K17, the substitution of which by Asn has been reported to lead to abnormal Hb J Amiens . The segment β Y35PWTQ39 is located at the convergence of the α1β1 and α2β2 interfaces and is important for allostery. Mutations of β 37W (e.g. β W37E and β W37G) have been shown to lead to tetrameric Hb with functional properties similar to those of the αβ dimers; that is, high oxygen affinity with no cooperativity [42, 43]. Finally, the 4-gram β H143KYH146 at the C-terminal end of subunit β and in particular H143 is a binding site for allosteric effectors [44, 45].
In addition to the above experimental evidence, we note that using the Gaussian network model we have shown that two segments, α F36-H45 and β T87-β N102, play a key role in the allosteric transition of Hb from constrained (T form) to flexible (or relaxed) (R2 form) state ; these two regions include the so-called switch region . The first stretch partially overlaps with α Y42FPHF46 and the second contains β L91HCD94. These results lend further support to the critical role of the rare n-grams, and in particular β His92, in propagating allosteric signals. Finally, we note that out of 37 rare residues identified in Hb, 14 are fully conserved (entropy S = 0), and 17 highly conserved (S < 0.75; see Additional files Table S6 for more details).
Upon scanning the hemoglobin sequence against the PROSITE database (Figure 7D) four histidines are retrieved, two (α His58 and α His87) on α-subunits, and two (β His63 and β His92) on β-subunits. As mentioned above, α His87 and β His92, also detected among the rare 3-grams, coordinate the Fe atom of the heme group, while α His58 and β His63 are located at the same region across the heme plane.
Results in the absence of sequence alignment
In the above three examples the scarcity score was calculated by averaging p uni | jk over a set of homologous proteins. However, in principle, scarcity score calculations can be equally performed for a single sequence as the ingredient is the UniProt frequencies for triplets along the sequence. The alignments are simply used to magnify (increase the accuracy of) the signals that could, otherwise, be extracted from single sequence analysis. As shown in the figures S1–S3 in the Additional files, the profiles based on single sequences closely approximate those derived from multiple sequence alignments. For hemoglobin subunit α, for example, six stretches of amino acids are distinguished by their high scarcity scores: W14GK16, R31MF33, F43PHF46, D75MPN78, H87AH89, S102HCL105, in accord with the results presented above. The only stretch of amino acids that differs from the above analysis is D75MPN78. Thus, when sequence conservation algorithms fail, the 3-gram analysis of single chains can be advantageously resorted to for identifying potentially functional sites.
The present study shows that relatively rare sequence motifs are preferentially recruited at enzymes' active sites. The probabilistic occurrence of rare 3-grams at active sites is higher than that of other 3-grams by a factor of more than 2.6 and higher that of exposed 3-grams by a factor of 5.4 according to the results obtained for the GT dataset of 59 enzymes (Figure 4). Detailed analysis of each enzyme in this dataset demonstrated that a scarcity score threshold of 11.0, permits us to identify 3-grams located at/near active sites, with a TP rate more than three times larger than the FP rate. These results were obtained from the sequence of the individual enzymes without recourse to sequence alignment or any amino acid conservation information.
The preferential selection of rare 3-grams at functional sites may not be a property of enzymes, exclusively, but that of proteins, in general, as the current application to hemoglobin suggests. We have illustrated and discussed here the functional importance of rare 3-gram motifs for three proteins: c-Src, hemoglobin and tyrosyl-tRNA synthetase.
While our analysis is purely sequence-based, it appears to be conceptually in accord with two structure-based studies that indicate that rare structural motifs correlate with proteins functional sites. Petock et al  showed that 9-residue fragments of rare backbone conformations often form parts of ligand-binding sites, protein-protein interactions, and domain-domain contacts. Likewise, Novotny and Kleywegt  found that, even thought left-handed helices are rare, when they do occur, they are structurally or functionally significant. The present analysis shows that not only structural motifs, but sequence motifs as well, tend to be highly distinctive at active sites. This selection of rare sequence motifs seems to be driven by specificity requirements. Notably, the rarest amino acids (Cys, His, Trp and Met; Figure 5B) have unique functional properties: Cys and Met are sulfur-containing amino acids and are sensitive to oxidation. Likewise, His is unique as its protonation depends on pH, and Trp is distinguished by its bulky size and significant contribution to stabilization of hydrophobic cores or interfaces. It has been suggested that proteins may even sustain a decrease in their stability in order to form effective functional sites [21, 22]. In accord with the above notion, these highly specific residues, Trp, His, Met, and Cys, may be used by enzymes for precise functional purposes, their specificity being amplified in 3-grams where two or more of these amino acids are juxtaposed.
Identification of proteins' functional sites using computational tools is a major challenge in computational biology in the post-genomic era [52, 53]. Sequence and structure conservation are the major criteria used by current algorithms to identify functional important sites. Identification of functional sites for orphan proteins (for which no homologous protein exists) is beyond the scope of algorithms that use conservation as major criterion for identifying these sites. Notably, we repeated the scarcity score analysis by using only a single chain and we were still able to identify functionally important sites (see Additional files), demonstrating that the potentially functional sites of orphan proteins can also be assessed by our methodology, and that conservation and scarcity are two different features. We note that the average scarcity score of scarce and conserved residues will be higher than that of scarce and non-conserved residues, as the scarcity score is evaluated as an average over all aligned sequences. However, calculations repeated for single query sequences, as opposed to those averaged over multiple aligned sequences showed that such effects are negligibly small and the profile of scarcity scores for examined proteins are robust. In addition, we have compared the scarcity scores (based on UniProt frequencies) and the Shannon entropies at the corresponding amino acid positions. Computations for the three test cases revealed that residues participating in rare 3-grams (Tables S5–S7) are not necessarily conserved or vice versa. However, they do exhibit a higher tendency to be conserved compared to other amino acids, as illustrated in Figure 9. Therefore, it is conceivable that a combined approach exploiting both conservation and scarcity may enhance the score/signal associated with particular sites, thus increasing the ability to accurately identify functional motifs.
Present calculations are based on the data extracted from UniProt  version 4.2. This database was chosen because it is manually curated with minimal level of redundancy and still comprehensive. Furthermore, UniProt contains entries from multiple organisms and it is therefore less susceptible to biases in 3-gram distributions of specific species as noted by Ganpathiraju et al .
The biological implications of the 3-grams distinguished by high scarcity scores were examined using the dataset of 59 enzyme/ligand complexes compiled by Gutteridge and Thornton  (GT dataset; see Table 1). We performed three case studies: C-Src kinase, hemoglobin subunits α and β, and tyrosyl-tRNA synthetase. Homologous sequences were retrieved in each case by a PBLAST search  of the SwissProt database. Sequences with E-values lower than 10-57 were extracted, leading to m = 111 homologous sequences for c-Src, 138 and 224 sequences for hemoglobin α- and β-subunits, respectively, and 24 sequences for tyrosyl tRNA synthetase. The sequences were aligned with ClustalX using default parameters .
The Accessible surface area (ASA) of each amino acid in the GT database was calculated using the program NACCESS . Amino acids were defined as exposed if their relative exposed surface are was > 20%. 3-grams with two or more exposed amino acids were defined as exposed ones.
We thank Thomas E. Smithgall for helpful discussions on the structure-function of Src proteins family. Support by NIH R01 grant # LM007994 is gratefully acknowledged.
- Casari G, Sander C, Valencia A: A method to predict functional residues in proteins. Nat Struct Biol 1995, 2: 171–178. 10.1038/nsb0295-171View ArticlePubMedGoogle Scholar
- Lichtarge O, Bourne HR, Cohen FE: An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 1996, 257: 342–358. 10.1006/jmbi.1996.0167View ArticlePubMedGoogle Scholar
- Hannenhalli SS, Russell RB: Analysis and prediction of functional sub-types from protein sequence alignments. J Mol Biol 2000, 303: 61–76. 10.1006/jmbi.2000.4036View ArticlePubMedGoogle Scholar
- Armon A, Graur D, Ben-Tal N: ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. J Mol Biol 2001, 307: 447–463. 10.1006/jmbi.2000.4474View ArticlePubMedGoogle Scholar
- Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N: Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 2002, 18 Suppl 1: S71-S77.View ArticlePubMedGoogle Scholar
- Andrade MA, Casari G, Sander C, Valencia A: Classification of protein families and detection of the determinant residues with an improved self-organizing map. Biol Cybern 1997, 76: 441–450. 10.1007/s004220050357View ArticlePubMedGoogle Scholar
- Aloy P, Querol E, Aviles FX, Sternberg MJ: Automated structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking. J Mol Biol 2001, 311: 395–408. 10.1006/jmbi.2001.4870View ArticlePubMedGoogle Scholar
- Madabushi S, Yao H, Marsh M, Kristensen DM, Philippi A, Sowa ME, Lichtarge O: Structural clusters of evolutionary trace residues are statistically significant and common in proteins. J Mol Biol 2002, 316: 139–154. 10.1006/jmbi.2001.5327View ArticlePubMedGoogle Scholar
- Gutteridge A, Bartlett GJ, Thornton JM: Using a neural network and spatial clustering to predict the location of active sites in enzymes. J Mol Biol 2003, 330: 719–734. 10.1016/S0022-2836(03)00515-1View ArticlePubMedGoogle Scholar
- Panchenko AR, Kondrashov F, Bryant S: Prediction of functional sites by analysis of sequence and structure conservation. Protein Sci 2004, 13: 884–892. 10.1110/ps.03465504PubMed CentralView ArticlePubMedGoogle Scholar
- Landgraf R, Xenarios I, Eisenberg D: Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. J Mol Biol 2001, 307: 1487–1502. 10.1006/jmbi.2001.4540View ArticlePubMedGoogle Scholar
- Wallace AC, Borkakoti N, Thornton JM: TESS: a geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites. Protein Sci 1997, 6: 2308–2323.PubMed CentralView ArticlePubMedGoogle Scholar
- Fetrow JS, Godzik A, Skolnick J: Functional analysis of the Escherichia coli genome using the sequence-to-structure-to-function paradigm: identification of proteins exhibiting the glutaredoxin/thioredoxin disulfide oxidoreductase activity. J Mol Biol 1998, 282: 703–711. 10.1006/jmbi.1998.2061View ArticlePubMedGoogle Scholar
- Kleywegt GJ: Recognition of spatial motifs in protein structures. J Mol Biol 1999, 285: 1887–1897. 10.1006/jmbi.1998.2393View ArticlePubMedGoogle Scholar
- Chen SC, Bahar I: Mining frequent patterns in protein structures: a study of protease families. Bioinformatics 2004, 20 Suppl 1: I77-I85. 10.1093/bioinformatics/bth912View ArticlePubMedGoogle Scholar
- Yang LW, Bahar I: Coupling between catalytic site and collective dynamics: a requirement for mechanochemical activity of enzymes. Structure 2005, 13: 893–904. 10.1016/j.str.2005.03.015PubMed CentralView ArticlePubMedGoogle Scholar
- del Sol A, Fujihashi H, Amoros D, Nussinov R: Residues crucial for maintaining short paths in network communication mediate signaling in proteins. Mol Syst Biol 2006, 2: 2006. 10.1038/msb4100063PubMed CentralView ArticlePubMedGoogle Scholar
- Chennubhotla C, Bahar I: Markov propagation of allosteric effects in biomolecular systems: application to GroEL-GroES. Mol Syst Biol 2006, 2: 36. 10.1038/msb4100075PubMed CentralView ArticlePubMedGoogle Scholar
- Vries JK, Munshi R, Tobi D, Klein-Seetharaman J, Benos PV, Bahar I: A sequence alignment-independent method for protein classification. Appl Bioinformatics 2004, 3: 137–148. 10.2165/00822942-200403020-00008View ArticlePubMedGoogle Scholar
- Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS: The Universal Protein Resource (UniProt). Nucleic Acids Res 2005, 33: D154-D159. 10.1093/nar/gki070PubMed CentralView ArticlePubMedGoogle Scholar
- Herzberg O, Moult J: Analysis of the steric strain in the polypeptide backbone of protein molecules. Proteins 1991, 11: 223–229. 10.1002/prot.340110307View ArticlePubMedGoogle Scholar
- Petock JM, Torshin IY, Weber IT, Harrison RW: Analysis of protein structures reveals regions of rare backbone conformation at functional sites. Proteins 2003, 53: 872–879. 10.1002/prot.10484View ArticlePubMedGoogle Scholar
- Novotny M, Kleywegt GJ: A survey of left-handed helices in protein structures. J Mol Biol 2005, 347: 231–241. 10.1016/j.jmb.2005.01.037View ArticlePubMedGoogle Scholar
- Gutteridge A, Thornton J: Conformational changes observed in enzyme crystal structures upon substrate binding. J Mol Biol 2005, 346: 21–28. 10.1016/j.jmb.2004.11.013View ArticlePubMedGoogle Scholar
- Bartlett GJ, Porter CT, Borkakoti N, Thornton JM: Analysis of catalytic residues in enzyme active sites. J Mol Biol 2002, 324: 105–121. 10.1016/S0022-2836(02)01036-7View ArticlePubMedGoogle Scholar
- Brown MT, Cooper JA: Regulation, substrates and functions of src. Biochim Biophys Acta 1996, 1287: 121–149.PubMedGoogle Scholar
- Roskoski R Jr.: Src protein-tyrosine kinase structure and regulation. Biochem Biophys Res Commun 2004, 324: 1155–1164. 10.1016/j.bbrc.2004.09.171View ArticlePubMedGoogle Scholar
- Lerner EC, Smithgall TE: SH3-dependent stimulation of Src-family kinase autophosphorylation without tail release from the SH2 domain in vivo. Nat Struct Biol 2002, 9: 365–369.PubMedGoogle Scholar
- Feng S, Kasahara C, Rickles RJ, Schreiber SL: Specific interactions outside the proline-rich core of two classes of Src homology 3 ligands. Proc Natl Acad Sci U S A 1995, 92: 12408–12415. 10.1073/pnas.92.26.12408PubMed CentralView ArticlePubMedGoogle Scholar
- Okishio N, Tanaka T, Fukuda R, Nagai M: Differential ligand recognition by the Src and phosphatidylinositol 3-kinase Src homology 3 domains: circular dichroism and ultraviolet resonance Raman studies. Biochemistry 2003, 42: 208–216. 10.1021/bi020475yView ArticlePubMedGoogle Scholar
- Young MA, Gonfloni S, Superti-Furga G, Roux B, Kuriyan J: Dynamic coupling between the SH2 and SH3 domains of c-Src and Hck underlies their inactivation by C-terminal tyrosine phosphorylation. Cell 2001, 105: 115–126. 10.1016/S0092-8674(01)00301-4View ArticlePubMedGoogle Scholar
- Ozkirimli E, Post CB: Src kinase activation: A switched electrostatic network. Protein Sci 2006, 15: 1051–1062. 10.1110/ps.051999206PubMed CentralView ArticlePubMedGoogle Scholar
- Hulo N, Bairoch A, Bulliard V, Cerutti L, de CE, Langendijk-Genevaux PS, Pagni M, Sigrist CJ: The PROSITE database. Nucleic Acids Res 2006, 34: D227-D230. 10.1093/nar/gkj063PubMed CentralView ArticlePubMedGoogle Scholar
- Changeux JP, Edelstein SJ: Allosteric mechanisms of signal transduction. Science 2005, 308: 1424–1428. 10.1126/science.1108595View ArticlePubMedGoogle Scholar
- Mills FC, Johnson ML, Ackers GK: Oxygenation-linked subunit interactions in human hemoglobin: experimental studies on the concentration dependence of oxygenation curves. Biochemistry 1976, 15: 5350–5362. 10.1021/bi00669a023View ArticlePubMedGoogle Scholar
- Marinucci M, Mavilio F, Massa A, Gabbianelli M, Fontanarosa PP, Camagna A, Ignesti C, Tentori L: A new abnormal human hemoglobin: Hb Prato (alpha 2 31 (B12) Arg leads to Ser beta 2). Biochim Biophys Acta 1979, 578: 534–540.View ArticlePubMedGoogle Scholar
- Kavanaugh JS, Rogers PH, Arnone A, Hui HL, Wierzba A, DeYoung A, Kwiatkowski LD, Noble RW, Juszczak LJ, Peterson ES, Friedman JM: Intersubunit interactions associated with Tyr42 alpha stabilize the quaternary-T tetramer but are not major quaternary constraints in deoxyhemoglobin. Biochemistry 2005, 44: 3806–3820. 10.1021/bi0484670View ArticlePubMedGoogle Scholar
- Barrick D, Ho NT, Simplaceanu V, Dahlquist FW, Ho C: A test of the role of the proximal histidines in the Perutz model for cooperativity in haemoglobin. Nat Struct Biol 1997, 4: 78–83. 10.1038/nsb0197-78View ArticlePubMedGoogle Scholar
- Bonaventura C, Bonaventura J: Anionic control of hemoglobin function. In Biochemical and Clinical Aspects of Hemoglobin Abnormalities. Edited by: Caughey WS. New York: Academic; 1978:647–663.Google Scholar
- Perutz MF, Wilkinson AJ, Paoli M, Dodson GG: The stereochemical mechanism of the cooperative effects in hemoglobin revisited. Annu Rev Biophys Biomol Struct 1998, 27: 1–34. 10.1146/annurev.biophys.27.1.1View ArticlePubMedGoogle Scholar
- Elion J, Wajcman H, Belkhodja-Dunda O, Lapoumeroulie C, Labie D, Messerschmitt J, Staal AM, Desablens B: [Hemoglobin J Amiens beta 17 (A 14) Lys replaced by Asn. Coincidence of a functionally silent new abnormal hemoglobin and a polycythemia vera (author's transl)]. Nouv Rev Fr Hematol 1979, 21: 347–352.PubMedGoogle Scholar
- Kiger L, Klinger AL, Kwiatkowski LD, De YA, Doyle ML, Holt JM, Noble RW, Ackers GK: Thermodynamic studies on the equilibrium properties of a series of recombinant betaW37 hemoglobin mutants. Biochemistry 1998, 37: 4336–4345. 10.1021/bi970868aView ArticlePubMedGoogle Scholar
- Kavanaugh JS, Weydert JA, Rogers PH, Arnone A, Hui HL, Wierzba AM, Kwiatkowski LD, Paily P, Noble RW, Bruno S, Mozzarelli A: Site-directed mutations of human hemoglobin at residue 35beta: a residue at the intersection of the alpha1beta1, alpha1beta2, and alpha1alpha2 interfaces. Protein Sci 2001, 10: 1847–1855. 10.1110/ps.16401PubMed CentralView ArticlePubMedGoogle Scholar
- Richard V, Dodson GG, Mauguen Y: Human deoxyhaemoglobin-2,3-diphosphoglycerate complex low-salt structure at 2.5 A resolution. J Mol Biol 1993, 233: 270–274. 10.1006/jmbi.1993.1505View ArticlePubMedGoogle Scholar
- Aime S, Digilio G, Fasano M, Paoletti S, Arnelli A, Ascenzi P: Metal complexes as allosteric effectors of human hemoglobin: an NMR study of the interaction of the gadolinium(III) bis(m-boroxyphenylamide)diethylenetriaminepentaacetic acid complex with human oxygenated and deoxygenated hemoglobin. Biophys J 1999, 76: 2735–2743.PubMed CentralView ArticlePubMedGoogle Scholar
- Xu C, Tobi D, Bahar I: Allosteric changes in protein structure computed by a simple mechanical model: hemoglobin T<-->R2 transition. J Mol Biol 2003, 333: 153–168. 10.1016/j.jmb.2003.08.027View ArticlePubMedGoogle Scholar
- Baldwin J, Chothia C: Haemoglobin: the structural changes related to ligand binding and its allosteric mechanism. J Mol Biol 1979, 129: 175–220. 10.1016/0022-2836(79)90277-8View ArticlePubMedGoogle Scholar
- Brick P, Bhat TN, Blow DM: Structure of tyrosyl-tRNA synthetase refined at 2.3 A resolution. Interaction of the enzyme with the tyrosyl adenylate intermediate. J Mol Biol 1989, 208: 83–98. 10.1016/0022-2836(89)90090-9View ArticlePubMedGoogle Scholar
- Fersht AR, Shi JP, Wilkinson AJ, Blow DM, Carter P, Waye MMY, Winter GP: Analysis of Enzyme Structure and activity by protein engineering. Angewandte Chemie 1984, 23: 235–238.View ArticleGoogle Scholar
- First EA, Fersht AR: Analysis of the role of the KMSKS loop in the catalytic mechanism of the tyrosyl-tRNA synthetase using multimutant cycles. Biochemistry 1995, 34: 5030–5043. 10.1021/bi00015a014View ArticlePubMedGoogle Scholar
- Shannon CE: The mathematical theory of communication. The Bell system Technical Journal 1948, 27: 379–423 and 623–656.View ArticleGoogle Scholar
- Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, bu-Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi K, Deng Z, Di F V, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins ME, Ji RR, Ke Z, Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV, Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg S, Shao W, Shue B, Sun J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R, Xiao C, Yan C, Yao A, Ye J, Zhan M, Zhang W, Zhang H, Zhao Q, Zheng L, Zhong F, Zhong W, Zhu S, Zhao S, Gilbert D, Baumhueter S, Spier G, Carter C, Cravchik A, Woodage T, Ali F, An H, Awe A, Baldwin D, Baden H, Barnstead M, Barrow I, Beeson K, Busam D, Carver A, Center A, Cheng ML, Curry L, Danaher S, Davenport L, Desilets R, Dietz S, Dodson K, Doup L, Ferriera S, Garg N, Gluecksmann A, Hart B, Haynes J, Haynes C, Heiner C, Hladun S, Hostin D, Houck J, Howland T, Ibegwam C, Johnson J, Kalush F, Kline L, Koduru S, Love A, Mann F, May D, McCawley S, McIntosh T, McMullen I, Moy M, Moy L, Murphy B, Nelson K, Pfannkoch C, Pratts E, Puri V, Qureshi H, Reardon M, Rodriguez R, Rogers YH, Romblad D, Ruhfel B, Scott R, Sitter C, Smallwood M, Stewart E, Strong R, Suh E, Thomas R, Tint NN, Tse S, Vech C, Wang G, Wetter J, Williams S, Williams M, Windsor S, Winn-Deen E, Wolfe K, Zaveri J, Zaveri K, Abril JF, Guigo R, Campbell MJ, Sjolander KV, Karlak B, Kejariwal A, Mi H, Lazareva B, Hatton T, Narechania A, Diemer K, Muruganujan A, Guo N, Sato S, Bafna V, Istrail S, Lippert R, Schwartz R, Walenz B, Yooseph S, Allen D, Basu A, Baxendale J, Blick L, Caminha M, Carnes-Stine J, Caulk P, Chiang YH, Coyne M, Dahlke C, Mays A, Dombroski M, Donnelly M, Ely D, Esparham S, Fosler C, Gire H, Glanowski S, Glasser K, Glodek A, Gorokhov M, Graham K, Gropman B, Harris M, Heil J, Henderson S, Hoover J, Jennings D, Jordan C, Jordan J, Kasha J, Kagan L, Kraft C, Levitsky A, Lewis M, Liu X, Lopez J, Ma D, Majoros W, McDaniel J, Murphy S, Newman M, Nguyen T, Nguyen N, Nodell M: The sequence of the human genome. Science 2001, 291: 1304–1351. 10.1126/science.1058040View ArticlePubMedGoogle Scholar
- Brenner SE: A tour of structural genomics. Nat Rev Genet 2001, 2: 801–809. 10.1038/35093574View ArticlePubMedGoogle Scholar
- Ganapathiraju G, Weisser D, Klein-Seetharaman J, Rosenfeld R, Carbonell J, Reddy R: Comparative n-gram analysis of whole-genome sequences: ; San Diego. 2002.View ArticleGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389–3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
- Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 1997, 25: 4876–4882. 10.1093/nar/25.24.4876PubMed CentralView ArticlePubMedGoogle Scholar
- Hubbard SJ, Thornton JM: NACCESS, Computer Program,. Department of Biochemistry and Molecular Biology, University College, London 1993.Google Scholar
- Xu W, Doshi A, Lei M, Eck MJ, Harrison SC: Crystal structures of c-Src reveal features of its autoinhibitory mechanism. Mol Cell 1999, 3: 629–638. 10.1016/S1097-2765(00)80356-1View ArticlePubMedGoogle Scholar
- DeLano WL: DeLano, W.L. The PyMOL Molecular Graphics System (2002) DeLano Scientific, Palo Alto, CA, USA. 2005.Google Scholar
- Tame JR, Vallone B: The structures of deoxy human haemoglobin and the mutant Hb Tyralpha42His at 120 K. Acta Crystallogr D Biol Crystallogr 2000, 56: 805–811. 10.1107/S0907444900006387View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.