ProfileGrids as a new visual representation of large multiple sequence alignments: a case study of the RecA protein family
© Roca et al; licensee BioMed Central Ltd. 2008
Received: 30 July 2008
Accepted: 22 December 2008
Published: 22 December 2008
Multiple sequence alignments are a fundamental tool for the comparative analysis of proteins and nucleic acids. However, large data sets are no longer manageable for visualization and investigation using the traditional stacked sequence alignment representation.
We introduce ProfileGrids that represent a multiple sequence alignment as a matrix color-coded according to the residue frequency occurring at each column position. JProfileGrid is a Java application for computing and analyzing ProfileGrids. A dynamic interaction with the alignment information is achieved by changing the ProfileGrid color scheme, by extracting sequence subsets at selected residues of interest, and by relating alignment information to residue physical properties. Conserved family motifs can be identified by the overlay of similarity plot calculations on a ProfileGrid. Figures suitable for publication can be generated from the saved spreadsheet output of the colored matrices as well as by the export of conservation information for use in the PyMOL molecular visualization program.
We demonstrate the utility of ProfileGrids on 300 bacterial homologs of the RecA family – a universally conserved protein involved in DNA recombination and repair. Careful attention was paid to curating the collected RecA sequences since ProfileGrids allow the easy identification of rare residues in an alignment. We relate the RecA alignment sequence conservation to the following three topics: the recently identified DNA binding residues, the unexplored MAW motif, and a unique Bacillus subtilis RecA homolog sequence feature.
ProfileGrids allow large protein families to be visualized more effectively than the traditional stacked sequence alignment form. This new graphical representation facilitates the determination of the sequence conservation at residue positions of interest, enables the examination of structural patterns by using residue physical properties, and permits the display of rare sequence features within the context of an entire alignment. JProfileGrid is free for non-commercial use and is available from http://www.profilegrid.org. Furthermore, we present a curated RecA protein collection that is more diverse than previous data sets; and, therefore, this RecA ProfileGrid is a rich source of information for nanoanatomy analysis.
Comparative nanoanatomy and phylogenetic studies of macromolecules depend upon multiple sequence alignments (MSAs). However, the traditional stacked sequence representation of an alignment proves cumbersome for large numbers of homologs as is prevalent with the proliferation of genome sequences. Early MSA formatting programs facilitated analysis by emphasizing residues with boxes, colors, and shading [1–3]. However, these programs (and many subsequent different implementations) still represent a MSA as stacked sequences. Regular expressions, major components , and sequence logos  are solutions to compress the sequence alignment information of motifs into a consensus format as reviewed in 2005 . In addition, a graphical view of MSA conservation can be achieved with an "overview" mode [7, 8] or with plots of similarity values . However, all of these representations do not convey the details of each character's frequency distribution at each homologous position in the entire alignment. Thus, potentially valuable information for the interpretation of macromolecular structure and function is lost. Clearly there is a need for a new visual representation paradigm for MSAs.
Here we introduce the JProfileGrid Java software for generating ProfileGrids – a new graphical, tabular representation of alignments. Historically, profiles scored by a distance matrix were used for database searches , although simple frequency profiles have been used to tabulate the amino acid content of linear motifs . By contrast, ProfileGrids are color-coded tables of the residue frequency occurring at every homologous position across the entire length of an MSA. Therefore, all MSA information is represented especially at variable regions and of rare residues that may yield clues about function. Similar to ColorGrids , the frequency determines color shading; but, ProfileGrids are specific for MSAs. In particular, our JProfileGrid software enables a dynamic visualization of structural patterns by analyzing protein alignments with respect to amino acid physical properties. Notably, JProfileGrid provides a unique method for generating publishable figures of the entire sequence content of an alignment with many homologs. A ProfileGrid facilitates the inspection of large MSAs and, thus, solves the problem of text legibility of traditional MSAs . Below we describe the features of the JProfileGrid software and demonstrate a ProfileGrid's usefulness by examining the bacterial RecA protein family that we introduce next.
The RecA protein is the premier genomic sentinel of Escherichia coli because of its crucial protective roles in both recombinational DNA repair  and the SOS response . RecA homologs are present in all domains of life [16, 17] and well distributed among bacteria [18–21]. As the vanguard of bacterial RecA homologs, the E. coli RecA protein (352 residues; [GenBank:AAC75741.1]) has been intensively studied starting with its discovery  and the subsequent sequencing of its gene [23, 24]. Later, many RecA sequences became available as microbiologists cloned recA genes from different culturable bacteria to construct knockout derivatives . Furthermore, the ubiquity of the RecA homolog made it a common marker for phylogenetic studies  using the most conserved parts of the RecA protein – the adjacent MAW and P-loop motifs. The precise function of the former is unknown , while the latter motif is the well-characterized ATP-binding site .
RecA MSAs have been analyzed from a structural perspective to understand RecA function [17, 27]. For example, molecular genetics approaches have generated over 1400 E. coli RecA missense mutations ; and, the phenotypes are discussed within the context of the sequence conservation occurring at the mutation location. Furthermore, conserved residues often have functional roles such as ligand binding so such positions are targets for inspection when studying protein structure. The recent determination of a RecA-DNA cocrystal structure  with the first clear identification of a DNA binding site provides a new motivation for RecA MSA information.
As the number of RecA homologs has increased, however, the visualization and analysis of a MSA becomes unwieldy using the traditional stacked sequence representation. In fact, the last complete RecA MSAs available as published figures comes from the mid-1990's when there were only about 60 homologs [17, 19, 30]. More recently, no MSA figures were included in the data sets of 144  and 113  RecA homologs. Since there are more RecA sequences available now, this family makes an excellent case study for showing how ProfileGrids succinctly display the information content of a large MSA. The present work describes a curated data set of 300 RecA protein sequences from a larger diversity of bacterial species than of previously reported alignments. The breadth of this sequence collection creates a robust description of the conserved sequence motifs of the RecA protein family and, therefore, may, shed light on unexplored regions of this protein such as the aforementioned MAW motif.
JProfileGrid is a Java program that combines the tasks of examining amino acid frequencies across an entire MSA, identifying conserved motif regions, and comparing species-specific residues against a sequence family. Both a command-line and a graphical user interface are available with the latter allowing interactive ProfileGrid analysis. The program accepts protein and nucleic acid MSAs in either MSF or FASTA formats. The former is preferred because of the inclusion of sequence weight values in the MSF file header. The similarity plot calculations are based on the plotcon algorithm  with a modification that the values are normalized between 0 and 1. The program saves matrix output as a spreadsheet file using the JExcel API . The color formatted ProfileGrid and the similarity values are stored in separate worksheets. A third worksheet identifies outlier characters (such as "X") in the MSA that the program flags for verification. JProfileGrid can also write PyMOL scripts  that identify the conserved regions of the MSA on a protein structure.
Sequence data set
RecA protein sequences were collected from the following databases: the National Center for Biotechnology Information GenBank database , The Institute for Genomic Research Comprehensive Microbial Resource , the DNA Data Bank of Japan , the European Molecular Biology Laboratory Sequence Database , and UniProt . Keyword searches were used at the aforementioned database websites especially for annotated genomes where RecA orthologs had already been identified. In addition, sequence similarity searches were performed using the E. coli RecA homolog as the query sequence in BLASTp and TBLASTN searches  with default parameters. After manually verifying the presence of conserved RecA family motifs, we added the protein sequences from the keyword search results and significant BLAST search hits (E-value <10-70) to our previous collection of validated bacterial RecA orthologs . Since we focused on fully sequenced homologs from known bacterial species, no explicit attempt was made to collect RecA homologs from environmental sequencing projects such as from the Sargasso Sea collection . In a previous analysis of 64 RecA homologs, 12 sequences were found to contain errors [17, 40, 41]. Although some of those have not yet been updated in GenBank, we used the corrected versions in all cases. Finally, we limited the RecA data set to unique sequences for each bacterial species. Specifically, we eliminated redundant sequences from duplicate sequencing efforts (genome versus individual gene projects) and from strains of the same bacterial species (E. coli CFT073 versus K12). While these sequences do not appear in our RecA MSA and ProfileGrid, the redundant sequences serve to verify any rare residue observations that could be the result of errors. This underscores the curation that was performed of the individual sequences as described in more detail below.
The multiple sequence alignments were calculated using the DNASTAR MegAlign program  that implements the ClustalW algorithm . Default parameters were used except that the gap penalty was increased to 30 to minimize the introduction of gaps. The resulting alignment was manually curated by visual inspection to optimize the position of small gaps. Weight values were assigned to each protein sequence using the ClustalX program  to remove any bias from similar sequences potentially overrepresented in the alignment. The MegAlign program was also used to identify alignment positions that were either invariant or chemically similar (Additional file 1) according to previously described amino acid classes .
In the genomic era, database web interfaces make it easy for the novice user to find and align many RecA sequences. However the quality of the sequence data sets and their subsequent alignment can not be taken for granted. Instead it is imperative that bioinformatic data be curated to enable researchers to be confident of the conclusions that they draw . This can be particularly important in the conserved motifs of a protein sequence alignment. Below, we belabor this point as a caution about the interpretation of rare residues in MSAs.
Inspection of the MSA (Additional file 1) and ProfileGrid (Additional file 2) show that the family motifs are very well conserved among the 300 RecA homologs. However, there are exceptions where residues occur which do not follow the consensus patterns for the motifs. These rare residues are readily visible in ProfileGrid representations. Such rare amino acids may be interesting exceptions or just noise in the bioinformatic data. We paid particular attention to the MAW and P-loop motifs that are the most conserved parts of the RecA family. For example, a single serine is observed in the MAW motif at E. coli position 52 where 298 other RecA sequences have glycine at that position (Additional file 2). This is not considered a conservative substitution. By contrast, a single serine in the P-loop at position 73 could be a conservative substitution when compared to the 299 other threonine residues. Structure and function inferences drawn from exceptions to conserved motifs would be a waste of effort if such exceptions were based upon faulty data. We also note that phylogenetic analyses are greatly affected by sequence errors .
Problems in sequence data sets can result from experimental artifacts or data handling mistakes. These issues are diminishing in the genomic era, but anomalies still occur. As mentioned above, we have identified errors in recA gene sequences determined using traditional gel techniques . More importantly, genome projects are introducing a new problem where the complete determination of an organism's DNA content yields sequences that may not be true chromosomal RecA orthologs. For example, the Salmonella enterica genome project  uncovered both plasmid encoded [GenBank:CAD09875.1] and chromosome encoded [GenBank:CAD05935.1] RecA proteins. Only the latter was included in the work presented here. In addition, JProfileGrid will flag outliers of one letter characters that do not represent the common amino acids or gap codes. For example, in the RecA protein alignment reported here, we unexpectedly identified "X" characters in two sequences [GenBank:CAD79373.1, GenBank:AAN06665.1].
Significantly, this point about data curation is not just a hypothetical cautionary comment. Attention  was drawn to the observation of a rare tyrosine residue in the Proteus vulgaris RecA protein  where the vast majority of RecA homologs have serine at E. coli position 70 (Additional file 2). However the discrepancy was resolved  when it was determined that the tyrosine observation was actually a simple typographical error in the publication figure. Compounding this problem, though, was a data handling error of the P. vulgaris [GenBank:CAB56804.1] and Pectobacterium carotovorum (formerly Erwinia carotovora) [GenBank:CAB56783.1] RecA protein sequences both determined by the same group . The sequence database records for these homologs were apparently mixed together such that the sequences do not agree with the protein sequences reported in the reference publication. The corrected sequences are used in this work. Thus, we encourage users of ProfileGrids to be cautious of overinterpreting rare residues identified in motifs. Currently, the accurate biocuration of sequence and alignment data sets can only be achieved by slow, tedious, manual efforts by protein family experts .
Results and Discussion
The parameter settings window (Figure 1) allows the user to change the template sequence, the position ruler numbering, the majority consensus sequence threshold cutoff (default 70%), and the residue sort order. By default, the template is the first sequence of the alignment; and, the amino acids are alphabetized by the one-letter code to facilitate looking up a residue of interest. JProfileGrid provides a menu of the following amino acid physical constants for analysis: age , flexibility , frequency among E. coli proteins , hydropathy , hydrophobicity , helix propensity , mutability [57, 58], surface area , and volume . Many more constants are available for those coding their own ProfileGrid implementations . The "Frequency Colors" button opens a window listing the 6 default frequency color bins (Figure 3). A ProfileGrid cell is colored by the following bin that has the largest threshold value greater than or equal to a cell's residue frequency: <10% (white), ≥ 10% (gray), ≥ 25% (yellow), ≥ 50% (orange), ≥ 70% (green), and ≥ 90% (red). This color scheme was chosen to maximize the visual differences between bins for the inspection of ProfileGrids for patterns (see below). By contrast, a color ramp (i.e., shades of one color) would not facilitate such analysis. However, the user is able to define their own frequency color scheme by choosing the number, size, and color of the bins. To assist the inspection of ProfileGrids, the frequency values can be hidden. This same menu allows the values to be reported as a percentage.
RecA family data set
Bacterial RecA Homologs
The data sets from the mid-1990's [17, 19, 30] were biased toward RecA homologs from the Proteobacteria phyla (60% of sequences). In the current work, the purple bacteria represent only 44% of the sequences (Table 1). Furthermore, we now include homologs from several newly sequenced bacterial phyla including the Chlororflexi and the Fusobacteria. The diversity of the current data set permits a robust description of motifs of the RecA protein family. Additional file 1 shows a summary of the information from the RecA MSA.
RecA family ProfileGrid applications
An alignment of 300 bacterial RecA homologs is graphically represented by a ProfileGrid (Figure 7). This visualization gives a succinct overview of MSA information especially when the frequency values are hidden to reduce clutter. The details of the residue frequency for all columns of the RecA MSA are found in Additional file 2. We used the sequence conservation denoted by the similarity boxes to define RecA motifs to serve as a nomenclature across the full length of the RecA protein family (see Additional files 1 and 2). The labeling (and subsequent analysis) of every part of the RecA protein is a fundamental technique adapted from traditional anatomy  and applied to macromolecules, i.e., nanoanatomy.
The detailed RecA ProfileGrid information will allow researchers to examine conservation at RecA positions of interest. For example, a new suppressor mutation was recently  reported that ameliorates the effects of an impaired [KR]x[KR] motif . The suppressor maps to E. coli RecA position 11 and is a change from alanine to valine which is a residue that is not observed among any of the 300 sequences in the MSA (Figure 2, Additional file 2). Since the current sequence data set is larger and more diverse than previous RecA homolog collections, one can have more confidence in the lack of an observed residue change.
Conservation of DNA binding residues
Ala 14%, Gln 12%
Glu 32%, Asp 16%
Gly 36%, Asp 29%
ProfileGrid structural pattern analysis of the MAW motif
When combined with different amino acid properties , ProfileGrids are a useful tool for visualizing structural patterns across the interspecies diversity of a protein family. We illustrate this on two adjacent motifs (MAW and P-loop) that comprise the most conserved part of RecA homologs of bacteria, eukaryotes, and archaea . Of the two, only the function of the P-loop (the cofactor binding site) has been determined . By contrast, little  is known about the MAW motif (residues 40–65). From the RecA crystal structures, the MAW motif (or "motif 1a"; see Additional file 1 for motif and variable names) consists of a loop, α-helix B, a tight turn, and ends with β-strand 1. This glycine-rich motif threads through the RecA hydrophobic core and interacts with motifs (1b, 4a, and 5b) that form part of the ATP binding site; but, the MAW region itself has not been shown to contact the cofactor ligand. The MAW motif also connects the P-loop to a hinge (variable 1) that undergoes a dramatic change in the transition from the inactive to active RecA conformation . We note that aside from the protein termini, this hinge region is one of the least conserved parts of the RecA protein (Figure 6, Additional files 1 and 2).
When considering distant RecA homologs from all domains of life, the MAW motif is better conserved than the recently defined DNA interacting residues (Table 2). It is curious, then, that no clear function has been attributed to the MAW motif so here we speculate on possible roles. Universally conserved residues can be involved in ligand interactions or in protein folding [72–74]. While a ligand interacting role is a formal possibility for the MAW motif, this region of the protein forms part of the RecA hydrophobic core. However, one or more residues in the segment spanning positions 61–72 can be crosslinked to bound single-stranded DNA . This suggests that parts of the MAW motif may not remain buried in the protein core at all times and that the motif may be involved in DNA binding. With respect to a protein folding role, the RecA ProfileGrid shows a high prevalence of isoleucine, leucine, and valine residues among bacterial RecA MAW motifs (Additional file 2). Specifically, two conserved leucines are on the same face of helix B (positions 47 and 51). Two properties of leucine may be relevant to this observation. First, in a study of crystal structures, leucine was found to have the largest amount of sidechain flexibility when buried . Second, leucine is known to stabilize helices  which agrees with a theoretical study of RecA family helices. The residues from 44 to 51 of helix B have a near optimal sequence for thermostability when compared to other central domain helices . Also, mutation of position 51 from leucine to phenylalanine results in a RecA mutant that is inactive for activities both in vivo and in vitro [78, 79]. Thus, a role for the MAW motif may be to initiate protein folding or to stabilize the RecA protein core mediated by the motif structural features described above. Perhaps such a protein folding role is significant for a motif that connects an ATP binding site to the hinge region that undergoes conformational changes upon cofactor binding.
Highlighting unique B. subtilis RecA residues
ProfileGrids serve as a new visual representation of large sequence alignments where the entire information content is presented in a concise form. The JProfileGrid Java software facilitates the creation and analysis of this alignment depiction. With the advent of sequence databases and software programs adopting MSA viewers, the traditional stacked sequence presentation is burdensome for large alignments especially for the interactive analysis of structural patterns and rare features. Thus, we anticipate that the ProfileGrid paradigm will have widespread application in bioinformatics. Finally, we describe and analyze a curated RecA protein data set whose representation as a ProfileGrid will serve as a valuable resource for researchers studying this ubiquitous protein.
Availability and requirements
Project name: JProfileGrid version 1.1.1
Project home page: http://www.profilegrid.org
Operating systems: Platform independent
Programming language: Java 1.5 or higher
License: University of California license; see http://www.profilegrid.org/downloads.shtml#license
Any restrictions to use by non-academics: license required for commercial use
Multiple Sequence Alignment
We thank Marcin Joachimiak (LBNL), Markus Kaufman (UCLA; CPS), Juan Alonso (CNB, Spain) and Michael Cox (UW-Madison) for insightful discussions. AIR was supported by a University of California President's Postdoctoral Fellowship, the Erasmo Foundation (grant TSC13702), and a National Institutes of Health Diversity Supplement (parent grant GM058868 to Alexander McPherson). AEA was supported by NIH MBRS grant GM55246 awarded to the UC-Irvine Minority Science Undergraduate Program. ACA was supported by the UC-Irvine Undergraduate Research Opportunities Program.
- Devereux J, Haeberli PH, Smithies OS: A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res 1984, 12(1):387–395.PubMed CentralView ArticlePubMed
- Parry-Smith DJ, Attwood TK: SOMAP: a novel interactive approach to multiple protein sequences alignment. Comput Appl Biosci 1991, 7(2):233–235.PubMed
- Barton GJ: ALSCRIPT: a tool to format multiple sequence alignments. Protein Eng 1993, 6(1):37–40.View ArticlePubMed
- Smith DK, Xue H: A major component approach to presenting consensus sequences. Bioinformatics 1998, 14(2):151–156.View ArticlePubMed
- Schneider TD, Stephens RM: Sequence logos: a new way to display consensus sequences. Nucleic Acids Res 1990, 18(20):6097–6100.PubMed CentralView ArticlePubMed
- Puntervoll P, Aasland R: Nomenclature for protein modules and their cognate motifs. In Modular Protein Domains. Edited by: Cesareni G, Gimona M, Sudol M, Yaffe M. Weinheim, Germany: Wiley-VCH; 2005:477–486.View Article
- Parry-Smith DJ, Payne AW, Michie AD, Attwood TK: CINEMA – a novel colour INteractive editor for multiple alignments. Gene 1998, 221(1):GC57–63.View ArticlePubMed
- Clamp M, Cuff J, Searle SM, Barton GJ: The Jalview Java alignment editor. Bioinformatics 2004, 20(3):426–427.View ArticlePubMed
- Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 2000, 16(6):276–277.View ArticlePubMed
- Gribskov M, McLachlan AD, Eisenberg D: Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci USA 1987, 84: 4355–4358.PubMed CentralView ArticlePubMed
- Pellegrini L, Yu DS, Lo T, Anand S, Lee M, Blundell TL, Venkitaraman AR: Insights into DNA recombination from the structure of a RAD51-BRCA2 complex. Nature 2002, 420(6913):287–293.View ArticlePubMed
- Joachimiak MP, Weisman JL, May BCH: JColorGrid: software for the visualization of biological measurements. BMC Bioinformatics 2006, 7: 225.PubMed CentralView ArticlePubMed
- BMC Author instructions: Sequence alignments[http://www.biomedcentral.com/info/ifora/figuretypes#sequence]
- Cox MM: Recombinational DNA repair in bacteria and the RecA protein. Prog Nucleic Acid Res Mol Biol 1999, 63: 311–366.View ArticlePubMed
- Friedberg EC, Walker GC, Siede W: SOS responses and DNA damage tolerance in prokaryotes. In DNA Repair and Mutagenesis. Washington, D.C.: ASM Press; 1995:407–464.
- Brendel V, Brocchieri L, Sandler SJ, Clark AJ, Karlin S: Evolutionary comparisons of RecA-like proteins across all major kingdoms of living organisms. J Mol Evol 1997, 44(5):528–541.View ArticlePubMed
- Roca AI, Cox MM: RecA protein: structure, function, and role in recombinational DNA repair. Prog Nucleic Acid Res Mol Biol 1997, 56: 129–223.View ArticlePubMed
- Lloyd AT, Sharp PM: Evolution of the recA gene and the molecular phylogeny of bacteria. J Mol Evol 1993, 37(4):399–407.View ArticlePubMed
- Eisen JA: The RecA protein as a model molecule for molecular systematic studies of bacteria: comparison of trees of RecAs and 16S rRNAs from the same species. J Mol Evol 1995, 41: 1105–1123.PubMed CentralView ArticlePubMed
- Santos SR, Ochman H: Identification and phylogenetic sorting of bacterial lineages with universally conserved genes and proteins. Environ Microbiol 2004, 6(7):754–759.View ArticlePubMed
- Rocha EP, Cornet E, Michel B: Comparative and evolutionary analysis of the bacterial homologous recombination systems. PLoS Genet 2005, 1(2):e15.PubMed CentralView ArticlePubMed
- Clark AJ, Margulies AD: Isolation and characterization of recombination-deficient mutants of Escherichia coli K-12. Proc Natl Acad Sci USA 1965, 53: 451–459.PubMed CentralView ArticlePubMed
- Sancar A, Stachelek C, Konigsberg W, Rupp WD: Sequences of the recA gene and protein. Proc Natl Acad Sci USA 1980, 77: 2611–2615.PubMed CentralView ArticlePubMed
- Horii T, Ogawa T, Ogawa H: Organization of the recA gene of Escherichia coli . Proc Natl Acad Sci USA 1980, 77(1):313–317.PubMed CentralView ArticlePubMed
- Miller RV, Kokjohn TA: General microbiology of recA : environmental and evolutionary significance. Annu Rev Microbiol 1990, 44: 365–394.View ArticlePubMed
- Saraste M, Sibbald PR, Wittinghofer A: The P-loop: a common motif in ATP- and GTP-binding proteins. Trends Biochem Sci 1990, 15: 430–434.View ArticlePubMed
- Leipe DD, Aravind L, Grishin NV, Koonin EV: The bacterial replicative heliase DnaB evolved from a RecA duplication. Genome Res 2000, 10: 5–16.PubMed
- McGrew DA, Knight KL: Molecular design and functional organization of the RecA protein. Crit Rev Biochem Mol Biol 2003, 38(5):385–432.View ArticlePubMed
- Chen Z, Yang H, Pavletich NP: Mechanism of homologous recombination from the RecA-ssDNA/dsDNA structures. Nature 2008, 453(7194):489–484.View ArticlePubMed
- Karlin S, Weinstock GM, Brendel V: Bacterial classifications derived from RecA protein sequence comparisons. J Bacteriol 1995, 177(23):6881–6893.PubMed CentralPubMed
- The PyMOL Molecular Graphics System[http://pymol.sourceforge.net]
- Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Church DM, DiCuccio M, Edgar R, Federhen S, Helmberg W, Kenton DL, Khovayko O, Lipman DJ, Madden TL, Maglott DR, Ostell J, Pontius JU, Pruitt KD, Schuler GD, Schriml LM, Sequeira E, Sherry ST, Sirotkin K, Starchenko G, Suzek TO, Tatusov R, Tatusova TA, Wagner L, Yaschenko E: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2005, 33: D39–45.PubMed CentralView ArticlePubMed
- Peterson JD, Umayam LA, Dickinson T, Hickey EK, White O: The Comprehensive Microbial Resource. Nucleic Acids Res 2001, 29: 123–125.PubMed CentralView ArticlePubMed
- Tateno Y, Saitou N, Okubo K, Sugawara H, Gojobori T: DDBJ in collaboration with mass-sequencing teams on annotation. Nucleic Acids Res 2005, 33: D25–28.PubMed CentralView ArticlePubMed
- Kanz C, Aldebert P, Althorpe N, Baker W, Baldwin A, Bates K, Browne P, Broek A, Castro M, Cochrane G, Duggan K, Eberhardt R, Faruque N, Gamble J, Diez FG, Harte N, Kulikova T, Lin Q, Lombard V, Lopez R, Mancuso R, McHale M, Nardone F, Silventoinen V, Sobhany S, Stoehr P, Tuli MA, Tzouvara K, Vaughan R, Wu D, Zhu W, Apweiler R: The EMBL Nucleotide Sequence Database. Nucleic Acids Res 2005, 33: D29–33.PubMed CentralView ArticlePubMed
- Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS: The Universal Protein Resource (UniProt). Nucleic Acids Res 2005, 33: D154–159.PubMed CentralView ArticlePubMed
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215: 403–410.View ArticlePubMed
- Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Pfannkoch C, Rogers YH, Smith HO: Environmental genome shotgun sequencing of the Sargasso Sea. Science 2004, 304(5667):66–74.View ArticlePubMed
- Margraf RL, Roca AI, Cox MM: The deduced Vibrio cholerae RecA amino acid sequence. Gene 1995, 152(1):135–136.View ArticlePubMed
- Roca AI: Initial characterization of mutants in a universally conserved RecA structural motif. PhD thesis. Madison: University of Wisconsin-Madison; 1997.
- Burland TG: DNASTAR's Lasergene sequence analysis software. Methods Mol Biol 2000, 132: 71–91.PubMed
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994, 22(22):4673–4680.PubMed CentralView ArticlePubMed
- Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 1997, 25(24):4876–4882.PubMed CentralView ArticlePubMed
- Pool R, Esnayra J: Bioinformatics: Converting Data to Knowledge: A Workshop Summary. Washington, D.C.: National Academy Press; 2000.
- Clark AG, Whittam TS: Sequencing errors and molecular evolutionary analysis. Mol Biol Evol 1992, 9(4):744–752.PubMed
- Parkhill J, Dougan G, James KD, Thomson NR, Pickard D, Wain J, Churcher C, Mungall KL, Bentley SD, Holden MT, Sebaihia M, Baker S, Basham D, Brooks K, Chillingworth T, Connerton P, Cronin A, Davis P, Davies RM, Dowd L, White N, Farrar J, Feltwell T, Hamlin N, Haque A, Hien TT, Holroyd S, Jagels K, Krogh A, Larsen TS, Leather S, Moule S, O'Gaora P, Parry C, Quail M, Rutherford K, Simmonds M, Skelton J, Stevens K, Whitehead S, Barrell BG: Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18. Nature 2001, 413(6858):848–852.View ArticlePubMed
- Konola JT, Logan KM, Knight KL: Functional characterization of residues in the P-loop motif of the RecA protein ATP binding site. J Mol Biol 1994, 237(1):20–34.View ArticlePubMed
- Zhao XJ, McEntee K: DNA sequence analysis of the recA genes from Proteus vulgaris, Erwinia carotovora, Shigella flexneri and Escherichia coli B/r. Mol Gen Genet 1990, 222(2–3):369–376.View ArticlePubMed
- Bourne PE, McEntyre J: Biocurators: contributors to the world of science. PLoS Comput Biol 2006, 2(10):e142.PubMed CentralView ArticlePubMed
- Trifonov EN: The triplet code from first principles. J Biomol Struct Dyn 2004, 22(1):1–11.View ArticlePubMed
- Zhao S, Goodsell DS, Olson AJ: Analysis of a data set of paired uncomplexed protein structures: new metrics for side-chain flexibility and model evaluation. Proteins 2001, 43(3):271–279.View ArticlePubMed
- Nakamura Y, Gojobori T, Ikemura T: Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic Acids Res 2000, 28(1):292.PubMed CentralView ArticlePubMed
- Kyte J, Doolittle RF: A simple method for displaying the hydropathic character of a protein. J Mol Biol 1982, 157(1):105–132.View ArticlePubMed
- Sweet RM, Eisenberg D: Correlation of sequence hydrophobicities measures similarity in three-dimensional protein structure. J Mol Biol 1983, 171(4):479–488.View ArticlePubMed
- Rohl CA, Chakrabartty A, Baldwin RL: Helix propagation and N-cap propensities of the amino acids measured in alanine-based peptides in 40 volume percent trifluoroethanol. Protein Sci 1996, 5(12):2623–2637.PubMed CentralView ArticlePubMed
- Grantham R: Amino acid difference formula to help explain protein evolution. Science 1974, 185(4154):862–864.View ArticlePubMed
- Schwartz RM, Dayhoff MO: Matrices for detecting distant relationships. In Atlas of Protein Sequence & Structure. Volume 5. Edited by: Dayhoff MO. Washington, D. C.: Natl Biomed Res Found; 1978:353–358.
- Chothia C: The nature of the accessible and buried surfaces in proteins. J Mol Biol 1976, 105(1):1–12.View ArticlePubMed
- Richards FM: The interpretation of protein structures: total volume, group volume distributions and packing density. J Mol Biol 1974, 82(1):1–14.View ArticlePubMed
- Kawashima S, Kanehisa M: AAindex: amino acid index database. Nucleic Acids Res 2000, 28(1):374.PubMed CentralView ArticlePubMed
- Story RM, Weber IT, Steitz TA: The structure of the E. coli RecA protein monomer and polymer. Nature 1992, 355(6358):318–325.View ArticlePubMed
- Saves I, Laneelle MA, Daffe M, Masson JM: Inteins invading mycobacterial RecA proteins. FEBS Lett 2000, 480(2–3):221–225.View ArticlePubMed
- Dullemeijer P: Concepts and Approaches in Animal Morphology. Assen, The Netherlands: Van Gorcum & Comp; 1974.
- Cox JM, Li H, Wood EA, Chitteni-Pattu S, Inman RB, Cox MM: Defective dissociation of a "slow" RecA mutant protein imparts an Escherichia coli growth defect. J Biol Chem 2008, 283(36):24909–24921.PubMed CentralView ArticlePubMed
- Cox JM, Abbott SN, Chitteni-Pattu S, Inman RB, Cox MM: Complementation of one RecA protein point mutation by another. Evidence for trans catalysis of ATP hydrolysis. J Biol Chem 2006, 281(18):12968–12975.View ArticlePubMed
- Hörtnagel K, Voloshin ON, Kinal HH, Ma N, Schaffer-Judge C, Camerini-Otero RD: Saturation mutagenesis of the E. coli RecA loop L2 homologous DNA pairing region reveals residues essential for recombination and recombinational repair. J Mol Biol 1999, 286: 1097–1106.View ArticlePubMed
- Cazaux C, Larminat F, Defais M: Site-directed mutagenesis in the Escherichia coli recA gene. Biochimie 1991, 73(2–3):281–284.View ArticlePubMed
- Wu Y, He Y, Moya IA, Qian X, Luo Y: Crystal structure of archaeal recombinase RADA: a snapshot of its extended conformation. Mol Cell 2004, 15(3):423–435.View ArticlePubMed
- Story RM, Bishop DK, Kleckner N, Steitz TA: Structural relationship of bacterial RecA proteins to recombination proteins from bacteriophage T4 and yeast. Science 1993, 259(5103):1892–1896.View ArticlePubMed
- Sommer S, Boudsocq F, Devoret R, Bailone A: Specific RecA amino acid changes affect RecA-UmuD'C interaction. Mol Microbiol 1998, 28(2):281–291.View ArticlePubMed
- Mirny LA, Shakhnovich EI: Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. J Mol Biol 1999, 291(1):177–196.View ArticlePubMed
- Reddy BV, Li WW, Shindyalov IN, Bourne PE: Conserved key amino acid positions (CKAAPs) derived from the analysis of common substructures in proteins. Proteins 2001, 42(2):148–163.View ArticlePubMed
- Diella F, Haslam N, Chica C, Budd A, Michael S, Brown NP, Trave G, Gibson TJ: Understanding eukaryotic linear motifs and their role in cell signaling and regulation. Front Biosci 2008, 13: 6580–6603.View ArticlePubMed
- Rehrauer WM, Kowalczykowski SC: The DNA binding site(s) of the Escherichia coli RecA protein. J Biol Chem 1996, 271: 11996–12002.View ArticlePubMed
- Chakrabartty A, Kortemme T, Baldwin RL: Helix propensities of the amino acids measured in alanine-based peptides without helix-stabilizing side-chain interactions. Protein Sci 1994, 3: 843–852.PubMed CentralView ArticlePubMed
- Petukhov M, Kil Y, Kuramitsu S, Lanzov V: Insights into thermal resistance of proteins from the intrinsic stability of their alpha-helices. Proteins 1997, 29(3):309–320.View ArticlePubMed
- Howard-Flanders P, Theriot L: Mutants of Escherichia coli K-12 defective in DNA repair and in genetic recombination. Genetics 1966, 53: 1137–1150.PubMed CentralPubMed
- Lauder SD, Kowalczykowski SC: Negative co-dominant inhibition of RecA protein function: biochemical properties of the RecA1, RecA13 and RecA56 proteins and the effect of RecA56 protein on the activities of the wild-type RecA protein function in vitro . J Mol Biol 1993, 234(1):72–86.View ArticlePubMed
- Lovett CM, Roberts JW: Purification of a RecA protein analogue from Bacillus subtilis . J Biol Chem 1985, 260(6):3305–3313.PubMed
- Steffen SE, Bryant FR: Reevaluation of the nucleotide cofactor specificity of the RecA protein from Bacillus subtilis . J Biol Chem 1999, 274: 25990–25994.View ArticlePubMed
- Carrasco B, Manfredi C, Ayora S, Alonso JC: Bacillus subtilis SsbA and dATP regulate RecA nucleation onto single-stranded DNA. DNA Repair 2008, 7(6):990–996.View ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.