Skip to main content
  • Research article
  • Open access
  • Published:

Total sequence decomposition distinguishes functional modules, "molegos" in apurinic/apyrimidinic endonucleases



Total sequence decomposition, using the web-based MASIA tool, identifies areas of conservation in aligned protein sequences. By structurally annotating these motifs, the sequence can be parsed into individual building blocks, molecular legos ("molegos"), that can eventually be related to function. Here, the approach is applied to the apurinic/apyrimidinic endonuclease (APE) DNA repair proteins, essential enzymes that have been highly conserved throughout evolution. The APEs, DNase-1 and inositol 5'-polyphosphate phosphatases (IPP) form a superfamily that catalyze metal ion based phosphorolysis, but recognize different substrates.


MASIA decomposition of APE yielded 12 sequence motifs, 10 of which are also structurally conserved within the family and are designated as molegos. The 12 motifs include all the residues known to be essential for DNA cleavage by APE. Five of these molegos are sequentially and structurally conserved in DNase-1 and the IPP family. Correcting the sequence alignment to match the residues at the ends of two of the molegos that are absolutely conserved in each of the three families greatly improved the local structural alignment of APEs, DNase-1 and synaptojanin. Comparing substrate/product binding of molegos common to DNase-1 showed that those distinctive for APEs are not directly involved in cleavage, but establish protein-DNA interactions 3' to the abasic site. These additional bonds enhance both specific binding to damaged DNA and the processivity of APE1.


A modular approach can improve structurally predictive alignments of homologous proteins with low sequence identity and reveal residues peripheral to the traditional "active site" that control the specificity of enzymatic activity.


Genomic cloning has revealed that most of the enzyme families essential for maintaining cell growth have been conserved throughout evolution [1]. However, mammalian enzymes with different functional activity may have evolved by combining elements from several bacterial ancestral genes. Even small proteins may contain several individual domains that link them to different superfamilies [2]. While many endonucleases share a common active site that is highly conserved across many subfamilies, identifying residues that control substrate specificity requires sophisticated analysis that combines both sequence conservation and structural data [35].

In this paper we distinguish, using a word-based "molego" approach, structural elements that control substrate specificity. We postulate here that elements conserved in all the members of related protein families dictate common structures and also common "functions", i.e., individual steps in a complex reaction. Areas that affect substrate specificity will be less conserved in the superfamily than they are in subfamilies of enzymes that catalyze specific activities. We have chosen to illustrate this approach using the multifunctional family of DNA repair proteins, the apurinic/apyrimidinic endonucleases (APEs), which have a clearly defined bacterial ancestor, E. coli exonuclease III (ExoIII), and are distantly related to several enzymes with varying substrate specificity.

APEs are essential for mammalian cell growth and bacterial survival in the presence of ionizing radiation and DNA mutagens [6]. They initiate repair of an abasic DNA site by cleaving the phosphodiester backbone 5' of the phosphodeoxyribose. This generates the necessary 3' hydroxyl group for DNA polymerases (pol β, δ or ε, in eukaryotes) to insert the correct nucleotide in later steps in the base excision repair pathway (BER-pathway) [7, 8]. Recent crystal structures of huAPE1 complexed with DNA containing an abasic site [911], combined with sequence analysis and site-directed mutagenesis, have defined the residues that participate in metal ion based cleavage of the phosphate backbone of the DNA [1217].

Mutations that greatly diminish the enzymatic activity of huAPE1 do not affect, and may even increase binding to damaged DNA, while non-specific DNA binding remains low [16, 18]. Further, mutations that have little effect on APE activity in vitro prevent complementation of DNA repair deficient E. coli. As seen with other DNA repair enzymes [19], specificity determining residues, as yet unidentified for APEs, must be distinct from those involved in phosphorolysis.

To better assess which residues determine specificity, we assume that functions unique to APEs will be determined by motifs that are not conserved in a similar fashion in families with a different activity spectrum of functions. Besides cleaving the phosphate backbone, to achieve specificity APEs must coordinate a series of functions, including: interaction with target DNA in a series of small, possibly repetitive steps (scanning), locating damage sites, establishing the transition state complex, completing the cleavage, re-adjusting the charge status within the active site, and regulating release of product after interaction with the next enzyme in the BER pathway [2025]. A finer breakdown of these functions can be achieved at the molecular level once all the residues in the reaction mechanism are known. APEs also have RNase H, 3'-exonuclease, and 3'-phosphodiester activities that are particularly high in the bacterial members of the family [26, 13].

Our web-based MASIA program [27] was used to rapidly decompose the sequences of APEs and related protein families into motifs, areas of significant conservation in members of identical function, which could then be correlated structurally using data from crystal structures. Having determined that 12 motifs were common to all APE1's, we compared the structure of the subset of these that occurred in both DNase 1 and synaptojanin, a member of the IPP family. These shared motifs had a similar 3D structure in representatives of these functionally diverse families, and we therefore called these motifs "molegos" (molecular legos). We then demonstrated that the shared molegos served a similar role in substrate binding by comparing the DNA binding profile of huAPE1 with that for the less specific enzyme DNase 1. The molegos present in both enzymes interact with target DNA in a similar fashion, while residues in molegos distinctive for APE1 control specificity by binding primarily to the bases around the apurinic site. Matching of molegos, guided by the degree of conservation of individual residues across the three families, allowed a better alignment of the individual secondary structure elements among the proteins than DALI achieved. This word based, sequence (motif) to structure (molego) to function method has clear implications for genomic analysis and template based homology modeling, as well as immediate application in recognizing specificity determinants in proteins that share active sites common to many enzymes [28].


Total sequence decomposition of human Ape1 with MASIA

MASIA identified 12 motifs as conserved in all members of the APE family (Figure 1 and Table 1). As table 1, last column, illustrates, these motifs include all the residues known to be essential for DNA cleavage. Most of the highly conserved (greater than 90%) residues have been shown by previous mutagenesis studies to affect activity. The 12 motifs are also structurally conserved, as demonstrated by the low RMSD values between segments in the crystal structures of bacterial ExoIII and of huAPE1. These two proteins are only 26% identical (based on a DALI, structure based alignment) and most of the similar segments are contained in the molegos. As the third column of the table demonstrates, the backbone deviation of the segments is overall <1 Å and for 5 of the motifs, <0.5 Å. We have chosen the name "molegos" for the structural units associated with motifs, which are presented pictorially in Figures 2 and 3. Most of the DNA and metal ion binding molegos form individual β-strands at the core of the protein that orient the absolutely conserved residues toward the substrate, but several have a helical or hydrogen bonded coil structure.

Figure 1
figure 1

MASIA converts complex sequence alignments into easily readable blocks of conserved sequences. A part of the CLUSTALW alignment of the APE family used for the results in Table 1 is shown, with a section of the corresponding MASIA output that includes motifs 1 and 2.

Figure 2
figure 2

APE1 Molegos near the DNA. The structures for the motifs of Table 1 are taken from a minimized 1DE8 crystal structure of APE1 bound to an uncleaved 11mer DNA with an abasic site. One Mg2+ ion was inserted at the position seen in the 1DE9 minimized structure based on 1DE8 (huAPE1/Mg2+/11 bp (cleaved) AP-containing oligonucleotide). The molegos contain residues bind to DNA and have corresponding molegos in other proteins identified in the PSIBLAST search.

Figure 3
figure 3

Other APE1 Molegos. These molegos either contain no residues that bind DNA (molegos 4 and 9) or differ significantly (5 and 7a) between the mammalian and bacterial APEs.

Table 1 Motifs characteristic of the APE family.

The 12 motifs, which account for about half of the protein, are bridged by areas that vary in the different members of the APE family. These connecting regions may account for the differing activities of the bacterial and mammalian proteins. The longest molego, 7, was broken down into two areas, with the contiguous region labeled 7a. The first 7 residues of the 7a area molego are quite similar in the bacterial and mammalian APE. However, the end is differently conserved in eukaryotes. The endonuclease activity of DNase 1 is reduced many fold by integrating this loop from E. coli exonuclease III, but the mutant cleaves at abasic sites in DNA with low efficiency [29]. Thus additional residues in the APEs control specificity while still allowing a reasonable rate of phosphorolytic cleavage.

Finding APE molegos in the DNase 1 superfamily

In an effort to functionally annotate the molegos of APE1, we next sought to find them in other proteins that shared some structural similarity to APE. The APEs, DNase-1 and inositol 5'-polyphosphate phosphatases (IPP) have been grouped according to the SCOP database [30] as the DNase-1 like superfamily. Although DNase 1 has only 18% overall sequence identity and the IPP domain of synaptojanin, 14%, to APE1, we could show that most of the areas of identity were in molegos common to all three proteins. Motifs in other protein families were identified by genomic cross-networking with PSIBLAST (see methods for details). Our analysis identified 5 molegos that are common to the DNase 1, IPP and APE families, which roughly correspond to areas of sequence similarity identified previously [31, 32]. The structural similarities of molegos 1,2, 7, 11 and 12 (i.e., the segmental RMSD's) between APE1 and representatives of the distantly related DNase 1 and IPP families are comparable to those found between members of the APE family (Figure 4 and Table 1 &2).

Figure 4
figure 4

APE Motifs common to DNase-1 and Synaptojanin have similar structure. Conserved areas of huAPE1 (from PDB file 1DE9) form the β-barrel core (left side). The motifs common to DNase1 (1DNK) and Synaptojanin (1I9Z) form a similar catalytic site, with the substrate (from the appropriate structure file) directed to the top of the β-strands and the metal ion binding site. The strands in the middle and right figures are colored according to their identity (green) or similarity (blue) to APE1. Residues that differ are red, the insertions needed for correct alignment in synaptojanin are yellow.

Table 2 Aligning sequences according to MASIA motif analysis improves the correlation between structures.

Common molegos form a similar active site in two distant relatives

The 12 conserved molegos form the β-barrel core of huAPE1. The completely conserved residues of huAPE1 concentrate, for the most part, at one end of this framework to form the metal ion binding active site (Figure 4). This core is also common to DNase 1 and synaptojanin(an IPP family member), which share the functions of metal ion based cleavage of a phosphate backbone. The shared molegos define an active site architecture conserved in all three proteins, including the orientation of the substrate toward the metal binding site.

Molegos define functional areas common to DNase 1 and APE1

A contact plot of huAPE1 with the DNA in the 1DE8 crystal structure (Figure 5) shows that motifs 1–3,5–8, and 10–12 all have residues close to the substrate, an oligonucleotide containing an abasic site (AP-DNA). The N-terminal motifs 1–3 and 5 bind primarily 5' to the apurinic site and to the 3' end of the undamaged strand. The other motifs bind more to the area 3' of the damage site. Motifs 10 and 12 span both strands of the DNA. Although motif 12 contains several highly conserved residues that, according to mutagenesis results (Table 1) contribute to APE1 activity, only His309 is very close to the abasic site in the DNA. Molegos 4 and 9 contain no residues in contact with the DNA or metal ion. Comparing the binding of APE in a substrate complex (Figure 6, left) suggests that APE's binding to the 5' end of the DNA after cleavage (Fig. 5), especially that mediated by molego 3, is stronger, while the distance from the protein to the DNA 3' of the cleavage site increases.

Figure 5
figure 5

Protein DNA contact plots for huAPE1. Protein/DNA contact plots for huAPE1 binding to substrate DNA taken from the minimized IDE8 structure described in Figure 2. Blocks, black at 1.5 Å, lighten with increasing distance of residues from the DNA up to 7.5 Å. Most of the MASIA motifs (Table 1) for the APE family are near the DNA interface. The motifs are indicated by numbers; the abasic site at position 6 is indicated by arrows.

Figure 6
figure 6

Comparison of DNA contacts in APE1 and DNase 1. Comparison of the DNA-contact plots of huAPE1 with product (cleaved DNA, from the 1DE9 structure) and DNase 1 (uncleaved substrate, from 1DNK) illustrates the similar mode of binding by the areas conserved between the two proteins. The scissile bonds in both DNAs, 5' of the abasic site in APE1, C315 in 1DNK [53], are indicated by arrows.

The contact plots of APE1 and DNase1 with their respective substrates (Figure 6) documents that the similar molegos in the proteins serve similar functions. The N-terminal 100 residues of both proteins, including molegos 1 and 2, bind 5' of the cleavage site and to the 3' end of the opposite DNA strand. Molegos 7, 11 and 12 bind to one base 5' and the next base 3' of the cleavage site in both proteins. Overall, the pattern of protein contacts to the cleavage site, the area 5' of the cleavage site, and the 3' end of the opposite strand are common to both proteins, suggesting that the functions of forming the substrate complex and the actual phosphorolysis are similar in both proteins.

While the length of the DNA in both cases is similar, DNase 1 clearly has less binding to bases opposite and 3' of the cleavage site. The extensive contacts that APE1 makes to these positions are mediated by molegos it does not share with DNase 1. Molegos 6, 7a and 10 all have residues within hydrogen bonding distance of the three basepairs 3' of the AP-site. This redundancy of binding to the 3' side is unique to APE, as is its strong binding to the DNA opposite the abasic site.

The importance of such bonds for activity was shown in other work, where huAPE1's binding to the DNA backbone is only inhibited by ethylation of the phosphates two and three positions 3' to an abasic site [18]. Mutation of R177A, at the end of Molego 6, that binds to this region and to the bases opposite the AP-site had enhanced activity [11], while mutations at W280 (Molego 11) and F266 (Molego 10) [33] reduce activity and, in the latter case, substrate selectivity.

In work from this group that will be described separately, we used this analysis to generate mutants of APE1 with altered activity. An alanine substitution mutant, N226A, of a conserved residue at the end of molego 7a that forms a hydrogen bond with the second phosphate group downstream of the abasic site, had enhanced APE activity but increased Km and Kd values, similar to an alanine mutant of R177, which binds to the same site, reported previously [34]. A combination of the two mutants, N226A and R177A, substantially reduced the ability of APE1 to bind to DNA containing an abasic site (Izumi et al., in preparation). Thus, molegos can effectively guide the redesign of enzymes to alter specificity.

Molegos to improve structural alignment

Using molegos may also help in aligning proteins for template based modeling, by determining the end points of secondary structure elements in alignments with many gaps and insertions. According to MASIA analysis, the residues K/R and DI at the N-termini of motifs 1 and 2 are absolutely conserved in the three families, APEs, DNase1s and IPPs. However, matching these conserved residues between synaptojanin and DNase1 or APE requires a gapping that would not be consistent with CLUSTALW or a structural (DALI [35]) alignment of these proteins (Table 2). If the local alignment with synaptojanin is gapped to align these residues in the three proteins (Table 2, gapped), the RMSD for the two sections separated by the gap is much lower than that if one tries to align the whole ungapped segment. As Fig. 7 illustrates, the local environments of both conserved residue pairs DI and QE are structurally equivalent in all three proteins, indicating that a motif based alignment with a two residue gap is correct. The first two β-strand molegos in synaptojanin are 2 residues longer than in APE or DNase 1. By regarding these elements as simple lego style blocks, and recognizing the connectivity, molego based alignment correctly defined the changing length of the secondary structure elements.

Figure 7
figure 7

Molegos can improve structural alignments. Molegos 1 and 2 have a similar structure in all three proteins, but the β-strand is 2 amino acids longer in synaptojanin. The amino acids around the N-terminal ends of the molegos that form hydrogen bonds to the conserved aspartate (D90 in huAPE1) are shown.


Previously, a "lego® block" approach described for organic synthesis [3638] was used to describe the reshuffling of large sections of plant genomes [39] and as a rational method to build novel protein structures in the lab [36, 40]. Here we demonstrate that the concept is also useful to define the structural and functional role of conserved amino acids. Combining the MASIA decomposition approach to sequence analysis (Figure 1 and Tables 1 and 2) with data from crystal structures and site directed mutagenesis (Figure 2,3,4,5,6,7) showed that molegos can indicate areas of the protein that control individual functions that contribute to enzymatic activity and improve alignments for template based modeling of homologues with low identity (Table 2 and Figure 7). Most of the MASIA-motifs were near the DNA, metal ions, or both in the co-crystal structures of huAPE1.

Further, functional roles could be assigned to motifs based on their occurrence in related protein families. The similarity of the 3D structures of these motifs in three distantly related proteins and even their modes of binding substrate (Table 2, Figs. 2,3,4,5,6,7) imply that these molegos will be found in even more distantly related proteins. Several molegos can contribute to the same interacting surface and can thus define domains that are not linearly located in the protein sequence. The combined structural/sequence definition allows much more flexibility in defining a functional element than is possible with purely sequence based approaches such as PROSITE [41].

Is the specificity of APE determined by binding 3' to an abasic site?

Crystal structure data, coupled with molego analysis, outlined the areas of APE1 that distinguish its mode of DNA binding from the less specific DNase 1. Contact maps (Figure 6) illustrate how the conserved motifs direct DNA binding in the distributive (i.e., rapidly releasing substrate/product), relatively non-specific DNase1 as opposed to the processive, highly specific huAPE1. Both enzymes cleave only one DNA strand in a duplex and bacterial Xth cleaves ssDNA containing an abasic site [42]. The additional contacts huAPE1, compared to DNase 1 (Figure 6), makes 3' to the damage site and to the opposite strand lower its turnover rate and its potential to cleave normal DNA. The residues contacting the region 3' to the abasic site come from three different uniquely conserved areas of APEs (molegos 6,7a, 10) as well as 11, a molego that is similar to that in DNase-1. These observations, coupled with DNA ethylation data [18], indicates that 3' binding is a key element in specific recognition by APEs.

This is confirmed by site directed mutagenesis studies. Of the four protein areas that bind to the DNA 3' of the abasic site, mutating F266 (molego 10) or W280 (middle of molego 11) decreases APE activity [33]. The F266 mutation is particularly interesting, as the mutants at this position had reduced substrate specificity and enhanced 3'-exonuclease activity. However, an R177A mutant had enhanced APE activity [11], as do mutations at N226 (Izumi et al., in preparation). Combining these mutations however greatly decreases substrate binding (Izumi et al., in preparation). The 3' approach to the DNA [34] and the wide area covered by the protein on both sides of the abasic site [14] are both consistent with the need to hold the product until the correct polymerase moves in 5' to 3' to complete the repair [25]. This implies that the mammalian enzyme has evolved to be processive, to facilitate more efficient functioning of the overall BER pathway, and may not be optimized for simple catalysis. Processivity is an important facit of the activity of enzymes that function in complex pathways [43]. Reduced processivity may explain, for example, the repair deficits in Xeroderma pigmentosum (XPA) cells [44]. Our molego approach provides a basis for exploring the role of segments of the protein in its functions, rather than relying only on data from missense mutations.

Using molegos to detect structural and functional homologues

We have demonstrated here the derivation and uses of molegos for analyzing the specificity of enzymes, based on those derived from the APE family. The methodology can be used to complement searches with programs such as PSIBLAST and PROSITE [41] to detect distantly related functional or structural homologues in sequences revealed by genome sequencing. PSIBLAST searches often reveal areas of local similarity in proteins that have no significant overall sequence identity. Molego analysis could be useful to analyze the significance of such findings. The combined sequence and structure definition makes molegos more flexible for defining shared protein elements than methods such as PROSITE that require a strict one-dimensional definition. An improved motif definition method, based on physical property similarity [42], which has been incorporated into our MASIA tool] also promises to enhance the usefulness of the method. This may eventually lead to a method to find functional relationships between proteins with even lower overall sequence similarity.

Another potential area for applying the molego approach is in homology modeling. Molegos may prove useful in to check alignments for template based modeling of homologues with low identity (Table 2 and Figure 7), if the "anchoring" residues are conserved in sequence or property across the members of both subfamilies. Our molego approach is closest in principle to that of the ROSETTA program [45] whereby the latter seeks only to connect structure, not function, to a sequence element. We are currently testing the usefulness of the molego approach in modeling in the CASP5 competition.


The MASIA program can parse sequences into discrete blocks of significant conservation. The motifs identified in the APE family could be structurally annotated using crystal data to derive molegos, words in the protein sequence that correlate with structural elements. These molegos could in turn be functionally annotated by comparing the DNA binding profile of APE1 with that of the less specific nuclease DNase 1. This analysis indicated that residues binding 3' to the site of phosphorolytic cleavage control the substrate specificity of APE1. These results indicate that molegos can provide a useful basis for identifying specificity determining regions in enzymes with similar active sites but different activity spectra [46, 28, 3]. Site directed mutagenesis based on these results can define the function of the unique elements of the APEs, and aid in the design of enzymes with altered specificity.

Materials and Methods

Sequence alignment

A BLAST [47] search of the "non-redundant" protein database using the whole sequence of human APE1 yielded over 100 related sequences. Some sequence entries represented the same protein, called by different names or isolated in different screens, including many entries for huAPE1,Drosophila Rrp 1 protein (~40% identical to the mammalian APE1 in the C-terminal third of the protein), Xth from E. coli, and counterparts of this and exodeoxyribonuclease (exo A) sequences from many bacteria, which are about 25% identical to mammalian APE1. The mammalian sequences are highly conserved, with only 6 non-conservative residue variations between the human and murine sequences, 5 of which occur in the apparently unstructured N-terminus. Several proteins with more distant relationship to APE1, such as mammalian and yeast APEIIs, and the CRC protein from Pseudomonas, which has no APE activity [48] were in the BLAST list, but were not used for this analysis. To derive functional motifs, the BLAST list was culled to 37 unique sequences with identity ranging from 25% to 98%. These were aligned using the default parameters of CLUSTALW [49, 50] Sequences were the APE1 protein from human, bovine, monkey, rat murine, Arabidopsis thaliana (Mouse-ear cress) Dictyostelium discoideum (Slime mold), Schizosaccharomyces pombe (Fission yeast), Caenorhabditis elegans, Saccharomyces cerevisiae (Baker's yeast), Thermoplasma acidophilum, Neisseria meningitidis. Methanobacterium thermoautotrophicum, Leishmania major, Trypanosoma cruzi, Coxiella burnetii; the Rrp1 protein of Drosophilia; exonuclease III from E. coli, Bacillus subtilis, Mycobacterium tuberculosis, Haemophilus influenzae, Salmonella typhimurium, Helicobacter pylori, Rickettsia prowazekii, Archaeoglobus fulgidus, Actinobacillus actinomycetemcomitans, Streptomyces coelicolor, Synechocystis sp. PCC 6803, Haemophilus influenzae; Exonuclease A from Steptococcus pneumonia, Treponema pallidum, Borrelia burgdorferi. Plasmodium falciparum. Different set conditions and sequence lists were tested in CLUSTALW for their effect on alignment and subsequent motif definition with MASIA. Areas peripheral to the endonuclease domain, such as the 50 amino acid mammalian and the 428 residue Drosophila Rrp-1 N-terminal regions were eliminated to improve the consensus.

Identification of motifs using MASIA

Motifs were identified in the aligned sequences using the MASIA consensus macro Motifs start when at least 3 of 4 consecutive positions are more than 40% conserved according to the dominant criterion [51], and extend until at least 2 positions in a row are less than 40% conserved. To allow for mistakes in the alignment of all the sequences, essential residues are those >90% conserved by MASIA criteria over all sequences in the alignment.

Genomic cross-networking with PSIBLAST

A PSIBLAST search, using huAPE1 as the founder sequence, with an e-value of 0.1 per iteration, did not converge after 6 iterations, but few new sequences were added in the last 2 cycles. Searches with an e-value of 0.01/iteration had similar results, but members of several families were not included until later cycles. Members of the DNase 1, LINE-1 repeats, inositol 5'-polyphosphate phosphatase, Nocturnin, CCR4, cytolethal distending toxin, neutral sphingomyelin phosphodiesterase, and amino acid methyltransferase families were found with expectation values of 10-4 or less to be significantly similar to APE1. To determine the presence of motifs in these relatives, a CLUSTALW alignment of at least 5 representatives of a protein family was prepared and analyzed with MASIA for significant areas of conservation. In some cases, alignments taken from literature references (e.g., for IPPs [32]) were used to confirm MASIA results. The motifs common to these families were compared with the APE motifs of Table 1. Criteria for inclusion (presence of motif) included conservation of residues >90% conserved (side chains shown in blue in the tables) and patterns of polarity (as determined with a macro included in the MASIA packet as a user specified feature).

Molego building and comparison

The drawings of "molego blocks" and structures (Figures 2,3,4 and 7), contact plots of protein/DNA (Figures 5,6), and calculation of the RMSD between similar segments (Table 2) were done with MOLMOL[52]. The RMSD values in Table 1 were calculated using SwissPDB viewer and the "fit selected residues" option.


  1. Lander ES, Linton LM BB, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blocker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowski J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, Szustakowki J, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ, International Human Genome Sequencing Consortium: Initial sequencing and analysis of the human genome. Nature 2001, 409: 860–921. 10.1038/35057062

    Article  CAS  PubMed  Google Scholar 

  2. Ramos A, Pastore A: Dissecting nucleases into their structural and functional domains: mapping the RNA-binding surface of RNase III by NMR. In: Methods in Molecular Biology (Edited by: Schein CH). Humana Press, Totowa NJ 2001, 237–248.

    Google Scholar 

  3. Jeltsch A, Pingoud AM: Methods for determining activity and specificity of DNA binding and DNA cleavage by Class II restriction endonucleases. In: Methods in Molecular Biology (Edited by: Schein CH). Humana Press, Totowa NJ 2001, 287–308.

    Google Scholar 

  4. Lichtarge O, Yamamoto KR, Cohen FE: Identification of functional surfaces of the zinc binding domains of intracellular receptors. J Mol Biol 1997, 274: 325–337. 10.1006/jmbi.1997.1395

    Article  CAS  PubMed  Google Scholar 

  5. Sowa ME, He W, Wensel TG, Lichtarge O: A regulator of G protein signaling interaction surface linked to effector specificity. Proc Natl Acad Sci 2000, 97: 1483–1488. 10.1073/pnas.030409597

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  6. Demple B, Harrison L: Repair of oxidative damage to DNA: enzymology and biology. Ann Rev Biochem 1994, 63: 915–948. 10.1146/annurev.biochem.63.1.915

    Article  CAS  PubMed  Google Scholar 

  7. Mitra S, Hazra TK, Roy R, Ikeda S, Biswas T, Lock J, Bolodogh I, Izumi T: Complexities of DNA base excision repair in mammalian cells. Mol Cells 1997, 7: 305–312.

    CAS  PubMed  Google Scholar 

  8. Mitra S, Izumi T, Bolodogh I, Ramana CV, Hsieh CC, Saito H, Lock J, Papaconstantinou J: Repair of oxidative DNA damage and aging: central role of AP-endonuclease. In: Proceedings of the NATO Asi Monograph Plenum Press Publishers, New York, NY 1999, 295–311.

    Google Scholar 

  9. Barzilay G, Mol CD, Robson CN, Walker LJ, Cunningham RP, Tainer JA, Hickson ID: Identification of critical active-site residues in the multifunctional human DNA repair enzyme HAP1. Nat Struct Biol 1995, 2: 561–568.

    Article  CAS  PubMed  Google Scholar 

  10. Gorman MA, Morera S, Rothwell DG, de La Fortelle E, Mol CD, Tainer JA, Hickson ID, Freemont PS: The crystal structure of the human DNA repair endonuclease HAP1 suggests the recognition of extra-helical deoxyriibose at DNA abasic sites. EMBO J 1997, 16: 6548–6558. 10.1093/emboj/16.21.6548

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  11. Mol CD, Izumi T, Mitra S, Tainer JA: DNA-bound structures and mutants reveal abasic DNA binding by APE1 DNA repair and coordination. Nature 2000, 403: 451–456. 10.1038/35000249

    Article  CAS  PubMed  Google Scholar 

  12. Erzberger JP, Wilson DM: The role of Mg2+ and specific amino acid residues in the catalytic reaction of the major human abasic endonuclease: new insights from EDTA-resistant incision of acyclic abasic site analogs and site-directed mutagenesis. J Mol Biol 1999, 290: 447–457. 10.1006/jmbi.1999.2888

    Article  CAS  PubMed  Google Scholar 

  13. Izumi T, Malecki J, Chaudhry MA, Weinfeld M, Hill JH, Lee JC, Mitra S: Intragenic suppression of an active site mutation in the human apurinic/apyrimidinic endonuclease. J Mol Biol 1999, 287: 47–57. 10.1006/jmbi.1999.2573

    Article  CAS  PubMed  Google Scholar 

  14. Nguyen LH, Barsky D, Erzberger JP, Wilson DM: Mapping the protein-DNA interface and the metal-binding site of the major human apurinic/apyrimidinic endonuclease. J Mol Biol 2000, 298: 447–459. 10.1006/jmbi.2000.3653

    Article  CAS  PubMed  Google Scholar 

  15. Rothwell DG, Hickson ID: Asparagine 212 is essential for abasic site recognition by the human DNA repair endonuclease HAP1. Nucleic Acids Res 1996, 24: 4217–4221. 10.1093/nar/24.21.4217

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  16. Masuda Y, Bennett RAO, Demple B: Dynamics of the interaction of human apurinic endonuclease (Ape1) with its substrate and product. J Biol Chem 1998, 46: 30352–30359. 10.1074/jbc.273.46.30352

    Article  Google Scholar 

  17. Rothwell DG, Hang B, Gorman MA, Freemont PS, Singer B, Hickson ID: Substitution of Asp-210 in HAP1(APE/Ref-1) eliminates endonuclease activity but stabilizes substrate binding. Nucleic Acids Res 2000, 28: 2207–2213. 10.1093/nar/28.11.2207

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  18. Wilson DM, Takeshita M, Demple B: Abasic site binding by the human apurinic endonuclease, Ape, and determination of the DNA contact sites. Nucleic Acids Res 1997, 25: 933–939. 10.1093/nar/25.5.933

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  19. Zhu H, Schein CH, Braun W: Homology modeling and molecular dynamics simulations of PBCV-1 glycosylase complexed with UV-damaged DNA. J Mol Model 1999, 5: 302–316. 10.1007/s008940050017

    Article  CAS  Google Scholar 

  20. Hill JW, Hazra TK, Izumi T, Mitra S: Stimulation of human 8-oxoguanine-DNA glycosylase by AP-endonuclease: potential coordination of the initial steps in base excision repair. Nucleic Acids Res 2001, 29: 430–438. 10.1093/nar/29.2.430

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  21. Parikh SS, Mol CD, Slupphaug G, Bharati S, Krokan HE, Tainer JA: Base excision repair initiation revealed by crystal structures and binding kinetics of human uracil-DNA glycosylase with DNA. EMBO J 1998, 17: 5214–5226. 10.1093/emboj/17.17.5214

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  22. Waters TR, Gallinari P, Jiricny J, Swann PF: Human thymine DNA glycosylase binds to apurinic sites in DNA but is displaced by human apurinic endonuclease 1. J Biol Chem 1999, 274: 67–74. 10.1074/jbc.274.1.67

    Article  CAS  PubMed  Google Scholar 

  23. Yang H, Clendenin WM, Wong D, Demple B, Slupska MM, Chiang JH, Miller JH: Enhanced activity of adenine-DNA glycosylase (Myh) by apurinic/apyrimidinic endonuclease (Ape1) in mammalian base excision repair of an A/GO mismatch. Nucleic Acids Res 2001, 29: 743–752. 10.1093/nar/29.3.743

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  24. Klungland A, Lindahl T: Second pathway for completion of human DNA base excision-repair: reconstitution with purified proteins and requirement for DNase IV (FEN1). EMBO J 1997, 16: 3341–3348. 10.1093/emboj/16.11.3341

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  25. Wilson SH, Kunkel TA: Passing the baton in base excision repair. Nature Struct Biol 2000, 7: 176–178. 10.1038/73260

    Article  CAS  PubMed  Google Scholar 

  26. Barzilay G, Walker LJ, Robson CN, Hickson ID: Site-directed mutagenesis of the human DNA repair enzyme HAP1: identification of residues important for AP endonuclease and RNase H activity. Nucleic Acids Res 1995, 23: 1544–1550.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  27. Zhu H, Schein CH, Braun W: MASIA: recognition of common patterns and properties in multiple aligned protein sequences. Bioinformatics 2000, 16: 950–951. 10.1093/bioinformatics/16.10.950

    Article  CAS  PubMed  Google Scholar 

  28. Gerlt JA, Babbitt PC: Divergent evolution of enzymatic function: mechanistically diverse superfamilies and functionally distinct suprafamilies. Ann Rev Biochem 2001, 70: 209–246. 10.1146/annurev.biochem.70.1.209

    Article  CAS  PubMed  Google Scholar 

  29. Cal S, Tan KL, McGregor A, Connolly BA: Conversion of bovine pancreatic DNase I to a repair endonuclease with a high selectivity for abasic sites. EMBO J 1998, 17: 7128–7138. 10.1093/emboj/17.23.7128

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  30. Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995, 247: 536–540. 10.1006/jmbi.1995.0159

    CAS  PubMed  Google Scholar 

  31. Furano AV: The biological properties and evolutionary dynamics of mammalian LINE-1 retrotransposons. Prog Nucleic Acid Res Mol Biol 2000, 64: 255–294.

    Article  CAS  PubMed  Google Scholar 

  32. Whisstock JC, Romero S, Gurung R, Nandurkar H, Ooms LM, Bottomley SP, Mitchell CA: The inositol polyphosphate 5-phosphatases and the apurinic/apurimidinic base excision repair endonucleases share a common mechanism for catalysis. J Biol Chem 2000, 275: 37055–37061. 10.1074/jbc.M006244200

    Article  CAS  PubMed  Google Scholar 

  33. Hadi MZ, Ginalski K, Nguyen LH, Wilson DM: Determinants in Nuclease Specificity of Ape1 and Ape2, Human Homologues of Escherichia coli Exonuclease III. J Mol Biol 2002, 316: 853–866. 10.1006/jmbi.2001.5382

    Article  CAS  PubMed  Google Scholar 

  34. Mol CD, Hosfield DJ, Tainer JA: Abasic site recognition by two apurinic/apyrimidinic endonuclease families in DNA base excision repair: the 3' ends justify the means. Mut Res 2000, 460: 211–229. 10.1016/S0921-8777(00)00028-8

    Article  CAS  Google Scholar 

  35. Holm L, Sander C: Touring protein fold space with Dali/FSSP. Nucleic Acids Res 1998, 26: 316–319. 10.1093/nar/26.1.316

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  36. Karle IL, Das C, Balaram P: De novo protein design: crystallographic characterization of a synthetic peptide containing independent helical and hairpin domains. Proc Natl Acad Sci 2000, 97: 3034–3037. 10.1073/pnas.070042697

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  37. Valetti F, Sadeghi SJ, Meharenna YT, Leliveld SR, Gilardi G: Engineering multi-domain redox proteins containing flavodoxin as bio-transformer: preparatory studies by rational design. Biosens Bioelectron 1998, 13: 675–685. 10.1016/S0956-5663(98)00021-9

    Article  CAS  PubMed  Google Scholar 

  38. Winningham MJ, Sogah DY: A modular approach to polymer architecture control via catenation of biomolecular Lego(R) sets. Polymers containing templated beta-sheets. Abstr Pap Am Chem Soc 1997, 213: 88-PMSE.

    Google Scholar 

  39. Messing J, Llaca V: Importance of anchor genomes for any plant genome project. Proc Natl Acad Sci 1998, 95: 2017–2020. 10.1073/pnas.95.5.2017

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  40. Sadeghi SJ, Meharenna YT, Fantuzzi A, Valetti F, Gilardi G: Engineering artificial redox chains by molecular 'Lego'. Faraday Discuss 2000, 116: 135–153. 10.1039/b003180l

    Article  CAS  PubMed  Google Scholar 

  41. Hofmann K, Bucher P, Falquet L, Bairoch A: The PROSITE database, its status in 1999. Nucleic Acids Res 1999, 27: 215–219. 10.1093/nar/27.1.215

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  42. Shida T, Noda M, Sekiguchi J: Cleavage of single- and double-stranded DNAs containing an abasic site by Escherichia coli exonuclease III (AP endonuclease VI). Nucleic Acids Res 1996, 24: 4572–4576. 10.1093/nar/24.22.4572

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  43. Lloyd RS: Processivity of DNA repair enzymes. In: Nuclease methods and protocols (Edited by: Schein CH). Humana Press, Totowa NJ 2001, 3–14.

    Chapter  Google Scholar 

  44. Feng S, Parrish DD, Lambert MW: A processive versus a distributive mechanism of action correlates with differences in ability of normal and xeroderma pigmentosum group a endonucleases to incise damaged nucleosomal DNA. Carcinogen 1997, 18: 279–286. 10.1093/carcin/18.2.279

    Article  CAS  Google Scholar 

  45. Simons KT, Bonneau R, Ruczinski I, Baker D: Ab Initio protein structure prediction of CASPIII targets using ROSETTA. Proteins 1999, 37: 171–176. Publisher Full Text 10.1002/(SICI)1097-0134(1999)37:3+%3C171::AID-PROT21%3E3.0.CO;2-Z

    Article  Google Scholar 

  46. Armstrong RN: Mechanistic diversity in a metalloenzyme superfamily. Biochemistry 2000, 39: 13625–13632. 10.1021/bi001814v

    Article  CAS  PubMed  Google Scholar 

  47. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DL: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389–3402. 10.1093/nar/25.17.3389

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  48. Macgregor CH, Arora SK, Hager PW, Dahl MB, Phibbs PV: The nucleotide sequence of the Pseudomonas aeruginosa pyrE-crc-rph region and the purification of the crc gene product. J Bacteriol 1996, 178: 5627–5635.

    PubMed Central  CAS  PubMed  Google Scholar 

  49. Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994, 22: 4673–4680.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  50. Thompson JD, Plewniak F, Poch O: A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res 1999, 27: 2682–2690. 10.1093/nar/27.13.2682

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  51. Hanggi G, Braun W: Pattern recognition and self-correcting distance geometry calculations applied to myohemerythrin. FEBS Lett 1994, 34: 147–153. 10.1016/0014-5793(94)00366-1

    Article  Google Scholar 

  52. Koradi R, Billeter M, Wuthrich K: MOLMOL: a program for display and analysis of macromolecular structures. J Mol Graph 1996, 14: 51–55. 10.1016/0263-7855(96)00009-4

    Article  CAS  PubMed  Google Scholar 

  53. Weston SA, Lahm A, Suck D: X-ray structure of the DNase I-d(GGTATACC)2 complex at 2.3 A resolution. J Mol Biol 1992, 226: 1237–1256.

    Article  CAS  PubMed  Google Scholar 

Download references


We thank our coworkers Venkatarajan Mathura, Sankar Mitra, and Ovidiu Ivanciuc for helpful discussions. Funding for this project was provided by grants from the Department of Energy (DE-FG03-00ER63041), the Texas Higher Education Coordinating Board (ARP 004952-0084-1999) and the John Sealy Memorial Endowment Fund (2535-01).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Werner Braun.

Additional information

Authors' contributions

CHS developed the molego concept, performed the sequence analysis and prepared the manuscript. NO analyzed the structural properties of the molegos, and prepared figures 2,3,4,5,6,7. TI prepared mutants of human APE1 protein to test the conclusions of this paper, and he provided background and list of mutations affecting APE function. WB designed and developed the MASIA program and the modular analysis approach to enzyme functions.

Electronic supplementary material


Additional File 1: Sequence alignment and MASIA printout for the APE family The original alignment used for the APE family and the MASIA consensus macro output. Note that the A. thaliana sequence is consistently an outlier in the file, due to the CLUSTAL-W misalignment of the nuclease area of the protein when the N-terminal sequence is not removed. The real position of the motifs in the A. thaliana sequence are underlined. (DOC 84 KB)

Authors’ original submitted files for images

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schein, C.H., Özgün, N., Izumi, T. et al. Total sequence decomposition distinguishes functional modules, "molegos" in apurinic/apyrimidinic endonucleases. BMC Bioinformatics 3, 37 (2002).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: