Domain analysis of the tubulin cofactor system: a model for tubulin folding and dimerization

Background The correct folding and dimerization of tubulins, before their addition to the microtubular structure, needs a group of conserved proteins called cofactors A to E. The biochemical analysis of cofactors gave an insight to their general functions, however not much is known about the domain structure and detailed, molecular function of these proteins. Results Combining modelling and fold prediction tools, we present 3D models of all cofactors, including several previously unannotated domains of cofactors B-E. Apart from the new HEAT and Armadillo domains in cofactor D and an unusual spectrin-like domain in cofactor C, we have identified a new subfamily of ubiquitin-like domains in cofactors B and E. Together, these observations provide a reliable, molecular level model of cofactor complex. Conclusion Distant homology searches allowed the identification of unknown regions of cofactors as self-reliant domains and allow us to present a detailed hypothesis of how a cofactor complex performs its function.


Background
The cytoskeleton plays a major role in many cell processes. However, it is not a passive scaffold. It is an active system, continuously changing its characteristics, and in particular constantly building and destroying the opposite ends of polymers forming the scaffold. Microtubule metabolism performs one of the leading roles in regulating various features of the cytoskeleton. The αand βtubulin dimerization and polymerization processes leading to microtubule formation are controlled by several multi-domain proteins. After translation of both proteins a group of chaperones called chaperonines (e.g. CCT) capture the tubulins and begin the folding of tubulin monomers [1]. Chaperonines are sufficient to fold completely actin and γ-tubulin but another set of proteins is needed for the final folding and dimerization of αand β-tubulins [2][3][4][5]. In the last ten years cofactors from different eukaryotic organisms were cloned and analyzed [4,[6][7][8][9][10][11][12][13][14][15][16][17][18]. Apparently the set consists of five cofactors A, B, C, D and E and an ADP ribosylation factor-like protein 2, Arl2 that regulates the interaction of cofactor D with native tubulin.
The process of folding and dimerization of tubulins is complex. After release from chaperonin α-tubulin is seized by cofactorB while β-tubulin is captured by cofactor A. The next step is the replacement of cofactors B and A by cofactors E and D, respectively. The last stage of this process involves the formation of a super-complex consisting of cofactors C-E [2,4,10,19]. The release of tubulin α/β heterodimer is catalyzed in the presence of GTP [4,18]. In the last few years, the function of the cofactors has been described in increasing detail ( Fig. 1) [8,19], but little is known about their domains and structural organization. As a result, we lack understanding of how the complex of cofactors may achieve its function.
In recent years, advances in sensitive sequence analysis tools and fold recognition algorithms have allowed us to recognize distant homologies, giving us important insights into the function and specific activities of many proteins. Here, we describe such analysis of the domain architecture of cofactors B-D. In particular, the application of fold recognition algorithms allowed us to predict threedimensional structures of all but one domain in cofactors B, C, D and E, and provided us with critical information to link their structure with function ( Fig. 2). Such predictions complement our understanding of how the cofactor complex might work in assembling tubulin dimers by providing structure-based understanding of function of individual domains and their possible interactions.

Results
The Schizosaccharomyces pombe homologue of cofactor B, Alp11 was studied in detail in a few recent publications [8,9]. This analysis identified the CLIP-170 domain, described in PFAM as the cytoskeleton-associated domain (CAP-Gly domain, Pfam01302), required for efficient binding to α-tubulin. Our analysis suggests that this domain is about twice as long as that described in PFAM (about 80 amino acids). The structure of this domain cannot be predicted with confidence; however, the secondary structure prediction programs strongly suggest the presence of six beta strands. This domain is found in many other proteins (kinesin, dynactin, restin, dynein-associated protein). It is often repeated twice or three times in one protein suggesting a possibility of a beta barrel fold for this domain. The structure of this domain from the C. elegans homologue was solved recently by the SouthWest Structure Genomics Center [20], confirming this prediction and identifying the CAP-Gly domains as having an OB-fold of a beta barrel, formed by a twisted B-sheet. This fold was previously found to be involved in DNA binding and several enzymatic activities [21], but its presence in a cytoskeleton binding proteins comes as a complete surprise.
The center of cofactor B is occupied by a short helical domain. It was suggested to have a coiled-coil structure [9], although this is not confirmed by most of the coiledcoil prediction algorithms. This fragment was also charac-Simplified model depicting the reactions involved in the assembly of the tubulin heterodimer [18] Figure 1 Simplified model depicting the reactions involved in the assembly of the tubulin heterodimer [18]. The colors depict the previously known regions of the cofactors and correspond to the colors ascribed to various domains in Figure 2A. The α-tubulin is gray colored and β-tubulin is black colored. For clarity, the involvement of ADP ribosylation factor-like protein 2 (Arl2) in modulating the interaction of cofactors with tubulins [18] was omitted.

GTP GDP + Pi
Domain structure of cofactors B-E: Domain ranges and fold recognition algorithms used for function assignment are shown under each domain  terized as necessary for a satisfactory maintenance of cellular α-tubulin level.
The N-terminal part of cofactor B ( Fig. 2 and Fig. 3A) was shown to be essential for its function [9]. A homologous domain is present in the C-terminus of cofactor E (see later), and also in several proteins that do not contain the CLIP-170 domain, including some that bind other cytoskeletal proteins. Here we show, using fold recognition algorithms, that this domain has a structure similar to ubiquitin. Some sequence patterns characteristic for ubiquitin are strongly conserved, for instance the lysine residues important for linkage in multi-ubiquitin chains [22]. Others are not, for instance the lack of the carboxyl-terminal extension containing an exposed double glycine motif essential for conjugation to target proteins [23]. The new domain can be classified in a broad ubiquitin-like (UBL) family consisting of sequences from multi-domain proteins known to be involved in the proteasome or other protein-protein interactions [23,24]. As this domain is crucial for cofactor's function we hypothesize its involvement in α-tubulin binding might be to assist CAP-Gly domain in keeping the tubulin in the folded state.
Cofactor C is composed of two domains. The N-terminal part has a strong coiled-coil sequence signature, but fold recognition programs are able to narrow down the prediction to indicate a significant similarity to the spectrin repeat. Spectrin consists of three α-helices that fold to form a short triple α-helical unit [25][26][27]. Several variants of the spectrin domains were identified and assigned to different functions [28][29][30][31][32]. Unfortunately, the cofactor C spectrin domain seems to be equally distant from all known spectrin variants. Therefore, no conclusion can be drawn regarding its function. The most intriguing problem lies in the fact that cofactor C contains only a single spectrin-like domain (the length on the N-terminal fragment corresponds to the average length of a spectrin repeat) while in all other known examples there are always many spectrin repeats forming specific super-structures [33]. This may be an indication of a complex formation with other proteins with similar domains.
The C-terminal domain of cofactor C shares a significant similarity with a domain found in other cytoskeletonrelated proteins. Distantly homologous domains were detected using FFAS algorithm in the distal part of adenylyl cyclase-associated protein (CAP) which is implicated in F-actin binding (13 % sequence identity) [34] as well as in the proximal part of the RP2 protein (29 % identity). Mutations in RP2 cause the X-linked retinitis pigmentosa, a severe retinal degeneration that leads to the loss of visual acuity and blindness [35]. Secondary structure predictions for this domain suggest an all-beta structure, but there is no consensus among the fold prediction programs indicating possibly a novel fold.
Cofactor D interacts directly with β-tubulin in vivo [18]. Recent sequence analysis revealed the presence of HEAT motives in cofactor D [8]. The HEAT-repeat motif is characterized by the presence of pairs of anti-parallel helices stacked in a consecutive array. The repeats are then stacked in parallel [36] and are involved in protein-protein interactions [37]. Our analysis suggests that in addition to the previously identified HEAT repeats, the N-terminal part of cofactor D contains several Armadillo repeats which are evolutionary related to HEAT repeats, but differ by the presence of the additional short (two turns) helix followed by two longer helices [36]. Armadillo repeats are also involved in mediating the protein-protein interaction [36][37][38]. Fold recognition programs however were not able to recognize whether the second domain belongs to the HEAT or Armadillo family. Therefore, we conclude this part of the protein can be classified as the ARM repeat superfamily member.
It is worth noting that the analysis of the close homologue of cofactor D, the Alp1 protein from S. pombe, revealed a significant difference in structure of both proteins. The 615-843 region from cofactor D sequence is not similar to the corresponding 571-766 region from Alp1. In fission yeast protein, the HEAT motives are interrupted by a coiled-coil region similar to the last coiled-coil domain from S. cerevisiae kinesin-related Cin8p (898-1017 region). Cin8p is essential for yeast spindle function opposing the force that draws separated poles back [39][40][41]. Its coiled-coil region might be involved in dimerization as it is in the case of kinesin [42]. This difference may be important since fission yeast Alp1 is able to bind   [13] whereas mammalian cofactor D does not [43].

A. The alignment of representative subset of ubiquitin-like domain homologs
In the supercomplex α-tubulin is directly bound to cofactor E [5]. The central 300 amino acids of cofactor E contain ten leucine rich repeats (LRRs) which are usually involved in protein-protein interactions [44,45]. The most important positions for LRR function are conserved (forming the typical subfamily of LRRs) although a few of them seem to be lost (Fig. 3B). The changed repeats could more or less fit the RI-like consensus from animals . But, according to Kajava [44], different LRR subfamilies never occur concomitantly within one LRR protein because the specific packing of repeats from one subfamily allows the formation of a specific hydrogen bond network between neighboring LRRs that could not be accomplished with LRRs from different subfamilies (Fig. 3B).
Two terminal domains of cofactor E are the same as in cofactor B, but in a reversed order. The CAP-Gly domain is on the N-terminus (in cofactor B it was on the C-terminus), and by simple analogy from the cofactor B function, it is probably involved in α-subunit binding as well, as suggested before [8]. The ubiquitin domain in cofactor E is in the C-terminal part (region 453-526). As discussed before, the function of this domain is still unknown.
Cofactor A is the first component of the β-tubulin folding pathway. It is discussed in the interest of integrity since its structure is known and it is not a subject of analysis here. It stabilizes the β-subunit and probably acts as a reservoir of tubulin folding intermediates for cofactor D [8]. It does not integrate into the super-complex but acts independently. Recently the crystal structure of the human cofactor A was described [46] and consists of a three-α-helix bundle. It interacts with β-tubulin via the helical regions.

Discussion
In the current study we have shown the complete domain composition of the cofactors involved in the final folding and dimerization of tubulin dimers. Our results narrow down the possible roles of previously unknown regions of these proteins.
The domain resemblance of cofactor B and E reflects their similar function in α-tubulin folding pathway, but the importance of the reverse order in which homologous domains appear in both proteins is unknown. CAP-Gly domain seems to be involved in α-subunit binding whereas the function of ubiquitin domain is unclear but most probably is involved in tubulin binding. It is interesting to note that both cofactors E and D contain extensive segments containing multiple domains involved in protein-protein interactions. The structure and length (1192 amino acids) of cofactor D predisposes this protein to bridge all the functions of the super-complex by means of spatial integration of all subunits (Fig. 4) [8].
There are still other known features of the cofactor supercomplex, such as its GAP action manifested in stimulating the polymerization-independent hydrolysis of GTP by βtubulin [19] that could not be explained by the analysis presented here. By the process of elimination, (i.e. the presence of protein-protein interaction motifs only in cofactor D and the lack of necessity for cofactor E in GAP activity) we can deduct that the GTPase-activating function possibly resides in cofactor C. The latest article by Bartolini and colleagues [47] presents evidence that the Cterminal part of cofactor C is indeed involved in this process. They also showed that mutation in an arginine residue, which they name "arginine finger", is responsible for the loss of function in both cofactor C and RP2 proteins [47].
To summarize we want to present a model for the interactions of cofactors in the final complex (Fig. 4). The goal of the cofactor complex is to facilitate the interaction of two tubulins. This is why they have to be brought together by cofactors D and E. After cofactor E binds β-tubulin both by CAP-Gly and ubiquitin-like domains it interacts with the repetitive region of cofactor D through the LRR region. The tubulin monomers are now brought together. After the process of complex formation is finished cofactor C joins it by binding with the spectrin-like domain to cofactor D or to both cofactor D and E. Then the cofactor C Cterminal domain stimulates the release of tubulin dimer ready to be incorporated in microtubule structures.
An interesting evolutionary association can be made by comparison of domains of proteins implicated in tubulin folding and chromosome segregation. The CAP-Gly domain was found in CLIP-170/restin and p150 Glued proteins from the dynactin complex which is implicated as the dynein 'receptor' on organelles and kinetochores [14]. HEAT repeats were discovered in microtubule-associated protein family important for mitotic spindle assembly and chromosome movement [37]. LRRs are present in yeast topoisomerase II involved in chromosome segregation [48]. Finally ubiquitin is needed to separate sister chromatids [49]. It seems that the same bricks are used to build different houses on the same (microtubular) basis.

Conclusions
This analysis points to the importance of several newly recognized cofactor domains in the formation of the tubulin dimer. The discovery of ubiquitin-like domains (known to be essential for cofactor B function [9] Model of domain interactions of the α/β-tubulin heterodimer folding pathway. The colors correspond to colors in Figure 2A. It is important to emphasize our hypothesis is most probable for the human system. As suggested by experiments, in fission yeast cofactors can act in a more linear manner with cofactor D homolog (Alp1) acting downstream from cofactors E and B [8].

Methods
Distant homology recognition PSI-BLAST [50], FFAS03 [51] and fold recognition Pcons [51][52][53][54][55][56][57] methods have been used to find possible structural templates for tubulin cofactor sequences. In addition, fold recognition Meta Server [58] (available at http://bioinfo.pl/) has been applied to cofactor C and D sequences. Table 1 summarizes the scores and templates used in this work. Domain boundaries were predicted by a combination of multiple alignment analysis, distant homology/fold recognition and modeling.
Best scoring alignments from FFAS03 [51] and Pcons [52] methods have been used for model building. FFAS03 and Pcons4 methods have consistently ranked among the best automated fold recognition methods over the last years both in LiveBench [59] and CAFASP competition [60]. Structural models based on these alignments have been obtained with WHATIF [61] modeling program with "slow" option (exhaustive side-chain packing optimization). The model visualization was done at the MICE [62] severs available at http://mice.sdsc.edu/.
The sequences of the ubiquitin family have been collected from PFAM domain database and subsequently supplemented with sequences from SWISS-PROT database [63] with help of literature sources. The resulting set was clustered at 40% sequence identity with CD-HIT algorithm [64]. The alignment has been obtained with the CLUS-TALW algorithm [65].

Authors' contributions
MG conceived of the study, carried out the bioinformatics analysis, prepared the sequence alignment and drafted the manuscript. LJ prepared the structural models of the domains. AG provided general guidance in the project, coordinated the modeling and bioinformatics aspects of it and prepared the final version of the manuscript.