The relationship between the L1 and L2 domains of the insulin and epidermal growth factor receptors and leucine-rich repeat modules

Background Leucine-rich repeats are one of the more common modules found in proteins. The leucine-rich repeat consensus motif is LxxLxLxxNxLxxLxxLxxLxx- where the first 11–12 residues are highly conserved and the remainder of the repeat can vary in size Leucine-rich repeat proteins have been subdivided into seven subfamilies, none of which include members of the epidermal growth factor receptor or insulin receptor families despite the similarity between the 3D structure of the L domains of the type I insulin-like growth factor receptor and some leucine-rich repeat proteins. Results Here we have used profile searches and multiple sequence alignments to identify the repeat motif Ixx-LxIxx-Nx-Lxx-Lxx-Lxx-Lxx- in the L1 and L2 domains of the insulin receptor and epidermal growth factor receptors. These analyses were aided by reference to the known three dimensional structures of the insulin-like growth factor type I receptor L domains and two members of the leucine rich repeat family, porcine ribonuclease inhibitor and internalin 1B. Pectate lyase, another beta helix protein, can also be seen to contain the sequence motif and much of the structural features characteristic of leucine-rich repeat proteins, despite the existence of major insertions in some of its repeats. Conclusion Multiple sequence alignments and comparisons of the 3D structures has shown that right-handed beta helix proteins such as pectate lyase and the L domains of members of the insulin receptor and epidermal growth factor receptor families, are members of the leucine-rich repeat superfamily.

There is some evidence to suggest that the L domains of the IR and EGFR families are leucine-rich repeats. At 10-16%, leucine is the most common residue in these do-mains. Furthermore, the 3D structure of the L1/cys-rich/ L2 fragment of the IGF-1R showed that the L domains were single-stranded right-handed β-helices [8] with structural similarities to pectate lyase, a right-handed beta helix protein [11,12] and the ribonuclease inhibitor, a right-handed beta-alpha superhelix protein [13]. Ribonuclease inhibitor (RI) is recognised as a member of the superfamily of leucine-rich repeat proteins [14][15][16] while pectate lyase is not, although similarities in the sequence patterns and 3D structures of pectate lyase and RI have been noted [15][16][17]. The IGF-1R is listed as a leucine-rich repeat protein in the SCOP database [http://scop.mrclmb.cam.ac.uk/scop/] but not in any of the annotated sequence databases such as SwissProt [http:// srs.ebi.ac.uk/] or SMART [http://smart.embl-heidelberg.de/index.shtml] [18]). Similarly, none of the other L-domain containing proteins from the IR or EGFR families are listed as leucine-rich repeats in these data bases or in a recent summary of the complete protein tyrosine kinase family present in the human genome [19].
The superfamily of leucine-rich repeat proteins has been subdivided into six subfamilies termed: typical, RI-like, CC (cysteine-containing), PS (plant specific), SD22-like and bacterial [16]. These subfamilies are characterised by different lengths and consensus sequences of the repeats (Fig. 1). The bulk of the LRRs have repeats of 22-25 amino acid residues while RI, with its alternating repeats of 28 to 29 residues, is considered somewhat atypical [20]. The family has been expanded further to include the small proteoglycans, which were shown to consist of different combinations of two types of LRRs of 21 (S-type) and 26 (T-type) amino acid residues [21]. The LRR consensus sequence is LxxLxLxxNx-Lxx-Lxx-Lxx-Lxx- (Fig. 1) where the first 11-12 residues are highly conserved and the remainder of the repeat can vary in size [16,17,21]. Some repeats have C instead of N at the 4 th highly conserved position and I, V, M, F, Y, A or C at the positions denoted by L in the above consensus ( Fig.  1).
In view of this variation in sequence motifs among LRR proteins, the sequences of the L1 and L2 domains of members of the IR and EGFR families were re-examined. The LRR motif is difficult to detect when examining a single sequence, but becomes more readily recognisable when multiple sequence alignments are analysed. The identification of such conserved sequence motifs was greatly aided by the availability of the 3D structures of the IGF-1R L1 and L2 domains [8], pectate lyase [11,12] and the known LRR proteins RI [13] and internalin 1B [22]. The data indicate that pectate lyase and the L domains of members of the IR and EGFR families should be included in the expanding family of LRR proteins

Results
Preliminary analyses of the SwissProt data base using profiles based on alignments of single (Prf1) or double LRR repeats (Prf2) failed to score members of the IR or EGFR families in the first 500 ranked scores. However searches with Prf-4 (based on four tandem repeats), ranked IR family members at positions 234 (INSR_RAT), 245 (INSR_MOUSE), 262 (IG1R_RAT) and 478 (IG1R_HUMAN) and EGFR family members at 146 (LT23_CAEEL), 174 (EGFR_MOUSE), 448 (EGFR_HUMAN) and 457 (EGFR_CHICK). The alignments with the mouse, human and chick EGFRs were to residues equivalent to 363-452 in the L2 domain of EGFR_human. In contrast, the alignments with the other sequences were to regions in the cytoplasmic domain. When the analyses were restricted to the receptor ectodomains, the alignments were to sequences in the L1 or L2 domains. Examples are: Prf-1 with residues 1-28, Prf-2 with residues 10-78 and Prf-4 with residues 46-153 in the human insulin receptor. An alternative approach, using a profile generated from the alignment of the L1 domains of 13 members of the IR family (see Fig. 2) was also encouraging as it ranked four known LRR proteins in the next 11 hits after the 26 known members of the IR or EGFR families. These were repeats 8-14 (residues 295-450) of ESA8_TRYEQ (ranked 28 th ) and ESA8_TRYBB (ranked 30 th ), and repeats 1-5 (residues 28-177) of TSHR_MOUSE (ranked 35 th ) and TSHR_RAT (ranked 37 th ).
However, in both approaches, the repeating units in the L1 and L2 domains, known from the 3D structure of IGF-1R [8], did not align exactly with the repeating units in the LRR proteins [20,23] or the LRR protein profiles (see Methods). The presence of insertions in some regions of the IR or EGFR proteins tended to confound the profile alignments. Consequently, the nature of the repeating sequence motifs in the L1 and L2 domains of the IR and EGFR families were examined manually using the 3D structure of the L1 and L2 domains of IGF-1R as a guide to adjust the alignments obtained from the profile analyses.
The consensus sequences for the six LRR protein subfamilies [16] are shown in Fig. 1. The sizes of the repeating units vary from 22-31 with eight highly conserved positions, numbered 1 to 8 in Fig 1. Conserved position 4 usually contains N or C but sometimes S, T, P, Q or A. The other 7 highly conserved positions are generally L but frequently contain alternative non-polar residues such as I, V, M, C, A, F, or Y, particularly in some LRR subfamilies. Also included are the type S and type T LRRs from the small proteoglycans [21]. The repeating motifs have been arranged in Fig. 1 so that they all begin with the 11-residue stretch LxxLxLxxNxL which contains the highly conserved positions 1 to 5. This motif corresponds to a linear central sequence with two half turns at either end [20]. Deletions and/or insertions, particularly in the C-terminal portion of the repeats (after conserved positions 5, 6, 7 and 8) account for the size differences seen between different LRR protein members.
The repeat sequences in members of the IR family that correspond to the six turns of the parallel beta helix of the L1 and L2 domains of IGF-1R [8] are shown in Fig. 2. The corresponding predicted repeats for members of the EGFR family are shown in Fig. 3. LRR-like motifs are most readily identified in the 4 th and 5 th repeats in the IR family L1 and EGFR family L2 domains and in the 3 rd , 4 th and 5 th repeats of the IR family L2 and EGFR family L1 domains. The size of these repeats (22-30 residues) is similar to the range found in other LRRs (Fig. 1). The 4 th repeat in the L1 and L2 domains of the IR family contain an insert of 7-8 residues between conserved positions 1 and 2 ( Fig. 2), which appears as an extra loop in the 3D structure [8]. The 4 th repeats of the majority of the EGFR L1 and L2 domains show similar inserts of 6-7 residues in this region, with the EGFR from Schistosoma mansoni (SMEGFRA) and Drosophila melanogaster (TOP_DROME) having inserts of 11 and 13 residues respectively (Fig. 3). Variable sized insertions are also seen in some (or all) of the 1 st and 2 nd repeats of the IR and EGFR L1 domains and the 1 st , 2 nd and 6 th repeats of the IR and EGFR family L2 domains.
The nature of the residues in the eight highly conserved positions of the LRR-like repeats in the L domains of IR and EGFR are summarised in Table 1. The most commonly found residues in the L domains of the IR and EGFR families are: isoleucine at positions 1 and 3; asparagine at position 4; and leucine at positions 2, 5, 6, 7 and 8. However other non-polar residues, particularly valine, phenylalanine and cysteine, occur in positions 1-3 and

Figure 1
Comparison of repeat motifs in IR and EGFR family L domains with the sequence motifs of LRR subfamilies. The motifs for LRR_Typical, LRR_PS type, LRR_bacterial type, LRR_SD22-like, LRR_CC type, LRR_R1 type-a, LRR_R1 type-b are based on [16], the motifs for the S and T type repeats of small proteoglycans (LRR_PGC) are from [21]. The size of the repeat can vary by insertions at the positions denoted by the dashes [16]. Also shown is a consensus motif for the pectate lyase superfamily based on the alignments in Fig. 5 and [26]. LRR_Typical

Figure 3
Leucine-rich repeats in the L1 and L2 domains of members of the epidermal growth factor receptor family. The sequences were sourced from SwissProt except for SMEGFRA which is from EMBL and CVULET23 which is from GENBANK. Residues equivalent to conserved positions 1 to 8 in the sequence motif LxxLxLxxNxLxxLxxLxxLxx are shaded.  (Figs 2 & 3). The most common alternatives to asparagine at the 4 th conserved position are valine, methionine, serine and tryptophan ( Table 1).

5-8 in some sequences
The locations of the eight conserved residues in the 3D structure of IGF-1R L1 domain have been compared with the location of the equivalent residues in the 3D structures of the known LRR proteins porcine RI and internalin 1B. As shown in Fig. 4, the 3D structures of the first part of the repeat (LxxLxLxxNx) are very similar in each of these proteins (β-strand flanked by two half turns), while the structures of the remainder of the repeats are more variable. This is despite the size variability of the sequence patterns seen in some of the repeats in the L domains of the IGF-1R and the more frequent occurrence of non-L residues at highly conserved positions 1-3 and 5-8.
The pectate lyase family [24,25] shows even greater variability, with large insertions in some repeats corresponding to loops overlaying the parallel beta helix core (see Fig. 4). As for other LRRs, the most conserved residues include positions 2-4, being two aliphatic side chains and the Asn ladder (Fig. 5). However, at each of the 8 highly conserved positions, isoleucine or valine is preferred over leucine [26]. In positions 2 and 3 the frequency of I or V is greater than 60%. The features which set apart the pectate lyase repeat from that of other LRR families are an aromatic residue in position 5, extending the third β-strand and forming a bulge (top left in Figure  4). Residues at positions 7 and 8 slide over to spatially overlap with positions 6 and 7 in other LRRs.

Discussion
There is considerable interest in the structure of the L domains of the IR and EGFR families and their relationship with other proteins, because of their importance in ligand binding (see [8,10,27,28]). In this paper, evidence is presented to show that these L domains contain all of the features of leucine-rich repeats. Multiple sequence alignments, coupled with the 3 D structure of the L1 and L2 domains of the IGF-1R [8], enabled the residues equivalent to the conserved residues in known LRR motifs, to be identified as summarised in Figs 2,3,4.
A variant of the motif LxxLxLxxNx-Lxx-Lxx-Lxx-Lxxfound in LRR proteins can be identified in the L1 and L2 domains of the IR and EGFR families, where I rather Calculated from conserved positions indicated in Figs 2 & 3. than L is the most common residue at positions 1 and 3.
Other non-polar amino acids frequently occur in some positions (Table 1, Figs 2 & 3), as found with other LRR proteins [16,17,21]. The L domains of the IR and EGFR families contain five full repeats with the 6 th partially truncated. Some sequences have insertions between conserved positions 1 and 2 and 4 and 5 (Figs 2 & 3), which complicate the analysis. However the combination of ex-

Figure 5
Structural repeats in the sequences of Erwinia chrysanthemi pectate lyase C (PEC, [11,12]), Bacillus Subtilis pectate lyase (BN8, [33]) and Aspergillus niger pectin lyase B (QCX, [34]). Alignment is based on a superposition of the three-dimentional structures, with residues in the sequence motif shaded. Charged residues (boxed) also form part of the motif but have either a compensating buried charge or are solvent exposed. amining multiple sequence alignments with the known 3D structure of the IGF-1R allowed the sequence motifs to be established.
Leucine rich repeat proteins are members of a broader class of proteins termed solenoid proteins, where the repeating structural units in the polypeptide chain form a continuous superhelix [17]. Solenoid proteins, including LRRs, show the simplest relationship between sequence and structure, compared to the more complicated folds of globular proteins. Thus recognition of such motifs can provide valuable insights into the predicted structure of such protein domains [17]. The most conserved structural feature of LRRs is the LxxLxLxxNx region, while the remainder of the repeat can differ dramatically [15,17] as illustrated in Fig. 5. The central sequence xLxLx in this region of the repeat forms a β-strand with successive repeats of this β-strand forming a parallel β-sheet on one side of the LRR module. This corresponds to the second β-sheet in the L domains of the IGF-1R, which is the structural counterpart of the β-sheet that forms the inner face of RI and internalin (Fig. 4). This face is involved in protein-protein interactions in RI [15], U2 [29] and the IR family (see [8,10]).
As shown in Fig 4, the first region of the repeat LxxLx-LxxNx in RI, internalin, the IGF-1R L domains and pectate lyase all adopt similar structural folds, while the remainder of the repeats are highly variable. Most of the repeats in the IR and EGFR L domains are between 21 and 30 residues, within the range commonly found in other LRRs. The inserts are accommodated as loops which do not, or are unlikely to, perturb the core structure [8]. Three of the repeats in pectate lyase have large inserts of 11-17 residues between the 6 th and 7 th conserved position (Fig. 5). The major difference between the 3D structures of RI and internalin versus the L domains of IGF-1R is the absence of the repetitive helix on the opposite face to the canonical β-sheet in the IGF-1R L domains and thus a lack of curvature although the L domains of IGF-1R are capped by α-helices at the N-terminal end and to a lesser extent at the C-terminus [8].
The existence of repeats in the L domains of IR and EGFR was first reported by Bajaj et al. [30] who described five repeats in the region equivalent to 1-119 in human IR. The subsequent 3D structure determination of the two L domains in the IGF-1R showed that they each contain five full and one partial repeat [8].

Conclusion
Here we have shown, using a combination of sequence analyses and 3D structure comparisons, that variations of the repeating motif typical of LRRs is present in the L domains of members of the IR and EGFR subfamilies and in β-helix proteins. This motif is not obvious, is difficult to detect with sequence analysis programs and has not been described previously. Comparison of the 3D structure of these domains with other protein structures showed that L domains matched equally well to the pectate lyase family and LRRs such as porcine ribonuclease inhibitor. We conclude that these three groups should be considered part of the same LRR superfamily. In the IR and EGFR subfamilies, isoleucine (or valine) is preferred over leucine at some positions of the repeat while in βhelix proteins isoleucine or valine (or occasionally phenylalanine) are always preferred over leucine.

Multiple Sequence Analysis
The sequence analysis programs used were from the sequence analysis software package of the Genetics Computer Group of the University of Wisconsin, Biotechnology Centre, Madison, Wisconsin, USA. Files of individual proteins were edited using the Seqed program and aligned using Pileup. Final adjustment to these alignments were made manually as required using Lineup. Profiles [31,32] were generated from these aligned sequences using ProfileMake. The SwissProt database was probed using ProfileSearch and the alignments displayed with ProfileSegments. Specific sequences were analysed using Gap or ProfileGap. Gap weight and Length weight penalties used were 3.0 and 0.3 respectively unless stated otherwise.
The proteins used to generate these profiles, were the chondroadherin precursor (CHAD_BOVIN, 10 LRRs), platelet glycoprotein 1B alpha chain precursor (GPBA_MOUSE, 8 LRRs), the bone proteoglycan 2 precursor (PGS2_RABIT, 10 LRRs) and the putative receptor protein tyrosine kinase from Arabidopsis (TMK1_ARATH, 8 LRRs). Three profiles were generated: Prf-1 based on the alignment of the 38 single LRRs in these four proteins; Prf-2 based on the alignment of the 19 tandem repeats of two LRRs from these four proteins and Prf-4 which was generated from the alignment of eight sequences, each containing four LRRs in tandem. The fragments from CHAD_BOVIN and PGS2_RABIT used in Prf-4 corresponded to repeats 1-4 and 7-10.
Publish with BioMed Central and every scientist can read your work free of charge "BioMedcentral will be the most significant development for disseminating the results of biomedical research in our lifetime."