- Research article
The relationship between the L1 and L2 domains of the insulin and epidermal growth factor receptors and leucine-rich repeat modules
BMC Bioinformaticsvolume 2, Article number: 4 (2001)
Leucine-rich repeats are one of the more common modules found in proteins. The leucine-rich repeat consensus motif is LxxLxLxxNxLxxLxxLxxLxx- where the first 11–12 residues are highly conserved and the remainder of the repeat can vary in size Leucine-rich repeat proteins have been subdivided into seven subfamilies, none of which include members of the epidermal growth factor receptor or insulin receptor families despite the similarity between the 3D structure of the L domains of the type I insulin-like growth factor receptor and some leucine-rich repeat proteins.
Here we have used profile searches and multiple sequence alignments to identify the repeat motif Ixx-LxIxx-Nx-Lxx-Lxx-Lxx-Lxx- in the L1 and L2 domains of the insulin receptor and epidermal growth factor receptors. These analyses were aided by reference to the known three dimensional structures of the insulin-like growth factor type I receptor L domains and two members of the leucine rich repeat family, porcine ribonuclease inhibitor and internalin 1B. Pectate lyase, another beta helix protein, can also be seen to contain the sequence motif and much of the structural features characteristic of leucine-rich repeat proteins, despite the existence of major insertions in some of its repeats.
Multiple sequence alignments and comparisons of the 3D structures has shown that right-handed beta helix proteins such as pectate lyase and the L domains of members of the insulin receptor and epidermal growth factor receptor families, are members of the leucine-rich repeat superfamily.
Many proteins have a modular architecture and are composed of a number of different, sometimes repeated structural units [1, 2]. The four most common modules found in the extracellular regions of proteins are immunoglobulin (Ig) domains, epidermal growth factor (EGF)-like repeats, fibronectin type 3 (Fn3) modules and leucine-rich repeats . Two of these, Fn3 modules [3–6] and EGF-like repeats [7–10], have been identified in members of the insulin receptor (IR) family.
There is some evidence to suggest that the L domains of the IR and EGFR families are leucine-rich repeats. At 10–16%, leucine is the most common residue in these domains. Furthermore, the 3D structure of the L1/cys-rich/L2 fragment of the IGF-1R showed that the L domains were single-stranded right-handed β-helices  with structural similarities to pectate lyase, a right-handed beta helix protein [11, 12] and the ribonuclease inhibitor, a right-handed beta-alpha superhelix protein . Ribonuclease inhibitor (RI) is recognised as a member of the superfamily of leucine-rich repeat proteins [14–16] while pectate lyase is not, although similarities in the sequence patterns and 3D structures of pectate lyase and RI have been noted [15–17]. The IGF-1R is listed as a leucine-rich repeat protein in the SCOP database http://scop.mrc-lmb.cam.ac.uk/scop/ but not in any of the annotated sequence databases such as SwissProt http://srs.ebi.ac.uk/ or SMART http://smart.embl-heidelberg.de/index.shtml). Similarly, none of the other L-domain containing proteins from the IR or EGFR families are listed as leucine-rich repeats in these data bases or in a recent summary of the complete protein tyrosine kinase family present in the human genome .
The superfamily of leucine-rich repeat proteins has been subdivided into six subfamilies termed: typical, RI-like, CC (cysteine-containing), PS (plant specific), SD22-like and bacterial . These subfamilies are characterised by different lengths and consensus sequences of the repeats (Fig. 1). The bulk of the LRRs have repeats of 22–25 amino acid residues while RI, with its alternating repeats of 28 to 29 residues, is considered somewhat atypical . The family has been expanded further to include the small proteoglycans, which were shown to consist of different combinations of two types of LRRs of 21 (S-type) and 26 (T-type) amino acid residues . The LRR consensus sequence is LxxLxLxxNx-Lxx-Lxx-Lxx-Lxx- (Fig. 1) where the first 11–12 residues are highly conserved and the remainder of the repeat can vary in size [16, 17, 21]. Some repeats have C instead of N at the 4th highly conserved position and I, V, M, F, Y, A or C at the positions denoted by L in the above consensus (Fig. 1).
In view of this variation in sequence motifs among LRR proteins, the sequences of the L1 and L2 domains of members of the IR and EGFR families were re-examined. The LRR motif is difficult to detect when examining a single sequence, but becomes more readily recognisable when multiple sequence alignments are analysed. The identification of such conserved sequence motifs was greatly aided by the availability of the 3D structures of the IGF-1R L1 and L2 domains , pectate lyase [11, 12] and the known LRR proteins RI  and internalin 1B . The data indicate that pectate lyase and the L domains of members of the IR and EGFR families should be included in the expanding family of LRR proteins
Preliminary analyses of the SwissProt data base using profiles based on alignments of single (Prf1) or double LRR repeats (Prf2) failed to score members of the IR or EGFR families in the first 500 ranked scores. However searches with Prf-4 (based on four tandem repeats), ranked IR family members at positions 234 (INSR_RAT), 245 (INSR_MOUSE), 262 (IG1R_RAT) and 478 (IG1R_HUMAN) and EGFR family members at 146 (LT23_CAEEL), 174 (EGFR_MOUSE), 448 (EGFR_HUMAN) and 457 (EGFR_CHICK). The alignments with the mouse, human and chick EGFRs were to residues equivalent to 363–452 in the L2 domain of EGFR_human. In contrast, the alignments with the other sequences were to regions in the cytoplasmic domain. When the analyses were restricted to the receptor ectodomains, the alignments were to sequences in the L1 or L2 domains. Examples are: Prf-1 with residues 1–28, Prf-2 with residues 10–78 and Prf-4 with residues 46–153 in the human insulin receptor. An alternative approach, using a profile generated from the alignment of the L1 domains of 13 members of the IR family (see Fig. 2) was also encouraging as it ranked four known LRR proteins in the next 11 hits after the 26 known members of the IR or EGFR families. These were repeats 8–14 (residues 295–450) of ESA8_TRYEQ (ranked 28th) and ESA8_TRYBB (ranked 30th), and repeats 1–5 (residues 28–177) of TSHR_MOUSE (ranked 35th) and TSHR_RAT (ranked 37th).
However, in both approaches, the repeating units in the L1 and L2 domains, known from the 3D structure of IGF-1R , did not align exactly with the repeating units in the LRR proteins [20, 23] or the LRR protein profiles (see Methods). The presence of insertions in some regions of the IR or EGFR proteins tended to confound the profile alignments. Consequently, the nature of the repeating sequence motifs in the L1 and L2 domains of the IR and EGFR families were examined manually using the 3D structure of the L1 and L2 domains of IGF-1R as a guide to adjust the alignments obtained from the profile analyses.
The consensus sequences for the six LRR protein subfamilies  are shown in Fig. 1. The sizes of the repeating units vary from 22–31 with eight highly conserved positions, numbered 1 to 8 in Fig 1. Conserved position 4 usually contains N or C but sometimes S, T, P, Q or A. The other 7 highly conserved positions are generally L but frequently contain alternative non-polar residues such as I, V, M, C, A, F, or Y, particularly in some LRR subfamilies. Also included are the type S and type T LRRs from the small proteoglycans . The repeating motifs have been arranged in Fig. 1 so that they all begin with the 11-residue stretch LxxLxLxxNxL which contains the highly conserved positions 1 to 5. This motif corresponds to a linear central sequence with two half turns at either end . Deletions and/or insertions, particularly in the C-terminal portion of the repeats (after conserved positions 5, 6, 7 and 8) account for the size differences seen between different LRR protein members.
The repeat sequences in members of the IR family that correspond to the six turns of the parallel beta helix of the L1 and L2 domains of IGF-1R  are shown in Fig. 2. The corresponding predicted repeats for members of the EGFR family are shown in Fig. 3. LRR-like motifs are most readily identified in the 4th and 5th repeats in the IR family L1 and EGFR family L2 domains and in the 3rd, 4th and 5th repeats of the IR family L2 and EGFR family L1 domains. The size of these repeats (22–30 residues) is similar to the range found in other LRRs (Fig. 1). The 4th repeat in the L1 and L2 domains of the IR family contain an insert of 7–8 residues between conserved positions 1 and 2 (Fig. 2), which appears as an extra loop in the 3D structure . The 4th repeats of the majority of the EGFR L1 and L2 domains show similar inserts of 6–7 residues in this region, with the EGFR from Schistosoma mansoni (SMEGFRA) and Drosophila melanogaster (TOP_DROME) having inserts of 11 and 13 residues respectively (Fig. 3). Variable sized insertions are also seen in some (or all) of the 1st and 2nd repeats of the IR and EGFR L1 domains and the 1st, 2nd and 6th repeats of the IR and EGFR family L2 domains.
The nature of the residues in the eight highly conserved positions of the LRR-like repeats in the L domains of IR and EGFR are summarised in Table 1. The most commonly found residues in the L domains of the IR and EGFR families are: isoleucine at positions 1 and 3; asparagine at position 4; and leucine at positions 2, 5, 6, 7 and 8. However other non-polar residues, particularly valine, phenylalanine and cysteine, occur in positions 1–3 and 5–8 in some sequences (Figs 2 &3). The most common alternatives to asparagine at the 4th conserved position are valine, methionine, serine and tryptophan (Table 1).
The locations of the eight conserved residues in the 3D structure of IGF-1R L1 domain have been compared with the location of the equivalent residues in the 3D structures of the known LRR proteins porcine RI and internalin 1B. As shown in Fig. 4, the 3D structures of the first part of the repeat (LxxLxLxxNx) are very similar in each of these proteins (β-strand flanked by two half turns), while the structures of the remainder of the repeats are more variable. This is despite the size variability of the sequence patterns seen in some of the repeats in the L domains of the IGF-1R and the more frequent occurrence of non-L residues at highly conserved positions 1–3 and 5–8.
The pectate lyase family [24, 25] shows even greater variability, with large insertions in some repeats corresponding to loops overlaying the parallel beta helix core (see Fig. 4). As for other LRRs, the most conserved residues include positions 2–4, being two aliphatic side chains and the Asn ladder (Fig. 5). However, at each of the 8 highly conserved positions, isoleucine or valine is preferred over leucine . In positions 2 and 3 the frequency of I or V is greater than 60%. The features which set apart the pectate lyase repeat from that of other LRR families are an aromatic residue in position 5, extending the third β-strand and forming a bulge (top left in Figure 4). Residues at positions 7 and 8 slide over to spatially overlap with positions 6 and 7 in other LRRs.
There is considerable interest in the structure of the L domains of the IR and EGFR families and their relationship with other proteins, because of their importance in ligand binding (see [8, 10, 27, 28]). In this paper, evidence is presented to show that these L domains contain all of the features of leucine-rich repeats. Multiple sequence alignments, coupled with the 3 D structure of the L1 and L2 domains of the IGF-1R , enabled the residues equivalent to the conserved residues in known LRR motifs, to be identified as summarised in Figs 2,3,4.
A variant of the motif LxxLxLxxNx-Lxx-Lxx-Lxx-Lxx- found in LRR proteins can be identified in the L1 and L2 domains of the IR and EGFR families, where I rather than L is the most common residue at positions 1 and 3. Other non-polar amino acids frequently occur in some positions (Table 1, Figs 2 &3), as found with other LRR proteins [16, 17, 21]. The L domains of the IR and EGFR families contain five full repeats with the 6th partially truncated. Some sequences have insertions between conserved positions 1 and 2 and 4 and 5 (Figs 2 &3), which complicate the analysis. However the combination of examining multiple sequence alignments with the known 3D structure of the IGF-1R allowed the sequence motifs to be established.
Leucine rich repeat proteins are members of a broader class of proteins termed solenoid proteins, where the repeating structural units in the polypeptide chain form a continuous superhelix . Solenoid proteins, including LRRs, show the simplest relationship between sequence and structure, compared to the more complicated folds of globular proteins. Thus recognition of such motifs can provide valuable insights into the predicted structure of such protein domains . The most conserved structural feature of LRRs is the LxxLxLxxNx region, while the remainder of the repeat can differ dramatically [15, 17] as illustrated in Fig. 5. The central sequence xLxLx in this region of the repeat forms a β-strand with successive repeats of this β-strand forming a parallel β-sheet on one side of the LRR module. This corresponds to the second β-sheet in the L domains of the IGF-1R, which is the structural counterpart of the β-sheet that forms the inner face of RI and internalin (Fig. 4). This face is involved in protein-protein interactions in RI , U2  and the IR family (see [8, 10]).
As shown in Fig 4, the first region of the repeat LxxLxLxxNx in RI, internalin, the IGF-1R L domains and pectate lyase all adopt similar structural folds, while the remainder of the repeats are highly variable. Most of the repeats in the IR and EGFR L domains are between 21 and 30 residues, within the range commonly found in other LRRs. The inserts are accommodated as loops which do not, or are unlikely to, perturb the core structure . Three of the repeats in pectate lyase have large inserts of 11–17 residues between the 6th and 7th conserved position (Fig. 5). The major difference between the 3D structures of RI and internalin versus the L domains of IGF-1R is the absence of the repetitive helix on the opposite face to the canonical β-sheet in the IGF-1R L domains and thus a lack of curvature although the L domains of IGF-1R are capped by α-helices at the N-terminal end and to a lesser extent at the C-terminus .
The existence of repeats in the L domains of IR and EGFR was first reported by Bajaj et al.  who described five repeats in the region equivalent to 1–119 in human IR. The subsequent 3D structure determination of the two L domains in the IGF-1R showed that they each contain five full and one partial repeat .
Here we have shown, using a combination of sequence analyses and 3D structure comparisons, that variations of the repeating motif typical of LRRs is present in the L domains of members of the IR and EGFR subfamilies and in β-helix proteins. This motif is not obvious, is difficult to detect with sequence analysis programs and has not been described previously. Comparison of the 3D structure of these domains with other protein structures showed that L domains matched equally well to the pectate lyase family and LRRs such as porcine ribonuclease inhibitor. We conclude that these three groups should be considered part of the same LRR superfamily. In the IR and EGFR subfamilies, isoleucine (or valine) is preferred over leucine at some positions of the repeat while in β-helix proteins isoleucine or valine (or occasionally phenylalanine) are always preferred over leucine.
Multiple Sequence Analysis
The sequence analysis programs used were from the sequence analysis software package of the Genetics Computer Group of the University of Wisconsin, Biotechnology Centre, Madison, Wisconsin, USA. Files of individual proteins were edited using the Seqed program and aligned using Pileup. Final adjustment to these alignments were made manually as required using Lineup. Profiles [31, 32] were generated from these aligned sequences using ProfileMake. The SwissProt database was probed using ProfileSearch and the alignments displayed with ProfileSegments. Specific sequences were analysed using Gap or ProfileGap. Gap weight and Length weight penalties used were 3.0 and 0.3 respectively unless stated otherwise.
The proteins used to generate these profiles, were the chondroadherin precursor (CHAD_BOVIN, 10 LRRs), platelet glycoprotein 1B alpha chain precursor (GPBA_MOUSE, 8 LRRs), the bone proteoglycan 2 precursor (PGS2_RABIT, 10 LRRs) and the putative receptor protein tyrosine kinase from Arabidopsis (TMK1_ARATH, 8 LRRs). Three profiles were generated: Prf-1 based on the alignment of the 38 single LRRs in these four proteins; Prf-2 based on the alignment of the 19 tandem repeats of two LRRs from these four proteins and Prf-4 which was generated from the alignment of eight sequences, each containing four LRRs in tandem. The fragments from CHAD_BOVIN and PGS2_RABIT used in Prf-4 corresponded to repeats 1–4 and 7–10.
epidermal growth factor, EGFR, epidermal growth factor receptor, Fn3, fibronectin type 3
insulin-like growth factor 1
the type I insulin-like growth factor receptor
insulin receptor related receptor
structural comparison of proteins
simple modular architecture research tool.
Bork P: Mobile modules and motifs. Curr Opin Struct Biol 1992, 2: 413–421. 10.1016/0959-440X(92)90233-W
Bork P, Downing AK, Kieffer B, Campbell ID: Structure and distribution of modules in extracellular proteins. Quart Rev Biophys 1996, 29: 119–167.
O'Bryan JP, Frye RA, Cogswell PC, Neubauer Z, Kitch B, Prokop C, Espinosa III R, Le Beau MM, Earp HS, Liu ET: axl , a transforming gene isolated from primary human myeloid leukemia cells, encodes a novel receptor tyrosine kinase. Mol Cell Biol 1991, 11: 5016–5031.
Mulhern TD, Booker GW, Cosgrove L: A third fibronectin type-3 domain in the insulin-family receptors. Trends Biochem Sci 1998, 23: 465–466. 10.1016/S0968-0004(98)01288-2
Marino-Buslje C, Mizuguchi K, Siddle K, Blundell TL: A third fibronectin type 3 domain in the extracellular region of the insulin receptor family. FEBS Lett 1998, 441: 331–336. 10.1016/S0014-5793(98)01509-9
Ward CW: Members of the insulin receptor family contain three fibronectin type 3 domains. Growth Factors 1999, 16: 315–322.
Ward CW, Hoyne PA, Flegg RH: Insulin and epidermal growth factor receptors contain the cysteine repeat motif found in the tumour necrosis factor receptor. Proteins: Struct Funct Genet 1995, 22: 141–153.
Garrett TPJ, McKern NM, Lou M, Frenkel MJ, Bentley JD, Lovrecz GL, Elleman TC, Cosgrove L, Ward CW: The structure of the first three domains of the type 1 insulin-like growth factor receptor. Nature 1998, 394: 395–399. 10.1038/28668
Ward CW, Garrett TPJ, McKern NM, Lawrence LJ: Structure of the insulin receptor family: unexpected relationships with other proteins. Today's Life Sciences 1999, 11: 26–32.
Adams TE, Epa VC, Garrett TJ, Ward CW: Structure and function of the type I insulin-like growth factor receptor. Cell Molec Life Sci. 2000, 57: 1050–1093.
Yoder MD, Lietzke SE, Jurnak F: Unusual structural features in the parallel beta-helix in pectate lyases. Structure 1993, 1: 241–251.
Yoder MD, Keen NT, Jurnak F: New domain motif: pectate lyase C, a secreted plant virulence factor. Science 1993, 260: 1503–1507.
Kobe B, Deisenhofer J: Crystal structure of porcine ribonuclease inhibitor, a protein with leucine repeats. Nature 1993, 366: 751–756. 10.1038/366751a0
Kobe B, Deisenhofer J: The leucine-rich repeat: a versatile binding motif. Trends Biochem Sci 1994, 19: 415–421. 10.1016/0968-0004(94)90090-6
Kobe B, Deisenhofer J: Proteins with leucine-rich repeats. Curr Opin Struct Biol 1995, 5: 409–416. 10.1016/0959-440X(95)80105-7
Kajava AV: Structural diversity of leucine-rich repeat proteins. J Mol Biol 1998, 277: 519–527. 10.1006/jmbi.1998.1643
Kobe B, Kajava AV: When protein folding is simplified to protein coiling: the continuum of solenoid protein structures. Trends Biochem Sci 2000, 25: 509–515. 10.1016/S0968-0004(00)01667-4
Schultz J, Copley RR, Doerks T, Ponting CP, Bork P: SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res 2000, 28: 231–234. 10.1093/nar/28.1.231
Robinson DR, Wu Y-M, Lin S-F: The protein tyrosine kinase family of the human genome. Oncogene 2001, 19: 5548–5557. 10.1038/sj/onc/120395710.1038/sj.onc.1203957
Kajava AV, Vassart G, Wodak SJ: Modeling of the three-dimensional structure of proteins with typical leucine-rich repeats. Structure 1995, 3: 867–877.
Matsushima N, Ohyanagi T, Tanaka T, Kretsinger RH: Super-motifs and evolution of tandem leucine-rich repeats within the small proteoglycans-biglycan, decorin, lumican, fibromodulin, PRELP, keratocan, osteoadherin, epiphycan, and osteoglycin. Proteins: Struct Funct Genet 2000, 38: 210–225. 10.1002/(SICI)1097-0134(20000201)38:2<210::AID-PROT9>3.0.CO;2-1
Marino M, Braun L, Cossart P, Ghosh P: Structure of the InIB leucine-rich repeats, a domain that triggers host cell invasion by the bacterial pathogen L-monocytogenes . Molecular Cell 1999, 4: 1063–1072.
Smiley BL, Stadnyk AW, Myler PJ, Stuart K: The trypanosome leucine repeat gene in the variant surface glycoprotein expression site encodes a putative metal-binding domain and a region resembling protein-binding domains of yeast, Drosophila, and mammalian proteins. Mol Cell Biol 1990, 10: 6436–6444.
Henrissat B, Heffron SE, Yoder MD, Lietzky SE, Jurnak F: Functional implications for structure-based sequence alignment of proteins in the extracellular pectate lyase superfamily. Plant Physiol. 1995, 107: 963–976. 10.1104/pp.107.3.963
Yoder MD, Jurnak F: The parallel β helix and other coiled folds. FASEB J 1995, 9: 335–342.
Heffron S, Moe GG, Sieber V, Mengaud J, Cossart P, Vitali J, Jurnak F: Sequence profile of a parallel b helix in the pectate lyase superfamily. J. Struct. Biol. 1998, 122: 223–235. 10.1006/jsbi.1998.3978
Lax I, Bellot F, Howk R, Ullrich A, Givol D, Schlessinger J: Functional analysis of the ligand binding site of EGF-receptor utilizing chimeric chicken/human receptor molecules. EMBO J 1989, 8: 421–427.
Kohda D, Odaka M, Lax I, Kawasaki H, Suzuki K, Ullrich A, Schlessinger J, Inagaki F: A 40-kDa epidermal growth factor/transforming growth factor alpha-binding domain produced by limited proteolysis of the extracellular domain of the epidermal growth factor receptor. J Biol Chem 1993, 268: 1976–1981.
Price SR, Evans PR, Nagai K: Crystal structure of the spliceosomal U2B"-U2A' protein complex bound to a fragment of U2 small nuclear RNA. Nature 1998, 394: 645–650. 10.1038/29234
Bajaj M, Waterfield MD, Schlessinger J, Taylor WR, Blundell T: On the tertiary structure of the extracellular domains of the epidermal growth factor and insulin receptors. Biochim.Biophys. Acta 1987, 916: 220–226. 10.1016/0167-4838(87)90112-9
Devereux J, Haeberli P, Smithies O: A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res 1984, 12: 387–395.
Gribskov M, Luthy R, Eisenberg D: Profile analysis. Methods Enzymol 1990, 183: 146–159.
Pickersgill R, Jenkins J, Harris G, Nasser W, Robert-Baudouy J: The structure of Bacillus subtilis pectate lyase in complex with calcium. Nat Struct Biol 1994, 1: 717–723.
Vitali J, Schick B, Kester HCM, Visser J, Jurnak F: The three-dimensional structure of Aspergillus niger pectin lyase B at 1.7 Å resolution. Plant Physiol. 1998, 116: 69–80. 10.1104/pp.116.1.69