Antigenic critical positions
In this study, we followed our previous work to select the critical positions [5] having high IGs, statistically derived from 343 HI assays, and high entropies, which were calculated using 125 HA sequences. 64 positions on HA were selected as critical positions (Table S2 in additional file 1). Among these 64 critical positions, 54 positions locate on the epitopes (54/64) and 53 positions locate on the HA surface (Fig. 1B). Additionally, 13 and 42 of these 64 critical positions were the positive selections [2] and cluster substitutions [3], respectively.
Changed epitopes for antigenic variants
Currently, several methods measured a changed epitope to escape from neutralizing antibody [8]. Here, we utilized the degree of accumulated mutations within an epitope to evaluate a changed epitope according to 329 positions and 64 selected positions. Figures 2 and 3 show the relationships between changed epitopes and antigenic variants on 4 models.
Models one and two: Changed epitopes on 329 positions
Figures 2A (Model one) and 2B (Model two) show the relationships between number of changed epitopes and "antigenic variants" on 343 pair HA sequences with HI assays. Among these 343 pairs for Model one, the changed epitopes of 225 "antigenic variants" pairs range from 1 to 5 and the changed epitopes of 118 "similar viruses" pairs range from 0 to 5. Among 34 similar viruses with more than 4 changed epitopes for Model one, we observed the following results: (1) the average number of changed epitopes was 4.2; (2) the average number of changed epitopes with only one mutation was 2.02 and 33 pairs have more than one changed epitope with only one mutation. For example, the virus pair, A/PortChalmers/1/73 and A/Singapore/4/75, has four changed epitopes with one mutation (i.e. Epitopes A, C, D, and E) (Table 2). In general, these 34 similar viruses should be regarded as "antigenic variants" because there are more than four changed epitopes. This result shows that the Model one is not reasonable.
For Model two, the average number of changed epitopes was 2.2 for these 34 similar viruses. According to the distribution (Figure 2B), Model two achieved the highest accuract if more that two changed epitopes was considered as "antigenic variants". The accuracies were 74.9% (257/343) and 92.2% (29410/31878) for predicting antigenic variants on the training set and independent set, respectively. This result was similar to the previous work [8].
Model three: Changed epitopes on 64 selected positions
Model three considered a changed epitope when the number of mutations on the 64 selected critical positions is more than 2. In Model two, the numbers of "antigenic variants" and "similar viruses" with ≥ 3 changed epitopes were 119 and 16, respectively (Fig. 2B). The averages of changed epitopes with ≥ 2 mutations on 329 positions for "antigenic variants" and "similar viruses" were 3.8 and 3.2, respectively. The averages of changed epitopes with ≥ 2 mutations on 64 selected critical positions for "antigenic variants" and "similar viruses" were 3.2 and 1.5, respectively (Fig. 2C). These results show that Model three using mutations on 64 critical positions is better than Model two to discriminate "antigenic variants" from "similar viruses". For the "similar viruses", A/Alaska/10/95 and A/France/75/97, there are 12 mutations to drive zero changed epitope because no epitope with ≥ 2 mutations on selected 64 positions (Table 2).
Three HA/antibody complex structures can be used to provide structural evidences for the changed epitopes [18] (Fig. S1 in additional file 1). Among these complexes, two antibodies bind to epitopes A and B (PDB code 1KEN [19] and 2VIR [20]), while the third binds to epitopes C and E (PDB code 1QFU [21]). The antibodies consistently bind to two epitopes and this result agrees to Models two and three. HA/antibody structures and Models two and three show that two position mutations often induce the conformational change of an epitope to escape from the antibody recognition. However, the numbers of changed epitopes of 48 "similar viruses" pairs are 2 (35 pairs) and 3 (13 pair) for Model two (Fig. 2B). Conversely, 14 "similar viruses" pairs have more than 2 changed epitopes for Model three (Fig. 2C).
Model four
Among 72 "antigenic variants" pairs with one changed epitope based on Model three, 70 pairs change on epitopes A or B. The single changed epitope on A or B, which can cause "antigenic variants", agreed to HA/antibody complex structures and the experiments. The receptor binding site, surrounded by epitopes A and B, is a basis for HA protein for the neutralizing mechanism [19, 22] (Fig. 1B).
Based on this observation, the epitopes A and B play a key role for neutralizing antibodies. Model four based on Model three considered a pair HA sequences as "antigenic variants" when ≥ 2 changed epitopes or ≥ 1 changed epitope on A or B. In Model four, a pair HA sequences with ≥ 3 mutations on 64 critical positions for the epitope B is regarded as "antigenic variants". Thus, we annotated a virus-pairs with single changed epitope on A or B as "1+" type (Fig. 3D). For example, the pair, A/Guizhou/54/89 and A/Beijing/353/89, occurs the changed epitope on A (i.e. mutation positions 135, 144 and 145) (Table 2). The accuracies of Model four were 81.6% and 94.0% on the training set and independent set, respectively. This model outperformed two compared methods, i.e. Wilson & Cox (89.7%) [8] and Lee & Chen (92.4%) [4], on the independent data set (Fig. S2 in additional file 1).
In the HA/antibody structure complex (PDB code 1KEN [19]), the antibody binds on epitopes A and B using two CDRs (i.e. CDR1 and CDR3) on the heavy chain and one CDR (i.e. CDR2) on the light chain (Fig. 4). The interface of antibody and HA consists of 13 and 5 contacted residues locating on epitopes B and A, respectively. Among these 13 positions, 7 positions were selected as critical positions. Based on Model four, 46 "antigenic variants" pairs have one changed epitope B with 3 mutations on epitope B, denoted as "B+". This result suggested a single changed epitope B can cause antigenic variants. For example, the pair virus strains, A/NewYork/55/2004 and A/Anhui/1239/2005, have three critical mutations on epitope B (i.e. positions 156, 160 and 193) (Table 2). According to the HA/antibody structure (Fig. 4), the residue 156 interacts to CDR2 (position 55 on the antibody) and the residue 193 interacts with three residues on CDR2 (positions 50, 55 and 57) and one residue on CDR3 (position 105). This structure suggested that mutations on residues 156, 160 and 193 can induce the conformation change on epitope B to escape from CDR2 and CDR3 of the neutralizing antibody.
Antigenic drift and epitope evolution
We utilized the changed epitopes to study the antigenic drift on 3,331 circulating strains ranging from 1982 to 2009 (38 influenza seasons). One of WHO surveillance network's purposes is to detect the emergence and spread of antigenic variants that may signal a need to update the composition of influenza vaccine [1, 3]. Here, we considered an emerging antigenic variant according to WER strain, which was the dominant strain in each influenza season [6] (Table S1 in additional file 1). For a selected season, we applied Model four, measuring changed epitopes for the pairs between the vaccine and circulating strains for "antigenic variants", and the variant ratio (VR) to detect the emerging antigenic variants.
Among 38 seasons (1982~2009), our model detected 12 seasons with emerging antigenic variants (VR ≥ 0.5) and 10 of them followed by the update of WER strain in the next season (Fig. 5A). For example, the 85-86 season, 80% of the circulating strains with changed epitope "B+" (Fig. 5B), is the first emerging antigenic variants and the WER strain updated in the next season (i.e. from A/Mississippi/1/85 to A/Leningrad/360/86). In addition, among seven "emerging antigenic variants" seasons (matching WHO vaccine updates), four seasons (i.e. 89-90, 91-92, 95-96 and 02-03) matched the antigenic cluster transitions proposed by Smith et al. [3]. The other three seasons, which were detected by one changed epitope on A or B, are consistent to the WER strain updates (i.e. 87-88, 85-86 and 99). These results suggested that "emerging antigenic variants" with ≥ 2 changed epitopes may cause the major antigenic drift while "emerging antigenic variants" with one changed epitope on A or B may cause the minor antigenic drift.
To observe the epitope evolution, Figure 5B illustrates the hamming distance (HD) on 64 critical positions of five epitopes. For example, the VR of the season 85-86 was 0.8 (Fig. 5A) and the epitope with the largest HD was epitope B (HD is 3.4). For 16 seasons with WER strain updates, the average HDs of epitopes A, B, C, D and E were 1.2, 2.1, 0.4, 0.4 and 0.5 respectively. These results showed that epitopes A and B change more frequently in vaccine update seasons and they play a key role for antigenic drift.