FluTyper-an algorithm for automated typing and subtyping of the influenza virus from high resolution mass spectral data

Wong, Jason WH; Schwahn, Alexander B; Downard, Kevin M

doi:10.1186/1471-2105-11-266

Methodology article
Open access
Published: 19 May 2010

FluTyper-an algorithm for automated typing and subtyping of the influenza virus from high resolution mass spectral data

Jason WH Wong¹,
Alexander B Schwahn² &
Kevin M Downard²

BMC Bioinformatics volume 11, Article number: 266 (2010) Cite this article

7546 Accesses
16 Citations
Metrics details

Abstract

Background

High resolution mass spectrometry has been employed to rapidly and accurately type and subtype influenza viruses. The detection of signature peptides with unique theoretical masses enables the unequivocal assignment of the type and subtype of a given strain. This analysis has, to date, required the manual inspection of mass spectra of whole virus and antigen digests.

Results

A computer algorithm, FluTyper, has been designed and implemented to achieve the automated analysis of MALDI mass spectra recorded for proteolytic digests of the whole influenza virus and antigens. FluTyper incorporates the use of established signature peptides and newly developed naïve Bayes classifiers for four common influenza antigens, hemagglutinin, neuraminidase, nucleoprotein, and matrix protein 1, to type and subtype the influenza virus based on their detection within proteolytic peptide mass maps. Theoretical and experimental testing of the classifiers demonstrates their applicability at protein coverage rates normally achievable in mass mapping experiments. The application of FluTyper to whole virus and antigen digests of a range of different strains of the influenza virus is demonstrated.

Conclusions

FluTyper algorithm facilitates the rapid and automated typing and subtyping of the influenza virus from mass spectral data. The newly developed naïve Bayes classifiers increase the confidence of influenza virus subtyping, especially where signature peptides are not detected. FluTyper is expected to popularize the use of mass spectrometry to characterize influenza viruses.

Background

Influenza is a leading cause of death throughout the developed world and contributes to between 250,000 and 500,000 deaths every year worldwide [1]. On three occasions last century, global pandemics resulted in millions of deaths while recent pandemic threats have been posed by strains of avian [2] and swine origin [3]. Much higher rates of infection exist in the general population that, while not life threatening, inflicts illness and suffering. The virus also imposes a significant social and economic burden through productive losses in the workplace [4].

The genetic analysis of the influenza virus is derived from RT-PCR sequencing of amplified gene segments for the major antigens of the virus [5]. Most work is focused on the hemagglutinin gene because of its primary role in antigenic drift [6]. This is aided by the Influenza Virus Resource, a sequence database developed by the National Center for Biotechnology Information (NCBI) [7] that provides access to genetic sequence data that facilitates multiple sequence alignments, phylogenetic analysis and the generation of clusters [8, 9]. It is typical in a retrospective analysis, for a strain from the most dominant genetic cluster within one influenza season to be recommended by the WHO for the vaccine in the following season.

Antigenic change is measured primarily employing the hemagglutination inhibition (HI) assay [10], where anti sera raised from infection of a host with one strain are cross reacted with other uncharacterized and reference strains in parallel. New computational approaches have been developed to analyze HI data [11] that increases the reliability with which antigenic differences can be assessed and this has been aided by mass spectrometric approaches [12] that enable epitopic domains to be localized [13–17]. Antigenic maps allow for the visualization of antigenic relationships among many strains in order to follow the short and long evolution of the virus [18]. These maps can aid the comparison of antigenic data derived from different laboratories and enable such data to be more reliably interpreted. Epidemiological modeling to predict whether new emerging strains are likely to cause widespread epidemics in future seasons is also under development [19, 20]. The inclusion of antigenic drift and cross-immunity data can improve the reliability of these models.

We have recently developed the most direct and rapid method yet to survey influenza from the perspective of the viral protein antigens [21–24]. Antigens recovered from the virus or present in whole virus or vaccine preparations are digested with site-specific proteases and the peptide products are analyzed by high resolution mass spectrometry [25]. The mass accuracy attained in these analyzes enables the unambiguous identification of conserved signature peptides that are specific to a given type or subtype of the influenza virus. The signature peptides are unique in mass when compared to the in silico digest of all influenza proteins across all strains and hosts and those proteins known to contaminate virus preparations.

To date, the analysis of high resolution mass spectra of influenza proteolytic preparations has required manual interpretation through the identification of signature peptide masses that indicate the type or the subtype of an influenza virus. Currently, manual interpretation can be performed when signature peptides dominate a mass spectrum but it is not possible to establish the degree of confidence in typing and subtyping strains. Further, spectral analysis often involves the detection of multiple signature peptides, some of low abundance, or in some cases establishing the type and subtype without signature peptides (Po > 90-95). Existing algorithms such as the Mascot Peptide Mass Fingerprinting algorithm [26] can be used to identify proteins within a mass spectrum, however, such algorithms do not provide any level of confidence for the type and subtype of the virus from which the proteins are identified. This is particularly a problem when signature peptides are not detected in a given mass spectrum. To extend our previous work and automate the analysis of high resolution mass spectra of influenza proteolytic preparations, the FluTyper algorithm has been developed. FluTyper implements methods to deisotope, filter and detect peaks from mass spectra. Peaks are then matched against established signature peptides from common antigens [21–24]. In addition, naïve Bayes classifiers have been developed to provide statistical confidence for type and subtype assignments where few or no signature peptides are available. Here the basis of the FluTyper algorithm is described and its application for the automated analysis of MALDI mass spectra derived from antigen and whole virus digests is demonstrated.

Results and Discussion

Algorithm overview

FluTyper has been designed to utilize naïve Bayes classifiers for the typing and subtyping of proteolytic influenza mass spectra. FluTyper is divided into two main parts, first, the algorithm generates naïve Bayes classifiers and determines unique signature peptides, and second, the algorithm pre-processes query mass spectra and determines the virus type and subtype based using the classifiers and signature peptides (Figure 1). Naïve Bayes classifiers are generated for four common influenza antigens hemagglutinin (HA), neuraminidase (NA), nucleoprotein (NP), and matrix protein 1 (M1). Subsequently, the FluTyper algorithm uses all classifiers, in combination, for the computation of the type and subtype probabilities and the identification of proteolytic signature peptides from each mass spectrum analyzed.

Pre-processing of high resolution mass spectra

Mass spectra of tryptic influenza peptides are pre-processed prior to typing and subtyping using the naïve Bayes classifier. First, a user defined threshold is used to remove peaks that are considered to be noise (typically set at a signal-to-noise ratio of 2). Second, all isotope clusters are identified and the spectrum is deisotoped. The deisotoping method used is adapted from the THRASH algorithm [27]. The method involves iterating through each peak in the threshold mass spectrum starting from the lowest m/z value. As the algorithm proceeds, each peak is compared to previous peaks to determine if it belongs to an existing isotopic cluster. If a peak belongs to an existing isotopic cluster, the peak is removed and its intensity is added to the existing monoisotopic peak. To evaluate the composition of isotopic clusters, the model amino acid averagine (C_4.9384H_7.7583N_1.3577O_1.4773S_0.0417) [28] is used to define both the predicted distance between isotopic peaks and the intensity distribution of ions with an isotopic cluster. A major advantage of mass spectral data acquired by MALDI is that tryptic peptide ions generated are almost exclusively singly charged (i.e. [M+H⁺] ions). This eliminates the need to deconvolute (by mass) the mass spectrum.

Naïve Bayes classifiers for the typing and subtyping of the influenza virus

Non-redundant HA, NA, NP and M1 sequence sets for human strains of influenza virus type A and B, and subtypes H1N1 excluding pandemic sequences (H1N1) 2009 sequences, pandemic (H1N1) 2009 sequences (P2009), H3N2 and H5N1 were retrieved from the NCBI Influenza Virus Sequence Database [7]. Each set of sequences is then aligned using ClustalW [29] to enable the relative frequency of occurrence Po(M, T) of each unique theoretical monoisotopic tryptic peptide ion [M+H]⁺, M, for a given type or subtype, T, to be determined. Tryptic peptide fragments were generated to allow for up to 2 missed cleavages, with fixed carbamidomethyl cysteine and optional modifications of methionine, glutamic acid and cysteine residues in the form of oxidized methionine, pyroglutamate and acrylamide adducts with cysteine.

A naive Bayes classifier is a simple probabilistic classifier based on the application of Bayes' theorem. Using the classifier, the type or subtype of an influenza virus can be determined as follows:

(1)

where p(T|M₁...M_n ) is the probability for a type or subtype T based on theoretical tryptic peptide ion monoisotopic masses, M₁...M_n . All parameters (p(M_i|T), p(T) and p(M₁...M_n)) in the model are estimated directly from protein sequence alignments. The independent probability for each mass to be present for a given type or subtype, p(M_i|T), is given by its relative frequency of occurrence Po(M, T). The assumption is made that the presence of peptide ion masses derived from a particular protein is independent to that of any other mass (i.e. that the presence of one tryptic peptide is independent of the presence of another). Where a particular mass M_i is present in one type or subtype, but not another, the Laplace's rule of succession is applied, where 1 is added to the number of observed events to avoid zero probabilities. This assumption is useful to account for noise peaks that may be present in mass spectral data. The prior probability, p(T), reflects the probability of occurrence of a given type or subtype, T, and is estimated based on the relative number of sequences in the NCBI database for T. However, this value may be adjusted as necessary to match the observed occurrence of different influenza types and subtypes in a particular season. Finally, the independent probability of observing peaks M₁...M_n , p(M₁...M_n) can be computed as the sum of the probability of observing peaks M₁...M_n across all types or subtypes:

(2)

where T_a , T_b , T_x , etc are all the possible type or subtypes being analyzed. A naïve Bayes classifier is built for each of the HA, NA, M1 and NP antigens used to type and subtype the virus.

To assess the peak matching false discovery rate, decoy naïve Bayes classifier models are generated using randomly permutated sequences from the same set of influenza proteins.

Uniqueness of peptide ion masses in naïve Bayes classifiers

Since the naïve Bayes classifier is trained based on theoretical protein sequences from specific influenza proteins alone, validation that the tryptic peptide masses are unique to influenza is necessary. This is performed as described previously [21]. Briefly, each theoretical monoisotopic mass, M, from each type and subtype present in the naïve Bayes classifier, is compared against the theoretical monoisotopic tryptic ion masses [M+H⁺] from a custom database containing all non-redundant influenza protein sequences, and those of possible contaminants, including human keratin, bovine/porcine trypsin and several chicken proteins that have been found to commonly contaminate egg-propagated virus preparations or are introduced during the sample preparation. The included egg-derived chicken protein contaminants are based on our own observation and their identity was confirmed by MALDI tandem mass spectrometry (unpublished observations - spectra available upon request). Other unknown contaminants are always possible, but due to the use of high-resolution mass spectrometry with mass accuracies routinely better than 1 ppm achieved, the misassignment of contaminants will be largely avoided. Masses are generated for predicted tryptic peptide ions allowing for up to 2 missed cleavages and the same post-translational modifications as described in the previous section. The difference in M and the closest theoretical mass, U_M (in parts per million (ppm)), of a tryptic peptide derived from a contaminant or influenza antigen with at least 10 entries in the custom database is defined as the uniqueness.

Peak matching, signature peptide identification and computation of type and subtype probabilities using naïve Bayes classifiers

In a mass spectrum, typically only a portion of theoretical tryptic peptides is observed experimentally. This may be due to a range of factors ranging from incomplete proteolytic cleavage to the presence of unanticipated post-translational modifications. It is necessary to first define a set of theoretical tryptic peptide masses that are actually observed within a specified mass error tolerance. The list of theoretical masses used for matching are determined based on the specified protein (HA, NA, NP, M1 or all). Where the mass of an observed peak is within the mass error tolerance of two or more peaks, the closest theoretical mass is selected. For a matching peak to be selected for further analysis, the mass must be sufficient unique as defined by:

(3)

where ΔM is the mass error (in ppm) between the observed mass and theoretical tryptic peptide mass, and U_M is the uniqueness as described in the previous section. A scaling of U_M by a factor of 0.5 is necessary to ensure that there cannot be another tryptic contaminant peptide mass present that is closer to the observed mass than that of the theoretical mass.

The concept of using signature peptides to type and subtype the influenza virus has been previously described [21]. A signature peptide is defined as a theoretical tryptic peptide that is exclusively present in one type or subtype, but not in any of the others. In the FluTyper algorithm, a signature peptide is defined as any theoretical tryptic peptide, M, where Po(M, T) > 0.7 for one type or subtype and Po(M, T) = 0 for all other types or subtypes for a given influenza protein. Since few signature peptides may be indicative of a particular subtype of the virus, indicator peptides are also used by the algorithm. An indicator peptide is defined similarly to a signature peptide with the exception that it may occur in the sequence of antigens from other viral subtypes with Po(M, T) < 0.1.

For the computation of type and subtype probabilities, the naïve Bayes classifier (1) is applied using the set of matching peaks. For typing, this provides a probability that a set of masses is from influenza A (p(FluA| M₁...M_n )) or influenza B (p(FluB|M₁...M_n )). If p(FluA|M₁...M_n ) > 0.7 or there is more than one influenza A signature peptide identified, the algorithm will proceed to perform subtyping where p(H1N1|M₁...M_n ), p(H3N2|M₁...M_n ), p(H5N1|M₁...M_n ) and p(P2009|M₁...M_n ) are all computed.

Implementation

Since it is only necessary to generate a naïve Bayes classifier when new sequences have been added to the custom database, the implementation of the FluTyper algorithm is divided in two applications, consisting of the naïve Bayes classifier and signature peptide generator, and the mass spectrum analysis program (Figure 1). The classifier and signature peptide generator accepts ClustalW aligned sequences as input to compute the frequency of occurrence of theoretical tryptic peptides and determines the uniqueness of their mass. The output is a table containing all data necessary for naïve Bayes classification and signature peptide determination. The second component of FluTyper accepts a mass spectrum in ASCII format and the classification tables as input. FluTyper outputs the type and subtype prediction based on signature peptides and naïve Bayes probabilities. The number of matches to peptides from decoy sequences is also shown to provide an estimate of the false positive peak matching rate. A summary of all peaks identified can also be downloaded in tab-delimited format. FluTyper is implemented using GNU C++. A web interface has been developed for the second component of FluTyper and can be accessed at http://www.cancerresearch.unsw.edu.au/CRCWeb.nsf/page/flutyper (see Figure S1 for a screenshot of the interface and Table S1 for a description of the parameters).

Theoretical evaluation of naïve Bayes classifier

The performance of the naïve Bayes classifiers were evaluated as a function of the protein coverage. For each protein (i.e. HA, NA, NP or M1), 500 random subsets of theoretical tryptic peptides representing 0-100% coverage of the protein were generated for each protein sequence used to train the classifier. The set of theoretical tryptic peptides masses represents a simulated mass spectrum. Leave-one-out cross-validation was performed, meaning that a new classifier was used each time, leaving out the protein sequence being tested. For the purpose of this evaluation, a subset of masses were determined to be typed or subtyped if p(T| M₁...M_n) > 0.7 for any T.

Figure 2A & 2B shows the percentage of simulated mass spectra conclusively classified as a function of protein coverage for typing and subtyping respectively. For typing, over 90% classification rate was achieved with greater than 25% protein coverage in all cases. For subtyping, over 90% classification rate was achieved with greater than 30% protein coverage for HA, NA and NP. However, M1 was less reliable, with a classification rate limited to around 80% with a protein coverage of greater than 40%. The low classification rate for M1 is due to a combination of factors. First, the M1 protein has around 50% less amino acids compared to NP, NA and HA and therefore also has fewer tryptic peptide masses that can be used by the naïve Bayes classifier. Second, the M1 protein is more conserved between different influenza subtypes compared to NP, NA and HA, thus the classifier may not be able distinguish the subtype even with full protein coverage.

In the case of typing (Figure 1C), the false positive rate (FPR) is less than 1% in all cases and 0% at protein coverage of greater than 25%. For subtyping (Figure 1D), the FPR was less than 1% for protein coverage of 20% or greater for HA and less than 5% with increased sequence coverage for NA. HA performed more favorably than NA since the neuraminidase of H1N1 and H5N1 are similar, while the hemaggluttin antigen across H1N1, H3N2 and H5N1 are all significantly different. On the other hand, the NA classifier was able to distinguish Hx N1 and H3N2 subtypes with 0% FPR (data not shown).

For NP, the FPR is 10% at low protein coverage and decreases to 5% with increased coverage. For M1, the FPR is just under 10% independent of the protein coverage. The high apparent FPR for NP and M1 for subtyping can be expected since the subtype of a virus is characterized by the isoform of its HA and NA proteins. For instance, the reassortment of a virus can lead to the introduction of a NP protein from one subtype to another (e.g. H1N1 to H3N2) without changing the subtype of the actual virus. For example, the translated NP protein sequence derived from the NCBI entry gi148466309 is designated as a H3N2 subtype, but the actual sequence is in fact more similar to other H1N1 NP sequences.

The theoretical testing results demonstrate that the use of naïve Bayes classifiers are appropriate at protein coverage levels expected from experimental mass spectra where 20-30% or greater protein coverage is common. Crucially, the false positive rate is less than 1% for typing and is still below 10% for subtyping using M1 and NP proteins. It is evident from testing that for confident assignment of the virus subtype, the use of HA or NA tryptic peptides would be most desirable.

Testing with experimental influenza mass spectra

To demonstrate FluTyper using experimental data, mass spectra were acquired from tryptic digests prepared from whole virus preparations and gel-separated influenza antigens. Mass spectra were generated for common human influenza virus strains including influenza type B strain B/Victoria/504/2000, type A (H1N1) strain A/Solomon Islands/03/06 and type A (H3N2) strain A/Brisbane/10/2007 (Additional file 1). The type and subtype of these three strains are in common with those viruses that are in circulation in humans today. All samples were analyzed using default FluTyper settings - with relative peak intensity cutoff at 0.001%, peak matching tolerance of 3 ppm, frequency of occurrence (Po) cutoff of 0.6, one missed cleavage and optional modification of methionine oxidation.

The high resolution mass spectrum of a whole virus digest of influenza type B strain B/Victoria/504/2000 is shown in Figure 3A. The 15 signature peptides for influenza type B identified enable the virus type to be confidently assigned (Table 1). In addition to the signatures, 3 indicator peptides - those that are present with a frequency of occurrence, Po < 0.1 in all other types, are also identified. The identified signature and indicator peptides are distributed amongst NP, M1, NA and HA, showing that good sequence coverage of all major antigens can be achieved through whole virus digestion.

Table 1 Identified peptides from a mass spectrum (Figure 3A) of a whole virus digest of type B influenza strain B/Victoria/504/2000

Full size table

To demonstrate the subtyping ability of FluTyper, a whole virus digest of type A (H3N2) influenza strain A/Brisbane/10/2007 is used (Figure 3B). In total, there are 18 peaks with Po of > 0.6 and the peaks are matched within the 3 ppm threshold (Table 2). 8 of the 18 peaks identified are signature peptides for type A influenza.

Table 2 Identified peptides from a mass spectrum (Figure 3B) of a whole virus digest of type A (H3N2) influenza strain A/Brisbane/10/2007

Full size table

Generally, type signature peptides are highly conserved with Po > 0.95 across all subtypes and provide little value for distinguishing subtypes (this is with the exception of the NA peptide (1625.68015 m/z) which is only present in HxN2 sub-types). Nevertheless, of the remaining 10 peptides, FluTyper identified two as H3N2 subtype signatures (852.43626 m/z and 1625.68015 m/z) and one as an indicator (748.47159 m/z). The identification of the signature and indicator peptides alone enables the subtype to be confidently assigned to H3N2. Furthermore, by applying the naïve Bayes classifier using the Po values of all the peaks for all subtypes a p(H3N2|peaks) value of 1 is obtained, providing additional confidence of the result (see Additional files 2, 3, 4 and 5).

Finally, to demonstrate the use of the naïve Bayes classifier where no signature peptides are available for subtyping, a mass spectrum of in-gel digested nucleoprotein from type A (H1N1) strain A/Solomon Islands/03/06 was analyzed (Figure 3C). In total, 11 peptides are identified by FluTyper (Table 3). While 5 type A influenza signatures peptides are identified, no subtype indicator or signature peptides were found. In this case, the naïve Bayes classifier provides the only means for subtype determination. Using the Po values shown in Table 3, the classifier generates probabilities of 0.9998, 0.0002, 0 and 0 for H1N1, H3N2, H5N1 and P2009 respectively, indicating that the peptides identified are almost certain to have come from the H1N1 subtype.

Table 3 Identified peptides from a mass spectrum (Figure 3C) of nucleoprotein derived from type A (H1N1) influenza strain A/Solomon Islands/03/06

Full size table

To validate the naïve Bayes classification, the protein sequence coverage is shown in Table 4. In the case of the whole virus digests, a coverage range of between 10.5% and 42%, and 10.3% and 27.9% was achieved in mass spectra for the type A (H3N2) and type B virus, respectively. The combined FPR as estimated from Figure 2B and 2D based on the product of each of the individual antigen FPR is < 0.1% for type A (H3N2) and type B, respectively. For type A (H1N1), as expected, only nucleoprotein was identified for the in-gel digestion of this antigen with a sequence coverage of 24.8%. Based on theoretical testing from Figure 2D, there is an approximately 8% chance that the spectrum could be misidentified. As discussed earlier, the high false positive rate is due to the fact that the subtype of an influenza virus is defined based on hemagglutinin and neuraminidase, hence the possibility of reassortment cannot be excluded. Nevertheless, the nano-scale preparation and mass spectrometry analysis of whole virus digests described here provides highly reliable subtyping results for influenza using FluTyper.

Table 4 Total protein coverage of the different antigens identified from the mass spectrum of each of samples tested

Full size table

Conclusions

The FluTyper algorithm has been developed for automated typing and subtyping of influenza virus using high resolution mass spectral data. FluTyper incorporates the use of influenza antigen signature peptides previously identified in this laboratory. Furthermore, to increase the confidence of subtyping, naïve Bayes classifiers have been developed for four common influenza antigens, hemagglutinin, neuraminidase, nucleoprotein, and matrix protein 1. Theoretical testing of the classifiers demonstrates their applicability at protein coverage rates expected in mass mapping experiments. Using laboratory grown virus samples analyzed by high resolution mass spectrometry, it is shown that FluTyper can rapidly and reliably type and subtype strains of the influenza viruses that are in common circulation in humans. Through the use of other signature peptides and classifiers, it is anticipated that the FluTyper algorithm could be applied to the typing/classification of other viruses and bacteria.

Methods

Influenza virus strains

All utilized human strains of type A and type B influenza viruses, A/Solomon Islands/03/06(H1N1), A/Brisbane/10/07(H3N2), and B/Victoria/504/2000, were purchased from Advanced ImmunoChemicals Inc. (Long Beach, California, USA). The inactivated viruses, prepared from allantoic fluid of embryonated eggs, were used without further purification.

Protein preparation and digestion

A suspension corresponding to 35 μg of influenza virus type B and type A (H1N1), was evaporated to near dryness, resuspended in digestion buffer without trypsin (50 mM NH₄HCO₃, 10% ACN, 2 mM DTT) and incubated at 37°C for 3 h. Modified trypsin (1.0 mg•mL^-1; Roche Diagnostics GmbH, Mannheim, Germany) was added to a final concentration of about 30 ng•μL^-1 and the digestion carried out at 37°C over night.

Where gel recovered, viral protein was first separated from 20 μg of the virus by SDS-PAGE (12.5%), excised and destained (25 mM NH₄HCO₃ in 50% acetonitrile). The reduction and alkylation of cysteine residues with DTT (10 mM DTT, 50 mM NH₄HCO₃; 30 min, 56°C) and iodoacetamide (55 mM iodoacetamide, 50 mM NH₄HCO₃; 20 min at room temperature in the dark) was followed by tryptic digestion as previously described [21]. Cleaved peptides were extracted by repeated sonication in 60% acetonitrile containing 0.1% trifluoroacetic acid. Extracted peptides were dried completely in a vacuum concentrator and dissolved in 25 mM NH₄HCO₃.

Nano-scale digestion of whole virus

2.5 μL of a suspension containing 500 ng•μL^-1 of the influenza virus type A (H3N2) was irradiated in a microwave (Samsung MX245) at 900 W power for 2 × 20 s. 7.5 μL of a 2.6 mM DTT solution was added to reduce Cysteine residues. The sample was sonicated in a sonicator bath and incubated at 60°C in an Eppendorf thermomixer for 30 min. The suspension was evaporated to dryness in a vacuum concentrator and viral protein was reconstituted in 4 μL digestion buffer (31.3 mM NH₄HCO₃, 12.5% acetonitrile, 4.3 mM octyl-β-D-glucopyranoside) by vortexing and sonication. 1.0 μL modified trypsin (65 ng•μL^-1; Roche Diagnostics, Mannheim, Germany) was added and the digestion carried out overnight at 37°C. The digestion mixture was concentrated to dryness and the tryptic cleavage products were dissolved directly in matrix solution (1.5 mg•mL^-1 α-cyano-4-hydroxycinnaminic acid, 6.3 mM NH₄HCO₃, 45% acetonitrile, 0.075% TFA) to create a peptide concentration of ~250 ng•μL^-1.

MALDI FT-ICR mass spectrometry

MALDI FT-ICR mass spectra were recorded on a 7T Bruker APEX-Qe instrument (Bruker Daltonics, Billerica, MA, USA) in the positive ion mode as previously described [21–24]. Briefly, mass spectra were acquired for 1 M data points using a broadband excitation. Mass spectra were calibrated externally using a mixture of peptides comprising Angiotensin I, adrenocorticotropic hormone (ACTH) fragments containing residues 1-17, 7-38 and 18-39, and a synthetic hemagglutinin antigen derived peptide. Mass spectra were processed using the Data Analysis v3.4 software (Bruker Daltonics, Billerica, MA, USA) and recalibrated internally utilizing identified peptide ions in each spectrum derived from the viral proteins. Mass lists were exported as tab-delimited files. Mass accuracies of between 0.1 to 1 ppm are routinely achieved for all ions detected with mass resolutions (FWHM) exceeding 100,000.

Availability and Requirements

Project name: FluTyper

Project home page:

http://www.cancerresearch.unsw.edu.au/CRCWeb.nsf/page/flutyper

Operating system: Windows, Linux

Programming language: C++

License: Free for non-commercial use. Source code available upon request.

References

World Health Organization: Fact Sheet No. 211. 2003.
Google Scholar
Peiris JS, de Jong MD, Guan Y: Avian influenza virus (H5N1): a threat to human health. Clin Microbiol Rev 2007, 20(2):243–267. 10.1128/CMR.00037-06
Article PubMed PubMed Central Google Scholar
Garten RJ, Davis CT, Russell CA, Shu B, Lindstrom S, Balish A, Sessions WM, Xu X, Skepner E, Deyde V, et al.: Antigenic and genetic characteristics of swine-origin 2009 A(H1N1) influenza viruses circulating in humans. Science 2009, 325(5937):197–201. 10.1126/science.1176225
Article CAS PubMed PubMed Central Google Scholar
Maynard A, Bloor K: The economic impact of pandemic influenza. BMJ 2009, 19: 339.
Google Scholar
Wright KE, Wilson GA, Novosad D, Dimock C, Tan D, Weber JM: Typing and subtyping of influenza viruses in clinical samples by PCR. J Clin Microbiol 1995, 33(5):1180–1184.
CAS PubMed PubMed Central Google Scholar
White J, Hoffman L, Arevalo J: Attachment and entry of influenza virus into host cells. Pivotal roles of hemagglutinin. In Structural Biology of Viruses. Edited by: Chiu W, Burnett R, Garcea R. New York: Oxford University Press; 1997:80–104.
Google Scholar
Bao Y, Bolotov P, Dernovoy D, Kiryutin B, Zaslavsky L, Tatusova T, Ostell J, Lipman D: The influenza virus resource at the National Center for Biotechnology Information. J Virol 2008, 82(2):596–601. 10.1128/JVI.02005-07
Article CAS PubMed PubMed Central Google Scholar
Ina Y, Gojobori T: Statistical analysis of nucleotide sequences of the hemagglutinin gene of human influenza A viruses. Proc Natl Acad Sci USA 1994, 91(18):8388–8392. 10.1073/pnas.91.18.8388
Article CAS PubMed PubMed Central Google Scholar
Plotkin JB, Dushoff J, Levin SA: Hemagglutinin sequence clusters and the antigenic evolution of influenza A virus. Proc Natl Acad Sci USA 2002, 99(9):6263–6268. 10.1073/pnas.082110799
Article CAS PubMed PubMed Central Google Scholar
Pedersen JC: Hemagglutination-inhibition test for avian influenza virus subtype identification and the detection and quantitation of serum antibodies to the avian influenza virus. Methods Mol Biol 2008, 436: 53–66. full_text
CAS PubMed Google Scholar
Lapedes A, Farber R: The geometry of shape space: application to influenza. J Theor Biol 2001, 212(1):57–69. 10.1006/jtbi.2001.2347
Article CAS PubMed Google Scholar
Downard KM, Morrissey B, Schwahn AB: Mass spectrometry analysis of the influenza virus. Mass Spectrom Rev 2009, 28(1):35–49. 10.1002/mas.20194
Article CAS PubMed Google Scholar
Downard KM, Morrissey B: Fingerprinting a killer: surveillance of the influenza virus by mass spectrometry. Analyst 2007, 132(7):611–614. 10.1039/b701835e
Article CAS PubMed Google Scholar
Kiselar JG, Downard KM: Antigenic surveillance of the influenza virus by mass spectrometry. Biochemistry 1999, 38(43):14185–14191. 10.1021/bi991609j
Article CAS PubMed Google Scholar
Morrissey B, Downard KM: A proteomics approach to survey the antigenicity of the influenza virus by mass spectrometry. Proteomics 2006, 6(7):2034–2041. 10.1002/pmic.200500642
Article CAS PubMed Google Scholar
Morrissey B, Streamer M: Antigenic characterisation of H3N2 subtypes of the influenza virus by mass spectrometry. J Virol Methods 2007, 145(2):106–114. 10.1016/j.jviromet.2007.05.015
Article CAS PubMed Google Scholar
Schwahn AB, Downard KM: Antigenicity of a type A influenza virus through comparison of hemagglutination inhibition and mass spectrometry immunoassays. J Immunoassay Immunochem 2009, 30(3):245–261. 10.1080/15321810903084350
Article CAS PubMed Google Scholar
Bush RM, Bender CA, Subbarao K, Cox NJ, Fitch WM: Predicting the evolution of human influenza A.[see comment]. Science 1999, 286(5446):1921–1925. 10.1126/science.286.5446.1921
Article CAS PubMed Google Scholar
Flahault A, Deguen S, Valleron AJ: A mathematical model for the European spread of influenza. Eur J Epidemiol 1994, 10(4):471–474. 10.1007/BF01719679
Article CAS PubMed Google Scholar
Rvachev L, Longini I: A mathematical model for the global spread of influenza. Math Biosci 1985, 73: 3–22. 10.1016/0025-5564(85)90064-1
Article Google Scholar
Schwahn AB, Wong JWH, Downard KM: Subtyping of the influenza virus by high resolution mass spectrometry. Anal Chem 2009, 81(9):3500–3506. 10.1021/ac900026f
Article CAS PubMed Google Scholar
Schwahn AB, Wong JWH, Downard KM: Signature peptides of influenza nucleoprotein for the typing and subtyping of the virus by high resolution mass spectrometry. Analyst 2009, 134(11):2253–2261. 10.1039/b912234f
Article CAS PubMed Google Scholar
Schwahn AB, Wong JWH, Downard KM: Typing of Human and Animal Strains of Influenza Virus with Conserved Signature Peptides of Matrix M1 Protein by High Resolution Mass Spectrometry. J Virol Methods 2010, 165(1):178–185. 10.1016/j.jviromet.2010.01.015
Article CAS PubMed Google Scholar
Schwahn AB, Wong JWH, Downard KM: Rapid Typing and Subtyping of Vaccine Strains of the Influenza Virus with High Resolution Mass Spectrometry. Eur J Mass Spectrom 2010, in press.
Google Scholar
Rompp A, Taban IM, Mihalca R, Duursma MC, Mize TH, McDonnel LA, Heeren RM: Examples of Fourier transform ion cyclotron resonance mass spectrometry developments: from ion physics to remote access biochemical mass spectrometry. Eur J Mass Spectrom 2005, 11(5):443–456. 10.1255/ejms.732
Article CAS Google Scholar
Perkins DN, Pappin DJ, Creasy DM, Cottrell JS: Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20: 3551–3567. 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2
Article CAS PubMed Google Scholar
Horn DM, Zubarev RA, McLafferty FW: Automated reduction and interpretation of high resolution electrospray mass spectra of large molecules. J Am Soc Mass Spectrom 2000, 11(4):320–332. 10.1016/S1044-0305(99)00157-9
Article CAS PubMed Google Scholar
Senko MW, Beu SC, McLafferty FW: Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions. J Am Soc Mass Spectrom 1995, 6(4):229–233. 10.1016/1044-0305(95)00017-8
Article CAS PubMed Google Scholar
Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994, 22(22):4673–4680. 10.1093/nar/22.22.4673
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The FT-ICR mass spectrometer was purchased with funds provided by an Australian Research Council Discovery Linkage Infrastructure Equipment Facility (LIEF) Grant (LE0668439) and the University of Sydney. A. Schwahn was supported by an ARC Discovery Project Grant (DP0770619).

Author information

Authors and Affiliations

Prince of Wales Clinical School & Lowy Cancer Research Centre, Faculty of Medicine, University of New South Wales, Sydney, NSW, Australia
Jason WH Wong
School of Molecular Science, University of Sydney, NSW, Australia
Alexander B Schwahn & Kevin M Downard

Authors

Jason WH Wong
View author publications
You can also search for this author in PubMed Google Scholar
Alexander B Schwahn
View author publications
You can also search for this author in PubMed Google Scholar
Kevin M Downard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jason WH Wong.

Additional information

Authors' contributions

JWHW designed, developed and implemented the algorithm and wrote the manuscript. ABS designed the algorithm, carried out the mass spectrometry experiments, prepared the virus digest and drafted the manuscript. KMD conceived the project, participated in its design and coordination and drafted the manuscript. All authors read and approved the final manuscript.

Electronic supplementary material

Additional file 1: Zipped file containing the raw mass spectra used for testing FluTyper. (ZIP 23 KB)

Additional file 2: Screenshot of the input web interface for FluTyper. (TIFF 48 KB)

Additional file 3: Description of parameters used in FluTyper (DOC 31 KB)

12859_2010_3723_MOESM4_ESM.TIFF

Additional file 4: FluTyper HTML web output for influenza type A (H3N2) strain A/Brisbane/10/2007 shown in Figure 3D. (TIFF 168 KB)

12859_2010_3723_MOESM5_ESM.TAB

Additional file 5: FluTyper output on the analysis of the mass spectrum of whole virus digest of type A influenza (H3N2) strain A/Brisbane/10/2007 shown in Figure 3B. (TAB 73 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Wong, J.W., Schwahn, A.B. & Downard, K.M. FluTyper-an algorithm for automated typing and subtyping of the influenza virus from high resolution mass spectral data. BMC Bioinformatics 11, 266 (2010). https://doi.org/10.1186/1471-2105-11-266

Download citation

Received: 19 February 2010
Accepted: 19 May 2010
Published: 19 May 2010
DOI: https://doi.org/10.1186/1471-2105-11-266

FluTyper-an algorithm for automated typing and subtyping of the influenza virus from high resolution mass spectral data

Abstract

Background

Results

Conclusions

Background

Results and Discussion

Algorithm overview

Pre-processing of high resolution mass spectra

Naïve Bayes classifiers for the typing and subtyping of the influenza virus

Uniqueness of peptide ion masses in naïve Bayes classifiers

Peak matching, signature peptide identification and computation of type and subtype probabilities using naïve Bayes classifiers

Implementation

Theoretical evaluation of naïve Bayes classifier

Testing with experimental influenza mass spectra

Conclusions

Methods

Influenza virus strains

Protein preparation and digestion

Nano-scale digestion of whole virus

MALDI FT-ICR mass spectrometry

Availability and Requirements

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Authors' contributions

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us