Growth and Maintenance of Stagonospora nodorum
S. nodorum SN15 and gna1 strains were maintained on CZV8CS agar as previously described [14]. These two strains were chosen as part of a relative quantitation analysis coupled to this proteome mapping experiment. For proteomic analysis, 100 mg of fungal mycelia were inoculated into minimal medium broth supplemented with 25 mM glucose as the sole carbon source. The fungi were grown to a vegetative state by incubation at 22°C with shaking at 150 rpm for three days. Vegetative mycelia were harvested via cheesecloth filtration and freeze-dried overnight.
Protein Extraction
Soluble intracellular proteins were extracted from freeze-dried mycelia as previously described [2]. Briefly, freeze-dried mycelia were mechanically broken with a cooled mortar and pestle and proteins were solubilised with 10 mM Tris-Cl (pH 7.5). The crude homogenate was collected and centrifuged at 20,000 g for 15 min at 4°C. The resulting supernatant was retained and treated with nucleases to remove nucleic acids. All protein samples were checked via SDS-PAGE to ensure that proteolysis was minimal during sample preparation (data not shown).
Sample Preparation
Proteins from SN15 and gna1-35 strains were precipitated individually by adding five volumes of acetone, incubating for 1 hour at -20°C and pulse centrifuging for 5-10 seconds. The protein pellets were resuspended in 0.5 M triethylammonium bicarbonate (TEAB) (pH 8.5) before reduction and alkylation according to the iTRAQ protocol (Applied Biosystems, Foster City, CA, USA). Samples were centrifuged at 13,000 g for 10 min at room temperature before the supernatant was removed and assayed for protein concentration (Bio-Rad protein assay kit, Hercules, CA, USA). A total of 55 μg of each sample was digested overnight with 5.5 μg trypsin at 37°C. Each digest was desalted on a Strata-X 33 μm polymeric reverse phase column (Phenomenex, Torrance, CA, USA) and dried. The entire experiment was performed in triplicate (including the generation of ground mycelia).
Strong Cation Exchange Chromatography
Dried peptides were dissolved in 70 μl of 2% acetonitrile and 0.05% trifluoroacetic acid (TFA) and separated by strong cation exchange chromatography on an Agilent 1100 HPLC system (Agilent Technologies, Palo Alto, CA, USA) using a PolySulfoethyl column (4.6 × 100 mm, 5 μm, 300 Å, Nest Group, Southborough, MA, USA). Peptides were eluted with a linear gradient of Buffer B (1 M KCl, 10% acetonitrile and 10 mM KH2PO4, pH 3). A total of 37 fractions were collected, pooled into 8 fractions, desalted, dried and resuspended in 20 μl of 2% acetonitrile and 0.05% TFA.
Reverse Phase Nano LC MALDI-MS/MS
Peptides were separated on a C18 PepMap100, 3 μm column (LC Packings, Sunnyvale, CA, USA) with a gradient of acetonitrile in 0.1% formic acid using the Ultimate 3000 nano HPLC system (LC Packings-Dionex, Sunnyvale, CA, USA). The eluent was mixed with matrix solution (5 mg/ml α-cyano-4-hydroxycinnamic acid) and spotted onto a 384 well Opti-TOF plate (Applied Biosystems, Framingham, MA, USA) using a Probot Micro Fraction Collector (LC Packings, San Francisco, CA, USA).
Peptides were analysed on a 4800 MALDI-TOF/TOF mass spectrometer (Applied Biosystems, Framingham, MA, USA) operated in reflector positive mode. MS data were acquired over a mass range of 800-4000 m/z and for each spectrum a total of 400 shots were accumulated. A job-wide interpretation method selected the 20 most intense precursor ions above a signal/noise ratio of 20 from each spectrum for MS/MS acquisition but only in the spot where their intensity was at its peak. MS/MS spectra were acquired with 4000 laser shots per selected ion with a mass range of 60 to the precursor ion -20.
Data Analysis
Mass spectral data from all three biological replicates were combined and analysed using the Mascot sequence matching software (Matrix Science, Boston, USA) with the support of the facilities at the Australian Proteomics Computational Facility (Victoria, Australia). Search parameters were: Enzyme, Trypsin; Max missed cleavages, 1; Fixed modifications, iTRAQ4plex (K), iTRAQ4plex(N-term), Methylthio(C); Variable modifications, Oxidation(M); Peptide tol, 0.6 Da; MS/MS tol, 0.6 Da. The MOWSE algorithm (MudPIT scoring) of Mascot was used to score the significance of peptide/protein matches with p < 0.05 for each protein identification. Four protein datasets were constructed for proteogenomic screening: the combination of version 1 and 2 proteins as defined from annotation of the SN15 genome sequence [12]; a between-stop codon 6-frame translation of the S. nodorum genome assembly; 6-frame translated, CAP3-generated [15] contigs of un-assembled reads of the S. nodorum assembly, and; 6-frame translated singleton un-assembled reads which did not assemble into contigs via CAP3. All 6-frame open reading frames (ORFs) were subject to a 10 amino acid minimum length threshold.
For the purpose of false discovery rate (FDR) calculation, randomised sequences from the version 1 and 2 proteins and the 6-frame translated assembly protein datasets were generated as Mascot decoy databases [16] (as detailed at http://www.matrixscience.com/help/decoy_help.html).
Characterisation of peptide-supported genes
Peptide supported genes were analysed for abundance of assigned gene ontology (GO) terms [12]. Gene counts for GO terms were compared between peptide supported and unsupported genes via Fisher's exact test. A p-value threshold of 0.05 was imposed to determine significance. Gene counts for SignalP [17] and WolfPsort [18] cellular location predictions and relative molecular mass predictions were also compared by this method.
De novo proteogenomics
MudPIT-filtered peptide matches to the 6-frame translated assembly were mapped back to their genomic location. Peptides mapping in the same orientation with either overlapping genomic coordinates or within the proximity of 200 bp were combined as peptide clusters (referred to herein as peptide clusters). The purpose of peptide cluster formation was merely to reduce the redundancy in the peptide data to aid in the interpretation of subsequent comparisons with annotated gene features, therefore clusters with a single peptide were retained. Individual peptides and peptide clusters were compared for overlap and proximity within 200 bp to S. nodorum version 1 and 2 genes.
Potential homologs to S. nodorum SN15 genes were detected by tblastn comparison of the genome assembly with the proteomes of the dothideomycete fungi Leptosphaeria maculans, Pyrenophora tritici-repentis, Cochliobolus heterostrophus, Alternaria brassicicola, Mycosphaerella graminicola and Mycosphaerella fijiensis. Tblastn high-scoring pairs (HSPs) were grouped according to hit, but also subject to additional criteria: best HSP e-value < 1e-10; individual HSP e-values < 1e-5; HSPs mapped on S. nodorum genome no further than 2 kb apart or split into sub-groups each subject to the previous criteria. Grouped HSP sequence coordinates were compared to both peptide cluster and annotated gene coordinates on the S. nodorum genome assembly for overlap (Figure 1). By this method we detected peptide clusters which could be linked back to a nearby gene model through a shared homolog or peptide clusters representing potential new gene annotations with homology support.