A software application for comparing large numbers of high resolution MALDI-FTICR MS spectra demonstrated by searching candidate biomarkers for glioma blood vessel formation

Background A Java™ application is presented, which compares large numbers (n > 100) of raw FTICR mass spectra from patients and controls. Two peptide profile matrices can be produced simultaneously, one with occurrences of peptide masses in samples and another with the intensity of common peak masses in all the measured samples, using the peak- and background intensities of the raw data. In latter way, more significantly differentially expressed peptides are found between groups than just using the presence or absence in samples of common peak masses. The software application is tested by searching angiogenesis related proteins in glioma by comparing laser capture micro dissected- and enzymatic by trypsin digested tissue sections. Results By hierarchical clustering of the presence-absence matrix, it appears that proteins, such as hemoglobin alpha and delta subunit, fibrinogen beta and gamma chain precursor, tubulin specific chaperone A, epidermal fatty acid binding protein, neutrophil gelatinase-associated lipocalin precursor, peptidyl tRNA hydrolase 2 mitochondrial precursor, placenta specific growth hormone, and zinc finger CCHC domain containing protein 13 are significantly different expressed in glioma vessels. The up-regulated proteins in the glioma vessels with respect to the normal vessels determined by the Wilcoxon-Mann-Whitney test on the intensity matrix are vimentin, glial fibrillary acidic protein, serum albumin precursor, annexin A5, alpha cardiac and beta actin, type I cytoskeletal 10 keratin, calcium binding protein p22, and desmin. Peptide masses of calcium binding protein p22, Cdc42 effector protein 3, fibronectin precursor, and myosin-9 are exclusively present in glioma vessels. Some peptide fragments of non-muscular myosin-9 at the C-terminus are strongly up-regulated in the glioma vessels with respect to the normal vessels. Conclusion The less rigorous than in general used commercial propriety software de-isotope algorithm results in more mono-isotopic peptide masses and consequently more proteins. Centroiding of peptide masses takes place by taking the average over more spectra in the profile matrix. Cytoskeleton proteins and proteins involved in the calcium signaling pathway seem to be most up-regulated in glioma vessels. The finding that peptides at the C-terminus of myosin-9 are up-regulated could be ascribed to splicing or fragmentation by proteases.


Background
Gliomas are the most common primary brain tumors, which resort among the neoplasms with the highest degree of blood vessel formation. The identification of new angiogenesis-related proteins is important for the development of anti-angiogenesis therapy. In a previous study [1], angiogenesis related proteins were identified in micro-dissected glioma vessels using Matrix Assisted Laser Desorption Ionization Fourier Transform Ion Cyclotron Resonance (MALDI-FTICR) Mass Spectrometry (MS). In brief, four different micro-dissected tissue groups were compared, namely, 1) samples from glioma blood vessels, 2) samples from glioma tissue surrounding these glioma vessels, 3) samples blood vessels in normal brain, and 4) samples from tissue that surrounded the normal vessels. The study resulted in the discovery of enzymatic by trypsin digested peptide fragments of proteins, which are exclusively present in samples of the glioma blood vessels using MS. The presence of two of these proteins was confirmed with specific antibodies on various tissue sections including glioma. The combination of techniques led to the discovery of validated biomarkers for glioma vessels, respectively, fibronectin precursor with SwissProt™ accession code P02751, collagen-binding protein 2, also named colligin-2 with SwissProt™ accession code P50454, and one candidate marker acidic calponin-3 with Swiss-Prot™ accession code Q15417. The presence of fibronectin precursor and colligin-2 was confirmed by staining of tissue sections with specific commercial antibodies. Staining of fibronectin and colligin-2 by immunohistochemistry in glioma vessels is shown in Figure 3 and Figure 4 of ref.
[1]. However, it is desirable to find more than these 3 proteins as candidate biomarkers. Ideally, the finding of more specific proteins will help to elucidate protein pathways that function in angiogenesis. Sophisticated bioinformatics may retrieve more information from present MS data. In a previous manuscript, a database application is presented, which enables to compare hundreds of MALDI-TOF MS spectra from patient and control groups [2]. In this study, the application, written in Java™ [3], is adapted to handle hundreds of mass spectra, which are measured on a FTICR MS instrument. This required complete rewriting of some parts of the source code, avoiding computer memory and performance issues dealing with high resolution FTICR MS spectra. The new architecture can also compare intensities of spectra as we address later. This was not possible in the previous version. The high resolution of MALDI FTICR spectra with respect to MALDI-TOF spectra enables de-isotoping, which we implemented in current version. In addition, the application can handle LC MALDI MS peak lists, which de-convolutes peptide masses with extra dimension elution time. Table 1 gives an overview of all the instruments and file types of exported mass spectra that are accessible for the application. The files types may consist of raw binary MS spectra or exported peak lists containing masses and corresponding intensities or XML files. The use of peak lists or XML files with respect to the use of raw binary data (the fid files) has the disadvantage of the so called "missing data" problem, which is demonstrated in Table 2a. When comparing peak lists, it appears that the signal intensity from a specific peak in a sample is not matched with intensities of peptide masses from other samples. This problem is tackled by using the raw binary data generated from the mass spectrometer instead of peak lists. The database application is adapted to handle raw binary fid (free induction decay) files. The data in these files is processed by Fast Fourier Transformation (FFT) into a frequency signal. By this conversion the mass can be calculated from the frequency, using the calibration constants in the acqus (acquisition status) files. The average signal intensity of noise, the baseline, is calculated according to a method developed by Horn et al. [4]. Figure 1 shows a fragment of an MALDI-FTICR MS spectrum as generated from the raw binary fid file. The baseline is the bottom horizontal line in Figure 1. The scatter of noise intensities around the baseline as seen in Figure 1 is expressed in a variable noise N. Real peaks are expected to display a signal, S, with an intensity above the sum of 2 intensities, namely 1) the baseline intensity, and 2) the factor*N. The combined intensity is denoted as signal to noise, S/N, threshold. The signal to noise threshold is the upper horizontal line in Figure 1. The peaks at masses 1808.9034 and 1809.9062 Da have intensities above the S/N threshold. The masses 1808.9034 and 1809.9062 Da are added in the peptide profile matrix with number of occurrences of a peptide mass for different samples [2]. Average masses over all spectra are calculated from these masses in the different samples with intensities above the S/N threshold. When working with raw data creating the intensity matrix, it is possible to register the background noise signals by recording all the peaks, such as approximately at 1810.9 and 1811.9 Da in Figure 1 with intensities between the baseline and the S/N threshold in a separate "noise peaks" file. These masses are not added to common peak list of the peptide profile matrix. Solely the background intensities > 0 but smaller than the S/N threshold of these "noise peaks" file are used for calculations. All signals below the baseline are neglected. This approach results in a more reliable and complete comparison of the intensities of peak masses in different groups (Table 2b). For each mass of the common peak list a Wilcoxon-Mann-Whitney rank sum test can be performed comparing the intensities of this mass in the samples of two groups. The p-value of the Wilcoxon-Mann-Whitney test ranks peak masses, that have for instance intensities that are strong up-regulated in the glioma vessel group compared to the normal control vessel group, and vice versa. In this way, we can find more significantly differentially expressed peptides, than just using the presence-absence matrix of peak masses. In a presence-absence matrix the presence of a specific mass in a sample, part of a common peak list of the masses in all samples, is represented by 1, and the absence is represented by 0. The process of generating the binary matrix is explained in [2] with that difference that in this FTMS study no replicate spectra of samples are measured, resulting in counted values of just 1 and 0. The peak lists of the samples are processed from raw data with high sensitivity at a relative low S/N threshold of 4. It is well known that MALDI TOF MS analysis suffers from limited reproducibility in term of peak intensities. The peak intensities measured with the FTICR MS, however, vary about 10% in replicate spectra of a single sample. The peak intensities are not normalized, because experimental conditions are kept constant, e.g. number of laser shots. A simple, but effective de-isotope algorithm is introduced, which is able to remove most of the isotopic masses from the raw data. The resulting peak list of one sample is compared with that obtained with commercially DataAnalysis™ software (Bruker Daltonics, Germany), which uses the SNAP™ deisotope algorithm. The tryptic protonated peptide frag-ments of proteins are compared with those theoretically calculated from the proteins in the SwissProt™ database, using the MASCOT™ search engine [5] and MS-MS sequencing of MALDI-TOF MS data of Liquid Chromatography (LC) fractionated samples using the WARP-LC™ software (Bruker Daltonics, Germany) as described in [1].

Mass accuracy
In the peak picking algorithm the data point are used with the local highest intensity without an m/z centroiding. In Equation (2) to (5) the frequency and the mass difference between the data points is given to describe the inaccuracy due to this simple peak detection. We do not consider mass accuracy due to peak broadening by space charging.
In our experiments, we accumulate the ions of 10 laser shots in the storage hexapole for each scan, and a total of 100 scans is used for each mass spectrum. Storing the ions in the hexapole prior to ICR analysis prevents overloading of the ICR cell. The mass accuracy in ppm of peak maxima in FTICR spectra can theoretically be calculated for the A fragment of a MALDI-FTICR MS spectrum as generated from the raw binary fid file FTICR MS device used as described in any advanced MS textbook [6]. Recapitulated, the dimensionless value of is inversely proportional to the frequency f in Hz of the FTICR signal for a particle with charge z*e (C) and mass m*u (kg).
where the device specific calibration constant in the acqus file ML1 ≈ 1.443*10 8 Hz is used for calculating peak masses from frequencies. A derivative of above equation originates in the following equation [6]: The frequency resolution Δf is: where the frequency sweep SW_h is 1.818*10 5 Hz; the number of data points TD = 524288 and a chosen factor zerofilling is 4. Equation (3) shows that zero filling increases the mass accuracy [7]. Applying these values results in: The average distance between the real maximum and the raw spectrum data point is the distance between the data points. Therefore the absolute value of mass accuracy for a 9.4 T ICR magnet is: For a dimensionless value of for instance 1000 equal to Daltons for single charged MALDI masses, a mass accuracy of 0.6 ppm can be calculated and for a dimensionless value of 2500, a mass accuracy of 1.5 ppm can be the-oretically be expected. The average value of mass accuracy for a between a 1000 and 2500 is 1.05 ± 0.27 ppm.
Given these values, corresponding peptide masses of proteins where searched in the SwissProt™ database with MASCOT™ using a mass tolerance of peptides of no more than 2 ppm.

De-isotope algorithm
A simple, but effective, de-isotope algorithm is implemented in the database application based on the methodology described earlier [8]. The de-isotope algorithm starts with the determination of a number of isotopic clusters of peaks above the signal to noise threshold (Figure 1), denoted as C, in the mass spectrum, where the difference in peptide mass between the mono-isotope and first isotope in each cluster, j, is , and the difference in mass of the first and second isotope, , and so on. The theoretical isotopic difference reported for the first isotope in Figure 1a of ref. [4] is about 1.00289 Da. The relative atomic abundance in the theoretical amino acid "averagine" is 31.71% C, 49.82% H, 8.72% N, 9.49% O, and 0.27% S [9], while the abundance of the first isotope C 13 is 1.11% with an isotopic difference mass C 13 -C 12 [4] calculate the isotopic distances also for higher isotopes with an average value of 1.00235 Da. When applying an input isotopic distance of 1.00235 Da [4] in the application, the average isotopic distance in the best clusters as will be described shortly in 38 samples is 1.00327 Da. Clearly the isotopic distance in the best clusters is predominantly influenced by the C 13 isotopic difference of 1.0034 Da, an experimental value which we most frequently observe in our spectra. This might be due to ion statistics, because with low concentrations of sample, average values are not measured. The value 1.0034 Da is used as a default setting in the application. The intensity of the mono-isotope peak in the mass spectrum may be larger than the intensity of the first isotope in the low mass area m/z < 1800, or the intensity of the mono-isotope peak may be smaller than the first, second, or higher isotope peak when m/z > 1800 [11,12]. However, in both cases the intensity of subsequent isotopes in a cluster never increases, once it has decreased (at least not with (1) non-overlapping isotopic clusters). Furthermore, a monoisotope peak in this algorithm does not have an accompanying isotopic peak with an approximately 1.0034 Da smaller mass. These considerations are taken into account when determining the peptide masses that belong to true isotopic clusters. Within each isotopic cluster all peptide masses are gathered which have an m/z distance with the previous isotope of 1.0034 ± 0.0100, thus with an initial relative high mass tolerance of 1%. The average isotopic distance of each cluster j is calculated according to: Where N j is the number of peptide masses in an isotopic cluster, j, including the mono-isotope, . From the total amount of clusters C, a number of clusters are selected; C s (≤ C) that shows the smallest intra isotopic distance variance, for example less than 0.1%. For each selected cluster j: In this case the variable variance_isotopic_distance = 0.1. If at least 10 good clusters are found, C s ≥ 10, a new average isotopic distance, μ s , is calculated from the mean isotopic distance, μ j , of the selected clusters, j.  text *.txt mass, intensity and Retention time peak list, space or tab separated data xml *.xml <Compound mass="mass1" RetentionTime="retentiontime1" AbsoluteIntensity="intensity1"> </Compound> The database application is written in Java and R and uses a MySQL™ database. The files types may consist of 1) raw binary MS spectra (the fid files), 2) exported peak lists containing tab or space separated masses and corresponding intensities, and 3) XML files. The mean isotopic distance μ s of the selected clusters has a deviation of: True isotopic clusters must have a mean isotopic distance, μ j , between μ s -σ s and μ s + σ s . Figure 2 shows the distribution of the mean isotopic distance, μ j of potential isotopic clusters of sample H2 as a function of the mono-isotopic masses. The dashed line in Figure 2 represents the acceptance window of mean isotopic distances real clusters, μ sσ s and μ s + σ s. As shown in Figure 2, the larger mass accuracy at larger peptide masses is reflected by the larger scatter of isotopic distances μ j . All isotopic masses of the clusters, that have a mean isotopic distance within the acceptance window, μ s -σ s and μ s + σ s , are removed from the peak lists. If the mean isotopic distance value of a potential isotopic cluster in not in the acceptance window, isotopic masses are peeled from this cluster from high to low mass until the mean isotopic mass is within the acceptance window, the remaining isotopes are removed from the peak list. As an example, the masses 1300.529351, 1301.531681, and 1302.525368 Da are detected as a potential isotopic cluster. However, the mean isotopic distance is too low with 0.9980 Da, due to the low distance between the first and second isotope of 0.9937 Da. The peak at mass 1302.525368 Da appears not an isotopic signal after visual inspection of the mass spectrum. Consequently, only the isotopic mass 1301.531681 Da with distance 1.0023 Da from the mono-isotope is removed from the peak list.

Tissue Sections
Identical raw MS fid files of 40 micro-dissected tissue sections were used as described in [1]. The 10 spectra of glioma blood vessels were coded H1 to H10, 10 spectra of tissue surrounding the glioma vessels were coded TH1 to TH10, 10 spectra of normal vessels were coded S1 to S10, and 10 spectra of tissue surrounding the normal vessels were coded TS1 to TS10.

MS-MS sequencing of peptides in tissue sections
Proteins were identified by the MS-MS sequencing as described in [1]. In brief, a number of 4 pooled samples were subjected to nano-LC fractionation. First, 8 sections comprising tissue and vessels of sample TH8 were combined (of which 10% was estimated to be vessels). Secondly, for comparison 8 sections of the normal brain sample TS5 were combined in exactly the same way. In addition, micro-dissected glioma samples of series H1 to H10 were combined, resulting in one pooled glioma blood vessel sample. Finally, the 10 samples of normal vessels in series S1 to S10 were pooled according to the same procedure. With time intervals of 15 s, fractions of the samples were spotted automatically upon a 384 prespotted anchor-chip plate. The plates were measured by an automated Ultraflex™ MALDI TOF-TOF instrument (Bruker Daltonics, Germany), using WARLP-LC™ software. The WARLP-LC™ software interprets MS spectra of each individual spot and subsequently performs MS-MS on each peptide peak mass. The best peak masses for performing the MS-MS sequencing were determined automatically by the WARLP-LC™ software. The BTDX.xml export files, containing the MS and the MS-MS peak masses, were imported in Biotools™ software version 3.0 (build 1.68) (Bruker Daltonics, Germany) and submitted by this software application to the SwissProt™ version 40.21 database, using the MASCOT™ search engine. A 150 ppm parent mass tolerance, 0.5 Da fragments tolerance, and one possible missed trypsin cleavage site was allowed.
Distribution of the mean isotopic distances μ j of real isotopic clusters as a function of the mono-isotopic peptide mass in the glioma vessels sample H2 Figure 2 Distribution of the mean isotopic distances μ j of real isotopic clusters as a function of the mono-isotopic peptide mass in the glioma vessels sample H2. The dashed lines represent the acceptance window, between μ sσ s and μ s + σ s of mean isotopic distances of real isotopic clusters. The larger mass accuracy with larger peptide masses is reflected by the larger scatter of isotopic distances μ j

MALDI-FTICR MS measurements
The samples of H1 toH10, S1 to S10, TH1 to TH10, and TS1 to TS10 were enzymatic digested by trypsin, mixed with 2,5-dihydroxybenzoic acid (DHB) solution (1 mg/ ml H 2 O), spotted upon a 600/384 anchor-chip plate (Bruker Daltonics, Germany), and measured by an type APEX-Q™ FTICR MS instrument with a 9.4 T magnet (Bruker Daltonics, Germany). The details of this procedure is described in [1]. The mono-isotopic peak list of glioma sample H7 obtained with the new de-isotope algorithm of the database application was compared with that obtained by the SNAP™ algorithm using DataAnaly-sis™ version 3.4 (Build 169) software using an S/N > 1.7.
The SNAP™ de-isotope algorithm is performed with the following parameters; the instrument type is set default (Fourier transform), the quality factor threshold 0, the S/ N threshold 1.7, the relative intensity threshold (base peak) 0.01%, the absolute intensity threshold 0, the maximum charge state 4, the repetitive building block of C 4.9384 N 1.3577 O 1.4773 S 0.0417 H 7.7583 (the theoretical "averagine" amino acid composition [13]), the additional constant unit "empty", the algorithm version 2.0, an include component isotope pattern checkbox "not checked", the filter exclusion masses checkbox "not checked", the use peak finder to calculate peak position checkbox "checked".

Data analysis
The Java™ software package described in [2] was adapted to handle raw FTICR MS data and was used to create a profile matrix of all peptide masses present in the 40 samples from the MALDI-FTICR mass spectra. The application was adapted to annotate peptide peak masses from raw FTICR MS fid files (Bruker Daltonics, Germany), which had intensities above an S/N threshold of 4. The search window for peaks was 20 ppm in both directions, which could be varied in the Graphical User Interface (GUI) of the application [2]. A window of 20 ppm was chosen, because of the experimental constraints of the simple peak picking (no interpolation between data points or curve fittings). The peak picking algorithm searches for a local maximum in this window in the MALDI-FTICR MS spectrum. A too small window of 3 ppm generates too large peak lists and causes performance problems. The software package was written in Java™ and R [14] and used a MySQL™ database [15]. It required special libraries to be installed, namely edtftpj-1.4.8.jar, mysql-connector-java-3.1.6-bin.jar and serializer.jar [15][16][17].

Apodization Function
The following apodization function F k was multiplied with the raw FTICR time signal of each data point, k, before applying the FFT: Where the number of data points TD = 524288 (with no factor for zero filling) and 1 ≤ k ≤ TD. LB and GB represent the Lorentzian and Gaussian broadening factors. The spectra were processed with a LB of 0 and a GB of 0.3276932. The values of these factors are default settings of the application GUI.

Internal calibration
All mass spectra of the samples were internally calibrated on ubiquitous cytoplasmic 1 beta actin single charged peptide masses of 1198.70545, 1515.74913, 1790.89186, 2215.06990 and 3183.61423 Da (see additional file 1). The protein cytoplasmic 1 beta actin has the SwissProt™ accession code P60709. The internal calibration required that measured actin masses had to be within 30 ppm distance of the calibration masses with allowance of one missing actin mass, resulting in at least 4 calibration points for each calibrated spectrum. A shift of 30 ppm was chosen since this value is somewhat smaller than the broadness of peaks at the baseline in the FTICR MS spectra with our settings and larger than the expected shift of mass values. A too small value < 5 ppm would result in a to small amount calibration masses and result in excluding the spectrum. A too large value > 30 ppm would result in the usage of peptide masses from wrong proteins other than actin. The shift by internal calibration is smaller than 5 ppm. The value of 30 ppm can be changed in the GUI of the application. An insufficient number of calibrate masses were found with the "normal vessel" sample S5 and "tissue surrounding the normal vessels" sample TS5 and they were not used for further analysis. If more than three calibration masses are found within 30 ppm of the measured peak masses, the following quadratic equation is applied to calculate the constants a [0], a[1], and a [2] from the observed frequencies of the peak masses, using R's linear model lm [14].
All peak masses were subsequently recalculated from the observed frequencies using this formula.

Peptide profiles
Two different peptide profile matrices were produced simultaneously, one with the presence or absence of common peptide masses in spectra of different samples, and a matrix with the intensity of the peak mass if present in the  (11) sample or the intensity of the background signal if absent [2]. To avoid including too many noise peaks in the matrix, each peptide mass had to be present 10% of the spectra (= 4

Mass accuracy
The experimental mass accuracy was determined from the mean value of the absolute differences between the measured and theoretical masses of tryptic peptide fragments of 2 proteins; 1) GFAP with entry name GFAP_HUMAN and SwissProt™ accession number P14136, and 2) type I cytoskeletal 10 keratin, with entry name K1C10_HUMAN and SwissProt™ accession number P13645. The protein K1C10_HUMAN was determined by the WARP-LC™ software to be present in 2 pooled samples; 1) glioma vessels, and 2) normal vessels (see Additional file 1). The protein GFAP_HUMAN was determined by the WARP-LC™ software to be present in all the 4 pooled samples; 1) glioma vessels 2) glioma tissue and vessels, 3) normal vessels, and 4) normal tissue and vessels (see Additional file 1). Keratin and Glial Fibrillary Acidic Protein were selected, since the tryptic peptide masses from these proteins appeared to be present in all the MALDI-FTICR MS spectra. From in total about 60 peak masses, the mass accuracy was determined from 20 peak masses that had the highest mean peak intensities, 20 peak masses with middle mean peak intensities, and 20 peak masses with the lowest mean peak intensities. The mean peak intensity of each mass was calculated from peak and background intensities from the present signals in the intensity matrix.

MS-MS peptide sequencing of tissue sections
All MS-MS sequencing analysis results of 1) the glioma vessels samples, 2) glioma tissue and vessels samples, 3) normal vessels samples, and 4) normal tissue and vessels samples generated by the WARP-LC™ software are presented in the additional material (Additional file 1).

Peak finding and de-isotope algorithm
The peak finding algorithm in our database application results in a peak list of 2026 peptide peak masses with a S/ N threshold of 4 for sample H7 (Additional file 3). The DataAnalysis™ 3.4 (Build 169) software gives approximately the same number of 2064 peak masses (≈ 2026) at a S/N threshold of 1.7 (Additional file 4). The definition of S/N threshold for peak finding is different in both software applications. The overlap of both peak lists is about 90% with 1834 peptide masses. A large decrease with 82% to 379 peak masses is assigned when the SNAP™ de-isotope algorithm is applied on the peak list exported from the DataAnalysis™ software (Additional file 5). A more modest decrease with 28% to 1451 peak masses is seen with the de-isotope algorithm described in this publication (Additional file 6). A number of about 250 isotopic clusters fall within the acceptance window ( Figure 2), with an average of 3 isotopes (including the mono-isotope). When about 500 isotopes are remove from the peak list (≈ 2026 -1451), it appear that the main amount of the 1451 peaks, namely ± 1200 (≈ 1451 -250) are single peaks with intensities just above the signal to noise threshold. A percentage of 83% (314 peak masses) of the DataAnalysis™ mono-isotopic peak masses appears to be present in the mono-isotopic list of the database application. At a higher S/N ratio of 4 instead of 1.7, using the DataAnalysis™ software, the number of total annotated peak masses gradually drops to 654, while the number of mono-isotopic peaks established with the SNAP™ algorithm yields 283 peak masses. This is a smaller decrease of 57% than the 82% measured at a low signal to noise threshold of 1.7. It indicates that the mono-isotopic peaks that are detected with the SNAP™ algorithm are in the high intensity peptide clusters.

Mass accuracy
In Table 3, the measured mass accuracy of an exported peak list of a single sample H7 from the database application is about 1.10 ± 0.91 ppm. This mass accuracy is about equal to the theoretical value of 1.05 ± 0.27 ppm. The masses in the peak list of the same sample exported with the DataAnalysis™ (Bruker Daltonics, Germany) software are more accurate with a low value of 0.81 ± 0.59 ppm. The higher accuracy is probably due to another internal calibration formula and a sophisticated propriety centroiding algorithm. When taking into account the average of the peptide masses in all the samples, the measured mass accuracy in the database application from raw data decreases from 1.10 ± 0.91 to 1.03 ± 0.72 and further to 0.67 ± 0.48 ppm when determined on peptide masses, which appear to be presents in at least 5 or more and 18 or more samples in the peptide matrix of in total 38 samples, respectively. The peak masses with the highest intensities do not necessary have a higher mass accuracy, namely 0.91 ± 0.58 ppm, than peak masses with middle mean peak intensities, 0.66 ± 0.39 ppm. Peak masses with the lowest mean peak intensities have the worst mass accuracy, 1.54 ± 0.82 ppm, but are also calculated with the smallest number of 254 peak and background signals. Figure 3 shows the hierarchical clustering on the presenceabsence peptide mass profile matrix of 38 of the in total 40 samples in a heat-map. Peptide masses present in glioma vessels clustering visually together in the red heated area in the marked blue box. This helps selecting the peptide masses to search for proteins in the SwissProt™ database using the MASCOT™ search engine. The clustering table is added as the Additional file 7. Two samples, S5 and TS5, were excluded because they could not be calibrated. Sample S10 appears to be an outlier in Figure 3, since it is clustered between samples H6 and H7 and other glioma samples. Peptide masses clustering to glioma vessels, illustrated in the highlighted box of Figure 3 are submitted to the SwissProt™ database (version 40.21, entry homo sapiens) using the MASCOT™ search engine to identify the proteins differentially expressed in the "glioma vessels" samples. A peptide mass error tolerance of ± 2 ppm is used with the MASCOT™ search. The highlighted box in Figure 3 represents the hierarchical clustering order 490 with mass 1037.5355 Da to 789 with mass 1665.7891 Da (see Additional file 7). Only proteins that have at least two matched values of experimental and calculated peptide masses are taken into consideration. One of the top 20 proteins reported by MASCOT™ is hemoglobin alpha subunit with SwissProt™ accession code P69905 with 2 matched peptide masses. Hemoglobin alpha subunit is identified by the MS-MS sequencing options of the WARP-LC™ software in the glioma vessels (Additional file 1). Hemoglobin delta subunit with Swiss-Prot™ accession code P02042 is also found is with 2 matched peptide masses. It is identified by the MS-MS  The data of the clustering table is added as an additional file 7. A number of 10 spectra of glioma blood vessels with codes H1 to H10, 10 spectra of tissue surrounding the glioma vessels with codes TH1 to TH10, 10 spectra of normal vessels with codes S1 to S10, and 10 spectra of tissue surrounding the normal vessels with codes TS1 to TS10 are included. Two "normal vessels" samples, S5 and TS5, were excluded because they could not be calibrated. The highlighted box in Figure 3

Wilcoxon-Mann-Whitney
The list of p-values based on the peak intensity differences between the normal and glioma vessel sample group for 2375 peptide masses is presented in the Additional file 2.
A number of 95 (4%) in glioma vessels up-regulated peptide masses with p-values < 0.01 are used in MASCOT™ to search for calculated peptides masses of proteins. A mass tolerance of 2 ppm is used. The MASCOT™ search is repeated with another number of 442 (19%) in glioma vessels up-regulated peptide masses with p-values < 0.1. Table 4 summarizes the MASCOT™ search and MS-MS sequencing results of the differentially expressed proteins.
Only proteins are listed that have at least three experimental measured peptide masses that correspond with the calculated values of these proteins. In Table 4, Calciumbinding protein p22 is one of the proteins with 3 peptides Increased peak intensity at the mass MH + 1508.7103 Da from a peptide of probably Calcium-binding protein p22 in glioma ves-sel samples with a p-value < 0.1 that match with the SwissProt™ database. An increased intensity of a peak at peptide mass MH + of 1508.7103 Da of likely a peptide of Calciumbinding protein p22 is measured in glioma vessel samples represented by the dark lines in Figure 4, which shows a fraction of the MALDI-FTICR mass spectra of all samples. By contrast the peptide mass of 2',3'-cyclic-nucleotide 3'phosphodiesterase (CNPase) at 1508.8739 Da represented by the grey lines is as expected not present in glioma vessels. This protein is identified by MS-MS sequencing in normal tissue and vessels (see Additional file 1).

Peptide masses exclusively present in glioma vessels
The peptide masses which are exclusively present in the glioma vessels group (series H in Additional file 8) are used to search calculated masses of proteins using MAS-COT™ with a mass tolerance of 2 ppm. In Table 5, all proteins are listed that have 2 hits or more by a MASCOT™ search and one of these matched peptide masses has a Wilcoxon-Mann-Whitney p-value < 0.01. MASCOT™ lists proteins due with a score. However, peptides masses can be ascribed to more than one protein in this summary. Peptide masses are preferably ascribed to proteins present in the MS-MS runs (Additional file 1) than proteins with the highest score in the MASCOT™ summary. Calciumbinding protein p22 with SwissProt™ accession number Q99653 which was found with the Wilcoxon-Mann-Whitney test is also added to Table 5. From the proteins listed in Table 5, Fibronectin precursor and Myosin-9 are also identified with the MS-MS sequencing of the pooled samples (Additional file 1), while Cdc42 effector protein 3 and Calcium-binding protein p22 are not identified by MS-MS sequencing. Except for a small peak in the outlier sample S10, Collagen-binding protein 2 precursor (Colligin-2) with SwissProt™ accession code P50454 is exclusively measured in the glioma vessels group (series H) using the new algorithm. Collagen-binding protein 2 precursor (Colligin-2) is listed in a separate Table 6. The peak intensities of the glioma samples are just above 300*10 3 A.U. and the value of S10 just under this value (Additional file 2).

Myosin-9
From  1). Two of them are displayed in Figure 5 and 6. Figure 5 shows a fraction of the mass spectra of all samples, where the peak at peptide mass MH + of 1155.6643 Da of Myosin-9 displays an increased intensity in glioma samples represented by the dark lines, while the first isotopic mass of Neurofilament triplet L protein at 1154.7128 Da, represented by the grey lines, is as A number of 95 (4%) of the measured peptide masses with Wilcoxon-Mann-Whitney p-values < 0.01 and raised intensities in the glioma vessel group are compared with calculated peptide masses from proteins in the SwissProt™ database using MASCOT™. A mass tolerance of 2 ppm is used. The MASCOT™ search is repeated with another number of 442 (19%) of peptide masses that have a p-value < 0.1 and with up-regulated intensities in the "glioma vessels" group. Only proteins are listed that have at least three hits of experimental measured peptide masses corresponding within 2 ppm with the calculated peptide masses of these proteins. The listed proteins are found by a MASCOT™ search with 2 or more hits between measured and calculated tryptic peptide masses with an error tolerance of 2 ppm. Only proteins are listed that had at least one peptide mass with a Wilcoxon-Mann-Whitney p-value < 0.01 comparing the peak intensities of glioma vessels with normal vessels. expected not present in glioma vessels. Neurofilament triplet L is identified by MS-MS sequencing in the "normal tissue and vessels" sample (see Additional file 1). Figure 6 shows an equal distribution of peak intensities among all samples of a Myosin-9 peptide mass at 1193.6166 Da. The strong up-regulated peptides of Myosin-9 are all located at the C-terminus of the protein, approximately from amino-acid position 1301 to 1959 (Figure 7). Table 7 shows  vessels and normal vessels, respectively (Table 7). By contrast, the p-value of peptides of fibronectin precursor in Table 5

Candidate glioma vessel formation biomarkers
In this study, proteins of laser micro-dissected tissues are analyzed using an improved high sensitivity detection, thus lower S/N threshold in comparison with our previous experiments [1]. Fibronectin precursor is 'rediscovered'. The mass 1275.5568 Da previously ascribed to acidic calponin [1] is measured again, however it differs relative much with 2.5 ppm from the calculated mass of 1275.5600 Da in this analysis. It is 5 times measured in the glioma vessels, 2 times in the normal vessels (including the outlier sample S10 in hierarchical clustering) and not in the other tissues. The mass 1659.8009 Da of Colligin-2 is measured again ( Table 6) with 0.10 ppm mass accuracy. Cytoskeleton proteins and proteins involved in the calcium signaling pathway seem to be most up-regu-lated. Tubulin-specific chaperone A is likely to be detected by hierarchical clustering and reported to be biomarker for grade IV gliomas [19,20]. Annexin A5, detected with the Wilcoxon-Mann-Whitney test, is an ion channel protein with calcium-and phospholipid-binding properties. Calcium-binding protein p22 is a member of the calcium signaling pathway [21]. It is interesting that the mass of 1116.54323 Da was not identified in our previous study [1] matches with 0 ppm with a tryptic fragment of Calcium/calmodulin-dependent 3',5'-cyclic nucleotide phosphodiesterase 1C with SwissProt™ accession code Q14123, while the mass of 2157.1065 Da matches with 0.38 ppm with a tryptic fragment of Calcium/calmodulindependent protein kinase kinase 2 with SwissProt™ accession code Q96RR4. Both proteins belong to the same calcium signaling pathway [21]. The finding of only part of fragments of Myosin-9 up-regulated in the glioma vessel samples may have various reasons. It could be ascribed to splicing, fragmentation of Myosin-9 by proteases, or other technical and concentration related reasons. It is not unusual that some peptides or part of one protein may be upregulated in one group. In a previous peptide-profiling study between a control and an end stage prostate cancer group (Figure 10 in ref. [2]), simultaneously up-regulated tryptic and down-regulated semi-tryptic peptide masses of Strong up-regulated peptides of Myosin-9 in the "glioma vessels group" versus the "normal vessels group" located at the C-ter-minus of the protein Figure 7 Strong up-regulated peptides of Myosin-9 in the "glioma vessels group" versus the "normal vessels group" located at the C-terminus of the protein. The strong up-regulated peptides are approximately located from amino-acid position 1301 to 1959. one protein, namely human serum albumin affected by proteases were measured [22]. Some part of myosin-9 has a specific function since calponin binds (in addition to calmodulin) a specific region S2 of the Myosin-9 rod to actin [23]. Some relatively high p-values for fibronectin in Table 5 can be ascribed by wrongly clustered peptide masses of other proteins than fibronectin. Caldesmon binds Myosin-9 in the same S2 region. It is interesting that Glia maturation factor gamma (GMF-gamma) with Swiss-Prot™ accession code O60234 is another protein preferentially expressed in human micro vascular endothelial cells that modulates actin cytoskeleton reorganization [24]. It could be measured in glioma vessels with one mass of 1762.8413 Da and a low Wilcoxon-Mann-Whitney pvalue of 8.7 *10 -3 . Glia maturation factor gamma is measured co-expressed with CDC42 effector protein 3 in stromal vascular cells (see Table 3 in ref. [25]). It is suggested that Glia maturation factor gamma in combination with the protein CDC42 plays a role in angiogenesis [24].

Mass accuracy
The theoretical mass accuracy is about 1.05 ± 0.27 ppm for the FTICR MS instrument used without centroiding. It decreases to 0.67 ± 0.48 ppm when determined on peptide masses, which appear to be presents in 18 samples or more in the peptide matrix of in total 38 samples. Probably, centroiding of the mean peptide mass takes place when taking the average over more samples. In a single high resolution MALDI-FTICR spectrum the real peak mass is between the data point with the local highest intensity and some of the data points with the next highest intensities. The measured peak mass of the data point with the highest local intensity is by change 50% left or right from the real peak maximum. It is hard to distinguish centroiding from effects of the dynamic range or ion-statistics. The mass accuracy decreases for peaks with low intensity. The peaks less frequently measured in all samples are probably also the less intensive which results in less accurate masses (Table 3). To limit performance problems a relative large peak search window of 20 ppm is chosen. The most abundant proteins have always the largest peak intensities within this window and the peptide masses of these proteins are chosen by the algorithm. There is more competition among the peptides of less abundant proteins with about equal low intensities. May be the combination of peptides from different proteins is the reason that the mass averaging does not improve accuracies by reducing statistical deviations for small peaks. The observation that the most intensive peaks do not have necessary the highest accuracy supports an additional effect of centroiding. A further improvement of the mass accuracy would significantly increase the reliability and direct identification of candidate proteins found using the MASCOT™ search engine on the SwissProt™ database. We applied a centroiding by a weighted mean calculated with various mass windows to 20 ppm in both directions of the peak mass, which is about the size of the peak-width, with a total window of 40 ppm. This did not improve the accuracy of the measured peptide masses, probably because the peaks are not symmetric. The larger number of peptide masses obtained by LC-MS demonstrates that there are a lot of overlapping peptide masses. This problem was recognized by Strittmatter et al. [26] who applied a double Gaussian fit to eliminate shoulders on the asymmetric peak distribution to get a more accurate mass. A double Gaussian fit is a special case of a Gaussian Mixture Modeling (GMM) [27][28][29][30][31], where 2 Gaussian curves are fitted through the peak distribution, with two maximum intensities at mass 1 and mass 2 separated by a distance Δ, thus mass 1 -mass 2 = Δ. We encounter the same failure of less likelihood of a convergence of masses, where mass 1 = mass 2 when applying a double Gaussian fit, but don't like to use the fixed Δ as suggested in [26].

De-isotope algorithm
The introduced de-isotope algorithm, which only removes peaks which are isotopes in an initial window of 1.0034 ± 0.0100 Da leads to more peptide masses compared to the commercial software in the processed peak lists and consequently to more hits with calculated peptide masses of proteins in the SwissProt™ database by using the MASCOT™ search engine. Although we obtained satisfactory results with the de-isotope algorithm, the relative intensities of peaks within the isotopic clusters also should be compared with the theoretical distributions. Especially, the high masses display a larger scatter of isotopic distance and could easily fall outside the restrictive acceptance window ( Figure 2). Ideally, it should be an (equipment dependent) function of mass. Since the isotopic distribution of individual elements, C, H, N, O and S of peptides is known, the abundance for each permutation of individual elements can be calculated with a multinomial distribution and the total number of atoms of the element in the peptide [32]. Combining the abundances for all the known number of different elements in a peptide, the isotopic distribution for a peptide can be calculated. Senko [13] calculates the number of each element C, H, N, O and S in a peptide by division of the peptide mass by the mass of a theoretical "averagine" amino acid, while Gay and co-workers [11] used all the tryptic fragments of the SwissProt™ database in a mass window. This resulted in a plot of the percentage intensity distribution of the 5 isotopes M 0 , M 1+ , M 2+ , M 3+ , M 4+ , and M 5+ as a function of the peptide mass in Da. Samuelsson and coworkers [12] have developed an algorithm to determine the mono-isotope masses in an overlapping cluster comparing the measured with the expected intensity distribution of the isotopic masses in a cluster. Valkenborg and co-workers [9] can distinguish peptides containing 0, 1 and 2 sulfur atoms from isotopic clusters, which would help with the analysis of proteins. The database application with such an algorithm can result in finding reliable mono-isotopic masses and perhaps leads to more protein identifications.

Conclusion
A database application is presented that can compare hundreds of raw high resolution FTICR mass spectrometry files without serious performance limitations. A new less rigorous than commercial propriety software de-isotope algorithm is introduced, that results in more mono-isotopic peptide masses and consequently more protein identifications. From the peptide masses in the mass spectrometry files 2 peptide profile matrices are created taking the average of peptide masses in different samples within a mass window of 3 ppm and listing for the individual samples the 1) presence or absence of the peptide peaks and 2) peak intensities. The mass accuracy of the Java™ application is predominantly influenced by the data point resolution in the raw FTICR mass spectrometry files. Centroiding of peptide masses takes place by taking the average over more spectra in the profile matrix. The usage of raw MS spectra instead of peak lists results in a more reliable comparison of peak intensities between groups. The Wilcoxon-Mann-Whitney test can be performed on the intensity matrix, using raw spectrometry data. From this test it appears that cytoskeleton proteins and proteins involved in the calcium signaling pathway seem to be most up-regulated. Tryptic fragments at the C-terminus of the Myosin-9 protein are more up-regulated in glioma vessels compared to the peak intensities observed in normal vessels. The Wilcoxon-Mann-Whitney p-values of peptide fragments show a significant decline from the Nterminus of Myosin-9. The software described in this paper gives a new opportunity to find and quantify significantly differentially expressed peptides close to noise level in clinical samples.