Normalization method for metabolomics data using optimal selection of multiple internal standards
 Marko SysiAho^{1},
 Mikko Katajamaa^{2},
 Laxman Yetukuri^{1} and
 Matej Orešič^{1}Email author
DOI: 10.1186/14712105893
© SysiAho et al; licensee BioMed Central Ltd. 2007
Received: 17 November 2006
Accepted: 15 March 2007
Published: 15 March 2007
Abstract
Background
Success of metabolomics as the phenotyping platform largely depends on its ability to detect various sources of biological variability. Removal of platformspecific sources of variability such as systematic error is therefore one of the foremost priorities in data preprocessing. However, chemical diversity of molecular species included in typical metabolic profiling experiments leads to different responses to variations in experimental conditions, making normalization a very demanding task.
Results
With the aim to remove unwanted systematic variation, we present an approach that utilizes variability information from multiple internal standard compounds to find optimal normalization factor for each individual molecular species detected by metabolomics approach (NOMIS). We demonstrate the method on mouse liver lipidomic profiles using Ultra Performance Liquid Chromatography coupled to high resolution mass spectrometry, and compare its performance to two commonly utilized normalization methods: normalization by l_{2} norm and by retention time region specific standard compound profiles. The NOMIS method proved superior in its ability to reduce the effect of systematic error across the full spectrum of metabolite peaks. We also demonstrate that the method can be used to select best combinations of standard compounds for normalization.
Conclusion
Depending on experiment design and biological matrix, the NOMIS method is applicable either as a onestep normalization method or as a twostep method where the normalization parameters, influenced by variabilities of internal standard compounds and their correlation to metabolites, are first calculated from a study conducted in repeatability conditions. The method can also be used in analytical development of metabolomics methods by helping to select best combinations of standard compounds for a particular biological matrix and analytical platform.
Background
Metabolomics is a discipline dedicated to the global study of metabolites, their dynamics, composition, interactions, and responses to interventions or to changes in their environment, in cells, tissues, and biofluids. Concentration changes of specific groups of metabolites may be descriptive of systems responses to environmental or genetic interventions, and their study may therefore be a powerful tool for characterization of complex phenotypes [1–3] as well as for development of biomarkers for specific physiological responses [4, 5].
Study of the variability of metabolites in different states of biological systems is therefore an important task of systems biology. As we are primarily interested in systems responses resulting in metabolite level regulation as related to diverse genetic or environmental changes, it is important to separate such interesting biological variation from obscuring sources of variability introduced in experimental studies of metabolites. Since multiple experimental platforms are commonly applied in the studies of metabolites [6, 7], the sources of the obscuring variation are many and platform specific [8]. Such sources include variability arising from inhomogeneity of samples, their lability and inevitable minor differences in sample preparation. In mass spectrometry based detection, the sources include the variations in the ion source as well as matrix specific effects such as ion suppression [9]. Following the measurement, the data preprocessing steps such as peak detection, peak integration and alignment may introduce an additional error.
Chemical diversity of metabolites, leading for example to different recoveries during extraction or responses during ionization in mass spectrometer, makes separation of interesting and obscuring variation a difficult task. Quantitative analytical methods have commonly relied on utilization of isotope labeled internal standard for each metabolite measured. However, in broad metabolic profiling approaches this is not practical. The number of metabolites is large, they are chemically too diverse to afford a common labeling approach, and many of them may not even be known. The availability of stable isotope labeled references is generally also very limited.
Strategies for normalization of metabolic profile data can be divided into two major categories:

Statistical models used to derive optimal scaling factors for each sample based on complete dataset [10], such as normalization by unit norm [11] or median [12] of intensities, or the maximum likelihood method [2] adopted from the approach developed for gene expression data [13].

Normalization by a single or multiple internal or external standard compounds based on empirical rules, such as specific regions of retention time [14].
The choice of multiple internal (added to sample prior to extraction) and external (added to sample after extraction) standard compounds may be a more reasonable choice, but even in that case the assignment of the standards to normalize specific peaks remains unclear. One possible approach is to assign a specific standard to metabolite peaks based on similarity in specific chemical property such as retention time in liquid chromatography (LC) column. For example, Bijlsma and colleagues utilize three external standard references for lipid profiling, chosen as mono, di, and triacyl lipid species representing most abundant lipid classes in their respective region of retention time [14]. Such approach still suffers from at least two problems:

The retention time is not necessarily descriptive of all matrix and chemical properties leading to obscuring variation. For example, in the lipid separation based on reverse phase LC diverse lipid species such as ceramides, sphingomyelins, diacylglycerols, and several phospholipid classes, are overlapping in retention time, and it is not reasonable to assume same normalization factor can be applied to all these species. The situation is even more complex when analyzing water soluble metabolites.

The normalization by a single molecular component is very sensitive to its own obscuring variation, which becomes a problem in very complex samples where matrix specific effects such as ion suppression may play an important role.
Recently we introduced a related approach for liquid chromatography – mass spectrometry (LC/MS) that normalizes metabolites based on multiple internal standards, with the normalization factor based on distance to the metabolite peaks both in the retention time and masstocharge ratio (m/z) [17]. While such method partially resolves the second issue listed above, it still suffers from the ad hoc assignments of internal standard(s) for each component based on a subset of relevant chemical properties.
None of the normalization methods mentioned above systematically take advantage of the obscuring variability that can be learned from the measured data itself. For example, monitoring multiple standard compounds across multiple sample runs may help determine how the standards are correlated, which variation is specific to a particular standard, and which patterns of variation are shared between the measured metabolites and the standards so they can be removed. In this paper we present a new approach to normalization of metabolomics data aiming to address these issues and develop a mathematical model that optimally assigns normalization factors for each metabolite measured based on internal standard profiles. We demonstrate the approach on mouse liver lipid profiling using UPLC/MS, and compare its performance to two other commonly utilized approaches: normalization by l_{2} vector norm and by retention time region specific standard compounds. We discuss method performance and several application possibilities.
Results and discussion
Formulation of the normalization model
The unnormalized metabolomics data resulting from first stages of preprocessing, usually including peak detection and alignment [17], can be represented by a matrix of N variables (metabolite peaks) and M objects (samples). For example, in liquid chromatography mass spectrometry (LC/MS) based profiling; each peak is represented by mass to charge ratio (m/z) and retention time (rt).
In the rest of the text we will use the following notation:

i parameterizes peaks: i → {m/z, rt} and i = 1...N.

s parameterizes peaks from internal standard compounds: s → {m/z, rt} and s = 1...S

j parameterizes experiment runs: j = 1...M.

X is N by M intensity matrix of metabolite peak profiles, with elements X_{ ij }

Z is S by M intensity matrix of standard compound peak profiles, with elements Z_{ sj }.
Variation arising from the above mentioned sources is often intensity (or metabolite concentration) dependent, larger variation being associated with higher intensities. The property that the magnitude of variation is not constant is commonly referred to as heteroscedasticity. Therefore, it is reasonable to assume a multiplicative model for the observed intensities:
X_{ ij }= m_{ i }× r_{ ij }(Z) × e_{ ij }, (1)
where m_{ i }the intensity independent of the run (i.e. true intensity value), r_{ ij }(function of Z) is the correction factor, and e_{ ij }the random error. We assume that the true intensity value depends only on index i. In practice this assumption means that we independently measure several samples from one biological specimen (e.g. under repeatability conditions). This assumption is crucial when the normalization model is trained, i.e., when the parameters of the model are learned from the data, but it can be relaxed when the normalization is applied to a new set of data. Reasons for this will become clear below.
Y = log X, Ω = log Z, μ = log m, ρ(Ω) = log r (Z), ε = log e (2)
to obtain an additive model:
Y_{ ij }= μ_{ i }+ ρ_{ ij }(Ω) + ε_{ ij }. (3)
The randomness in the values of Y is modeled by the error ρ_{ ij }that is drawn from the normal distribution with zero mean and variance ${\sigma}_{i}^{2}$:
ε_{ ij }~ N(0, ${\sigma}_{i}^{2}$). (4)
We aim at removing the effect of ρ_{ij} in Equation (3) that we presume to represent such variation in the data that can be explained with changes in the levels of the standard compounds. For the sake of simplicity, we treat the observed values of the standards Ω as explanatory variables without modeling their error. We parameterize ρ as a linear function of the levels of internal standards:
where the average 〈 〉 is taken over the samples j = 1...M, i.e. $\u3008{\Omega}_{s.}\u3009\equiv \frac{1}{M}{\displaystyle {\sum}_{j}{\Omega}_{sj}}$. The parameters β therefore relate the variability of internal standard intensities with the variability of intensities of endogenous metabolite peaks, i.e. bigger the parameter β_{ is }, bigger is the contribution of internal standard's s variability to the normalization correction factor of metabolite peak i.
From the equations (3–5) it follows that Y_{ ij }can be modeled as normally distributed,
Y_{ ij }~ N(μ_{ i }+ ρ_{ ij }, ${\sigma}_{i}^{2}$), (6)
therefore the log likelihood L of observing data Y under the assumption of normality is
We note that the simple form of Equation (7) is due to the assumption of independence of the random errors in Equation (4), both across different sample measurements and across different metabolites. While the former assumption is easy to accept, the latter assumption is arguable, because it is well known that coregulated metabolites are highly correlated [18]. However, in order to keep the number of parameters in the model moderate, we decided to adhere to the latter assumption, being aware of its possible effect on the precision of the parameter estimates [19].
We solve for the values of parameters μ_{ i }, β_{ is }, and ${\sigma}_{i}^{2}$ that maximize the log likelihood of observing the data:
Setting ∂$\mathcal{L}$/∂μ_{ i }= 0, ∂$\mathcal{L}$/∂β_{ is }= 0, and ∂$\mathcal{L}$/∂σ_{i} = 0 leads to the following equations:
Since ${\sum}_{j}{\Omega}_{sj}}\equiv {\displaystyle {\sum}_{j}\u3008{\Omega}_{s.}\u3009$, the Equation (9) leads to
μ_{ i }= 〈Y_{i.}〉. (12)
The Equation (10) can be written as a matrix product:
where
correlates internal standards and endogenous metabolite peaks, while
is a covariance matrix for the internal standards. Multiplying each side of Equation (13) by the inverse of matrix Σ, the estimates for the parameters β can be written as a product of two matrices:
where the hat notation means that the matrices are evaluated using the actually observed data Y. Based on the multiplicative error model from Equation(1), the normalized intensities for each peak are then calculated as
Once the model has been trained, i.e., the parameter β has been estimated, it can be applied to a new data of samples from a similar biological experiment with arbitrary true metabolite levels, that is, in Equation 1 the true level, m_{ ij }, can be sample dependent.
Normalization in case of a single internal standard
 (16)
and (17) straightforwardly lead to:
where
We write the internal standard levels as
Log Z_{1j}= C + ω_{ j }, (21)
with C = 〈Ω_{1.}〉 being the mean and ω_{ j }deviation of sample j from the mean:
ω_{ j }= Ω_{1j} 〈Ω_{1.}〉. We model the endogenous metabolite peaks as:
log X_{ ij }= T_{ i }+ β_{ i }ω_{ j }+ ε_{ i j }, (22)
where T_{ i }is the true log intensity of metabolite i's peak, β_{i} is a parameter that describes by how many units the log intensity of peak i changes when the log intensity of the standard increases by one unit, and ε_{ij} is a random error drawn from a normal distribution with zero mean. The coefficients r_{i 1}and ${\sigma}_{1}^{2}$ from Equations (19) and (20) can then be written as
If we ignore the effect of the residual term in the r_{i 1}/${\sigma}_{1}^{2}$ ratio:
then the Equation (18) reduces to
Where M = exp(C). From Equation (16) we see that β_{ i }= c_{ i }/c_{11}, with c_{i 1}and c_{11} being estimators for the covariance between metabolite i and the standard and the variance of the standard respectively. The interpretation of the result is now straightforward. For example, if β_{ i }= 1, i.e. the covariance of metabolite i with the standard is of the order of the variance of the standard, then the Equation (26) describes a simple correction by the internal standard
Such correction is commonly applied to specific metabolites when their corresponding isotope labeled standards are available. In contrast, if a specific metabolite is uncorrelated to the internal standard, β_{ i }= 0, and the normalization factor is 1, leading to $\tilde{X}$_{ ij }= X_{ ij }. Thus, if the linear association between a metabolite and the standard is weak, the NOMIS method reduces the extent of normalization.
In the following section we demonstrate the NOMIS method using the multiple internal standard applications in real biological samples.
Method performance and comparison to other methods: mouse liver lipidomics dataset
Lipid internal standards The list of internal standards utilized in the demonstrations of the paper, their abbreviations, common names, amount in the sample, retention time in the UPLC/MS method described in the Methods, mean intensity as peak height, and coefficient of variance based on the 16run liver repeatability study.
Abbreviation  Name  Amount (μ g/sample)  Retention time (s)  Mean Intensity  CV 

LPC  GPCho(17:0/0:0)  6.408  210  5574  0.118 
Cer  Cer(d18:1/17:0)  1.832  381  1044  0.197 
PC  GPCho(17:0/17:0)  0.198  388  521  0.111 
PE  GPEth(17:0/17:0)  1.790  392  316  0.134 
TAG  TG(17:0/17:0/17:0)  2.072  543  202  0.335 
The performance of the NOMIS method is compared to two other methods. The first is a commonly utilized normalization by l_{2} vector norm (abbreviated as L2N) [10]:
where the average 〈 〉 is taken over the samples j = 1...M. The second method is essentially the same as in [14], based on the application of three internal standard compounds (3STD) with the choices of retention time ranges reflecting the analytical method used: LPC standard is applied for peaks with rt < 300s, PC standard for 300s <rt < 410s, and TAG standard for rt > 410s.
We utilize coefficient of variation (CV) as the main performance measure for normalization methods. The CV is defined as the ratio of the standard deviation and the mean:
As the overall measure of variability we apply median CV:
MCV ≡ median {CV_{ i }}_{i =1...N} (30)
Heteroscedasticity
Calculation of the β matrix using the log transformation is of potential concern because such transformation, while efficient in correcting for heteroscedasticity, may also amplify the high variability of low abundance metabolites [8, 21]. The log transformation is also unable to deal with value zero.
We deal with the problem of zeros in log transformation by utilizing the postalignment peak picking algorithm from the MZmine software [20]. In case of datasets utilized in this paper no exact zeros were found among the 1470 peaks following such processing. Application of the NOMIS method to selection of the internal standard mixture. The systematic study of the results obtained by the NOMIS method can also be utilized for selection of standard compound mixture used in the analytical method. This may be useful in practical analytical work as more standards do not necessarily guarantee better quality of normalization. It is also important to gain understanding of how each individual standard affects the variability of individual molecular species across the full spectra.
While the variability of the TAG standard is high (CV = 0.335), its inclusion with the other standards still improved the MCV from 0.129 to 0.116. The TAG standard in combination with other standards can therefore model the variability of some metabolites better than the four other standards alone. For example, the correlation of the triacylglycerol levels with the TAG standard is higher than with other standards. The median Pearson correlation coefficient between the internal standard levels and each of the 184 identified TAG species is 0.25, compared to 0.30, 0.17, 0.18, and 0.08 for LPC, Cer, PC, and PE standards, respectively.
The results from analysis of different internal standard combinations above suggest the NOMIS method can be a valuable tool in analytical development. Different biological matrices and different analytical platforms may require different combinations of standards for optimal normalization and systematic evaluation of different standards as illustrated above may provide the necessary clues.
Investigation of the results in context of the identified molecular species
Comparison of coefficients of variance for three lipid classes The raw data variability for identified lipid species of the same class is compared to the results from the NOMIS method, as well as to results adjusted for an internal standard of the same class.
Raw data  NOMIS  Internal standard  

Lysophosphatidylcholines (N = 13)  0.245  0.094  0.221 
Phosphatidylcholines (N = 74)  0.183  0.100  0.209 
Triacylglycerols (N = 184)  0.227  0.146  0.308 
Normalization using β matrix obtained independently
The matrix β relates the variability of each individual metabolite with that of internal standards for a specific platform and biological matrix. Therefore, it is possible that the parameters β are obtained from a separate repeatability experiment involving large number of repeated measurements. This may often be desirable due to a large number of normalization parameters N x (S + 2) determined by the method. The correction factors from Equation (17) in a real biological application then include the matrix β obtained independently and the measured levels of internal standards Ω_{ sj }from the biological experiment.
 (16)
(16) and demonstrated in Figure 7 for the case of lipids, the β matrix captures the variation of all detected metabolites in biological matrix as modeled by selected standard compounds. In case of repeatability runs demonstrated in this paper, the variability modeled is due to the obscure variation introduced in experimental studies of metabolites. We therefore believe the best usage of the NOMIS method should include a large run in repeatability conditions for a specific platform and specific biological matrix (i.e. biofluid or tissue type) in order to obtain a β matrix, which would then be applied to normalization of other samples using the same standard compounds and peak lists. The latter requires that the analytical method is sufficiently accurate and precise so that one can reliably track a specific set of peaks within a specific biological matrix even if some peaks remain unidentified. The β matrix may even be updated when new runs are made and we believe there are opportunities to develop sophisticated probabilistic methods to model and update the β matrix based on new experimental data.
The NOMIS normalization model is derived from the variabilities and correlation structure observed in data measured under repeatability conditions and does not specifically model different sources of systematic variation, incl. ion suppression. Therefore, as long as the assumption of multiplicative error model is valid to a reasonable extent, the NOMIS approach may be applicable. The heteroscedasticity of GC/MS spectra has in fact already been studied and demonstrated earlier [21]. We therefore believe the NOMIS approach may be applicable to analytical platforms other than LC/MS demonstrated in this paper.
Conclusion
We introduced a new method for normalization of metabolomics data which utilizes variability information from multiple internal standard compounds to find optimal normalization factor for each individual molecular species detected (NOMIS). The method proved superior to two other commonly utilized normalization strategies in its ability to reduce variability across the full spectrum of metabolites.
The NOMIS method can be used directly as a onestep normalization method or as a twostep method where the normalization parameters containing information about the variabilities of internal standard compounds and their association to variabilities of metabolites are first calculated from a study carried in repeatability conditions. Additionally, the method can be used to select standard compounds for normalization and evaluate their influence on variability of all detected metabolites.
While we focused on applications of NOMIS to LC/MS based approaches; we believe the same strategy can be applied to other analytical platforms used in metabolomics, as well as to other levels of molecular profiling such as mass spectrometry based proteomics.
Methods
Liver LC/MS based lipid profiling
An aliquot of 20 μl of an internal standard mixture (5 reference compounds at concentration level 83–10 μg/ml), 50 μl of 0.15 M sodium chloride and chloroform: methanol (2:1) (200 μl) was added to the tissue sample (203–30 mg). The sample was homogenized, vortexed (2 min) let to stand (1 hour for liver) and centrifuged at 10000 RPM for 3 min. From the separated lower phase, an aliquot was mixed with 10 μl of a labelled standard mixture (3 stable isotope labelled reference compounds at concentration level 93–11 μg/ml) and 0.53–1.0 μl injection was used for LC/MS analysis.
Total lipid extracts were analysed on a Waters QTof Premier mass spectrometer combined with an Acquity Ultra Performance LC™ (UPLC). The column, which was kept at 50°C, was an Acquity UPLCTM BEH C18 10 × 50 mm with 1.7 μm particles. The binary solvent system (flow rate 0.200 ml/min) included A. water (1% 1 M NH_{4}Ac, 0.1% HCOOH) and B. LC/MS grade (Rathburn) acetonitrile/isopropanol (5:2, 1% 1 M NH_{4}Ac, 0.1% HCOOH). The gradient started from 65% A/35% B, reached 100% B in 6 min and remained there for the next 7 min. The total run time per sample, including a 5 min reequilibration step, was 18 min. The temperature of the sample organizer was set at 10°C.
Mass spectrometry was carried out on QTof Premier (Waters, Inc.) run in ESI+ mode. The data was collected over the mass range of m/z 3003–1600 with a scan duration of 0.08 sec. The source temperature was set at 120°C and nitrogen was used as desolvation gas (800 L/h) at 250°C. The voltages of the sampling cone and capillary were 39 V and 3.2 kV respectively and collision energy 5 V, respectively. Reserpine (50 μg/L) was used as the lock spray reference compound (10 μl/min; 10 sec scan frequency).
Data processing including peak detection, alignment, and deisotoping, was performed using the MZmine software version 0.60 [17]. Identification was performed based on an internal reference database of lipid species. The implementation of normalization methods and data analysis were performed using Matlab version 7.2 (Mathworks, Inc.).
Abbreviations
 MS:

Mass spectrometry.
 UPLC™:

Ultra Performance Mass Spectrometry (Waters, Inc.).
 LC/MS:

Liquid chromatography – mass spectrometry.
 GC/MS:

Gas chromatography – mass spectrometry.
 QTof:

Quadrupole – time of flight mass spectrometer.
 CV:

Coefficient of variance.
 MCV:

Median coefficient of variance.
 m/z:

Masstocharge ratio (m is molecular mass and z is charge of the ion).
 NOMIS:

Normalization using Optimal selection of Multiple Internal Standards (the method introduced in this paper).
 3STD:

Normalization by retentiontimeregionspecific standard compounds.
 L2N:

Sum of squares normalization.
 LPC:

lysophosphatidylcholine
 Cer:

ceramide
 PC:

phosphatidylcholine
 PE:

phosphatidylethanolamine
 TAG:

triacylglycerol
Declarations
Acknowledgements
This work was funded by Academy of Finland Grants 111338 and 8207492, and by the Marie Curie International Reintegration Grant from the European Community. The authors thank to Tuulikki SeppänenLaakso and Tapani Suortti for performing the analytical work.
Authors’ Affiliations
References
 Raamsdonk LM, Teusink B, Broadhurst D, Zhang N, Hayes A, Walsh MC, Berden JA, Brindle KM, Kell DB, Rowland JJ, Westerhoff HV, van Dam K, Oliver SG: A functional genomics strategy that uses metabolome data to reveal the phenotype of silent mutations. Nat Biotech 2001, 19: 45–50. 10.1038/83496View ArticleGoogle Scholar
 Oresic M, Clish CB, Davidov EJ, Verheij E, Vogels JTWE, Havekes LM, Neumann E, Adourian A, Naylor S, Greef J, Plasterer T: Phenotype characterization using integrated gene transcript, protein and metabolite profiling. Appl Bioinformatics 2004, 3: 205–217. 10.2165/0082294220040304000002View ArticlePubMedGoogle Scholar
 Oresic M, VidalPuig A, Hanninen V: Metabolomic approaches to phenotype characterization and applications to complex diseases. Expert Rev Mol Diagn 2006, 6: 575–585. 10.1586/14737159.6.4.575View ArticlePubMedGoogle Scholar
 Pauling L, Robinson AB, Teranishi R, Cary P: Quantitative analysis of urine vapor and breath by gasliquid partition chromatography. Proc Nat Acad Sci U S A 1971, 68: 2374–2376. 10.1073/pnas.68.10.2374View ArticleGoogle Scholar
 van der Greef J, Davidov E, Verheij E, Vogels JTWE, van der Heijden R, Adourian AS, Oresic M, Marple EW, Naylor S: The role of metabolomics in systems biology: A new vision for drug discovery and development. In Metabolic profiling: Its role in biomarker discovery and gene function analysis. Edited by: Harrigan GG and Goodacre R. Boston, Mass., Kluwer Academic Publishers; 2003:171–198.View ArticleGoogle Scholar
 van der Greef J, Stroobant P, Heijden R: The role of analytical sciences in medical systems biology. Curr Opin Chem Biol 2004, 8: 559–565. 10.1016/j.cbpa.2004.08.013View ArticlePubMedGoogle Scholar
 Goodacre R, Vaidyanathan S, Dunn WB, Harrigan GG, Kell DB: Metabolomics by numbers: acquiring and understanding global metabolite data. Trends Biotechnol 2004, 22: 245–252. 10.1016/j.tibtech.2004.03.007View ArticlePubMedGoogle Scholar
 van den Berg RA, Hoefsloot HC, Westerhuis JA, Smilde AK, van der Werf MJ: Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics 2006, 7: 142. 10.1186/147121647142PubMed CentralView ArticlePubMedGoogle Scholar
 de Hoffmann E, Stroobant V: Mass spectrometry: Principles and applications. 2.th edition. , John Wiley & Sons; 2001.Google Scholar
 Crawford LR, Morrison JD: Computer methods in analytical mass spectrometry. Identification of an unknown compound in a catalog. Anal Chem 1968, 40: 1464–1469. 10.1021/ac60266a027View ArticleGoogle Scholar
 Scholz M, Gatzek S, Sterling A, Fiehn O, Selbig J: Metabolite fingerprinting: detecting biological features by independent component analysis. Bioinformatics 2004, 20: 2447–2454. 10.1093/bioinformatics/bth270View ArticlePubMedGoogle Scholar
 Wang W, Zhou H, Lin H, Roy S, Shaler TA, Hill LR, Norton S, Kumar P, Anderle M, Becker CH: Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. Anal Chem 2003, 75: 4818 44826. 10.1021/ac026468xView ArticlePubMedGoogle Scholar
 Hartemink AJ, Gifford DK, Jaakkola TS, Young RA: Maximum likelihood estimation of optimal scaling factors for expression array normalization. In Microarrays: optical technologies and informatics Proceedings of SPIE (vol 4266) Edited by: Bittner M, Chen Y and Dorsel A. 2001, 132–140.View ArticleGoogle Scholar
 Bijlsma S, Bobeldijk I, Verheij ER, Ramaker R, Kochhar S, Macdonald IA, vanOmmen B, Smilde AK: Largescale human metabolomics studies: A strategy for data (pre) processing and validation. Anal Chem 2006, 78: 567–574. 10.1021/ac051495jView ArticlePubMedGoogle Scholar
 Aitchison J: The Statistical Analysis of Compositional Data. Caldwell, NJ, The Blackburn Press; 2003.Google Scholar
 Zhang Y, Proenca R, Maffei M, Barone M, Leopold L, Friedman JM: Positional cloning of the mouse obese gene and its human homologue. Nature 1994, 372: 425–432. 10.1038/372425a0View ArticlePubMedGoogle Scholar
 Katajamaa M, Oresic M: Processing methods for differential analysis of LC/MS profile data. BMC Bioinformatics 2005, 6: 179. 10.1186/147121056179PubMed CentralView ArticlePubMedGoogle Scholar
 Steuer R, Kurths J, Fiehn O, Weckwerth W: Observing and interpreting correlations in metabolomic networks. Bioinformatics 2003, 19: 1019–1026. 10.1093/bioinformatics/btg120View ArticlePubMedGoogle Scholar
 Diggle P, Heagerty P, Liang KY, Zeger S: Analysis of longitudinal data. 2nd edition edition. New York, Oxford University Press; 2002.Google Scholar
 Katajamaa M, Miettinen J, Oresic M: MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data. Bioinformatics 2006, 22: 634–636. 10.1093/bioinformatics/btk039View ArticlePubMedGoogle Scholar
 Kvalheim OM, Brakstad F, Liang Y: Preprocessing of analytical profiles in the presence of homoscedastic or heteroscedastic noise. Anal Chem 1994, 66: 43–51. 10.1021/ac00073a010View ArticleGoogle Scholar
 Cleveland WS: Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc 1979, 74: 829–836. 10.2307/2286407View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.