Statistical elimination of spectral features with large between-run variation enhances quantitative protein-level conclusions in experiments with data-independent spectral acquisition

Cheng, Lin-Yang; Liu, Yansheng; Chang, Ching-Yun; Röst, Hannes; Aebersold, Ruedi; Vitek, Olga

doi:10.1186/1471-2105-16-S2-A4

Volume 16 Supplement 2

Highlights from the Tenth International Society for Computational Biology (ISCB) Student Council Symposium 2014

Meeting abstract
Open access
Published: 28 January 2015

Statistical elimination of spectral features with large between-run variation enhances quantitative protein-level conclusions in experiments with data-independent spectral acquisition

Lin-Yang Cheng¹,
Yansheng Liu²,
Ching-Yun Chang¹,
Hannes Röst²,
Ruedi Aebersold^2,3 &
…
Olga Vitek⁴

BMC Bioinformatics volume 16, Article number: A4 (2015) Cite this article

1279 Accesses
1 Citations
1 Altmetric
Metrics details

Background

Many proteomic investigations summarize the quantitative information across multiple spectral features into protein-level conclusions. Data-independent spectral acquisition (DIA) now generates a lot of interest, as it allows us to quantify many spectral features in a single run. However, the disadvantage of DIA experiments as compared, e.g., to Selected Reaction Monitoring (SRM) is that the features are subject to interferences and noise. We argue that between-run variation provides an additional insight for distinguishing good-quality and noisy DIA features. To appropriately use the quantitative between-run variation, it is important to account for the properties experimental design, and distinguish random artifacts from the biological changes. We have previously proposed a method (Chang et al., ASMS 2013) that accounts for the experimental design to eliminate features with low information content.

Results

In this project we furthermore emphasized that conducting regularization helps us avoid exploring every subset of features exhaustively, and allows us to conduct hypothesis tests later on so that we would be able to control the false discovery rate of the feature selection process. We evaluated our proposed approach by using three datasets that have some notion of ground truth: an extensive simulation study, a controlled mixture where proteins were spiked into a complex background in known concentrations, and a study of 232 plasma samples, where 18 proteins were quantified in both SWAH and SRM mode in presence of heavy labeled reference peptides. We worked on [1] protein-level estimates of fold changes between conditions, [2] sensitivity and specificity of detecting changes in protein abundance, and [3] accuracy of relative quantification of protein abundance in individual biological samples. A family of linear mixed models similar to that in MSstats http://www.msstats.org were fit to all the datasets. Then we conducted the regularization and hypothesis test to control the selection false discovery rate.

Conclusion

The results demonstrated that our proposed feature selection approach enhanced sensitivity and specificity of the conclusions, was robust to the amount of noisy fragments, and increased the correlation of subject quantification between SRM and DIA workflows. Importantly, the performance exceeded that of the frequently used 'top 3' approach, which consists of using three spectral features with the highest average intensity between runs. Furthermore, we showed that our proposed approach outperforms using correlation to select the information features.

References

Clough T, Thaminy S, Ragg S, Aebersold R, Vitek O: Statistical protein quantification and significance analysis in label-free LC-MS experiments with complex designs". BMC Bioinformatics. 2012, 13: S16-
Article Google Scholar
Chang CY, Picotti P, Hüttenhain R, Heinzelmann-Schwarz V, Jovanovic M, Aebersold R, Vitek O: Protein significance analysis in selected reaction monitoring (SRM) measurements. Molecular and Cellular Proteomics. 2012, 11: Article M111.014662
Google Scholar
Choi M, Chang CY, Clough T, Broudy D, Killeen T, MacLean B, Vitek O: MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments. Bioinformatics. 2014
Google Scholar
Lockhart R, Taylor J, Tibshirani R, Tibshirani R: A significance test for the lasso. The Annals of Statistics. 2014, 42:
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Purdue University, West Lafayette, IN, USA
Lin-Yang Cheng & Ching-Yun Chang
Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093, Zurich, Switzerland
Yansheng Liu, Hannes Röst & Ruedi Aebersold
Faculty of Science, University of Zurich, 8057, Zurich, Switzerland
Ruedi Aebersold
Department of Computer Science, Purdue University, West Lafayette, IN, USA
Olga Vitek

Authors

Lin-Yang Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Yansheng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ching-Yun Chang
View author publications
You can also search for this author in PubMed Google Scholar
Hannes Röst
View author publications
You can also search for this author in PubMed Google Scholar
Ruedi Aebersold
View author publications
You can also search for this author in PubMed Google Scholar
Olga Vitek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lin-Yang Cheng.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Cheng, LY., Liu, Y., Chang, CY. et al. Statistical elimination of spectral features with large between-run variation enhances quantitative protein-level conclusions in experiments with data-independent spectral acquisition. BMC Bioinformatics 16 (Suppl 2), A4 (2015). https://doi.org/10.1186/1471-2105-16-S2-A4

Download citation

Published: 28 January 2015
DOI: https://doi.org/10.1186/1471-2105-16-S2-A4

Highlights from the Tenth International Society for Computational Biology (ISCB) Student Council Symposium 2014

Statistical elimination of spectral features with large between-run variation enhances quantitative protein-level conclusions in experiments with data-independent spectral acquisition

Background

Results

Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

Highlights from the Tenth International Society for Computational Biology (ISCB) Student Council Symposium 2014

Statistical elimination of spectral features with large between-run variation enhances quantitative protein-level conclusions in experiments with data-independent spectral acquisition

Background

Results

Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us