Convex-hull voting method on a large data set

Ellingson, Sally R; Wang, Chi; Nagarajan, Radhakrishnan

doi:10.1186/1471-2105-16-S15-P2

Volume 16 Supplement 15

Proceedings of the 14th Annual UT-KBRIN Bioinformatics Summit 2015

Poster presentation
Open access
Published: 23 October 2015

Convex-hull voting method on a large data set

Sally R Ellingson^1,3,
Chi Wang^2,4 &
Radhakrishnan Nagarajan¹

BMC Bioinformatics volume 16, Article number: P2 (2015) Cite this article

1251 Accesses
Metrics details

Background

Genes work in concert as a system, not as independent entities, to mediate disease states. There has been considerable interest in understanding variations in molecular signatures between normal and disease states. The selective-voting convex-hull ensemble procedure accommodates molecular heterogeneity within and between groups and allows retrieval of sample-specific sets and investigation of variations in individual networks relevant to personalized medicine[1]. The work here describes using the convex-hull voting method on a large data set. Using parallelization techniques, we predict that we can execute the convex-hull voting algorithm on the University of Kentucky cluster (DLX) using a dataset much too large to run in a feasible time on a single machine.

Materials and methods

Normalized RNA-seq data for 208 samples (104 matched normal/tumor pairs) from TCGA breast carcinoma data set were downloaded and analyzed by the edgeR package, which identified 2,882 differentially expressed genes with at least a 2-fold difference between tumor and normal samples and at 1% false discovery rate. The convex-hull voting method¹ was applied to data from the differentially expressed genes. A general idea of the algorithm including levels of parallelism is given in Figure 1.

A parallel-for loop is used within the R code allowing multiple processors within a node to concurrently perform the voting calculations of different sample pairs within one iteration. Then multiple jobs are submitted to perform the randomized iterations. This turns a computationally intensive problem into a data intensive problem since each iteration produces just over 6 GBs of data.

Results

The final runtime of one iteration of the large dataset was just under 34 hours and up to 32 iterations can run concurrently. The entire run of 100 iterations using this large data set took less than a week time.

Conclusions

Future work will involve the parallelization of the entire computationally and data intensive steps in a way that reduces the complexity of job submission and scalability of the entire job. Computing paradigms such as Hadoop are being explored for this task.

References

Nagarajan R, Kodell RL: A Selective Voting Convex-Hull Ensemble Procedure for Personalized Medicine. AMIA Summits on Translational Science Proceedings. 2012, 2012: 87-94.
PubMed Central Google Scholar

Download references

Acknowledgements

This research was supported by the Cancer Research Informatics and the Biostatistics and Bioinformatics Shared Resource Facilities of the University of Kentucky Markey Cancer Center (P30CA177558) and the University of Kentucky Center for Computational Sciences.

Author information

Authors and Affiliations

Division of Biomedical Informatics, College of Public Health, University of Kentucky, Lexington, KY, 40536, USA
Sally R Ellingson & Radhakrishnan Nagarajan
Division of Cancer Biostatistics, College of Public Health, University of Kentucky, Lexington, KY, 40536, USA
Chi Wang
Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, 40536, USA
Sally R Ellingson
Biostatistics and Bioinformatics Shared Resource Facility, Markey Cancer Center, Lexington, KY, 40536, USA
Chi Wang

Authors

Sally R Ellingson
View author publications
You can also search for this author in PubMed Google Scholar
Chi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Radhakrishnan Nagarajan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sally R Ellingson.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Ellingson, S.R., Wang, C. & Nagarajan, R. Convex-hull voting method on a large data set. BMC Bioinformatics 16 (Suppl 15), P2 (2015). https://doi.org/10.1186/1471-2105-16-S15-P2

Download citation

Published: 23 October 2015
DOI: https://doi.org/10.1186/1471-2105-16-S15-P2

Proceedings of the 14th Annual UT-KBRIN Bioinformatics Summit 2015

Convex-hull voting method on a large data set

Background

Materials and methods

Results

Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

Proceedings of the 14th Annual UT-KBRIN Bioinformatics Summit 2015

Convex-hull voting method on a large data set

Background

Materials and methods

Results

Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us