Mining the human proteome for conserved mechanisms
© Naulaerts et al.; licensee BioMed Central Ltd. 2015
Published: 13 February 2015
All cells are subject to ever-changing environments to which they have to adapt, using their sensory system to provide input for the regulatory systems that integrate the information and trigger the eventual effectors. These cascades constitute a very complex cellular wiring that is highly relevant due to its medical importance. The omni-present application of high-throughput analysis techniques has resulted in an unprecedented level of available detail about gene expression and various aspects of cellular proteins, such as abundance, function and localization, often captured in well-curated compendia that are publicly available.
Although these information-rich inventories exist, the adaptive nature of protein complexes and signalling cascades remain poorly understood, as the current predominant approaches are not always suited to describe the associations between proteins. For example, binary protein interactions do not necessarily occur in vivo as the proteins could be expressed in different compartments of the cell or at different time points. This severely complicates the analysis of any protein interaction data. It thus remains a challenge to find out how biological entities cooperate to regulate cellular response to stimuli.
We used an integrative method, reliant on advanced pattern mining approaches to gain a deeper understanding of protein network dynamics. To this end, we created a compendium consisting of a large amount of proteomics papers for Homo sapiens that report differentially expressed proteins in cell lines. Next, we analysed this collection with frequent itemset mining to identify proteins that are often co-occurring in publications and used these patterns as the backbone structure of our further analysis. These patterns of co-occurring proteins were enriched with additional attributes, such as gene expression correlation, protein localization and functional coherence metrics derived from the Gene Ontology tree  and used as a filter on top of an integrated binary protein interaction network, obtained by fusing several of the most popular resources.
We found that several proteins and GO-functions, such as transcriptional regulation, are consistently reported and deemed significant regardless of the research topic. Furthermore, we were able to find associations across the various "omics" levels that are conserved in a wide range of human cancers and managed to identify lists of frequently occuring patterns that can be used to classify between pre- and post-metastasic tumour development.
Pattern-based analysis on multiple "omics" levels can be used to identify the cellular logic circuits and holds many promising applications in the biotechnological and biomedical areas.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.