Skip to main content
  • Meeting abstract
  • Open access
  • Published:

Mining the human proteome for conserved mechanisms


All cells are subject to ever-changing environments to which they have to adapt, using their sensory system to provide input for the regulatory systems that integrate the information and trigger the eventual effectors. These cascades constitute a very complex cellular wiring that is highly relevant due to its medical importance. The omni-present application of high-throughput analysis techniques has resulted in an unprecedented level of available detail about gene expression and various aspects of cellular proteins, such as abundance, function and localization, often captured in well-curated compendia that are publicly available.

Although these information-rich inventories exist, the adaptive nature of protein complexes and signalling cascades remain poorly understood, as the current predominant approaches are not always suited to describe the associations between proteins. For example, binary protein interactions do not necessarily occur in vivo as the proteins could be expressed in different compartments of the cell or at different time points. This severely complicates the analysis of any protein interaction data. It thus remains a challenge to find out how biological entities cooperate to regulate cellular response to stimuli.


We used an integrative method, reliant on advanced pattern mining approaches to gain a deeper understanding of protein network dynamics. To this end, we created a compendium consisting of a large amount of proteomics papers for Homo sapiens that report differentially expressed proteins in cell lines. Next, we analysed this collection with frequent itemset mining to identify proteins that are often co-occurring in publications and used these patterns as the backbone structure of our further analysis. These patterns of co-occurring proteins were enriched with additional attributes, such as gene expression correlation, protein localization and functional coherence metrics derived from the Gene Ontology tree [1] and used as a filter on top of an integrated binary protein interaction network, obtained by fusing several of the most popular resources.


We found that several proteins and GO-functions, such as transcriptional regulation, are consistently reported and deemed significant regardless of the research topic. Furthermore, we were able to find associations across the various "omics" levels that are conserved in a wide range of human cancers and managed to identify lists of frequently occuring patterns that can be used to classify between pre- and post-metastasic tumour development.


Pattern-based analysis on multiple "omics" levels can be used to identify the cellular logic circuits and holds many promising applications in the biotechnological and biomedical areas.


  1. Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J, Davis A, Dolinski K, Dwight S, Eppig J, et al: Gene ontology: tool for the unification of biology. Nat Genet. 2000, 25 (1): 25-29. 10.1038/75556.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations


Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Naulaerts, S., Meysman, P., Vanden Berghe, W. et al. Mining the human proteome for conserved mechanisms. BMC Bioinformatics 16 (Suppl 3), A6 (2015).

Download citation

  • Published:

  • DOI: