Skip to main content

A systematic review of biologically-informed deep learning models for cancer: fundamental trends for encoding and interpreting oncology data



There is an increasing interest in the use of Deep Learning (DL) based methods as a supporting analytical framework in oncology. However, most direct applications of DL will deliver models with limited transparency and explainability, which constrain their deployment in biomedical settings.


This systematic review discusses DL models used to support inference in cancer biology with a particular emphasis on multi-omics analysis. It focuses on how existing models address the need for better dialogue with prior knowledge, biological plausibility and interpretability, fundamental properties in the biomedical domain. For this, we retrieved and analyzed 42 studies focusing on emerging architectural and methodological advances, the encoding of biological domain knowledge and the integration of explainability methods.


We discuss the recent evolutionary arch of DL models in the direction of integrating prior biological relational and network knowledge to support better generalisation (e.g. pathways or Protein-Protein-Interaction networks) and interpretability. This represents a fundamental functional shift towards models which can integrate mechanistic and statistical inference aspects. We introduce a concept of bio-centric interpretability and according to its taxonomy, we discuss representational methodologies for the integration of domain prior knowledge in such models.


The paper provides a critical outlook into contemporary methods for explainability and interpretability used in DL for cancer. The analysis points in the direction of a convergence between encoding prior knowledge and improved interpretability. We introduce bio-centric interpretability which is an important step towards formalisation of biological interpretability of DL models and developing methods that are less problem- or application-specific.

Peer Review reports


There is an increasing interest in the use of Deep Learning (DL) based methods as a supporting analytical framework in oncology. Recent works have articulated the potential applied impact of DL-based methods in oncology including drug response prediction [1, 2], cancer diagnosis or prognosis [3,4,5,6,7,8] and the overall impact of this emerging analytical substrate to deliver the vision of precision and personalised medicine [4, 9]. Despite not being mainstream methods at this point, these architectures point in the direction of addressing existing paradigmatic analytical gaps currently faced by more traditional inference frameworks, including the tension between small study cohorts and increasingly available complex set of features per patient (\(p>> n\)).

However, most direct applications of DL will deliver models with limited transparency and explainability, which constrain their deployment in biomedical settings. In this systematic analysis we tackle an aspect commonly acknowledged but left almost untouched, namely: how authors understand and use the definition of biological interpretability, and how it dialogues to the growing spectrum of biologically-informed models, which integrate prior biological knowledge within existing DL frameworks. This paper provides a systematic review focused on omics-based DL models used in cancer biology highlighting the dialogue and convergence between biologically-informed models, explainable AI (XAI) and biological interpretability. In this sense, it complements recent surveys on Machine Learning (ML) methods [10,11,12] and DL methods [13, 14] developed for biomarker identification. Moreover, it supports the argumentation in favour of the integration of multi-omics data using the AI pipelines, which is already regarded as important and advantageous over single-omic data (more on this topic in [15,16,17,18]). We perform a systematic review, identifying the motifs within emerging architectures: the domain knowledge which is integrated in the design of the models, data representation aspects and emerging architectures, ranging from biological networks and graphs to embedding models. Finally, we introduce the concept of bio-centric interpretability in DL models, which augments the contemporary Explainable (XAI) taxonomies and emerges as a fundamental property and desideratum of biologically-informed DL.

Addressing the above mentioned gaps, we defined the following research questions:

  1. 1

    What are the perspectives of interpretability accross different DL-based frameworks within the cancer research domain?

  2. 2

    What are the methods that deliver biological interpretability?

  3. 3

    What are the desirable approaches to integration of domain knowledge in the models’ architecture?

  4. 4

    What are the emerging representation paradigms within these models?

In addition to recent surveys in Explainable AI (XAI) (i.a. [19,20,21,22,23,24,25]), XAI in the field of genomics [15, 16, 26,27,28,29] and medicine [13, 30,31,32,33,34,35,36] we highlight a much more specific sub-field, aiming to link explainability and biological interpretability. Explainability is often used interchangeably with interpretability, however the distinction must be made as the first is product by the second. Explainability refers to a collection of features from the interpretable domain that contributes to the production of an abstract statement. Interpretability refers to mapping this statement into the domain the human expert can perceive, comprehend, and understand [37].

Recent works in the area of XAI provide an extensive discussion on the properties and desiderata of explainability methods [38,39,40,41,42,43,44], however they do not discuss their reception by a specific user - the biomedical expert. In the area of medical XAI, according to Holzinger et al. [45], in order to satisfy the need for trustworthiness at multiple levels of the medical workflow the main frontier topics are: the verification and explainability methods, inference of complex networks, and graph causal models and counterfactual. Our proposed concept of bio-centric interpretability encapsulates all of them.

This systematic review is restricted to the context of multi-omics based DL in cancer biology, excluding papers from the computer-vision subarea. The sub-field of ML and AI in biomarker identification, is discussed in [10,11,12,13, 46]. The recent study, Dhillon et al. [14] examines the state of the art feature selection, ML and Deep Learning approaches to uncover markers in single and multi-omics data. Zhao et at. [47] investigates the aspect of reproducibility in models applied to transcriptomics data. For the review on sequence-to-activity and sequence-based DL models, the reader is refereed to [29]. For a critical introduction to the application of interpretable genomics, see [48]. Another sub-field, the Deep Learning in drug response prediction is discussed in [49,50,51,52].

The importance and advantages of the integration of multi-omics in the AI algorithms over single omics are presented in [15,16,17,18]. A summary of recent data integration methods and frameworks is available in [53]. Alharbi and Rashid [54] catalogue different DL tools/software in different subareas of genomics for various predictive tasks and discussed the data types in genomics assays providing a guidance which DL architecture to use. Mo et al. [55] discusses data integration and contrasts DL methods with mechanistic modelling.

These reviews provide a comprehensive overview of current DL modelling techniques and their existing genomic applications. However, the above mentioned do not elaborate specifically on the domain knowledge integration into the model and its impact on interpretability. In this review, we focus on the dialogue between post-hoc explainability (regarding its internal mechanisms and the interpretation of the model’s output) and the encoding of prior biomedical knowledge, thus discussing the contribution of AI for supporting the understanding of oncogenic processes, in particular, the methods for integrating existing domain knowledge (DK) into DL models (Additional file 1: Table S1). We highlight the dialogue between explicit and latent representations.

Fig. 1
figure 1

Current mechanisms to restore or increase biological interpretability. There is a negative correlation between the interpretability of the DL model and the size of the feature space. However, the integration of domain knowledge can systematically address this dimensionality issue. Domain knowledge can be distinguished between expert-level knowledge and knowledge derived from databases. The motifs for integrating domain knowledge within emerging architectures were identified: the domain knowledge which a is integrated in the design of the models, b is integrated in the input data pre-processing, c is integrated in the post-hoc analysis process

The paper is organized as follows: First, we substantiate the concept of bio-centric interpretability and explain its three key aspects and four main components. Second, we define a taxonomy for the integration of domain knowledge into models, which is specific for biologically-informed DL models. Then, we provide a detailed review on the DL models for cancer: their architectural patterns, methods of the integration of domain knowledge and interpretability, and observed trends. We describe 42 selected papers divided into thematic blocks that correspond to the new concept and proposed taxonomy. In the Discussion we highlight the prevalence of graph representation, sparse connections as a key design feature and improved support for biomarker discovery. Then, we summarize specifically in the context of the four research questions. The paper concludes with the summary of the main findings and future perspectives in the field of DL models for cancer. Last section, the Methods, contains the details regarding papers’ selection criteria and data extraction form. A diagrammatic outline of the discussion is depicted in Fig. 1.


The electronic bibliographic databases (PubMed and Web of Science) search identified 661 records, which were reduced to 591 after removing duplicates. The 591 records were screened on the basis of prespecified inclusion criteria resulting in 176 records. All these potentially relevant articles were read in full text. The reasons for the exclusion of the papers were as follows: papers provided methods that are not directly linked to cancer and functional analysis/insights on biological processes; papers provided models based on DL and ML using clinical/laboratory data alone; based on microarray data or developed a sequence-based algorithmic framework. A list of eligible studies was created and resulted in 42 studiesFootnote 1 (Additional file 1: Table S1). The PRISMA checklist is provided as Additional file 2: Table S2.

Emerging methodological paradigm: bio-centric model interpretability

Explainability and interpretability are considered as key desiderata of the machine learning (ML) models (e.g. [22]). They are thought to prevent the risks of misuse of machine learning models embedded in healthcare applications. Model transparency and explainability are required to deploy AI-derived biomarkers in clinical settings. In addition, the transparency of interpretable methods can minimise the risks in AI-based decision-making in healthcare applications. It is by definition impossible to appeal to decisions resulting from a DL model that are not presented in an understandable manner and cannot be explained in biomedical terms and grounded in current biomedical reasoning. In biomedicine, the predictions and metrics calculated from these predictions alone are insufficient to characterise the model.

The existence of multiple types and definitions of models’ interpretability makes it difficult to formulate a precise definition of biological interpretability in a cancer biology setting. When is it valid to say that the ML model used in cancer biology is interpretable? The lack of a formal definition needs to be addressed and points in the direction of an unmet research gap. Benk and Ferrario [56] introduced three different dimensions of the need for interpretation: epistemic, pragmatic, ethical. In biology, the impact of these models from a scientific epistemology setting needs to be considered as, at their limit, emerging AI methods bring the promise of integrating heterogeneous evidence and mechanistic and statistical inference paradigms. These methods can ultimately impact fundamental notions of what constitutes a valid scientific argument, bringing alternative perspectives to the notion of statistical significance.

Despite high demand, interpretability remains one of the biggest challenges for bringing these models into a real-world setting. In the AI and ML fields, there is a well-known trade-off between how well the model performs and how well people are able to interpret it [40, 56, 57]. Additionally, there is no consistent agreement on definitions of interpretability. One of its definitions directly refers to the components of interpretable models such as transparency (‘how does the model work?’) and post-hoc explanations (‘what else can the model tell me?’) [40]. It identifies two main objects for interpretation: i) the internal mechanisms, i.e. how the models compute their outcomes, and ii) the outcomes generated by the model. Similarly, according to known taxonomic accounts [32], interpretability can be: algorithmic-centric, focusing on the inner-working of the model; or output-centric, highlighting the model agnostic post-hoc analysis.

In the context of DL, we replace algorithmic-centric with architecture-centric interpretability. We argue that more emphasis and inference is put on the structure of the model rather than the learning process of the DNN (via backpropagation algorithm).

In order to derive biological insights from the model, an interpretation of a biological expert is required regardless of architecture or outcome-centric approach. Both of them need to favor mapping the biological mechanisms to the models’ components, aiming at delivering an interpretation for the intended end user (i.e. biologist, oncologist) which relies more on biological knowledge rather than on DL or mathematical knowledge. More specifically, a preferable format of model’s transparency would be a biological mechanism integrated in the model’s architecture (e.g. gene activation pathway) or calculations mimicking biological processes (e.g. mimic typical molecular biology assays that study functional genomics), in complement to state-of-the-art explainability methods borrowed from other fields. Some formats have already been successfully applied to transcriptomic data, such as the integration of DK of gene modules, or the integration of hierarchical information about molecular subsystems involved in cellular processes. Such models provide informative biological interpretation of the predictions by studying the activation of the various subsystems embedded in the model architecture and, moreover, they can make it possible to infer on the activity of latent factors as a priori characterized gene modules. The interpretation of the biological expert allows for evaluation of biological plausibility and satisfiability of biological constrains.

Hence, in this paper we revisit the notion of interpretability to ground it in a biomedical context, introducing the concept bio-centric interpretability. It encompasses three key aspects which lead to biological understanding of the investigated problem and new insights:

  • Architecture-centric interpretability

  • Output-centric interpretability

  • Post-hoc evaluation of biological plausibility

We argue, that evaluation of the DL model regarding bio-centric interpretability requires an analysis of all these aspects at once. These three aspects are evaluated via the analysis of the four bio-centric interpretability components:

  • The integration of different data modalities

  • The schema level representation of the model

  • The integration of domain knowledge

  • Post-hoc explainability methods

The integration of different data modalities.

Cancer is a complex and multi-faceted disease with a landscape of features that can separately or together influence treatment responses and patient prognosis. Important biological relations can be expressed in more than one data modality, e.g. potential cancer driver genes can be represented through integration of copy number, DNA methylation and gene expression data. Therefore, combining different data modalities in the DL model, including different types of omics data is imminent as the field evolves and inherent if biological processes are modelled. Only provided that the biologically-informed model can reveal both established and novel molecularly altered candidates which can be implicated in predicting advanced disease.

Schema level representation of the model.

Understanding the data flow in the model is crucial for the post-hoc interpretation by an expert user. Obviously, this is affected by how the data is represented in subsequent components of the model. Usually, collected multi-omics data is stored in tables (matrices). However, over a series of computations steps, the representation can change into graphs, networks, eigenvalues, eigenvectors, among others. Each representation has its own specific properties and is processed by specific architectural elements in the model, e.g. Graph Neural Networks and Graph Convolution Networks (for graphs). Thus, in the context of bio-centric interpretability, it is crucial to understand these representations, how they transform and how to communicate such transformation during the post-hoc inference. The underlying dialogue between the input data model and the architectural structure of the model requires a schema level representation, which then allows for domain expert interpretation and inference.

The integration of domain knowledge.

A key aspect, which significantly impacts all three components is the domain knowledge integration into the model. A biologically-informed DL model can and should make use of databases that contain an abundance of known biological relations. Later in the paper, we compare and contrast emerging approaches of DK integration and its close dialogue with DL archtectures, indicating which model has the highest potential in improving bio-centric interpretability.

Post-hoc explainability methods.

The inherent property of DL model is its ability to derive latent features reflected in a large space of weighted connections between neurons. Even provided that the model’s architecture resembles bio relations, post-hoc explainability methods must be applied to allow for tracking back the information flow, highlighting the importance (and unimportance) of model’s components. More specifically, when investigating an individual output it is necessary to define key neurons, connections or layers that most impact the prediction, as well as those that do not.

Encoding domain priors: Improving bio-centric interpretability and integrating relational knowledge

In cancer, AI / ML is emerging as a methodological enabler to transform omics data into biomarker panels that can diagnose, predict or report on the effectiveness of interventions in the disease. More recently, some of these methods have concentrated on the integration of symbolic-level, explicit domain knowledge into the models. Domain knowledge can be understood as the information so far accumulated in a given field (here: pathways, PPI networks, Gene Ontology), usually expressed as known relational knowledge. In many cases, this knowledge is available in well-known curated databases and expressed in canonical data models that can be integrated in a computational pipeline. The taxonomy for explicit knowledge integration with the informed ML framework proposed by von Rueden et al. [58] includes: (i) source of knowledge; (ii) representation of knowledge; (iii) and integration of knowledge in the ML pipeline. Each dimension contains a set of elements showing different approaches that can be observed in previous literature. Knowledge sources can be classified according to the degree of formality. They range from the rigorously expressed scientific knowledge (derived from any scientific discipline) to an expert-derived statement (mapping for example their clinical experience). More or less formalised, more general scientific knowledge (aka. world knowledge) situated at a basic expertise level within that domain (e.g. that the body is composed of cells; that there is DNA inside cell nucleus; that cancer is a disease of the genome, etc.); we found the general scientific knowledge not relevant in the context of this work.

Domain knowledge can be integrated into the model to improve its consistency, reliability and biological plausibility as well as for supporting better generalisation. As proposed by von Rueden et al. [58] this can be done in a variety of ways, such as incorporating DK into basic training data (e.g. pre-processing), hypothesis set (e.g. sparse connections between neurons), established relational data, learning algorithm (e.g. cost function), and final hypothesis (e.g. model’s architecture). On the other hand, DK is needed in order to extract the scientific outcome from the model or from individual elements of the model, and/or to explain such outcome. For example, based on DK, the contributions of specific model components can be better localised and investigated.

In addition, DK can be used in a post-hoc setting, where the scientific credibility and consistency of the results are cross-validated within existing knowledge. Results that do not match the existing knowledge can be rejected or flagged as incorrect or suspicious, so that the final result is consistent with prior knowledge.

In this paper we define a taxonomy which is more specific for biologically-informed DL models (inspired by von Rueden [58]). We suggest three main categories of DK integration as:

  • Input data pre-processing (PRE) - DK is used to enrich or augment the input data, which results in a change of data representation. Scaling or normalisation is excluded from this category.

  • Architecture definition (ARCH) - DK explicitly impacts the model architecture, such as connections between neurons and layers.

  • Post-hoc comparison (POSTHOC) - DK is used to investigate and explain the outcome of the model. The DK is used to process the outcome and compare to current, known biological relations.

Multiple types of DK integration can be observed in a single model.

Of note, a pre-requisite of developing any DL model in cancer biology is to understand the target domain, needed at least to define the input and output, and to qualitatively or quantitatively evaluate this output. Despite acknowledging the expert knowledge of the authors of the models, we do not consider it as explicitly integrated domain knowledge. We consider the post-hoc DK integration when the output is compared with information derived from external knowledge, or the representation of the output is changed (e.g. a vector to a graph) by using DK, so the the biological plausibility can be validated.

Fig. 2
figure 2

Bio-centric interpretability scheme in the overview of a biologically-informed DL model. Grey boxes - three interpretability aspects

An outline of the three categories of DK integration is shown in Fig. 2. The results for selected papers according to proposed taxonomy are summarized in Additional file 1: Table S1.

Trends in DL models for cancer

Fig. 3
figure 3

Data representation paradigms and the impact of the integration of domain knowledge. Domain knowledge (DK) can be derived from a database (blue blocks) or expert DK (yellow blocks). DK can be used in pre-processing and data augmentation before the training process. DK from databases can be represented in two ways: A as a step in the pre-processing of input data, before the training process. This first paradigm has emerged for the representation of multi-omic data, which are transformed into graphs or a network and fed into GNN or GCN. This paradigm has been applied to DL models such as: struc2vec, GLUE, several GCN and CNN models; B as inductive bias when creating the neural network architecture, defining the connections between nodes in layers. In this case, DK impacts the training process as it affects the back-propagation. This paradigm has emerged mainly for the representation of multi-omic data, which are fed into sparsely connected Deep Neural Network, where connections are defined by biological relations. This paradigm has been applied to DL models such as: VNN, PNET, KPNN, VAE, CNN

The prominent explanation for the high heterogeneity observed in cancer may be the organisation of genes in various signalling/regulatory pathways and protein complexes. Cellular-level processes and responses are carried out by spatially and temporally organized sets of interacting entities such as proteins or RNA molecules. It is fundamental to understand how these interactions lead to biological processes. The conventional approach to studying biological processes is based on molecular interaction networks between individual biological molecules, represented as nodes with edges describing the interactions between a pair of nodes [59, 60]. There are multiple types of biological interaction networks that represent different biological mechanisms and are based on different types of interaction [61]. Many of these biological interactions are publicly available through various specific databases such as KEGG [62], Reactome [63], among others. They can be leveraged as DK to deliver a mechanistic and relational inference component which can be integrated to a statistical-probabilistic framework (Fig. 3).

Pathway-level representations, which represent sets of the pathway genes subsumed into the pathway nodes, with the interactions between the individual genes are also collectively involved in biological processes, such as cell proliferation and death. Thus, malfunction of the pathways can lead to disease. Taking into account the topology of gene interactions as prior knowledge may further help to characterise new genes or disease modules. Many network models have been developed to use known gene-gene interactions for prediction, based on the assumption that interacting genes tend to produce similar phenotypes. New biomarkers discovered by the DL model can be tracked inside the model more easily when the model’s design conforms to biological relations.

The biological pathways can be integrated as curated knowledge on the molecular relation, reaction and interaction networks, covering metabolism, cellular processes, organismal systems, and human diseases and they are widely used to analyse omics data. The pathway construction function can be either a data-driven objective (DDO) or a knowledge-driven objective (KDO) [64]. The first component is used to establish gene or protein associations identified in a particular experiment. Knowledge-driven pathway construction is associated with the development of a detailed knowledge base for specific areas of interest. There are various approaches to mapping the organisation of cellular functions using molecular interaction networks in which the edges represent interactions between genes, proteins or metabolites. Protein-protein interaction (PPI) data are used to construct networks of reactions important for the regulation and implementation of most biological processes in which proteins have been shown to interact with functionally related proteins. Such an organisation results in the emergence of ‘functional modules’, i.e. functionally related sub-networks in which there is a statistically significant aggregation of nodes with an associated cellular function. Co-expression data, genetic interaction data, and combined data types have been also used to generate similar molecular interaction networks.

Data augmentation with domain knowledge

In this subsection we focus on domain knowledge being used to pre-process the input data in order to change its representation by enrichment or augmentation: from measured omics values as matrices into pathways, networks and graphs (Fig. 3A). First, we discuss how the knowledge of pathways derived from databases was integrated into the model in the reviewed studies.

At an input level, pathways are mapped to scores, graphs or images. Oh et al. [65] demonstrated the method called PathCNN to build an interpretable CNN model using multi-omics data including mRNA expression, copy number variation (CNV) and DNA methylation from the cBioPortal database. Information about pathways together with the associated genes in each omics type is extracted from the KEGG database. Input data at a gene level is converted into pathway-level profiles, and then Principal Component Analysis is applied to extract 3 principal components (PCs). Then, vectors containing PCs of all pathways are represented as a pathway image of a sample (set of pixels) combining all multi-omics data. Images are the input to the CNN model. As an explainability method, Grad-CAM [66] was used to identify pathways impacting cancer survival predictions by identifying the parts of an image that are most discriminative. The authors assumed that relevant pathways were more likely to be detected if they are grouped together on the pathway images. They managed to highlight the pathways (‘pixels’) that were of importance for the prediction of long-term survival of glioblastoma patients.

Another model which allows for integration of multi-omics data on pathway level was proposed by Lemsara et al. [67]. In the multi-modal sparse denoising autoencoder model, multi-omics features are mapped to NCI pathways. Each pathway is represented as a score obtained via autoencoder, then bi-clustering is applied. The model clusters patients based on three-omics data types, including gene expression, miRNA expression, DNA methylation and CNVs data. The SHAP method is used ‘to understand the impact of individual omics modalities and features on the autoencoded score [...] learned for each pathway’ [67].

Lee et al. [68] proposed a DL model for cancer subtype classification, which used 287 pathways retrieved from KEGG database. Pathways were used to build a graph in which a set of nodes represents genes and a set of edges represents molecular interactions between genes in the pathway. Gene expression profiles from RNA-seq were mapped to nodes represented as a vector. To model each pathway, they used a graph convolutional neural network (GCN), which can capture localised patterns in data and consider interactions among genes. In this way, they built multiple GCNs, one for each of the 287 pathways. Then, a multi-attention based ensemble combines all the pathway models into a single one through two attention levels (pathway-level and ensemble-level). This is followed by a multi-layer perceptron (MLP) for a cancer subtype classification task. The attention mechanism allows for highlighting pathways that are important for the classification, and falls into the ARCH category as notion of pathways directly impacts the model’s architecture. In addition, DK is used POST-HOC to explain the differences between gene expression and interactions between different subtypes in terms of pathways. The authors used the network propagation method on a pathway-PPI network, where the PPI was derived from the BIOGRID database.

PPI networks as a prevalent type of graph based input. An example of a DL model for the integration and analysis of multi-omics data is DeepMOCCA [69]. DeepMOCCA is a survival prediction model, which integrates DK using PPI networks to transform the input data representation into a graph. The PPI networks are obtained from the STRING database. The multi-omics data is mapped into the nodes, which represent combination of genes, transcripts and proteins. The edges reflect physical and other functional interactions between them. Then, the graph is an input to a GCN with a graph attention mechanism. Additionally, as POSTHOC DK integration, cancer driver genes listed from the COSMIC database [70] are used to interpret the averaged rank derived from the attention mechanism. By looking at genes with repeatedly high scores across samples but not yet reported as cancer genes, the attention mechanism allows for the generation of new hypotheses. Therefore, DeepMOCCA allows for identification of prognostic markers and cancer driver genes. The authors of DeepMOCCA [69] also investigated the sample representations in the hidden layer of the network (before the Cox regression) with t-SNE visualization [71] and compared their similarity between cancer types. They suggested that this kind of analysis in reduced dimensional spaces could support patient stratification.

Similarly, Chuang et al. [72] used the PPI network to change the input representation. However, their model maps the PPI network into 2D space by using spectral clustering and combines it with the gene expression data to generate images of cancer-related networks of different types of cancer for a CNN model. More specifically, the adjacency matrix (from the PPI network) is reduced to 2 eigenvalues and represented as 2D images. Then a CNN model is trained for cancer type classification. Unfortunately, spectral clustering renders tracing the signal back to individual input features very difficult. This computational step makes significantly reduces the model’s interpretability.

Another DL model integrating PPI networks was developed by Chereda et al. [73]. They use the PPI network from the HPRD database [74, 75] to structure the gene expression data. Input data is transformed into a graph and used in GCN model, which is trained to classify expression profiles from breast cancer patients into metastatic or non-metastatic. They developed a Graph Layer-wise Relevance Propagation to interpret the outputs of the GCN. They used this explainability method to build a patient-specific subnetwork containing the genes that contribute the most to a prediction.

Ramirez et al. [76] investigated four models for expression-based cancer type classification (into a cancer subtype or normal tissue gene) using a GCN-based model. The input graphs were generated based on: the co-expression (using Spearman correlation), the co-expression+singleton, the PPI, and the PPI+singleton networks from the STRING database [77]. As an interpretability method, they use an in silico perturbation procedure. Gene expression is successively set to 0 or 1 before passing through the model and examining how the prediction accuracy is affected by this manipulation. The more important for the classification the gene is, the greater the change in accuracy will be observed. This effect is captured with what the authors called a gene-effect or contribution score, defined as ‘the larger prediction accuracy change of the labeled cancer type’, and calculated for each gene for all classification labels (33 tumor types plus normal).

Schulte-Sasse et al. [78] combined three omics data types, gene-gene interaction network and PPI network from Consensus Path DB (CPDB). DK was integrated both in PRE and to assign labels in the dataset. First a gene-gene interaction network is created, where some weak correlations are discarded based on DK from PPI. Such graph is an input to a GCN which is trained to predict whether a gene is associated with the disease or not. To derive a collection of positive and negative labels for genes in the dataset (y - true labels), network of cancer genes (NCG), COSMIC, OMIM and KEGG are used. As the output of the model and true labels depend on the integrated DK, POSTHOC category is also assigned to this model. The authors demonstrate that including the interaction networks with a GCN classifier helps to classify and predict novel genes as well as entire disease modules. Using the Layer-Wise Relevance Propagation (LRP) [79], they are able to dissect which features drive the classification whether a gene is a driver gene or not and to identify, for each gene, neighboring interacting genes that most influence its classification. This results in building sub-modules consisting in a directed graph of gene-gene LRP contributions. As an illustration, this revealed that important neighboring genes of the cancer gene SAPCD2 are enriched for other drivers, suggesting that PPI between these genes are important for the classification.

Liu et al. [80] developed network-embedding based stratification method (NES). The method constructs the patient vectors based on the network-embedding of the PPI network. More specifically, a struc2vec [81] network embedding approach is used. Although this provides relatively good performance in classification of patient subtypes from large-scale patients’ somatic mutation profiles, the method lacks interpretability. The author do not attempt to analyse inner working of the model, which may be due to struc2vec embedding of the input graph, which makes the inference very difficult.

Liu and Xie [82] developed TranSynergy, to predict the synergistic drug combinations of cancer therapy. Information from the PPI network, gene dependency, and drug-target association are integrated into the model. They proposed a Shapley Additive Gene Set Enrichment Analysis (SA-GSEA) with the aim of deconvoluting ‘genes that contribute to the synergistic drug combination’. Their SA-GSEA method proceeds by ranking the features (i.e. genes) based on these values and then conducting a gene set enrichment analysis. This approach offers perspective for therapeutic approach and decisions in the context of personalized medicine.

Data enrichment and augmentation driven by relations in the input data. Apart from DK extracted explicitly from knowledge bases (e.g. specific pathways), the multi-omics data can be enriched or augmented by using relations derived from the input data, for instance by calculating correlations between gene expression. Studies described below utilise such data enrichment via: co-expression network, co-expression eigengene matrices, sample similarity networks or guidance graphs with GLUE (graph-linked unified embedding). Of note, expert knowledge is required to define or select appropriate method.

Huang et al. [83] proposed SALMON (Survival Analysis Learning with Multi-Omics Neural Networks). The input to the model consists of mRNA- and miRNA-seq co-expression eigengene matrices. They are derived from lmQCM algorithm [84], PRE step. Patient features: diagnosis age, ER and PR status, copy number and tumor mutation burdens are integrated at a later stage. The model predicts Cox proportional hazard ratio (survival) for the TCGA breast cancer dataset. As interpretation method, the perturbation procedure measures the importance of each input variable for survival prognosis. Features are ranked according to how much the concordance index (a metric for quantifying how survival prognosis models perform) is decreased. In this POSTHOC interpretation, the authors performed Gene Ontology (GO) and cytoband enrichment from ToppGene Suite to inference the biological implication from the feature ranking. In this way, Huang et al. [83] identified that the diagnosis age and PR status along with five mRNA-seq co-expression modules are the most determinant features. Genes belonging to these leading co-expression modules were further functionally assessed with gene set enrichment analysis.

A similar way of determining the contribution of input features can be used to identify biomarkers, as illustrated by Wang et al. and their MOGONET model [85]. In MOGONET, DNA methylation, mRNA- and miRNA-seq data are transformed into sample similarity networks. Each network enters a separate GCN. The omic-specific label distributions are then concatenated and integrated with a view correlation discovery network (VCDN), which ‘can exploit the higher-level cross-omics correlations in the label space’ [85]. They identified distinct biomarkers for each of the investigated diseases and performed gene set enrichment analysis yielding results consistent with previous studies.

Another graph embedding of the input was proposed by Cao and Gao [86] in a modular framework, called GLUE (graph-linked unified embedding). GLUE utilizes prior knowledge via a knowledge-based graph, called ‘guidance graph’). The method combines omics-specific variational autoencoders with a ‘guidance graph’, which models regulatory interactions across omics layers. The method was used to integrate unpaired single-cell triple-omics data. The nodes in the guidance graph correspond to the features of each omics layer, and edges represent signed regulatory interactions.

Xing et al. [87] proposed a multi-level attention graph neural network (MLA-GNN) for multi-task prediction. As a first step in the model, the omics data (unimodal, e.g. proteomics or transcriptomics) are converted into a weighted correlation matrix (WGCNA; [88]). Built for the full dataset, the WGCNA represents a coexpression network, from which an edge matrix is derived. Next, a patient-specific graph can be constructed, where the node values are given by the gene expression level in a given sample, and edges between nodes are drawn according to the WGCNA analysis. The graph serves as input to the first (out of three) graph attention layer (GAT) of the DL model. Features from these 3 GAT are then vectorised after a linear projection, and finally fused into a single vector, which finally passes through sequential fully connected layers in the prediction module. Finally, a full-gradient graph saliency (FGS) mechanism is implemented to interpret the predictions.

Mapping Domain Knowledge as a direct input to DL models. The degree to which a gene is essential for cancer cell proliferation is defined as gene dependency [89]. Chiu et al. [90] proposed a DeepDEP autoencoder (AE) to predict gene dependency profile based on the representations learned from high-dimensional genomic data, including DNA mutation, gene expression, DNA methylation, and copy number alteration (CNA). The model includes molecular signatures of the chemical and genetic perturbations from MSigDB as unique functional fingerprints of a gene dependency of interest. First, five AEs (one for each type of input data) are trained on unlabeled tumor data, then the outputs from five encoders are combined and passed to DNN. As one of the AEs is trained on fingerprints from MSigDB, which is a DK, we considered the integration as PRE. Based on DeepDEP, the authors performed detailed post-hoc analysis including input data perturbation, exploration of the latent layers, signature scores and multi-variable linear regression.

Explicitly defined architecture

In this section we discuss DL models that use domain knowledge to modify a standard densely connected DL model’s architecture in order to improve both biological plausibility and interpretability (Fig. 3B).

Pathways are used to define connections. Elmarakeby et al. [91] combined ex ante and ex post interpretability approaches, proposing a novel neural network architecture - pathway-aware multi-layered hierarchical network (P-NET). It was built using a set of 3,007 curated biological pathways from the Reactome database. The model predicts disease state in prostate cancer patients on the basis of somatic mutations and copy number alterations data. Encoding the relationships that exist in the Reactome dataset focuses the network on interpretability at the design stage (ARCH).

P-NET comprises one layer to encode the genes and five for the pathways. The input layer corresponds to the features that can be quantified and passed through the network. Three nodes from this layer (representing mutations, copy number amplification and copy number deletion) are connected to one node in the subsequent layer. The connections of the second layer reflect gene-pathway relationships whereas those of the next layers are arranged according to parent–child relationships borrowed from Reactome. For a given patient, the trained NN will return its probability to have metastatic cancer. For each sample, features can be ranked by importance score in a layer-wise manner using DeepLIFT, where sample-level scores are aggregated to obtain the global importance [92]. To gain additional insights into the information flow inside P-NET, the authors evaluated how a change in input sample label affects the activation of a node.

Deng et al. [93] proposed a pathway-guided deep neural network (DNN) framework to predict drug sensitivity in cancer cells, using known biological signaling pathways, the expression profiles of cancer cell lines, drug - protein interactions, and drug sensitivity datasets. The pathway maps were obtained from the KEGG database. DK was integrated into the DNN model via the layer of pathway nodes and their connections to input gene nodes and drug target nodes.

Zhao et al. [94] proposed a scalable, and interpretable DL model, called DeepOmix, for multi-omics data integration and survival prediction. DeepOmix incorporated prior biological knowledge defined by users as the functional module input (such as signaling pathways in this analysis). The pathway gene sets were downloaded from the Molecular Signatures Database (MSigDB) (KEGG and Reactome). DeepOmix integrated multi-omics data as an input gene layer, where nodes of the gene layer are connected with a functional module layer based on the DK. Again, the pathways defined whether there is a connection between nodes.

Feng et al. [95] proposed a DL model, called DeepSigSurvNet, based on a set of (46 selected) signaling pathways from the KEGG database for cancer patients’ survival prediction and outcome. The model identifies the individual patterns of these signaling pathways to four types of cancer using gene expression and copy number data (multi-omics data and clinical factors integrated into the model). Not-densely connected layers are followed by CNN with inception modules. For interpretability, Smoothgrad [96] is used to assess how perturbation added to the signaling pathways affects the model’s predictions. This allows scoring the relevance of each pathway for each cancer type. Then the distributions of the relevance scores of each pathway between different cancer types is compared. The authors noted that striking discrepancies arise among the cancer types and also that for a given cancer type only a small subset of the pathways have high relevance scores. This latter observation could be of interest for prioritising drug or drug combinations that target these driver pathways.

Zhang et al. [97] used a DL architecture constrained by the 46 pathways, with a pathway layer that follows the gene layer. Similarly to Feng et al. [95], connections between the two layers are sparse, and connect genes only to pathways to which they belong. They trained the model (‘consDeepSignaling’) for predicting drug responses in cancer cell lines from the data of dose response and multi-omics (gene expression and copy number). The output from the last layer represents the predicted area under the experimental dose-response curve value of the drug effect on a given cancer cell line. By using Smoothgrad, they analyze the distributions of the importance scores of the signaling pathways from all samples and highlight those important for drug response prediction.

Hao et al. [98] proposed a Pathway-Associated Sparse Deep Neural Network (PASNet) to accurately predict patient prognosis and describe complex biological processes related to prognosis by incorporating curated biological pathways from the MSigDB (Reactome). The sparse DL architecture of PASNet modeled a multilayered, hierarchical biological system of genes and pathways enabling for model interpretability. PASNet included a pathway layer where each node indicates an individual biological pathway (linked with input genes) and a hidden layer which represented hierarchical nonlinear relationships of biological processes into account. The associations between the gene layer and the pathway layer were established by well-known pathway databases (e.g., Reactome and KEGG).

Another sparsely connected DL model is a sparse Variational Autoencoder architecture, VEGA (VAE Enhanced by Gene Annotations) proposed by Seninge et al. [99]. The decoder connections are informed by user-provided biological networks based on gene annotation databases (e.g., Reactome). VEGA performance was tested using pathways, gene regulatory networks and cell type marker sets as the gene modules that define its latent space. VEGA was shown to be useful in understanding the response of a population of a specific cell type to a variety of perturbations.

To predict cell states from gene expression profiles, Fortelny and Bock [100] proposed Knowledge-Primed Neural Networks (KPNNs) aiming at providing a biologically interpretable DL model. Their approach combines ex ante and ex post explainability methods. The fully connected NNs were replaced by networks derived from prior knowledge of biological networks, including the signaling pathways and gene-regulatory networks. To do this, the authors assumed that most of the regulatory relationships important for the biological system of interest had already been discovered in other contexts. In KPNNs, each node corresponds to a protein or a gene, and each network edge corresponds to a regulatory relationship that has been documented and annotated in biological databases. The model was trained based on single-cell RNA-seq data. Of note, contrary to previously described models, the KPNN architecture allows for skipping layers. As for the post-hoc analyses, they focused on the node weights applying a perturbation procedure. It quantifies, for each node, how the addition of small noise is reflected in changes in the outputs. In this way, they evaluated the global importance of the node. These informative weights (in absolute value) can therefore be used to identify likely relevant transcription factors and/or signaling proteins.

Gene Ontology used to define architectural constraints. Based on terms extracted from Gene Ontology (GO), the system hierarchy can be structured. Each GO term is associated with a number of genes and gene products, hence genes can be organised into a hierarchy of nested gene sets. Multi-scale hierarchical interactions among biological entities such as GO terms and genes can be encoded as a list of relations. Below, we describe two studies that make explicit integration of GO into ARCH.

The response of cancer cells to therapy depends on biological as well as chemical factors [101]. To predict drug responses, Kuenzi et al. [102] developed a DL model, called DrugCell, a modular neural network with two branches. The model combines conventional DNN that process compound chemical structures with a Visible Neural Network (VNN) processing binary encodings of individual genotypes. The DrugCell system hierarchy was structured from a literature-curated database. The VNN was guided by a hierarchy of molecular human cell subsystems, taken from 2,086 biological processes from the GO database. In DrugCell, RLIPP [103] analysis leads to the identification of the gene embedding network subsystems that most contribute to the cell response prediction. Interestingly, Kuenzi et al. [102] further exploited their approach and confirmed the validity of the hypotheses derived from it. Using cell line data, they demonstrated that subsystems identified as important (as evaluated with the RLIPP scores) for the response to a given drug can reveal synergy of drug combination. In addition, they further showed, using patient-derived xenograft models (PDX) data from a public database, how DrugCell can be used to suggest drug combination treatments. DrugCell constitutes a promising example of how analysis of the inner workings of a DL model could translate into therapeutic recommendations.

Another model using GO is Factor Graph Neural Network model proposed by Ma and Zhang [104]. Each node in the model corresponds to a biological entity such as genes or GO terms (i.e., gene nodes and GO nodes), which forms a bipartite graph. The model is based on the RLIPP analysis (‘relative local improvement in predictive power’) and is used to predict tumor stages of kidney and lung cancers and also to classify kidney samples in normal vs. tumor tissues. The method calculating attention matrices allows ‘capturing multi-scale hierarchical interactions [by assigning] weights to connections between different layers’. By investigating the weights in the last hidden layer, the authors retrieved e.g. the gene ontologies that contribute most to sample classification.

Gene Regulatory Networks used as constraints for VAEs. In [105], Shu et al. developed Deep SEM, a VAE-based model which contains a Gene Regulatory Network (GRN) layer in the encoder and Inverse GRN in the decoder. Of note, the weights are shared between these layers. GRN consists of target genes and transcription factors and can be reconstructed based on the representation learnt by the model. DeepSEM is an example of nonlinear mapping from the gene expression to GRN activities. Although no database is used as DK, certainly the GRN layers added to a VAE architecture can be considered as a step forward bio-centric interpretability.

POSTHOC explanations

Although in previous sections we already described models that use DK both in ARCH and in POSTHOC phases, here we provide examples that integrate DK only for POSTHOC purposes, not impacting the model’s design.

A Cox-nnet [106] is an example of an attempt to link biological features or functions to the (hidden) nodes of a DNN model solely via POSTHOC analysis. DK is not used in ARCH. Cox-nnet uses a Cox regression as the output layer, extending the Cox-PH model [107]. The interpretation of the output includes mapping nodes’ weight to regression coefficients, t-SNE, the gene set enrichment analysis with KEGG pathways and computation of partial derivatives of the output. Results from Cox-nnet compared favourably with those from Cox-PH from a biological perspective, revealing for example the importance of the BAI1 gene in the p53 pathway or MAPK1 in several cancer-related pathways. Importantly, POSTHOC interpretation is executed not only via expert (author) evaluation, but systematically using DK about known relations extracted from a database.

A frequently used POSTHOC interpretation method is the exploration of the association between latent representations with input covariates (e.g. phenotypic features of the patients) [108]. This approach is of particular interest for models such as autoencoders (AEs) and variational autoencoders (VAEs). In these models the input data is compressed into a reduced (latent) representation and then reconstructed back from the encoded representation with the least possible error. Due to appealing dimensionality reduction abilities, AEs and VAEs are frequently used within the oncology domain (e.g. [109,110,111]). They can be used together with PCA, UMAP [112], t-SNE [113] or other algorithms [114] for data visualization, and various clustering methods can be used on top of that. POSTHOC analyses can then be performed on the weight parameters and/or on the compressed data for gaining biological insights on what the model learned. As an example, XOmiVAE was develop to solve supervised and unsupervised tumour classification tasks [115]. It uses DeepSHAP explanation [116] to explain novel clusters generated by VAEs. Results are compared with DK derived from i.a. Reactome and GO.

Similarly, Kinalis et al. [117] propose an AE for clustering analysis of scRNA-seq data. They used guided backpropagation (only positive gradients used for the backpass) for computing saliency maps. In their model, saliency values are obtained for each cell and each gene. Gene and gene set importance scores are then computed by averaging across the cells or the corresponding genes, respectively. They use DK in POSTHOC to investigate the latent space of the AE, comparing obtained representation with the pathways (i.e. hematopoietic signatures derived from the DMAP study [118]).

In contrast, some AE based models are being developed but no DK is used in PRE, ARCH nor POSTHOC [119,120,121,122]. The architecture proposed by Hira et al. [111] can integrate multi-omics data (genomics, epigenomics, transcriptomics). Patient subtyping is obtained first by applying a clustering algorithm on the learned latent features. Clinically relevant latent dimensions are identified by building a univariate Cox proportional hazards (Cox-PH) model for each of them and clustered into survival subgroups. Based on these labels, a Support Vector Machine was trained for allowing survival subgroup classification for new samples. With the aim of identifying biomarkers, a linear model (correlations) is used to map the embeddings of clinical relevance into the gene space.


Prevalence of graph representations

Recent years have brought an increasing number of specialised DL architectures which encode the structure of biological relations (Fig. 4A, B). DL supports non-linear modelling, while encoding complex structures and relationships, in order to learn informative representations at multiple levels of abstraction [123]. Graph Neural Networks (GNNs, and Graph Convolution Networks - GCNs) based architectures provide a universal support for encoding structural biological knowledge into neural representations. In general, GNNs are a spectrum of models which capture graph dependency by passing interaction between nodes that simultaneously take into account the scale, heterogeneity, and deep topological information of the input data (Fig. 3A). In a biomedical setting, GNNs demonstrate their applicability encoding of topological relations, and mapping them into a high-dimensional embedding space [124]. Compared to other DL models, the advantage of GNN is the ability to integrate relational data into the inference. With the increasing interest in GNNs, we observed a spectrum of new models which combine with explainability methods (Figs. 4C, D and 5).

Fig. 4
figure 4

The trends in DL models for cancer. There is an upward trend in using multi-omics data (blue) compared to single-omic data (orange) (A) and in the integration of domain knowledge (DK) (orange, green, red) (B) based on recent studies for DL in cancer biology. The most frequently integrated domain knowledge are pathways (orange) and other DK (red) like functional modules with recent increase in the usage of PPI networks. C There are three main categories of DK integration as: input data pre-processing (PRE) (blue), architecture definition (ARCH) (orange) and post-hoc comparison (POST-HOC) (green). There is a trend in the use of DK in PRE step, i.e. DK is used to enrich or augment the input data, which results in a change of data representation; D In recent years, there is an increasing number of specialised DL architectures which encode the structure of biological relations. Graph Neural Networks (GNNs, and Graph Convolution Networks - GCNs) based architectures were the most prevalent used (green). There is an increase in the number of sparse DNN (red) and sparse AE/VAE (blue) models

Fig. 5
figure 5

Network of relations between key components of the bio-centric interpretability. Network representing the relations between domain knowledge (red nodes), DK databases (orange), DK integration type according to the proposed taxonomy of bio-centric interpretability(purple nodes), DL models (blue nodes) and explainability methods (green nodes). Node size is proportional to the no. occurrences of the entity, edge width is proportional to no. pairs observed in the reviewed papers. We observe strong connections between: ARCH-pathways-sparse DNN; VAE-latent space exploration; PRE-PPI network-GCN; KEGG-pathways

Upward trend of graph representations

Many models were developed to use known gene-gene interactions for prediction, based on the assumption that interacting genes tend to produce similar phenotypes. This resonates with the development in the field of graph neural networks. We observe an increase of GCN/GNN application (1 in 2019, 3 in 2020 and 7 in 2021, Fig. 4D), which is associated with integration of PPI networks as DK (1 model in 2020, 4 models in 2021, Fig. 4B).

32% of the models which used prior knowledge are GCN models

GNNs and GCNs models are able to combine heterogeneous omics data types with graph data representations into a predictive model and learn abstract features from both data types. Based on our study, it can be observed that GCNs are the prevalent architectural choice (Fig. 4D). This is due to the fact that the DK is usually represented as a graph (as the phenotype correlates with modules constituting a graph, i.e., sets of related nodes).

60% of the GNN/GCNs used PPIs as a DK

Due to non-reticular structure data such as graphs, GCNs are successfully used to encode protein-protein interaction networks (PPIs) to predict cancer subtypes, to identify and classify normal tissue and tumour samples for many types of cancer (60% of the GCNs used PPIs as a DK, Fig. 5). GCNs can systematically determine which part of the pathway is useful for characterising the tumour. Whether neural networks encoding of biological relations as prior knowledge can accelerate biological discoveries remains largely unknown.

PPI networks used in PRE to obtain input to GNN

We observe a pattern that GNN/GCN models are associated with the PPI network application in the pre-processing stage (PRE). Tabular data containing measured multi-omics features are transformed into graphs and then fed into the model. PPI networks are derived from databases such as: STRING, CPDB, HPRD, BioGRID (Fig. 5).

Sparse connections as a key design feature

Pathways encoded via sparse connections is an emerging architectural pattern. We observed an pattern in the approaches towards which employ sparse connections mapping to layers and nodes which have a grounded biological meaning. Domain knowledge integration allows for explicit definition of connections between nodes of DNN. To achieve this, the relational biases of pathways is exploited, where relations are obtained from knowledge bases (KEGG, Reactome, SIGNOR) and used as a mask within the model for removing connections which are not represented. Thus, DK integration in ARCH allows for better, more efficient and meaningful POSTHOC interpretation (Fig. 5) as well as biological plausibility. As a result, a new architectural paradigm emerges (Fig. 3B), which conforms the architecture to reflect biological relations.

As the organisation of genes in pathways shapes the high heterogeneity of cancers, taking into account the topology of gene interactions may further help to characterize new gene or disease modules. This is reflected in the prevalence of pathways in DL models: 48.4% models that integrated DK used pathways (Fig. 4B, C). This corresponds to increase in popularity of sparse DNN and sparse AE/VAE models, as the sparsity comes from limited connections between layers defined by the pathways (4 in 2020, 5 in 2021, Fig. 4D).

Improved support for biomarker discovery

From a machine learning (ML) perspective, predicting clinical outcome can be framed as a classification or regression task, and patient or tumor specific subnets can be identified as distinguishing features. However, the high dimensionality of multi-omics data drives an instability in the feature selection process. In this context, stability means that with minor data perturbations, the process is able to preserve the same features [125, 126]. Thus, for minor changes in samples, the biomarker detection method should select a consistent/similar gene set. Ideally, the biomarkers can be applied to any sample in the dataset. In general, finding the relevant features remains a major challenge in the high-dimensional, low sample-size setting, in which features are correlated, either by nature (and this is the case in most molecular datasets) or merely by chance (as the number of samples is relatively small). Finding these truly relevant features is significantly more challenging than finding features that provide optimal predictivity. In practice, current algorithms tend to focus on the prediction error of the models and usually are highly unstable, which limits its applicability in a clinical setting and creates barriers for the interpretation of biomedical insight. Stability of the biomarker discovery can be improved by including prior knowledge (i.e. DK) of molecular networks (e.g., pathways or PPI networks; [125, 126]).

Research questions - summary

What are the perspectives of interpretability across different DL-based frameworks within the cancer research domain?

Based on our proposed taxonomy we argue that to provide biological interpretability to a DL model used in cancer biology, is to enable the domain expert to contemplate the data flow in the entire model and decompose its architectural elements into elements which maps to a biological reasoning and to the structure of the underlying biological mechanisms. We argue that the key explainability property for this class of models is decomposability. Each component can be also viewed as a computational step which transforms the data representation, e.g. in both an explicit or latent form. Although individual computational steps may be mathematically complex, which is inherent to modelling a biological system, they should be organised in the models’ architecture in a way that supports the decomposability of the inference process. This will build the representational foundations to deliver bio-centric interpretability.

What are the methods that deliver biological interpretability?

We argue that a promising category of methods are grounded on sparse connections between neurons (e.g. KPNN), that include skip-connections between hidden layers and that this mechanism supports both bio-centric interpretability and improves the biological plausibility of the inference. Such architecture combined with state-of-the art DL explainability methods allow for tracking back in the network the contribution of biologically grounded components to individual outputs. We argue that designing for bio-centric interpretability, i.e. performing architectural choices which minimise the construction of latent representations which are not easily linked to biological primitives should be at the center of any application of DL for cancer.

What are the desirable approaches to integration of domain knowledge in the models’ architecture?

DL models can induce a lack of parsimony in data representation (excessive latent features) delivering models which are intrinsically opaque. The application of explainability methods cannot fully circumvent this limitation, limiting the ability of these models to deliver meaningful biological insights. Post-hoc interpretation often leads to confirmation of known existing relations, which is presented as the evidence of the biological plausibility of a model. However, it has been documented that even untrained neural networks can produce saliency maps that appear meaningful [127]. Thus, we argue that bio-centric interpretability may manifest as the ability of the model’s architecture to reflect an isomorphism with regard to known biological structures and processes, so that these can be explicitly investigated. Integration of DK allows for the definitions of these architectures. These elements allows for a better use of explainability methods which can rank network components (e.g. nodes activation or edge weights), and return references to biologically grounded elements.

What are the emerging representation paradigms within these models?

Based on our Review we identify two main trends:

  • Input data is transformed into graphs or network and fed into GNN or GCN

  • Input data is fed into sparsely connected Deep Neural Network, where connections are defined by biological relations

We observed that frequently the multi-omics data is transformed prior to the model input. The transformation extends beyond computational techniques such as the enrichment analysis, and impacts the data representation: tabular data becomes a graph or network (Fig. 3). They can be constructed in data-driven manner, e.g. based on the correlation within the data, like gene-gene interaction networks, or constructed through database DK integration, e.g. input data expressed in nodes and edges of known PPI networks (Fig. 3A). Then, the graph representation is processed in a GNN or GCN. We observe an upward trend in the usage of such models, in most cases using PPI networks as DK.

The second trend focuses more on the architecture of the model, i.e. on the connection between neurons on the network. The input data still can have tabular representation and, because the bio-interpretability comes from carefully crafted architecture, the ability to track back the signal between output and input is not lost. Intuitively, the more times the representation of the data changes in the model, the less interpretable the data flow appears to be. Despite the advantages of graphs in describing biological relations, they might be not the best solution for a DL model, because transforming input data into a graph makes the data flow less transparent (e.g. graph to PCA, then to 2D image; convolutions on graphs). Preserving tabular input data representation may allow for more transparent post-hoc explanations, provided that the model’s architecture reflects biological relations. For such models, pathways and functional modules derived from knowledge bases are used for defining the the sparse connections (Fig. 3B).


In this systematic review we focused on the biological interpretability of Deep Learning models that target omics data developed in the domain of cancer biology. We introduced the new concept of bio-centric interpretability and defined its key properties and components. According to a taxonomy centered around this notion, we critically reviewed recent studies in the context of model architecture, domain knowledge integration and biological interpretability methods.

We found that the convergence between the use of external domain knowledge and the design of architectures which reflect the structure of known biological mechanisms can deliver: (i) the model explainability required by domain experts, (ii) the improvement of the biological plausibility of these models, (iii) the improvement of the explanation quality delivered by post-hoc methods and more fundamentally (iv) the repositioning of DL models from opaque pure-predictors to explainable models which can support new biological insight. The two most common approaches to incorporate DK into the model are to use pathways or PPI networks (Figs. 3 and 4). They can be used for (i) data augmentation, (tabular mRNA data \(\rightarrow\) graphs based on gene interactions) and (ii) to biologically ground the architecture of the model (e.g. mapping the connections between nodes). Domain knowledge is most frequently represented as pathways and PPI networks, which are derived from public databases, such as KEGG and Reactome, exploiting the existing curated biological knowledge. The vast majority of reviewed models attempt to interpret the output by post-hoc analyses, with a clear pattern: the more domain knowledge is reflected in the model design, the more interpretable is the post-hoc analysis. Although expert knowledge is always required to interpret the results, we assert that only the integration of explicit domain knowledge in the model design may lead to the improvement in understanding the underlying biological mechanisms. As the notion of biological interpretability is still largely unformalised, we highlight the need for universal bio-centric interpretable methods, so the developed methods are less problem- or application-specific.

In recent years we observe a significant increase of the amount of DL models developed for cancer research. Gradual improvement of their performance and better interpretability will facilitate the adoption of these models to support biomedical inference. Still there are challenges that need to be systematically addressed. First, with the decrease of costs for the acquisition of molecular-level data and accessibility of patients screening improving, more data will become available, often most likely as multi-omics. Commonly, there will be an imbalance between the feature set p and the sample size n (high dimensionality low sample size). DL models in oncology will in many cases need to integrate various data modalities in a efficient and traceable manner, at the same time handling the \(p>> n\) regime.

Second, we emphasise the need for reproducibility and benchmarking for DL models in cancer. Although publicly available datasets are often used, the selection of sub-datasets (e.g. only one tumor type selected for modelling), modelling approaches and explainability methods vary. As the consequence, the biomarkers and biological relations indicated by the models as predictors or important are inconsistent, containing already well-known biological facts, potential new discoveries and spurious, false biomarkers. At this point, it is challenging to resolve the difference between the last two. The benchmarks, i.e. datasets with expected interpretations, will allow for the model verification and reliable comparison between developed models.

Third, there is a clear direction is set towards domain prior knowledge integration. All the studies we reviewed accredit the improved interpretability to the incorporation of any form of biological knowledge into the model. We anticipate that the future models will exploit known biological relation to the greater extend by combining DL expressivity and flexibility with mechanistic modelling methods.


In this systematic review, we summarise emerging DL models in cancer biology covering the representation of biological processes, diagnosis and prognosis, and recent progress in biologically informed models. To this end, we started by searching electronic bibliographic databases (PubMed and Web of Science) for relevant studies published between Jan 1, 2018, and Jan 1, 2022. We used the following terms: multi-omics and deep learning or computer science or neural networks or network analysis or machine learning and cancer or cancer biology. The same search was repeated just before the final experimental analysis for completeness (Mar 1, 2022).

We concentrated on deep learning methods applied to cancer or at least to those that are linked to straightforward applications in cancer biology using multi-omics data conducted in humans or human cell lines. Furthermore, we also searched the reference lists of published trials and the relevant review articles. At last we only concentrated on DL applications for omics data including: genomics, transcriptomics and epigenomics data from cancer in humans. We excluded studies published in languages other than English, studies with insufficient data (i.e., studies where full texts were not available or irrelevant studies), case reports, editorial materials, comments and meeting abstracts. Similarly, all pre–clinical studies conducted either in animal cell lines or murine models, review articles, meta-analyses or studies performed on animals and animal cell lines were excluded. Papers providing methods that are not directly linked to cancer and functional analysis/insights on biological processes were excluded. Papers centered around medical imaging were excluded (e.g. histopathology and computed tomography) as well papers provided models based on DL and ML using clinical/laboratory data alone. Studies based on microarray data or developed a sequence-based algorithmic framework were excluded as well.

Using the search strategy, we obtained titles and abstracts of retrieved studies and imported them to an endnote. Two authors independently screened identified studies on the basis of prespecified inclusion criteria. All potentially relevant articles were read in full text and a list of eligible studies was created. Data were manually extracted using a structured template and any disagreements were resolved by mutual agreement between these two authors during the process of screening and data extraction, or by intervention of a third author. A standardised data extraction form was used to extract the following fields: authors’ names, year of publication, type of omic data, model’s output, type of prior knowledge, prior knowledge databases, type of domain knowledge integration, type of deep learning model/architecture and interpretability method used.

The multi-omics data can be represented in various ways in the subsequent components of the model. This representation can be changed into a another representation in a series of computations steps. It is crucial to understand these representations, how they are transformed, and how to communicate such transformation during post-hoc inference. We distinguished four bio-centric interpretability components: the integration of different data modalities, the schema level representation of the model, the integration of domain knowledge, post-hoc explainability methods. We took the concept of interpretability and distinguished three categories: architecture-centric interpretability, output-centric interpretability, and post-hoc evaluation of biological plausibility. The association of the model to each of the identified groups was done manually based on the authors’ expertise.

The selection criteria resulted in 42 studies (see footnote 1). We elaborate on the components involved in bio-centric interpretability within DL models, focusing on emerging architectural and methodological advances, the encoding of biological domain knowledge and the integration of explainability methods. The dimensions of bio-centric interpretability for recent studies are presented in Additional file 1: Table S1.

Availability of data and materials

All data analyzed during this study are included in this published article.


  1. Each of the selected studies is from the existing state of the art and not performed by any of the authors.



Copy number alteration


Deep Learning


Domain knowledge


Deep neural network


Graph Convolution Network


Graph Neural Network


Gene Ontology


Machine Learning


Protein-protein interaction


Explainable AI


  1. Baptista D, Ferreira PG, Rocha M. Deep learning for drug response prediction in cancer. Brief Bioinform. 2020;22(1):360–79.

    Article  CAS  Google Scholar 

  2. Sharifi-Noghabi H, Zolotareva O, Collins CC, Ester M. MOLI: multi-omics late integration with deep neural networks for drug response prediction. Bioinformatics. 2019;35(14):501–9.

    Article  CAS  Google Scholar 

  3. Kumar Y, Gupta S, Singla R, Hu Y-C. A Systematic review of artificial intelligence techniques in cancer prediction and diagnosis. Archiv Comput Methods Eng. 2021;29:2043–70.

    Article  Google Scholar 

  4. Tran KA, Kondrashova O, Bradley A, Williams ED, Pearson JV, Waddell N. Deep learning in cancer diagnosis, prognosis and treatment selection. Genome Med. 2021;13(1):152.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Tufail AB, Ma Y-K, Kaabar MKA, Martínez F, Junejo AR, Ullah I, Khan R. Deep learning in cancer diagnosis and prognosis prediction: a minireview on challenges, recent trends, and future directions. Comput Math Methods Med. 2021;2021:1–28.

    Article  Google Scholar 

  6. PCAWG Tumor Subtypes and Clinical Translation Working Group, PCAWG Consortium, Jiao W, Atwal G, Polak P, Karlic R, Cuppen E, Danyi A, de Ridder J, van Herpen C, Lolkema MP, Steeghs N, Getz G, Morris Q, Stein LD. A deep learning system accurately classifies primary and metastatic cancers using passenger mutation patterns. Nat Commun. 2020;11(1):728.

    Article  CAS  Google Scholar 

  7. Hassanzadeh HR. Wang MD (2021) An integrated deep network for cancer survival prediction using omics data. Front Big Data. 2021;4: 568352.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Kipkogei E, Arango Argoty GA, Kagiampakis I, Patra A, Jacob E. Explainable transformer-based neural network for the prediction of survival outcomes in non-small cell lung cancer (NSCLC). medRxiv. 2021.

    Article  Google Scholar 

  9. Bhinder B, Gilvary C, Madhukar NS, Elemento O. Artificial intelligence in cancer research and precision medicine. Cancer Discov. 2021;11(4):900–15.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Dragani TA, Matarese V, Colombo F. Biomarkers for early cancer diagnosis: prospects for success through the lens of tumor genetics. BioEssays. 2020;42(4):1900122.

    Article  Google Scholar 

  11. Shi K, Lin W, Zhao X-M. Identifying molecular biomarkers for diseases with machine learning based on integrative omics. IEEE/ACM Trans Comput Biol Bioinform. 2020;18(6):2514–25.

    Article  Google Scholar 

  12. Kaur H, Kumar R, Lathwal A, Raghava GP. Computational resources for identification of cancer biomarkers from omics data. Brief Funct Genomics. 2021;20(4):213–22.

    Article  CAS  PubMed  Google Scholar 

  13. Eraslan G, Avsec Z, Gagneur J, Theis FJ. Deep learning: new computational modelling techniques for genomics. Nat Rev Genet. 2019;20(7):389–403.

    Article  CAS  PubMed  Google Scholar 

  14. Dhillon A, Singh A, Bhalla VK. A systematic review on biomarker identification for cancer diagnosis and prognosis in multi-omics from computational needs to machine learning and deep learning. Archiv Comput Methods Eng. 2023;30(2):917–49.

    Article  Google Scholar 

  15. Xiao Y, Bi M, Guo H, Li M. Multi-omics approaches for biomarker discovery in early ovarian cancer diagnosis. EBioMedicine. 2022;79: 104001.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. He X, Liu X, Zuo F, Shi H, Jing J. Artificial intelligence-based multi-omics analysis fuels cancer precision medicine. Sem Cancer Biol. 2022;88:187–200.

    Article  CAS  Google Scholar 

  17. Kang M, Ko E, Mersha TB. A roadmap for multi-omics data integration using deep learning. Brief Bioinform. 2022;23(1):454.

    Article  Google Scholar 

  18. Yu X, Zhou S, Zou H, Wang Q, Liu C, Zang M, Liu T. Survey of deep learning techniques for disease prediction based on omics data. Hum Gene. 2022;35:201140.

    Article  Google Scholar 

  19. Montavon G, Samek W, Müller K-R. Methods for interpreting and understanding deep neural networks. Digit Signal Process. 2018;73:1–15.

    Article  Google Scholar 

  20. Adadi A, Berrada M. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access. 2018;6:52138–60.

    Article  Google Scholar 

  21. Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D. A survey of methods for explaining black box models. ACM Comput Surv. 2018;51(5):1–42.

    Article  Google Scholar 

  22. Marcinkevičs R, Vogt JE. Interpretability and explainability: a machine learning zoo mini-tour. arXiv,1–24. 2020. arXiv:2012.01805

  23. Belle V, Papantonis I. Principles and practice of explainable machine learning. Front Big Data. 2021;4:39.

    Article  Google Scholar 

  24. Samek W, Montavon G, Lapuschkin S, Anders CJ, Muller K-R. Explaining deep neural networks and beyond: a review of methods and applications. Proc IEEE. 2021;109(3):247–78.

    Article  Google Scholar 

  25. Thayaparan M, Valentino M, Freitas A. A survey on explainability in machine reading comprehension. CoRR. 2020. arXiv:2010.00389

  26. Talukder A, Barham C, Li X, Hu H. Interpretation of deep learning in genomics and epigenomics. Brief Bioinform. 2020;22(3):bbaa177.

    Article  CAS  PubMed Central  Google Scholar 

  27. Watson DS. Interpretable machine learning for genomics. arXiv preprint arXiv:2110.03063. 2021.

  28. Wysocki O, Zhou Z, O’Regan P, Ferreira D, Wysocka M, Landers D, Freitas A. Transformers and the representation of biomedical background knowledge. Comput Linguist. 2023;49(1):73–115.

    Article  Google Scholar 

  29. Novakovsky G, Dexter N, Libbrecht MW, Wasserman WW, Mostafavi S. Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat Rev Genet. 2022;24(2):125–37.

    Article  CAS  PubMed  Google Scholar 

  30. Holzinger A, Biemann C, Pattichis CS, Kell DB. What do we need to build explainable ai systems for the medical domain? arXiv preprint arXiv:1712.09923. 2017.

  31. Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow P-M, Zietz M, Hoffman MM, Xie W, Rosen GL, Lengerich BJ, Israeli J, Lanchantin J, Woloszynek S, Carpenter AE, Shrikumar A, Xu J, Cofer EM, Lavender CA, Turaga SC, Alexandari AM, Lu Z, Harris DJ, DeCaprio D, Qi Y, Kundaje A, Peng Y, Wiley LK, Segler MHS, Boca SM, Swamidass SJ, Huang A, Gitter A, Greene CS. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface. 2018;15(141):20170387.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Stiglic G, Kocbek P, Fijacko N, Zitnik M, Verbert K, Cilar L. Interpretability of machine learning-based prediction models in healthcare. WIREs Data Mining Knowl Discov. 2020;10(5):e1379.

    Article  Google Scholar 

  33. Tjoa E, Guan C. A survey on explainable artificial intelligence (xai): toward medical xai. IEEE Trans Neural Netw Learn Syst. 2021;32:4793–813.

    Article  PubMed  Google Scholar 

  34. Yang AC, Kern F, Losada PM, Agam MR, Maat CA, Schmartz GP, Fehlmann T, Stein JA, Schaum N, Lee DP, Calcuttawala K, Vest RT, Berdnik D, Lu N, Hahn O, Gate D, McNerney MW, Channappa D, Cobos I, Ludwig N, Schulz-Schaeffer WJ, Keller A, Wyss-Coray T. Dysregulation of brain and choroid plexus cell types in severe COVID-19. Nature. 2021;595(7868):565–71.

  35. Wysocki O, Davies JK, Vigo M, Armstrong AC, Landers D, Lee R, Freitas A. Assessing the communication gap between AI models and healthcare professionals: explainability, utility and trust in AI-driven clinical decision-making. Artif Intell. 2023;316: 103839.

    Article  Google Scholar 

  36. Bogatu A, Wysocka M, Wysocki O et al. Meta-analysis informed machine learning: Supporting cytokine storm detection during CAR–T cell Therapy. J Biomed Inform. 2023.

    Article  PubMed  Google Scholar 

  37. Holzinger A, Müller H. Toward Human–AI interfaces to support explainability and causability in medical AI. Computer. 2021;54(10):78–86.

    Article  Google Scholar 

  38. Montavon G, Samek W, Müller K-R. Methods for interpreting and understanding deep neural networks. Proc Natl Acad Sci. 2019;73:1–15.

    Article  Google Scholar 

  39. Bauer K, von Zahn M, Hinz O. Expl(Ai)Ned: the impact of explainable artificial intelligence on cognitive processes.

  40. Lipton ZC. The mythos of model interpretability. arXiv:1606.03490 [cs, stat]. 2017.

  41. Gilpin LH, Bau D, Yuan BZ, Bajwa A, Specter M, Kagal L. Explaining explanations: an overview of interpretability of machine learning. In: 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), pp. 80–89.

  42. Samek W, Müller K-R. Towards Explainable Artificial Intelligence. In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.) Explainable AI: interpreting, explaining and visualizing deep learning. Lecture Notes in Computer Science, vol. 11700, pp. 5–22. Springer International Publishing.

  43. Murdoch WJ, Singh C, Kumbier K, Abbasi-Asl R, Yu B (2019) Definitions, methods, and applications in interpretable machine learning. Proc Natl Acad Sci 116(44), 22071–22080

  44. Thayaparan M, Valentino M, Freitas A. A survey on explainability in machine reading comprehension. arXiv preprint arXiv:2010.00389. 2020.

  45. Holzinger A, Dehmer M, Emmert-Streib F, Cucchiara R, Augenstein I, Ser JD, Samek W, Jurisica I, Díaz-Rodríguez N. Information fusion as an integrative cross-cutting enabler to achieve robust, explainable, and trustworthy medical artificial intelligence. Inf Fusion. 2022;79:263–78.

    Article  Google Scholar 

  46. Tufail AB, Ma Y-K, Kaabar MKA, Rehman AU, Khan R, Cheikhrouhou O. Classification of initial stages of alzheimer’s disease through pet neuroimaging modality and deep learning: quantifying the impact of image filtering approaches. Mathematics. 2021;9(23):3101.

    Article  Google Scholar 

  47. Zhao Y, Shao J, Asmann YW. Assessment and optimization of explainable machine learning models applied to transcriptomic data. Genom Proteomics Bioinform. 2022.

    Article  Google Scholar 

  48. Watson DS. Interpretable machine learning for genomics. Hum Genet. 2022;141(9):1499–513.

    Article  CAS  PubMed  Google Scholar 

  49. Baptista D, Ferreira PG, Rocha M. Deep learning for drug response prediction in cancer. Brief Bioinformat. 2021;22(1):360–79.

    Article  CAS  Google Scholar 

  50. Junejo AR, Kaabar MKA, Li X. Optimization: molecular communication networks for viral disease analysis using deep leaning autoencoder. Comput Math Methods Med. 2021;2021:1–11.

    Article  Google Scholar 

  51. Torkamannia A, Omidi Y, Ferdousi R. A review of machine learning approaches for drug synergy prediction in cancer. Brief Bioinform. 2022;23(3):75.

    Article  CAS  Google Scholar 

  52. Kumar V, Dogra N. A comprehensive review on deep synergistic drug prediction techniques for cancer. Archiv Comput Methods Eng. 2021;29(3):1443–61.

    Article  Google Scholar 

  53. Picard M, Scott-Boyer M-P, Bodein A, Périn O, Droit A. Integration strategies of multi-omics data for machine learning analysis. Comput Struct Biotechnol J. 2021;19:3735–46.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Alharbi WS, Rashid M. A review of deep learning applications in human genomics using next-generation sequencing data. Human Genom. 2022;16(1):26.

    Article  Google Scholar 

  55. Mo H, Breitling R, Francavilla C, Schwartz J-M. Data integration and mechanistic modelling for breast cancer biology: Current state and future directions 24, 100350. Accessed 2023-03-04.

  56. Benk M, Ferrario A. Explaining interpretable machine learning: theory, methods and applications. SSRN Electron J. 2020. 10/gktgb9. Accessed 2022-01-02.

  57. Doshi-Velez F, Kim B. Towards a rigorous science of interpretable machine learning. 2017. arXiv:1702.08608.

  58. von Rueden L, Mayer S, Beckh K, Georgiev B, Giesselbach S, Heese R, Kirsch B, Pfrommer J, Pick A, Ramamurthy R, Walczak M, Garcke J, Bauckhage C, Schuecker J. Informed machine learning – a taxonomy and survey of integrating knowledge into learning systems. IEEE Transactions on Knowledge and Data Engineering, 1–1. 2021. 10/gkzc3j. arXiv: 1903.12394. Accessed 2021-11-17.

  59. Barabási A-L, Oltvai ZN. Network biology: understanding the cell’s functional organization. Nat Rev Genet. 2004;5(February):101–13.

    Article  CAS  PubMed  Google Scholar 

  60. Mcgillivray P, Clarke D, Meyerson W, Zhang J, Lee D, Gu M, Kumar S, Zhou H, Gerstein M. Network analysis as a grand unifier in biomedical data science. Annu Rev Biomed Data Sci. 2018;1:153–80.

    Article  Google Scholar 

  61. Vidal M, Cusick ME, Barabási A-L. Interactome networks and human disease. Cell. 2011;144:986–98.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 2011;2012(40):109–14.

    Article  CAS  Google Scholar 

  63. Matthews L, Gopinath G, Gillespie M, Caudy M, Croft D, Bono BD, Garapati P, Hemish J, Hermjakob H, Jassal B, Kanapin A, Lewis S, Mahajan S, May B, Schmidt E, Vastrik I, Wu G, Birney E, Stein L, Eustachio PD. Reactome knowledgebase of human biological pathways and processes. Nucleic Acids Res. 2009;37:619–22.

    Article  CAS  Google Scholar 

  64. Viswanathan GA, Seto J, Patil S, Nudelman G, Sealfon SC. Getting started in biological pathway construction and analysis. PLoS Comput Biol. 2008;4(2):1–5.

    Article  CAS  Google Scholar 

  65. Oh JH, Choi W, Ko E, Kang M, Tannenbaum A, Deasy JO. PathCNN: interpretable convolutional neural networks for survival prediction and pathway analysis applied to glioblastoma. Bioinformatics. 2021;37(S1):443–50.

    Article  Google Scholar 

  66. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. IEEE Int Conf Comput Vis. 618–26 2017.

  67. Lemsara A, Ouadfel S, Fröhlich H. PathME: pathway based multi-modal sparse autoencoders for clustering of patient-level multi-omics data. BMC Bioinform. 2020;21(1):146.

    Article  Google Scholar 

  68. Lee S, Lim S, Lee T, Sung I, Kim S. Cancer subtype classification and modeling by pathway attention and propagation. Bioinformatics. 2020;36(12):3818–24.

    Article  CAS  PubMed  Google Scholar 

  69. Althubaiti S, Kulmanov M, Liu Y, Gkoutos GV, Schofield P, Hoehndorf R. DeepMOCCA: a pan-cancer prognostic model identifies personalized prognostic markers through graph attention and multi-omics data integration. Bioinformatics. 2021.

    Article  Google Scholar 

  70. Tate JG, Bamford S, Jubb HC, Sondka Z, Beare DM, Bindal N, Boutselakis H, Cole CG, Creatore C, Dawson E, Fish P, Harsha B, Hathaway C, Jupe SC, Kok CY, Noble K, Ponting L, Ramshaw CC, Rye CE, Speedy HE, Stefancsik R, Thompson SL, Wang S, Ward S, Campbell PJ, Forbes SA. COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Res. 2018;47(D1):941–7.

    Article  CAS  Google Scholar 

  71. Hinton GE, Roweis S. Stochastic neighbor embedding. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in neural information processing systems, vol. 15. MIT Press, Cambridge. 2002.

  72. Chuang Y-H, Huang S-H, Hung T-M, Lin X-Y, Lee J-Y, Lai W-S, Yang J-M. Convolutional neural network for human cancer types prediction by integrating protein interaction networks and omics data. Sci Rep. 2021;11(1):20691.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Chereda H, Bleckmann A, Menck K, Perera-Bel J, Stegmaier P, Auer F, Kramer F, Leha A, Beißbarth T. Explaining decisions of graph convolutional neural networks: patient-specific molecular subnetworks responsible for metastasis prediction in breast cancer. Genome Med. 2021;13(1):42.

    Article  PubMed  PubMed Central  Google Scholar 

  74. Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK, Surendranath V, Niranjan V, Muthusamy B, Gandhi TKB, Gronborg M, Ibarrola N, Deshpande N, Shanker K, Shivashankar HN, Rashmi BP, Ramya MA, Zhao Z, Chandrika KN, Padma N, Harsha HC, Yatish AJ, Kavitha MP, Menezes M, Choudhury DR, Suresh S, Ghosh N, Saravana R, Chandran S, Krishna S, Joy M, Anand SK, Madavan V, Joseph A, Wong GW, Schiemann WP, Constantinescu SN, Huang L, Khosravi-Far R, Steen H, Tewari M, Ghaffari S, Blobe GC, Dang CV, Garcia JGN, Pevsner J, Jensen ON, Roepstorff P, Deshpande KS, Chinnaiyan AM, Hamosh A, Chakravarti A, Pandey A. Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res. 2003;13(10):2363–71.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, Balakrishnan L, Marimuthu A, Banerjee S, Somanathan DS, Sebastian A, Rani S, Ray S, Harrys Kishore CJ, Kanth S, Ahmed M, Kashyap MK, Mohmood R, Ramachandra YL, Krishna V, Rahiman BA, Mohan S, Ranganathan P, Ramabadran S, Chaerkady R, Pandey A. Human Protein Reference Database-2009 update. Nucleic Acids Res. 2009;37:767–72.

    Article  CAS  Google Scholar 

  76. Ramirez R, Chiu Y-C, Hererra A, Mostavi M, Ramirez J, Chen Y, Huang Y, Jin Y-F. Classification of cancer types using graph convolutional neural networks. Front Phys. 2020;8:203.

    Article  PubMed  PubMed Central  Google Scholar 

  77. Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S, Doncheva NT, Legeay M, Fang T, Bork P, Jensen LJ, von Mering C. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 2020;49(D1):605–12.

    Article  Google Scholar 

  78. Schulte-Sasse R, Budach S, Hnisz D, Marsico A. Graph convolutional networks improve the prediction of cancer driver genes. In Artificial Neural Networks and Machine Learning-ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17-19, 2019, Proceedings 2019;28:658–68.

  79. Binder A, Montavon G, Bach S, Müller K-R, Samek W. Layer-wise relevance propagation for neural networks with local renormalization layers. arXiv (2016).

  80. Liu C, Han Z, Zhang Z-K, Nussinov R, Cheng F. A network-based deep learning methodology for stratification of tumor mutations. Bioinformatics. 2021;37(1):82–8.

    Article  PubMed  PubMed Central  Google Scholar 

  81. Ribeiro LFR, Saverese PHP, Figueiredo DR. struc2vec: Learning node representations from structural identity. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York. 2017.

  82. Liu Q, Xie L. TranSynergy: mechanism-driven interpretable deep neural network for the synergistic prediction and pathway deconvolution of drug combinations. PLOS Comput Biol. 2021;17(2):1008653.

    Article  Google Scholar 

  83. Huang Z, Zhan X, Xiang S, Johnson TS, Helm B, Yu CY, Zhang J, Salama P, Rizkalla M, Han Z, Huang K. SALMON: survival analysis learning with multi-omics neural networks on breast cancer. Front Genet. 2019;10:166.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Zhang J, Huang K. Normalized ImQCM: an algorithm for detecting weak quasi-cliques in weighted graph with applications in gene co-expression module discovery in cancers. Cancer Inform. 2014;13(s3):14021.

    Article  Google Scholar 

  85. Wang T, Shao W, Huang Z, Tang H, Zhang J, Ding Z, Huang K. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat Commun. 2021;12(1):3445.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Cao Z-J, Gao G. Multi-omics integration and regulatory inference for unpaired single-cell data with a graph-linked unified embedding framework. Bioinformatics. 2021.

  87. Xing X, Yang F, Li H, Zhang J, Zhao Y, Huang J, Meng MQ-H, Yao J. Multi-level attention graph neural network for clinically interpretable pathway-level biomarkers discovery. Bioinformatics. 2020.

    Article  PubMed  PubMed Central  Google Scholar 

  88. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform. 2008;9(559):1–13.

    Article  CAS  Google Scholar 

  89. Tsherniak A, Vazquez F, Montgomery PG, Golub TR, Boehm JS, Hahn WC, Tsherniak A, Vazquez F, Montgomery PG, Weir BA, Kryukov G, Cowley GS, Gill S, Harrington WF, Pantel S, Krill-burger JM, Meyers RM, Ali L, Goodale A, Lee Y, Jiang G, Hsiao J, Gerath WFJ, Howell S, Merkel E, Ghandi M, Garraway LA, Root DE, Golub TR, Boehm JS. Defining a cancer dependency map. Cell. 2017;170:564–76.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Chiu Y-C, Zheng S, Wang L-J, Iskra BS, Rao MK, Houghton PJ, Huang Y, Chen Y. Predicting and characterizing a cancer dependency map of tumors with deep learning. Sci Adv. 2021;7(34):1275.

    Article  Google Scholar 

  91. Elmarakeby HA, Hwang J, Arafeh R, Crowdis J, Gang S, Liu D, AlDubayan SH, Salari K, Kregel S, Richter C, Arnoff TE, Park J, Hahn WC, Van Allen EM. Biologically informed deep neural network for prostate cancer discovery. Nature. 2021;598(7880):348–52.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Shrikumar A, Greenside P, Shcherbina AY, Kundaje A. Not just a black box: learning important features through propagating activation differences. arXiv, 1–6. 2017. arXiv:arXiv:1605.01713v3.

  93. Deng L, Cai Y, Zhang W, Yang W, Gao B, Liu H. Pathway-guided deep neural network toward interpretable and predictive modeling of drug sensitivity. J Chem Inf Model. 2020;60:4497–505.

    Article  CAS  PubMed  Google Scholar 

  94. Zhao L, Dong Q, Luo C, Wu Y, Bu D, Qi X, Luo Y, Zhao Y. DeepOmix: a scalable and interpretable multi-omics deep learning framework and application in cancer survival analysis. Comput Struct Biotechnol J. 2021;19:2719–25.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Feng J, Zhang H, Li F. Investigating the relevance of major signaling pathways in cancer survival using a biologically meaningful deep learning model. BMC Bioinform. 2021;22(1):47.

    Article  Google Scholar 

  96. Smilkov D, Thorat N, Kim B, Viégas F, Wattenberg M. SmoothGrad: removing noise by adding noise. arXiv, 1–10. 2017. arXiv:1706.03825.

  97. Zhang H, Chen Y, Li F. Predicting anticancer drug response with deep learning constrained by signaling pathways. Front Bioinform. 2021;1: 639349.

    Article  PubMed  PubMed Central  Google Scholar 

  98. Hao J, Kim Y, Kim T-K, Kang M. PASNet: pathway-associated sparse deep neural network for prognosis prediction from high-throughput data. BMC Bioinformatics. 2018;19(1):510.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. Seninge L, Anastopoulos I, Ding H, Stuart J. VEGA is an interpretable generative model for inferring biological network activity in single-cell transcriptomics. Nat Commun. 2021;12(1):5684.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Fortelny N, Bock C. Knowledge-primed neural networks enable biologically interpretable deep learning on single-cell sequencing data. Genome Biol. 2020;21(1):190.

    Article  PubMed  PubMed Central  Google Scholar 

  101. Turner RM, Park BK, Pirmohamed M. Parsing interindividual drug variability: an emerging role for systems pharmacology. WIREs Syst Biol Med. 2015;7:221–41.

    Article  Google Scholar 

  102. Kuenzi BM, Park J, Fong SH, Sanchez KS, Lee J, Kreisberg JF, Ma J, Ideker T. Predicting drug response and synergy using a deep learning model of human cancer cells. Cancer Cell. 2020;38(5):672–6846.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. Ma J, Yu MK, Fong S, Ono K, Sage E, Demchak B, Sharan R, Ideker T. Using deep learning to model the hierarchical structure and function of a cell. Nat Methods. 2018;15(4):290–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Ma T, Zhang A. Incorporating biological knowledge with factor graph neural network for interpretable deep learning. arXiv:1906.00537 [cs, q-bio]. 2019.

  105. Shu H, Zhou J, Lian Q, Li H, Zhao D, Zeng J, Ma J. Modeling gene regulatory networks using neural network architectures. Nat Comput Sci. 2021;1(7):491–501.

    Article  Google Scholar 

  106. Ching T, Zhu X, Garmire LX. Cox-nnet: an artificial neural network method for prognosis prediction of high-throughput omics data. PLOS Comput Biol. 2018;14(4):1006076.

    Article  Google Scholar 

  107. Therneau TM, Grambsch PM. Modeling Survival Data: Extending the Cox Model (Statistics for biology and health). New York: Springer; 2000. p. 350.

  108. Guo L-Y. Deep learning-based ovarian cancer subtypes identification using multi-omics data. BioData Mining. 2020;13:1–12.

    Article  Google Scholar 

  109. Rampášk L, Hidru D, Smirnov P, Haibe-Kains B, Goldenberg A. Dr.VAE: improving drug response prediction via modeling of drug perturbation effects. Bioinformatics. 2019;35(19):3743–51.

    Article  CAS  Google Scholar 

  110. Simidjievski N, Bodnar C, Tariq I, Scherer P, Andres Terre H, Shams Z, Jamnik M, Liò P. Variational autoencoders for cancer data integration: design principles and computational practice. Front Genet. 2019;10:1205.

    Article  PubMed  PubMed Central  Google Scholar 

  111. Hira MT, Razzaque MA, Angione C, Scrivens J, Sawan S, Sarker M. Integrated multi-omics analysis of ovarian cancer using variational autoencoders. Sci Rep. 2021;11(1):6265.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. Mcinnes L, Healy J, Melville J. UMAP: uniform manifold approximation and projection for dimension reduction. arXiv 2018. arXiv:1802.03426v3.

  113. van der Maaten L, Hinton G. Visualizing Data using t-SNE. J Mach Learn Res. 2008;9:2579–605.

    Article  Google Scholar 

  114. Anowar F, Sadaoui S, Selim B. Conceptual and empirical comparison of dimensionality reduction algorithms (PCA, KPCA, LDA, MDS, SVD, LLE, ISOMAP, LE, ICA, t-SNE). Comput Sci Rev. 2021;40: 100378.

    Article  Google Scholar 

  115. Withnell E, Zhang X, Sun K, Guo Y. XOmiVAE: an interpretable deep learning model for cancer classification using high-dimensional omics data. Brief Bioinform. 2021;22(6):315.

    Article  Google Scholar 

  116. Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17, pp. 4768–4777. Curran Associates Inc., Red Hook, NY, USA. 2017.

  117. Kinalis S, Nielsen FC, Winther O, Bagger FO. Deconvolution of autoencoders to learn biological regulatory modules from single cell mRNA sequencing data. BMC Bioinformatics. 2019;20(1):379.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  118. Novershtern N, Subramanian A, Lawton LN, Mak RH, Haining WN, McConkey ME, Habib N, Yosef N, Chang CY, Shay T, Frampton GM, Drake ACB, Leskov I, Nilsson B, Preffer F, Dombkowski D, Evans JW, Liefeld T, Smutko JS, Chen J, Friedman N, Young RA, Golub TR, Regev A, Ebert BL. Densely interconnected transcriptional circuits control cell states in human hematopoiesis. Cell. 2011;144(2):296–309.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  119. Way GP, Greene CS. Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders. bioRxiv. 2017.

  120. Titus AJ, Wilkins OM, Bobak CA, Christensen BC. Unsupervised deep learning with variational autoencoders applied to breast tumor genome-wide DNA methylation data with biologic feature extraction. BioRxiv. 2018.

  121. Wang Z, Wang Y. Extracting a biologically latent space of lung cancer epigenetics with variational autoencoders. BMC Bioinformatics. 2019;20(S18):568.

    Article  PubMed  PubMed Central  Google Scholar 

  122. Palazzo M, Beauseroy P, Yankilevich P. A pan-cancer somatic mutation embedding using autoencoders. BMC Bioinformatics. 2019;20(1):655.

    Article  PubMed  PubMed Central  Google Scholar 

  123. Lin Y, Zhang W, Cao H, Li G, Du W. Classifying breast cancer subtypes using deep neural networks based on multi-omics data. Genes. 2020;11(8):1–18.

    Article  CAS  Google Scholar 

  124. Gao J, Lyu T, Xiong F, Wang J, Ke W, Li Z. MGNN: A Multimodal Graph Neural Network for Predicting the Survival of Cancer Patients. New York: Association for Computing Machinery; 2020. p. 1697–700.

    Book  Google Scholar 

  125. Cun Y, Fröhlich H. Prognostic gene signatures for patient stratification in breast cancer - accuracy, stability and interpretability of gene selection approaches using prior knowledge on protein-protein interactions. BMC Bioinformatics. 2012;13(69):1–13.

    Google Scholar 

  126. Oller-moreno S, Kloiber K, Machart P. Algorithmic advances in machine learning for single- cell expression analysis. Curr Opin Syst Biol. 2021;25:27–33.

    Article  CAS  Google Scholar 

  127. Adebayo J, Gilmer J, Muelly M, Goodfellow IJ, Hardt M, Kim B. Sanity checks for saliency maps. CoRR. 2018. arXiv:1810.03292.

Download references


Not applicable.


This project has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant agreement no 965397. This project has also be supported by funding from the digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, Cancer Research UK Manchester Institute P126273. The funding bodies played no role in the design of the study, research, writing and publication of the paper.

Author information

Authors and Affiliations



MW, OW and MZ researched data for the article. MW and OW wrote the article. All authors contributed substantially to discussion of the content, and reviewed and/or edited the manuscript before submission. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Magdalena Wysocka or Oskar Wysocki.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1:

Strategies of domain knowledge integration and explainability methodsLegends: Types of omic data: G - genomics, P - proteomics, T - transcriptomics, E - epigenomics; GO - GeneOntology; PPI - protein-protein interaction; WGCNA - Weighted Correlation Network Analysis; Deep Learningarchitecture: AE - autoencoder, ANN - Artificial Neural Networks, CNN - Convolutional Neural Network, DAE -Denoising Autoencoder, DBN - Deep Belief Network, DNN - Deep Neural Network, GCNN - graph convolutionalneural network, GCNN-MLP - GCNN multilayer perceptron, MMD-VAE - Maximum Mean Discrepancy VariationalAutoencoder, VAE - Variational Autoencoder, VCDN - View Correlation Discovery Network; Interpretability method:LRP - layer-wise relevance propagation; Interpretability group: II - intrinsically interpretable, PH - post-hoc; Interpretability group: PROC - Processing; REPR - Representation; CREATE -Explanation producing; NA - not applicable.

Additional file 2:

PRISMA checklist.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wysocka, M., Wysocki, O., Zufferey, M. et al. A systematic review of biologically-informed deep learning models for cancer: fundamental trends for encoding and interpreting oncology data. BMC Bioinformatics 24, 198 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Multi-omics Data
  • Cancer Genomics
  • Deep Learning
  • Explainable AI
  • Graph Neural Networks
  • Sparse Neural Networks
  • Domain Knowledge Integration