Graph embedding on mass spectrometry- and sequencing-based biomedical data

Graph embedding techniques are using deep learning algorithms in data analysis to solve problems of such as node classification, link prediction, community detection, and visualization. Although typically used in the context of guessing friendships in social media, several applications for graph embedding techniques in biomedical data analysis have emerged. While these approaches remain computationally demanding, several developments over the last years facilitate their application to study biomedical data and thus may help advance biological discoveries. Therefore, in this review, we discuss the principles of graph embedding techniques and explore the usefulness for understanding biological network data derived from mass spectrometry and sequencing experiments, the current workhorses of systems biology studies. In particular, we focus on recent examples for characterizing protein–protein interaction networks and predicting novel drug functions.


Introduction
In the literature, several reviews present graph embedding models used to solve multiple tasks such as pathogen-host protein interactions, predicting drug efficiency, linking a metabolite with a metabolic network, etc [1][2][3].However, wide spread application of graph embedding techniques in the life-science community has been scarce, which may be in part because the complex mathematical framework underlying graph embedding requires considerable bioinformatical expertise.To make graph embedding known to a wider research community we have focused our review to be accessible for wet-lab biologists as well as bioinformaticians, mainly using more accessible wording for life scientists and focussing on potential future applications.
Biological data is usually presented as graphs; some of the most famous ones are represented in the book Cellular Biochemical Networks (Editor: Gerhard Michal), which describes the known metabolomic network of eukaryotic cells and comprises most of the cellular metabolites and their interactions (i.e., possible conversions and connections between metabolic pathways such as sugar and amino acid metabolism).Although traditional biology tools have been extremely successful in identifying most components and some of the major linear interactions contained in the Cellular Biochemical Networks graphs, one of the significant challenges in biology is comprehending the nonlinear or dynamic interactions among the cellular constituents to unravel the organization and interactions within cellular networks.For example, understanding which metabolic subnetworks are active in a particular cell type under specific conditions is critical to decipher the influence of the metabolic network on cellular function.
Mass spectrometry (MS) is an excellent example of a tool for understanding the underlying interactions among large numbers of cellular constituents.MS-based metabolomic and proteomics studies can follow various linear and nonlinear interactions (based on signal abundances) and dynamic interactions from time series measurements.The interactions are visualized via correlation plots of the MS signals [4,5].In a correlation plot, metabolites, proteins, etc., are represented as dots (or nodes), and a line illustrates their correlations with other network elements.Using carefully designed experiments and bioinformatic tools makes it possible to model and quantify the different types of interactions between the nodes.Hence, a traditional approach in molecular biology is to compare two or more graphs to identify which metabolites or proteins in the biological network are associated with a particular physiology (i.e., disease) or phenotype of interest [4,5].
Unfortunately, clear insight into biological information via visual inspection of the correlation plots is challenging due to the large number of biological species present in cells that MS can detect.Furthermore, artifacts such as the presence of ghost peaks or batch effects can futher obscure the information within these graphs [4,5].Graph embedding techniques have been developed to analyze complex graphs of diverse origins.A graph embedding technique takes graphs as input and converts the graphs into a matrix of vectors (i.e., a lower-dimensional latent space), thus allowing researchers to better identify the interactions between their different elements.Although graph embedding techniques have been applied to various fields of study, e.g., to analyze relationships between client and providers in financial transactions [6], to recommend locations using recommender systems [7], or to detect malware [8]; they have not been routinely applied to biological systems and are not well-known to life-scientists.
This review discusses the suitability of graph embedding techniques for analyzing masss spectrometry-and sequencing-based biomedical data and explains the theoretical background to understand graph embedding.We classify graph embedding techniques from the perspective of biomedical data, considering the canonical classification, thereby subdividing graph embedding techniques into random walk-based, matrix factorizationbased, and deep learning-based algorithms.Additionally, we review articles that applied graph embedding for link prediction, node classification, and node clustering tasks on biomedical data and highlight novel biological insights obtained by graph embedding.In particular, we will focus on protein-protein and drug-protein interactions.Our review will help future readers to identify, which graph embedding models can be applied to solve a given task on biomedical datasets, which datasets can be used, and which metrics are available to evaluate the results.
The paper is structured as follows: section "Theory of graphs embeddings" contains the necessary definitions and summarizes the theoretical background of graph embedding.Then, section "Applications of graph embeddings in mass spectrometry-and sequencing-based biomedical data" describes the existing applications of graph embedding techniques on biomedical data.Finally, section "Conclusion" discusses conclusions and future applications.

Background techniques
To be able to understand graph embedding, we first must introduce the term word embedding, which transforms a group of words (i.e., text) into a matrix of vectors and is frequently used in natural language processing (NLP) [9].In more detail, word embedding technique results in the (n-dimensional) vector representation of a word (token) within a text [10].Since words often occur in the same semantic or syntactic context, a cosine similarity measure among the vectors in the matrix can be used to identify the relationship between words.Hence, the semantic and syntactic similarity between words can be mathematically identified [11,12].For example, word embedding is used when a word processing program suggests a phrase after the computer user types just a few words.Two different strategies were proposed for word embedding (i.e., architectures): Continuous bag-of-words (CBOW) [13] predicts a word w i in one particular position in the sentence based on the context of words surround- ing that position w i−2 , w i−1 , w i+1 , w i+2 , while continuous skip-gram model [13] predicts the context (surrounding words) with respect to a particular word in the sentence.The first formulation of skip-gram model defines the conditional likelihood P(w context | w center ) ≈ P(w o | w c ) utilizing the function softmax [13,14], where o is the index of the context word (output) in the dictionary, c is the index of the central word (input) in the dictionary, and W is the vocabulary.
Similarly, continuous bag-of-words defines conditional likelihood P(w c | w o 1 , . . ., w o 2m ) [13,14], where o 1 , . . ., o 2m are the indexes of the context words in the dictionary.Although the skip-gram architecture performs slightly worse on syntactic tasks than the CBOW model, it does much better on semantic tasks [13].Executing the definition (Equ. 1) has a very high computational cost [13,14].Therefore, [15] optimized the training process of the skip-gram model by adding the hierarchical softmax and negative sampling techniques.
Graph embedding is applied to dot (scatter) graphs.In analogy to word embedding, in graph embedding a point (i.e., node) in a graph is considered as a word, which is surrounded by other points (i.e., words).Furthermore, the graph contains information about the relationship between any two given points (words); this relationship is defined as an edge between two nodes.Hence, graph embedding can be used to create a matrix of vectors for all the nodes in a graph based on their edges by using the following (1) analogy [16][17][18][19]: given a sequence of words, S n 1 = (w 1 , w 2 , . . ., w n ) where w i ∈ W , it can be inferred P(w n | w 1 , w 2 , . . ., w n−1 ) ≈ P(v i | v 1 , v 2 , . . ., v i−1 ) and v i represents a node in a graph G.

Graph embedding
The following definitions are useful to better understand and develop graph embedding and its applications.
Definition 1 (Graph) In mathematics and computer science, a graph is a scatter plot with a defined data structure.Let G be a graph, defined as G = (V , E) where V = {v 1 , v 2 , . . ., v n } is a set of nodes (vertices), and E represents the connection (edge) between 2 nodes (v i , v j ) ∈ V [1,[20][21][22][23].Given a graph (Fig. 1a), this graph can be rep- resented by an adjacency matrix: is 1 when there is an edge from node v i to node v j , and is 0 when there is no edge (Fig. 1b).The adjacency list groups the neighboring nodes of each node v i (Fig. 1c), while the edge list consists of ordered pairs (v i , v j ) when there is an edge from node v i to node v j (Fig. 1d) [20,21,24,25].
Definition 2 (Homogeneous and heterogeneous graphs) In a homogeneous graph, all nodes and/or edges are of the same type.For example, in the friends' network, each node represents a person, and an edge represents friendship between two people.In contrast, in heterogeneous graphs, nodes and edges can be of different types.Heterogeneous graphs are exemplified by an education network, in which there may be nodes representing teachers and students, and it is possible to have the relationships (edges) between teachers (colleagues), between teachers and students, and between students (classmates) [1,20,21,24].By their nature, biochemical networks can be defined as homogeneous or heterogeneous graphs.For example, protein-protein interaction studies are represented   in homogeneous graphs [26][27][28][29], while miRNA-disease/gene interaction studies are represented by heterogeneous graphs [30][31][32].
Definition 3 (Directed and undirected graphs) In directed graphs (digraph), the list of nodes (i.e., vertices) that generates the graph is ordered, and each interaction (i.e., edge) has a direction.Traversal in this type of graph is done according to the direction of the interactions among nodes, while in undirected graphs traversal can be done in both directions [1,20,21,24].In metabolic pathways, both types of graphs are present.Metabolic pathways, in which each product (i.e., node) is solely dependent on its precursor (i.e., a previous node in the pathway), can be defined as directed.However, most metabolic pathways are represented as undirected graphs, since their chemical reactions are reversible and regulated by feedback loops, where downstream products influence the formation of their upstream precursors (e.g., in glycolysis) [33,34].
Definition 4 (First-order and second-order proximity) The first-order proximity measures the proximity between a pair of nodes v i , v j , and represents the weight w of the edge e ij ( w ≥ 0 ).If the edge does not have a weight, then the default value is 0.Then, first- order proximity is defined as the neighborhood of the node v i containing a set of adja- cent nodes The second-order proximity measures the num- ber of 2-hop paths between a pair of nodes v i , v j [2,24].
Definition 5 (Graph embedding) Given a graph as input G = (V , E) , graph embed- ding (see Fig. 2) is defined as a mapping function f : v i → Z i ∈ R d (latent space) with i ∈ {1, 2, . . ., n} where d ≪ |V| and Z i is a vector of dimension d known as an embedding [2,22,24].

Classification of graph embedding techniques
Most commonly, graph embedding techniques are classified as either matrix factorization-based, random walk based, or deep learning-based [1,2,[22][23][24]35].However, in the literature, an alternative classification has been introduced based on the point of view of the mathematical problems, which can be vector point-based, gaussian distribution-based, or based on dynamic graph embedding [1].Vector point-based approaches aim to project the nodes of a high-dimensional graph onto low-dimensional vectors within a vector space [1].Gaussian distribution-based methods allow the vector representation (embedding) of a node as potential functions of continuous densities in a

Embedding Algorithm
Fig. 2 Graph embedding scheme vector space.[1].Dynamic graph embedding is often the method of choice for practical applications, as many networks are dynamic and evolve, leading to the addition of removal of nodes or edges [1].
Alternatively, it was proposed that embedding techniques can be grouped from the perspective of biomedical networks, including biomedical relation data, biomedical knowledge graphs, biomedical ontology, or clinical data, in non-attributed network embedding and attributed network embedding [36].Below is the classification of nonattributed network embedding [1,2,22,24,35,36] and attributed network embedding [2,36].Table 1 shows the graph embedding models published by category.

Mathematical concepts behind graph embeddings
Shallow embeddings are the earliest graph embedding technique applied to life-science data based on homogenous networks (i.e., networks based on only one biological entity, such as proteins).Shallow graph embeddings are subdivided into random-walk and matrix-factorization algorithms.Examples of random-walk algorithms are (DeepWalk [16] and Node2vec [17]; while matrix-factorization examples are graph factorization [43] and GraphRep [44]. DeepWalk [16] was the first graph embedding technique used to represent the vertices (nodes) of a homogeneous graph in vectors [91].The process begins when the random walk algorithm generates a sequence of vertices.The model is then trained using the skip-graph algorithm [13].Finally, the result is the vector representation for each vertex, also called embedding.
Node2vec [17] is a generalization of DeepWalk [16].The authors added two parameters, p, and q, which drive the generation of paths (see Fig. 3) by using the idea of breadth-first traversal (BFS) and depth-first traversal (DFS).When q > 1 , the traversal approaches BFS, and the random walks lead to a micro-view of node
Netpro2vec [92]: In the techniques described above, nodes of a network were transformed into tokens.Instead, the main concept of Netpro2vec is to transform networks into documents.The process is carried out in 3 steps: 1) building the probability distributions representing each graph, 2) extracting tokens from probability distributions, and 3) building the graph embedding using token extraction.The graph is then represented as a word document (a set tokens), and the Doc2vec (document embedding) technique is applied to obtain the graph embedding [93].The proposal was compared with other techniques of whole-graph embeddings to solve classification tasks in gene networks.The results were evaluated based on accuracy, precision, recall, F-measure, and Matthews correlation coefficient (MCC) metrics.
Pathway2vec [33] incorporates multiple random walk-based techniques, Node-2vec [17], Metapath2vec, Metapath2vec++ [18], JUST [94], and RUST [33], to represent learning by automatically generating features of metabolic pathways.It consists of three layers that interact: compounds, enzymes, and pathways.This interaction between layers results in a heterogeneous network of multi-layer information, and each layer has associated nodes.The layered architecture captures meaningful relationships to learn a low-dimensional space based on neural embeddings of metabolic features.Finally, applying the skip-gram [13] model, the embeddings for each node are extracted.Pathway2vec was applied for node clustering, embedding visualization, and pathway prediction tasks.Evaluation of the results was performed using MetaCyc software and F1-micro metric.
Graph Factorization (GF) [43]: GF is a factorization technique based on partitioning a graph to minimize the number of neighboring vertices instead of edges between partitions.GF begins from the assumption that the information regarding the presence of an edge (i, j) with a weight Y ij can be captured by the inner product between vertices with attributes Z i , Z j .Finally, the value of the vector Z is determined by the following objective function: where is the regularization parameters, E is the list of edges.To validate their proposal, the authors applied GF on a graph of 200 million vertices and 10 billion edges.In order to evaluate convergence and execution time, they used 3 architectures: a single machine, (3) a synchronous parallel implementation and an asynchronous parallel implementation.
The results showed that asynchronous parallel implementation is very beneficial for scalability.
GraRep [44]: GrapRep is a model for learning node representation.This model captures the relational information of different k-steps with different values of k between vertices of the graph, directly manipulating different global transition matrices defined on the graph without slow and complex sampling processes.GraRep defines different loss functions and optimizes each model with matrix factorization techniques, constructing global representations of each node by combining the different model representations.Experiments were run to solve the node clustering and node classification tasks on linguistic networks and social networks, respectively.In both tasks, GraRep showed an empirical efficiency of the learned representations compared to the LINE and DeepWalk models.
While shallow-embedding algorithm applications focus on solving link prediction, node classification, and community detection tasks, more complex problems such as graph matching, subgraph matching, and calculating the maximum common subgraphs require more complex models.Graph-neural network (GNN) algorithms can address these problems by combinatorial optimization using graph theory.Furthermore, these problems are solved through representation learning (deep learning); for example, in [95] a GNN model is proposed that addresses the subgraph matching problem for molecular fingerprint detection.
Graph Convolutional Network (GCN): Kipf et al. [56] present GCN for semi-supervised learning that works directly on graphs.GCN is a variation of convolutional neural networks.It scales linearly with the number of edges and encodes the local structure of the graph and features of nodes.The task of node classification is approached on a graph with partially labeled nodes, using a neural network f(X, A) trained in a supervised environment with node feature matrix X and adjacency matrix A ∈ R N ×N .For this purpose, a multilayer GCN is considered with the following layer-wise propagation rule.
where Ã = A + I N is the adjacency matrix of undirected graph with added loops, I N is the identity matrix, Dii = j Ãij and W (l) is a layer-specific trainable weight matrix, σ (.) is an activation function, H (l) ∈ R N ×D is the matrix of activations in the l th layer H (0) = X .The experiments were run on 4 datasets (Citeseer, Cora, Pubmed, and NELL), and the results showed that CGN significantly outperforms DeepWalk.
In the case of attribute data-biomedical data based on heterogeneous networks (i.e., networks based on more than one biological entity, such as drug-protein target interactions)-the graph embedding algorithms must consider both the node distribution and the edge information of the graph.Embeddings are generated that encode the proximity between nodes based on their attributes and connectivity patterns.Graph embeddings algorithms for attribute data can be divided into semantic matching models (e.g., DDKG, DistMult, etc.), translational distance models (e.g., TransE, TransR), and metapath-based methods (e.g., Metapath2vec).
DDKG: Xiaori et al. [96] used an approach denominated "attention-based knowledge graph representation learning framework" or DDKG to simultaneously consider drug (4) attributes and triple facts in knowledge graphs (KG).A triple fact is the link between one entity (e.g., metabolite, protein, etc.), usually referred as subject or head, and another entity referred as object or tail.The relationship between this two entities is referred as relationship or label.Xiaori et al. 's work aimed to use all the information available in biomedical KGs and improve the results in the link prediction task in drug-drug interaction (DDI) networks.The proposal was developed in 4 steps: 1) Building the KG, 2) Generating the initial embeddings for each drug according to its KG, 3) Generating the global embeddings of the drugs considering the node-embeddings of their neighbors, 4) finally, DDKG determines the probability of interaction of drugs in pairs with their respective embeddings through a binary classification.The experiments were conducted on two biomedical KGs and compared with ten state-of-the-art models, including LINE and SDNE.Results obtained from DDKGs were evaluated by metrics of accuracy, sensitivity, specificity, AUC, and AUPR, demonstrating that DDKGs outperformed the stateof-the-art models.DistMult [67] considered learning entity and relationship representations in knowledge bases (KBs) using the neural-embedding approach.The learning process seeks to learn entity and relationship representations such that valid triple facts (i.e., known facts) receive high scores.The triple facts are denoted by ( e 1 , r, e 2 ), where e 1 is the subject, e 2 is the object, and r is the relationship between the two.The first layer of the model projects a pair of entities from the input into low-dimensional vectors, and the second layer combines these two vectors into a scalar to be compared by a scoring function.Entity representation learning can be defined as: where f can be a linear/nonlinear function, W is a parameter matrix, W can be initialized randomly/pre-trained, and X is a one-hot/n-hot vector representing the input entities e 1 and e 2 .DistMult was empirically evaluated for link prediction tasks on the Free- base dataset.The results showed that a bilinear model successfully captures the compositional semantics of the relationships.It is also reported that DistMult outperforms TransE with a top-10 accuracy of 73.2% versus 54.7%.
TransE: Antoine et al. [80] addressed the problem of embedding different class entities (e.g., metabolites, proteins, etc.) and relationships of multi-relational data in low-dimensional latent spaces.The primary condition is that all the different entities (e.g., protein, metabolite, gene, etc.) must be present in a directed graph.In this directed graph, a triple fact consists of one entity (designated head), which is related to another entity (designated tail) by an edge (designated label).TransE is an energy-based model that learns embeddings of low-dimensional entities.For TransE the relationships are represented as translations in latent space; if a strong relation (edge) exists among two nodes (i.e., head and tail), then the embedding of the tail entity must be similar to the embedding of the head entity plus some vector that satisfies the relationship.For its simplicity, TransE has a small number of parameters and is scalable.Experiments showed that TransE performs well and significantly outperforms the RESCAL method in the link prediction task on two large knowledge base, Firebase and Wordnet.
TransR [82]: In contrast to the TransE model, where entities and relations (edges) are embedded in the same latent space, in TransR it was proposed to build the embeddings (5) y e 1 = f (WX e 1 ), y e 2 = f (WX e 2 ) of the entities and the edges in separate latent spaces linked by specific relation matrices, yielding one entity space and multiple relation spaces.TransR was based on the idea that entities that have a relationship of the form (head, label, tail) are first projected from the entity space into the r-relation space as h r and t r with M r operation, and then h r + r ≈ t r .The relation-specific projection can make the head/tail entities that actu- ally hold a strong relation (edge) close to each other and also move away those that do not.In the experiments, Lin et al. [82] evaluated the model with three tasks: link prediction, triple classification, and relational fact extraction using the WordNet and Freebase datasets.The results showed that TransR obtains significant improvements compared to TransE.Additionally, they proposed CTransR, a combination of TransR and Clustering.
Metapath2vec [18]: Unlike DeepWalk [16] and Node2vec [17], Metapath-2vec, guides and generates paths using random walks through meta-path schemes.It captures the structural and semantic relationships between different types of nodes in heterogeneous networks.Formally, a meta-path is defined as a path P represented by, complex relationships between node types V 1 and V l .The skip-gram architecture is also used by Metapath2vec to determine embeddings.Dong et, al. [18] evaluated their proposal on heterogeneous graphs for solving multi-classification nodes, node clustering, and similarity search problems.The results were evaluated using the F1-score metric.
As an example, Su et al. [28] applied graph embedding to improve the identification of protein-protein interactions.To avoid the high computational cost of identifying the possible protein-protein interactions based on previous graph embedding techniques, the authors studied different approaches (algorithms) to accelerate graph embedding and improve its accuracy.The authors' contribution was 2-fold.Firstly, their approach denominated LPPI integrated protein attributes into the graph embedding task.This way, multi-view information was used, improving the accuracy of the

Link prediction
Precision@k and recall@k graph embedding process.Secondly, the graph was reconstructed using the Graph-Zoom algorithm to reduce the graph's size.Therefore, the authors could accelerate the efficiency of the embedding algorithms.Combining the above two aspects, the authors' algorithm, LPPI, saves execution time without losing accuracy (AUC 0.99996) in identifying protein-protein interactions in a large dataset.Despite representing a major advance in the use of graph embedding, Su et al. [28] only used a homogeneous dataset from protein data.However, biological information from systems biology studies is typically derived from multi-omics datasets and contain heterogeneous information (DNA, RNA, protein, and metabolite information).Furthermore, as the network of interactions is time or condition-sensitive, multilayer networks must be considered [4].
Gong et al. [105] proposed the use of a multilayer network embedding to handle data sets with multiple types of nodes and edges found in heterogeneous graphs.This approach becomes extremely useful for evaluating the performance of node embedding in link prediction, which tries to predict edges that most likely will appear in theoretical networks (not experimentally measured data); this is similar to the approach performed by bioinformatics in in-silico studies.As some tested datasets are very large and complex, it is hard to predict links on the whole node sets.Hence, Gong et al. [105] suggested first extracting a core set of nodes of each dataset and conducting link prediction in these core sets.Hence, many authors similar to Gong et al. are encouraging the use of more complex graph-embedding algorithms that are based on combinations of the above-mentioned ones.These combinations of graph-embedding algorithms are known as encores or graph neural networks.
For example, Ray et al. [106] used a combination of graph-embedding algorithms as proposed by Gong et al. to generate a graph embedding encore algorithm approach to identify potential drugs that could affect the protein-protein interaction (PPI) between the SARS-CoV-2 virus and its human target proteins.The SARS-CoV-2 viral protein and human interaction datasets (i.e., protein interaction graph) were based on the experimental data obtained by Gordon et al. [107] by means of affinity-purification mass spectrometry (AP-MS) screening and on the theoretical data by Dick et al. [108].
The graph embedding-based algorithm proposed by Ray et al. [106] to repurpose drugs against COVID-19 considered that the available data was heterogeneous.They suggested to combine three different data sets: (i) SARS-CoV-2-host protein interactions, (ii) human protein-protein interactions, and (iii) drug-human protein interactions to predict possible novel treatments to interfere with infection.As described by Gong et al. [105], these three datasets were very large and complex; hence, Ray et al. [106] had to reduce the dataset complexity by performing the data reduction step, i.e., a first graph embedding based on the Nod2vec algorithm to obtain the feature matrix ( X ).In the second step, the novel graph embedding algorithm denominated variational graph autoencoder (VGAE) was used for link prediction tasks.As input, VGAE receives the adjacency matrix ( A ) and the feature matrix ( X ) from the original graph ( X replaces the one-hot matrix that the VGAE model uses by default and also helps improve prediction precision).The encoder of VGAE converts the input data to lower-dimensional representation ( Z ) and the decoder takes Z to reconstruct the original input in ( Â ), where Â is similar to A , and in Â new connections between the different types of nodes can be discovered.
The results of Ray et al. [106] were compatible with those observed by other authors.For example, Ray et al. [106] identified the angiotensin-converting enzyme-2 (ACE-2) as a potential drug target against SARS-CoV-2 [109,110].Interestingly, the authors also found that drugs used to prevent Malaria and pneumocystis pneumonia (PCP) relapses, such as Primaquine, have therapeutic potential against SARS-CoV-2 based on the interaction of Primaquine with the TIM complex, consisting of TIMM29 and ALG11.
Similarly, Zitnik et al. [111] used a graph convolutional network, a combination of graph-embedding algorithms with a convolutional neural network that can work directly on graphs, to predict clinical side effects in patients taking multiple drugs simultaneously.
As in the case of Ray et al. [106], Zitnik et al. [111] combined multimodal graphs of protein-protein interactions, drug-protein target interactions, and known clinical drug side effects.Their new graph embedding algorithm, named Decagon, could accurately predict drug side effects in patients with complex diseases or co-existing conditions necessitating simultaneous medication for their treatment.
The use of shallow embeddings, such as (Nod2vec) is limited as shallow embeddings do not share information between the nodes and do not take advantage of the characteristics of the nodes in the coding process.To mitigate these limitations, graph neural networks (GNN) have more sophisticated encoders that take advantage of the structure, features, and attributes of graphs [112].
Su et al. [113] proposed constrained multi-view nonnegative matrix factorization (CMNMF), a model based on GNN, to determine the similarity between drugs and viruses within their space of characteristics (latent space).Therefore, CMNMF is oriented towards preserving drug and virus similarity information as much as possible.Then, they apply a graph convolutional network (GCN) with attention-based neighbor sampling to optimize the vectorial representation of drugs and viruses in virus-drug associations (VDA) networks, whereas VDA networks are considered heterogeneous graphs.The experiments were executed on three VDA datasets to identify possible drugs against SARS-CoV-2.The embedding algorithm from Su et al. outperformed other models and was evaluated with the accuracy, F1, AUC, and AUPR metrics.
Decagon, a DeepWalk neural graph embedding, outperformed baseline algorithms by up to 69% (accuracy).Specifically, Decagon could automatically predict side effects with a known strong molecular basis with high precision, but still performed well on predicting side effects with a non-molecular basis due to its effective sharing of model parameters across edge types.
Finally, Nelson et al. [114] mentioned the advantages of graph embedding techniques compared to other techniques that operate directly on biological/biomedical networks.One advantage is a more rapid analysis of the learnt embedding.Unlike the tasks mentioned in the other works (link prediction, node classification, and node clustering), Nelson et al. [114] demonstrated the usefulness of graph embeddings for more specific tasks in biology, such as protein network alignment, protein module detection, and protein function prediction.Taken together, these examples establish the high value of graph embedding techniques for the analysis of mass spectrometry-and sequencing-based-OMICs datasets.Several other applications have been published, which could not be discussed in greater detail, but have been showcased in Table 2 and classified for the use for (i) link prediction, (ii) node classification, and (iii) node clustering tasks.
Although Table 2 shows how graph embedding algorithms have become popular for representing biomedical data, several major limitations are apparent that limit the general applicability of graph embedding to life sciences: • Most graph embedding algorithms have been developed to accomplish a specific task on a specific dataset, with no standards or even flexibility for incorporating other datasets.For using the same graph embedding algorithm to solve a different task, the new data set must be rewritten, thus limiting the application for other researchers.• Shallow-embedding algorithm applications are limited in their applications, such as link prediction, node classification, and community detection tasks.More complex problems such as graph matching, subgraph matching, and calculating the maximum common subgraphs require more complex models requiring combinatorial optimization (graph theory).Furthermore, these problems are solved through representation learning (deep learning).However, most deep-learning graph embedding techniques are not deterministic because they use probabilities to perform their tasks, yielding similar, but not identical results for different runs.• Loss of structural information: graph embedding methods typically aim to preserve the proximity of nodes based on their graph structure.However, they may lose certain structural information during the embedding process.For instance, (i) higherorder relationships within the graph may not be accurately captured.Furthermore, (ii) graph embeddings may not effectively leverage node attributes or features.Node attributes (metadata) can provide valuable information in life sciences, such as measurement conditions.It may be computationally expensive to maintain graph embeddings for (iii) dynamic data sets where nodes and edges are frequently added, removed, or modified (due to experimental conditions).• Interpretability: The interpretability of graph embeddings can be more challenging compared to other clustering techniques, as it is often difficult to interpret the specific features or relationships each dimension captures.
Addressing these limitations is an active area of research, and researchers continue to develop new techniques and algorithms to enhance the performance and versatility of graph embedding methods to make them more applicable to life-science research questions.

Conclusion
As can be easily appreciated from the by far not exhaustive list of discussed algorithms for graph embedding in this review, there is currently not yet a gold standard for graph embedding for biological data emerging that can provide reliable data for biologists and serve as a reference point for future developments of in the field.So far, the presented applications for graph embedding on biological data have all been developed for the specific data sets at hand.All these studies have thus mainly remained theoretical, focusing on the development of computational techniques rather than taking the interpretation of the data to the identification of novel biology or drug developments.Yet, with the evergrowing datasets available to life science researchers, the community needs novel tools to understand better the underlying biological processes.Given their nature of reducing the dimensionality of complex data, graph embedding algorithms are an exciting and novel tool for extracting novel insight from large biological datasets (Table 2).We envision that graph embedding will become an essential tool aiding hypothesis generation leading to novel biological discoveries.Specifically, graph embedding techniques hold significant potential in various biological and biomedical research fields.In the context of the drug-disease association (DDA), disease-gene association (DGA), drug-target interaction (DTI), protein-protein interaction (PPI), and drug-drug interaction (DDI) (Table 2), graph embedding methods can provide valuable insights and aid in understanding complex relationships.By representing drugs, diseases, genes, targets, and proteins as nodes in a graph and capturing their interactions as edges, graph embedding algorithms can (i) infer novel insight into a biological system based on information about its elements (i.e., link prediction), (ii) classify the relevance of biological elements (e.g., proteins, metabolites, etc.) and their interactions within a system (i.e., node classification), and (iii) identify a phenotype or physiology of interest based on the networks formed by their elements (i.e., node clustering).
Furthermore, with the help of low-dimensional representations obtained using graphneural networks (GNN) algorithms, it is possible to encode the underlying relationships and functional associations to find similarities between individuals sharing the same condition (e.g., graph matching or subgraph matching).These low-dimensional embeddings can then be leveraged to gain an understanding of the underlying molecular events occurring within the biological system (i.e., molecular phenotype characterization).
Hence, the ability to integrate multiple data sources, such as genomic, transcriptomic, proteomic, metabolomic, and clinical data, further enhances the predictive power and potential impact of graph embedding techniques, mainly in the field of personalized medicine, paving the way for improved disease management, identifying potential therapeutic targets, elucidating underlying molecular mechanisms, and exploring drug synergy or adverse interactions.
In conclusion, this increased predictive power gained by using graph embedding techniques on biological data will allow life-science researchers to conduct more targeted experiments by extracting novel unseen links.Developing applications will require substantial further research on the bioinformatic side to identify the most promising approaches to be applied to specific types of datasets, as well as thorough experimental validation of the generated outputs.Despite posing a challenging problem to either field, the rapid rise of AI tools in our everyday life as a researcher will certainly fuel interest in incorporating novel AI-based analysis methods on high dimensional biological data.Therefore, we anticipate that graph embedding applications will soon be invaluable in the broader life science community.

Fig. 3
Fig. 3 BFS (red arrows) and DFS (blue arrows) traversals, from node A with a path length of 3

Table 2
Summary of graph embedding on biomedical dataAUPR area under precision-recall curve, ROC receiver operating characteristics, AUC area under the curve ROC, MCC Matthews correlation coefficient, DBI Davies-Bouldin index, NMI normalized mutual information, ARI adjusted rand index, MMR maximum matching ratio, MAP mean average precision