Path-level interpretation of Gaussian graphical models using the pair-path subscore

Background  Construction of networks from cross-sectional biological data is increasingly common. Many recent methods have been based on Gaussian graphical modeling, and prioritize estimation of conditional pairwise dependencies among nodes in the network. However, challenges remain on how specific paths through the resultant network contribute to overall ‘network-level’ correlations. For biological applications, understanding these relationships is particularly relevant for parsing structural information contained in complex subnetworks. Results We propose the pair-path subscore (PPS), a method for interpreting Gaussian graphical models at the level of individual network paths. The scoring is based on the relative importance of such paths in determining the Pearson correlation between their terminal nodes. PPS is validated using human metabolomics data from the Hyperglycemia and adverse pregnancy outcome (HAPO) study, with observations confirming well-documented biological relationships among the metabolites. We also highlight how the PPS can be used in an exploratory fashion to generate new biological hypotheses. Our method is implemented in the R package pps, available at https://github.com/nathan-gill/pps. Conclusions The PPS can be used to probe network structure on a finer scale by investigating which paths in a potentially intricate topology contribute most substantially to marginal behavior. Adding PPS to the network analysis toolkit may enable researchers to ask new questions about the relationships among nodes in network data. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04542-5.


Installing the package
The package can be installed and loaded using the following commands. library(devtools) devtools::install_github("nathan-gill/pps") library(pps)

Running the app
After loading the pps package, the app can be run by calling the function run.PPS.app(). The app will launch in a new window, and the homescreen is shown in Figure 1.

Load data
Click the "Browse" button to open your computer's file explorer. From here, navigate to your desired .csv or .xlsx file. This file should contain an n × p numeric data matrix that is ready for input into the graphical lasso algorithm. The file should have column headings corresponding to the names of the nodes -these will be used as labels in the results.

Select GGM and PPS parameters
Once a dataset is loaded, input fields for the model parameters will appear. The first is for the graphical lasso penalty parameter, i.e. λ in the precision matrix estimator estimator where S is the empirical covariance matrix of the data. For any chosen λ, the estimated GGM will appear onscreen, as shown in Figure 2. The column headings from the data will be used to label the nodes. Next, select the two terminal nodes to which you would like to apply PPS. The drop-down menus under "Node 1" and "Node 2" will contain the user-supplied node names. The selected nodes will appear in blue in the network ( Figure 2).
Finally, select the maximum path length (K in the paper) up to which to search for network paths between the chosen nodes.
Note: there is nothing special about the default values for these parameters, and they should not be viewed as suggestions. The max path length should not exceed 5 unless the network is small (smaller p) or sparse (larger λ), since computation time can become intensive rather quickly.

Check that the data file was read properly
If something unexpected has happened thus far, it is a good idea to check that the data file was read properly. The "View Data" (Fig 3) tab prints the n × p data matrix back to the user so that issues can be diagnosed. The column headings should be node names, and the entries should be individual observations of that node.

Run PPS
Once all the parameters have been set, pressing the "Get PPS" button will apply PPS to the graphical model currently appearing onscreen. The "Paths" tab contains a list all paths of length at most K connecting the two selected terminal nodes, along with the respective PPS values, in order of decreasing PPS (Fig 4).
The "Subnetwork" tab shows the union of the top 20 PPS paths between the selected terminal nodes ( Figure  5). The terminal nodes are again shown in blue, and the edges comprising the highest PPS path are bolded.

PPS Analysis Outside the App
We can perform the same analysis without the app in the R console. The main function is pps(P, i, j, K = 5, prec = TRUE, use.names = TRUE).
Here, P is a precision matrix or partial correlation matrix, i and j are the nodes (i.e. row numbers in P) to use as endpoints. If the matrix has named columns, these names can be used for i and j. K is the maximum path length, prec indicates whether P is a precision matrix or partial correlation matrix, and use.names indicates whether the results should be shown in terms of node numbers of node names. The output is a list with the following elements.
path A list of paths between nodes i and j in order of descending PPS.
pps A list of the PPS values for the paths in path.
gamma The individual contributions of each path in path on the correlation scale.
Note that this information can also be found by typing ?pps into the console.

Example
The analysis shown above in the app can be performed in the console using the following commands, if the data matrix is stored in a matrix called data. res <-glasso::glasso(data, lambda = 0.4) # Estimate precision matrix with graphical lasso prec <-res$wi # Extract the precision matrix pps_res <-pps(P = prec, i = "AC C10", j = "AC C14", K = 4, prec = TRUE, use.names = TRUE) # Run PPS for nodes of interest