### The original algorithm

We summarize the basic PageRank algorithm which was developed by Larry Page and Sergey Brin at Stanford University [2] and forms the basis of the successful search engine Google. Further details may also be found in [1, 22].

PageRank assigns a measure of relevance or importance to each web page, allowing Google to return high-quality pages in response to a user query. The algorithm is designed to be robust to methods of deception, where web page designers attempt to artificially boost the PageRank of their page by altering the local link structure. Robustness follows from the recursive nature of the algorithm, where a page is highly ranked if it is linked to by other highly ranked pages. A link from page *i* to page *j* is regarded as a "vote of confidence" for page *j* from page *i*. The algorithm views the web as a directed graph *G*(*V, E*), where the *N* nodes *V* are the web pages and the edges *E* represent the links between pages. This information can be stored in an adjacency matrix, *W* ∈ *R*
^{
N × N
}, where *w*
_{
ij
}= 1 if there is a link from page *i* to page *j* and *w*
_{
ij
}= 0 otherwise. We define deg_{
i
}:=
to be the *degree* (more precisely, the out-degree) of the *i*th page. Suppose we have assigned an initial ranking **r**
^{[0]} ∈
^{
N
}. The PageRank algorithm proceeds iteratively, updating the ranking for the *j*th page from
to
according to the formula

Here

denotes the ranking of page

*j* at the

*n*th iteration and

*d* ∈ (0,1) is a fixed parameter. The value

*d* = 0.85 appears to be used by Google [

1,

2]. We see from (1) that the rank of a page depends on the rank of all pages that link to it. Scaling by 1/deg

_{
i
}in the summation ensures that each page has equal influence in the voting procedure. Each page gets a rank of 1 -

*d* automatically and also gets

*d* times the votes given by other pages. Iterating to convergence in (1) is equivalent to solving for

**r** ∈

^{
N
}in the linear system

(*I* - *dW*
^{
T
}
*D*
^{-1})**r** = (1 - *d*)**e**, (2)

where *I* is the identity matrix, *W*
^{
T
}is the transpose of *W*, *D* = diag(deg_{
i
}) and **e** ∈
^{
N
}has all *e*
_{
i
}= 1. Applying PageRank is equivalent to applying the Jacobi iteration [23] to (2), and convergence to a unique solution **r** is guaranteed under the condition

*ρ*(*dW*
^{
T
}
*D*
^{-1}) < 1, (3)

where *ρ*(·) denotes the spectral radius. The convergence condition (3) holds for any 0 <*d* < 1.

### A random walk interpretation

The PageRanking process has an alternative interpretation in terms of a random walk [1, 2, 22]. Suppose that a random walker is currently at page *i*. On the next step the walker

**teleports:** with probability 1 - *d* moves to a new page, chosen uniformly over all web pages, or,

**surfs:** with probability *d* moves to a page that is linked to from page *i*; in this case each page *j* such that *w*
_{
ij
}= 1 is equally likely to be chosen as the destination.

The PageRank vector **r**, when normalised so that its components sum to one, corresponds to the invariant measure for this process. In other words, *r*
_{
j
}is the long-time proportion of visits made to page *j*. A further interpretation based on mean hitting times rather than invariant measures is given in [24]. The biological implication of the random walk interpretation is discussed in the description of the algorithm in the Results and Discussion section.

### The modified algorithm: GeneRank

The PageRank idea translates intuitively to the analogous situation of gene expression analysis. Instead of producing a ranked list of web pages, we produce a ranked list of genes. PageRank views hyperlinks as votes of confidence, so we similarly allow functional connections to boost rank. Just as PageRank counts votes from a highly-ranked page as more influential than votes from a lowly-ranked page, we will allow connections to genes with high differential expression to carry greater significance than connections to genes with low differential expression. Figure 1 gives a graphical view of the concept.

PageRank gives each web page a rank of (1 - *d*) "for free". We will adapt this to give each gene a rank of (1 - *d*)ex_{
i
}, where ex_{
i
}is the absolute value of the expression change for gene *i*. Letting
denote the ranking of gene *j* after the *n*th iteration, we take initial ranking **r**
^{[0]} = **ex**/||**ex**||_{1}, where ||·||1 denotes the vector 1-norm. Then we let

Here *W* ∈
^{
N × N
}is the connectivity matrix for the gene network, so *w*
_{
ij
}= *w*
_{
ji
}= 1 if genes *i* and *j* are connected and *w*
_{
ij
}= *w*
_{
ji
}= 0 otherwise.

We remark that this iteration may also be motivated from the viewpoint of *personalised PageRanking* [1, 2], where teleporting jumps in the random walk process are biased towards a user's preferred locations – here, we are biasing according to expression level.

The iteration (4) corresponds to Jacobi on the system

(*I* - *dW*
^{
T
}
*D*
^{-1})**r** = (1 - *d*)**ex**, (5)

and, because the iteration matrix has not been altered, the condition that convergence is guaranteed for all 0 <*d* < 1 continues to hold. Since *W* is symmetric as the network is undirected, we could replace *W*
^{
T
}by *W*. This is unlike the original algorithm, where a directed network is used.

In summary, the GeneRank algorithm is finding the customised ranking vector **r** defined by the linear system (5). A Matlab implementation of the algorithm is available in the additional file geneRank.m The random walk interpretation carries through to this more general setting. If the teleporting step is re-defined so that the destination gene is not chosen uniformly over the whole set, but rather is chosen with probability proportional to absolute expression level, then **r** in (5), suitably scaled, is the invariant measure. Overall, we have a true generalization of PageRank in the sense that (a) the algorithm has both "vote of confidence" and "random walk" interpretations and (b) for the case where all ex_{
i
}= 1 we recover the original PageRank algorithm.

It is trivial to check that with the choice *d* = 0 the system (5) has solution **r** = **ex**. In this case the genes are ranked purely on expression level. We will now study the other extreme, where *d* = 1, and show that this case may be regarded as ranking purely on connectivity.

For *d* = 1, the iteration (4) becomes

and the system (5) for the corresponding fixed point becomes

(*I* - *W*
^{
T
}
*D*
^{-1})**r** = **0**. (7)

First, we show that the sum of the rankings is preserved by the iteration. From (6),

Also, it is clear from (6) that the iteration preserves the nonnegativity of the initial ranks; that is,

≥ 0. Next, we note that

**deg**/||

**deg**||

_{1} is a fixed point of (6). To see this, put

**r**
^{[n-1]}=

**deg**/||

**deg**||

_{1} in the right-hand side of (6) to obtain

Now, we observe that *ρ*(*W*
^{
T
}
*D*
^{-1}) ≤ ||*W*
^{
T
}
*D*
^{-1}||_{1} = 1, and hence all eigenvalues of *W*
^{
T
}
*D*
^{-1} are less than or equal to 1 in modulus. Because *W* is symmetric, we have *W*
^{
T
}
*D*
^{-1}
**deg** = *W*
^{
T
}
**e** = **deg**, showing that there is at least one eigenvector, **deg**, corresponding to eigenvalue 1. Suppose now that λ = 1 is a simple eigenvalue of *W*
^{
T
}
*D*
^{-1} and that **r*** with ||**r***||_{1} = 1 is another solution of (7). Then

So **r*** - **deg**/||**deg**||_{1} is an eigenvector of *W*
^{
T
}
*D*
^{-1} corresponding to eigenvalue 1. It follows that **r*** - **deg**/||**deg**||_{1} must be a multiple of **deg** and hence **r*** = ± **deg**/||**deg**||_{1} . We may summarize our findings in the following result.

**Result** If the eigenvalue λ = 1 of *W*
^{
T
}
*D*
^{-1} is simple, then **r** = **deg**/||**deg**||_{1} is the unique solution of (7) that satisfies the required constraints ||**r**||_{1} = 1 and *r*
_{
i
}≥ 0.

Overall, we conclude that the extremal parameter values *d* = 0 and *d* = 1 represent ranking by pure expression level and pure degree, respectively, and hence by changing the value of *d* we may interpolate between these two extremes.