We propose five possible methods to predict drug side-effect profiles from the chemical structures.

### Ordinary canonical correlation analysis (OCCA)

Suppose that we have a set of *n* drugs with *p* substructure features and q side-effect features. Each drug is represented by a chemical substructure feature vector **x** = (*x*
_{1} , ⋯, *x*
_{
p
} )^{
T
} , and by a side-effect feature vector **y** = (*y*
_{1} , ⋯, *y*
_{
q
} )^{
T
} .

Consider two linear combinations for chemical substructures and side-effects as

*u*
_{
i
} =

α
^{
T
}
**x**
_{
i
} and

*v*
_{
i
} =

β
^{
T
}
**y**
_{
i
} (

*i* = 1, 2, ⋯,

*n*), where

α = (

*α*
_{1}, ⋯,

*α*
_{
p
})

^{
T
} and

β = (

*β*
_{1} , ⋯,

*β*
_{
q
} )

^{
T
} are weight vectors. The goal of ordinary CCA is to find weight vectors

α and

β which maximize the following canonical correlation coefficient:

Where
is assumed and *u* (resp. *v*) is called *canonical component* for **x** (resp. **y**) [31].

Let × denote the

*n* ×

*p* matrix defined as

*X* = [

**x**
_{1} , ⋯,

**x**
_{
n
}]

^{
T
} , and let Y denote the

*n* ×

*q* matrix defined as

*Y* = [

**y**
_{1} , ⋯,

**y**
_{
n
} ]

^{
T
} . The columns of X and Y are assumed to be centered and scaled. Then the maximization problem can be written as follows:

In other high-dimensional problems, it is known that good results can be obtained by treating the covariance matrix as a diagonal matrix [

32,

33], as suggested in [

34]. Therefore, we substitute identity matrices for

*X*
^{
T
}
*X* and

*Y*
^{
T
}
*Y* , and consider the following optimization problem:

**Sparse canonical correlation analysis (SCCA)**

In the OCCA, the weight vectors α and β are not unique if *p* or *q* exceeds *n*. In addition, it is difficult to interpret the results when there are many non-zero elements in the weight vectors α and β. In practical applications, especially when *p* and *q* are large, we want to find a linear combination of the weights for **x** and **y** that has large correlation, but that is also sparse for easier interpretation.

To impose the sparsity on

α and

β, we propose to consider the following optimization problem with some additional

*L*
_{1} penalty terms:

where || · ||_{1} is *L*
_{1} norm (the sum of all absolute values in the vector), *c*
_{1} and *c*
_{2} are parameters to control the sparsity and restricted to range 0 < *c*
_{1} ≤ 1 and 0 < *c*
_{2} ≤ 1. For simplicity, we use the same value for *c*
_{1} and *c*
_{2} in this study. The sparse version of CCA is referred to as sparse canonical correlation analysis (SCCA).

The optimization problem in SCCA can be regarded as the problem of penalized matrix decomposition of the matrix *Z* = *X*
^{
T
}
*Y*. Recently, a useful algorithm for solving the penalized matrix decomposition (PMD) problem has been proposed and applied to this kind of analysis [34].

The optimisation problem formulated in (4) can be used for finding one canonical component. To extract multiple canonical components, we use a deflation manipulation iteratively as follows:

where *Z*
^{(k) }is the input of step *k* (*Z*
^{(1)} = *X*
^{
T
}
*Y* ), *d*
_{
k
} is the highest singular value, and α
_{
k
} and β
_{
k
} are the weight vectors estimated in the *k*-th step (*k* = 1, 2, ⋯, *m*).

Finally, we obtain *m* pairs of weight vectors α
_{1}, ⋯, α
_{
m
} and β
_{1}, ⋯, β
_{
m
}. For easier interpretation, the sign of the weight vectors is adjusted such that the weight element with the highest absolute value is positive in each component. High scoring substructures and side-effects in the weight vectors are extracted as correlated sets.

If the extracted sets of chemical substructures and side-effects are biologically meaningful, potential side-effects for a new drug candidate molecule should be predicted by looking for the extracted chemical substructures in its chemical structure. Suppose that we are given the chemical structure profile **x** of a new drug candidate molecule, and we want to predict its potential side-effect profile **y** based on the extracted sets of chemical substructures and side-effects encoded in
and
.

The

**x** and

**y** are assumed to have their canonical components

**u** =

*A*
^{
T
}
**x** and

**v** =

*B*
^{
T
}
**y**, respectively, where

*A* = [

α
_{1}, ⋯,

α
_{
m
}],

*B* = [

β
_{1} , ⋯,

β
_{
m
}]. Since

**y** is unknown, we need to estimate

**y** such that

**v** is close to

**u** as much as possible. This estimation can be done by minimizing

, which leads to the following solution:

where *B*
^{-T
}is the peudo-inverse matrix of *B*
^{
T
} . Note that all data features are normalized in the CCA analysis, each element in the estimate is de-normalized with the standard deviation and the average calculated in the training set. If the *j*-th element in
has a high score, the new molecule **x** is predicted to have the *j*-th side-effect (*j* = 1, 2, ⋯, *q*).

We also consider another prediction score. Based on the weighted sum of canonical components, we propose the following prediction score for a given molecule

**x**:

where Λ is the diagonal matrix whose elements are canonical correlation coefficients. Note that *s*(**x**) is the *q*-dimensional vector whose *j*-th element represents a prediction score for the *j*-th side-effect. If the *j*-th element in *s*(**x**) has a high score, the new molecule **x** is predicted to have the *j*-th side-effect (*j* = 1, 2, ⋯, *q*).

In our experience, eq. (7) works similarly as or slightly better than eq. (6), so we use eq. (7) as the prediction score in the result section.