Model framework
For clarity, we first present our main methodology development for a univariate regression problem. We will extend it to multiple regression problems in “Extension to multiple regressions” section.
Consider the following regressiontype HT problem:
$$ {\begin{aligned} \mathbf{y} = \mathbf{1} \mu + \mathbf{x} \beta + {\boldsymbol{\epsilon}}, \\ \end{aligned}} $$
(1)
$$ {\begin{aligned} &\text{where} \quad \mu, \beta \in \mathbb{R}, \quad \mathbf{y},\mathbf{x}, {\boldsymbol{\epsilon}}, \mathbf{1}=(1, \cdots, 1)' \in \mathbb{R}^{n} \\ &\quad \text{and} \quad {\boldsymbol{\epsilon}} \sim \mathcal{N}(\mathbf{0}, \Sigma); \end{aligned}} $$
$$ {\begin{aligned} H_{0}: \beta=0 \quad \text{versus} \quad H_{1}: \beta \ne 0. \end{aligned}} $$
(2)
Here, y is the response variable, x is the covariate, and ε is the error term that follows an ndimensional multivariate normal distribution \(\mathcal {N}\) with mean zero and a general variancecovariance matrix Σ. By considering a random variable Y in the ndimensional space, the above problem can also be stated as
$$ \mathbf{Y} = \left(\begin{array}{c} Y_{1} \\ \vdots \\ Y_{n} \\ \end{array} \right), \qquad \mathbf{Y} \sim \left\{\begin{array}{ll} \mathcal{N}\left(\mathbf{1} \mu, \; \Sigma \right), & \text{under}~ H_{0}, \\ \mathcal{N}\left(\mathbf{1} \mu + \mathbf{x} \beta, \; \Sigma \right), & \text{under}~ H_{1}. \\ \end{array}\right. $$
(3)
In this model, μ is the intercept or grand mean that is a nuisance parameter, and β is the parameter of interest that quantifies the effect size. We express the variancecovariance matrix of ε in the form
$$ \text{cov}\left(\boldsymbol{\epsilon} \right) = \Sigma = \sigma^{2} \cdot \mathbf{S}, $$
(4)
where σ^{2} is a nonzero scalar that quantifies the magnitude of the covariance structure, and S is a symmetric, positivedefinite matrix that captures the shape of the covariance structure. Additional constraints are needed to determine σ^{2} and S; here, we choose a special form that can subsequently simplify our mathematical derivations. For any given Σ, define
$${\begin{aligned} \sigma^{2} := \left(\sum\limits_{i,j} \left(\Sigma^{1}\right)_{i,j}\right)^{1} \quad \text{and} \quad \mathbf{S} := \sigma^{2} \Sigma = \left(\sum\limits_{i,j} \left(\Sigma^{1}\right)_{i,j}\right) \Sigma. \end{aligned}} $$
From the above definition, we have the following nice property
$$ \sum\limits_{i,j} \left(\mathbf{S}^{1} \right)_{i,j}= \mathbf{1}' \, \mathbf{S}^{1} \, \mathbf{1} = 1. $$
(5)
Hereinafter, we refer to S the standardized structure matrix satisfying Eq. 5.
The proposed method
As a special case of Model (3), if S is proportional to I, the identity matrix, it is wellknown that regression ttest is a valid solution to this HT problem. If S≠I, e.g. the observed data are correlated and/or have heterogeneous variance structure, the assumptions of the standard ttest are violated. In this paper, we propose a linear transformation, namely \(\mathbf {P}\mathbf {B} : \mathbf {Y} \to \tilde {\mathbf {Y}}\), which transforms the original data to a new set of data that are independent and identically distributed. Furthermore, we prove that the transformed HT problem related to the new data is equivalent to the original problem, so that we can approach the original hypotheses using standard parametric (or later rankbased) tests with the new data.
To shed more lights on the proposed method, we first provide a graphical illustration in Fig. 1. The proposed procedure consists of three steps.

1
Estimate \(\hat \mu (\mathbf {Y})\) (i.e. the weighted mean of the original data), and subtract \(\hat \mu \) from all data. This process is an oblique (i.e. nonorthogonal) projection from \(\mathbb {R}^{n}\) to an (n−1)dimensional subspace of \(\mathbb {R}^{n}\). The intermediate data from this step is Y^{(1)} (i.e. the centered data). It’s clear that \(\mathbb {E} \mathbf {Y}^{(1)}\) is the origin of the reduced space if and only if H_{0} is true.

2
Use the eigendecomposition of the covariance matrix of Y^{(1)} to reshape its “elliptical” distribution to a “spherical” distribution. The intermediate data from this step is Y^{(2)}.

3
Use the QRdecomposition technique to find a unique rotation that transforms the original HT problem to an equivalent problem of testing for a constant deviation along the unit vector. The equivalent data generated from this step is \(\tilde {\mathbf {Y}}\), and the HT problem associated with \(\tilde {\mathbf {Y}}\) can be approached by existing parametric and rankbased methods.
In the proposed PBtransformation, Bmap performs both transformations in Step 1 and 2; Pmap from Step 3 is designed to improve the power of the proposed semiparametric test to be described in “A semiparametric generalization” section.
Centering data
Using weighted least squares, the mean estimation based on the original data is \(\hat {\mu } (\mathbf {Y})=\mathbf {1}' \mathbf {S}^{1} \mathbf {Y}\) (for details please see Additional file 1: Section S1.1). We subtract \(\hat {\mu }\) from all data points and define the centered data as
$$ \mathbf{Y}^{(1)} := \mathbf{Y}  \mathbf{1} \hat{\mu} = \left(\mathbf{I}  \mathbf{J}\mathbf{S}^{1}\right) \mathbf{Y}, $$
where J=1·1^{′} (i.e. a matrix of all 1’s). With some mathematical derivations (see Additional file 1: Section S1.1), we have
$$ \begin{aligned} \mathbb{E}\mathbf{Y}^{(1)} \,=\, \left\{\begin{array}{ll} \mathbf{0}, & \text{under}~ H_{0},\\ \left(\mathbf{I}  \mathbf{J}\mathbf{S}^{1}\right) \mathbf{x} \beta, & \text{under} ~H_{1}; \end{array}\right. \quad \text{cov}\left(\mathbf{Y}^{(1)}\right) \,=\, \sigma^{2} \left(\mathbf{S} \mathbf{J} \right). \end{aligned} $$
The Bmap
Now, we focus on S−J, which is the structure matrix of the centered data. Let TΛT^{′} denote the eigendecomposition of S−J. Since the data are centered, there are only n−1 nonzero eigenvalues. We express the decomposition as follows
$$ \mathbf{S} \mathbf{J} = \mathbf{T}_{n1} \Lambda_{n1} \mathbf{T}_{n1}', $$
(6)
where T_{n−1}∈M_{n×(n−1)} is a semiorthogonal matrix containing the first n−1 eigenvectors and Λ_{n−1}∈M_{(n−1)×(n−1)} is a diagonal matrix of nonzero eigenvalues. Based on Eq. 6, we define (see Additional file 1: Section S1.2)
$$\mathbf{B} := \Lambda_{n1}^{1/2} \mathbf{T}_{n1}' \mathbf{S}^{1} \in \mathrm{M}_{(n1) \times n}, $$
so that \(\mathbf {Y}^{(2)} := \mathbf {B}\mathbf {Y} \in \mathbb {R}^{n1}\) have the following mean and covariance
$$ {\begin{aligned} \mathbb{E}\mathbf{Y}^{(2)} = \left\{\begin{array}{ll} \mathbf{0}_{n1}, & \text{under}~ H_{0}, \\ \mathbf{B} \mathbf{x} \beta, & \text{under}~ H_{1}; \end{array}\right. \quad \text{cov}\left(\mathbf{Y}^{(2)}\right) = \sigma^{2} \mathbf{I}_{(n1)\times (n1)}. \end{aligned}} $$
(7)
We call the linear transformation represented by matrix B the “Bmap”. So far, we have centered the response variable, and standardized the general structure matrix S into the identity matrix I. However, the covariate and the alternative hypothesis in the original problem are also transformed by the Bmap. For normally distributed Y, the transformed HT problem in Eq. 7 is approachable by the regression ttest; however, there’s no appropriate rankbased counterpart. In order to conduct a rankbased test for Y with broader types of distribution, we propose the next transformation.
The Pmap
From Eq. 7, define the transformed covariate
$$ \mathbf{z} := \mathbf{B}\mathbf{x} \in \mathbb{R}^{n1}. $$
(8)
We aim to find an orthogonal transformation that aligns z to 1_{n−1} in the reduced space. We construct such a transformation through the QR decomposition of the following object
$$\mathbf{A} = \left(\mathbf{1}_{n1}  \mathbf{z}\right) = \mathbf{Q}\mathbf{R}, $$
where A∈M_{(n−1)×2} is a columnwise concatenation of vector z and the target vector 1_{n−1},Q∈M_{(n−1)×2} is a semiorthogonal matrix, and R∈M_{2×2} is an upper triangular matrix. We also define the following rotation matrix
$$ {\begin{aligned} \text{\texttt{Rot}} &:= \left(\begin{array}{cc} \xi & \sqrt{1 \xi^{2}} \\ \sqrt{1 \xi^{2}} & \xi \end{array}\right) \in \mathrm{M}_{2\times 2}, \quad \text{where} \\&\qquad \xi := \frac{\langle{\mathbf{z}}{\mathbf{1}_{n1}}\rangle}{\sqrt{n1}\cdot \\mathbf{z}\} \in \mathbb{R}. \end{aligned}} $$
Geometrically speaking, ξ= cosθ, where θ is the angle between z and 1_{n−1}.
With the above preparations, we have the following result.
Theorem 1
Matrix P:=I−QQ^{′}+Q Rot Q^{′}=I_{(n−1)×(n−1)}−Q(I_{2×2}−Rot)Q^{′} is the unique orthogonal transformation that satisfies the following properties:
$$\begin{array}{*{20}l} \mathbf{P} \mathbf{P}' &= \mathbf{P}' \mathbf{P} = \mathbf{I}_{(n1)\times (n1)}, \end{array} $$
(9)
$$\begin{array}{*{20}l} \mathbf{P} \mathbf{z} &= \zeta \cdot \mathbf{1}_{n1}, \qquad \zeta := \frac{\\mathbf{z}\}{\sqrt{n1}}, \end{array} $$
(10)
$$\begin{array}{*{20}l} \mathbf{P} \mathbf{u} &= \mathbf{u}, \qquad \forall \mathbf{u} \text{ s.t.} \langle{\mathbf{u}}{\mathbf{1}_{n1}\rangle} = \langle{\mathbf{u}}, {\mathbf{z}}\rangle = 0. \end{array} $$
(11)
Proof
See Additional file 1: Section 1.3. □
We call the linear transformation P defined by Theorem 1 the “Pmap”. Equation 9 ensures that this map is an orthogonal transformation. Equation 10 shows that the vector z is mapped to 1_{n−1} scaled by a factor ζ. Equation 11 is an invariant property in the linear subspace \(L_{\mathbf {z}}^{\perp }\), which is the orthogonal complement of the linear subspace spanned by 1_{n−1} and z, i.e. L_{z}=span(1_{n−1},z). This property defines a uniqueminimum map that only transforms the components of data in L_{z} and leaves the components in \(L_{\mathbf {z}}^{\perp }\) invariant. A similar idea of constructing rotation matrices has been used in [22].
With both B and P, we define the final transformed data as \( \tilde {\mathbf {Y}} := \mathbf {P} \mathbf {Y}^{(2)} = \mathbf {P} \mathbf {B} \mathbf {Y}\), which has the following joint distribution
$$ {}\tilde{\mathbf{Y}} \!\sim\! \mathcal{N}\left(\mathbf{P}\mathbf{B} \mathbf{x} \beta, \; \mathbf{P}\mathbf{B} (\sigma^{2} \mathbf{S}) \mathbf{B}'\mathbf{P}' \right) \,=\, \left\{\begin{array}{ll} \mathcal{N}\left(\mathbf{0},\; \sigma^{2} \mathbf{I}\right), & \!\!\text{under}~ H_{0}, \\ \mathcal{N}\left(\mathbf{1} \zeta \beta,\; \sigma^{2} \mathbf{I}\right), & \!\!\text{under}~ H_{1}. \\ \end{array}\right. $$
The normality assumption implies that each \(\tilde Y_{i}\) follows an i.i.d. normal distribution, for i=1,⋯,n−1. The location parameter of the common marginal distribution is to be tested with unknown σ^{2}. Therefore, we can approach this equivalent HT problem with the classical onesample ttest and Wilcoxon signed rank test (more in “A semiparametric generalization” section).
Correlation estimation for repeated measurements
If Σ is unknown, we can decompose Σ in the following way
$$ \Sigma = \mathbf{W}^{\frac{1}{2}} \, \text{\texttt{Cor}} \, \mathbf{W}^{\frac{1}{2}}, $$
(12)
where W is a diagonal weight matrix and Cor is the corresponding correlation matrix. By definition, the weights are inversely proportional to the variance of the observations. In many real world applications including RNAseq analysis, those weights can be assigned a priori based on the quality of samples; but the correlation matrix Cor needs to be estimated from the data. In this section, we provide a momentbased estimator of Cor for a class of correlation structure that is commonly used for repeated measurements. This estimator does not require computationally intensive iterative algorithms.
Let Y be a collection of repeated measures from L subjects such that the observations from different subjects are independent. With an appropriate data rearrangement, the correlation matrix of Y can be written as a blockdiagonal matrix
$$\text{cor}(\mathbf{Y}) = \left(\begin{array}{ccc} \text{\texttt{Cor}}_{1} & & \\ & \ddots & \\ & & \text{\texttt{Cor}}_{L} \end{array}\right). $$
We assume that the magnitude of correlation is the same across all blocks, and denote it by ρ. Each block can be expressed as \(\phantom {\dot {i}\!}\text {\texttt {Cor}}_{l}(\rho) = (1\rho) \mathbf {I}_{n_{l} \times n_{l}} + \rho \mathbf {J}_{n_{l} \times n_{l}}, \quad \text {for} \quad l = 1, \cdots, L,\) where n_{l} is the size of the lth block and \(n={\sum \nolimits }_{l=1}^{L} n_{l}\).
We estimate the correlation based on the weighted regression residuals \(\hat {\boldsymbol {\epsilon }}\) defined by Eq. (S3) in Additional file 1: Section S2.1. Define two forms of residual sum of squares
$$SS_{1}= \sum\limits_{l}\hat{\boldsymbol{\epsilon}}_{l}'\mathbf{I}\hat{\boldsymbol{\epsilon}}_{l} \quad \text{and} \quad SS_{2}= \sum\limits_{l}\hat{\boldsymbol{\epsilon}}_{l}'\mathbf{J}\hat{\boldsymbol{\epsilon}}_{l}, $$
where \(\hat {\boldsymbol {\epsilon }}_{l}\) is the corresponding weighted residuals for the lth block. With these notations, we have the following Proposition.
Proposition 1
Denote \(\Sigma _{\epsilon } = \text {cov}(\hat {\boldsymbol {\epsilon }})\)and assume that for some nonzero σ^{2},
$$\Sigma_{\epsilon} = \sigma^{2} \cdot \text{diag}(\text{\texttt{Cor}}_{1}(\rho), \cdots, \text{\texttt{Cor}}_{L}(\rho)). $$
An estimator of ρ based on the first moments of SS_{1} and SS_{2} is
$$ \hat{\rho}_{\text{moment}}^{2} = \frac{SS_{2}  SS_{1}}{ \frac{1}{n} \sum\nolimits_{l=1}^{L} \left(n_{l}(n_{l}1) \right) SS_{1}}. $$
Moreover, if \(\hat {\boldsymbol {\epsilon }} \sim \mathcal {N}(\mathbf {0}, \Sigma _{\epsilon })\) and n_{1}=⋯=n_{L}=n/L (i.e. balanced design), the above estimator coincides with the maximum likelihood estimator of ρ, which has the form
$$\hat \rho_{\text{MLE}} = \frac{SS_{2}  SS_{1}}{(n_{1}1)SS_{1}}. $$
Proof
See Additional file 1: Section S2.1. □
Standard correlation estimates are known to have downward bias [23], which can be corrected by the Olkin and Pratt’s method [24]. With this correction, our final correlation estimator is
$$ \hat{\rho} = \hat{\rho}_{\text{moment}} \left[ 1+ \frac{1\hat{\rho}_{\text{moment}}^{2}}{2(L3)} \right]. $$
(13)
Kenwardroger approximation to the degrees of freedom
The degree of freedom (DF) can have nontrivial impact on hypothesis testing when sample size is relatively small. Intuitively, a correlated observation carries “less information” than that of an independent observation. In such case, the effective DF is smaller than the apparent sample size. Simple examples include the twosample ttest and the paired ttest. Suppose there are n observations in each group, the former test has DF=2n−2 for i.i.d. observations, and the latter only has DF=n−1 because the observations are perfectly paired. These trivial examples indicate that we need to adjust the DF according to the correlation structure in our testing procedures.
We adopt the degrees of freedom approximation proposed by [25] (KR approximation henceforth) for the proposed tests. The KR approximation is a fast momentmatching method, which is efficiently implemented in R package pbkrtest[26]. In broad terms, we use the DF approximation as a tool to adjust the effective sample size when partially paired data are observed.
Alternative approach using mixedeffects model
As we mentioned in “Background” section, the HT problem stated in Model (3) for repeated measurements can also be approached by the linear mixedeffects regression (LMER) model. Suppose the ith observation is from the lth subject, we may fit the data with a random intercept model such that
$$Y_{i(l)} = \mu + x_{i} \beta + 1_{l} \gamma + \epsilon_{i}, $$
where 1_{l} is the indicator function of the lth subject, \(\gamma \sim N\left (0, \sigma ^{2}_{\gamma }\right)\), and \(\epsilon _{i} \stackrel {i.i.d.}{\sim } N\left (0, \sigma ^{2}_{\epsilon }\right)\). The correlation is modeled as
$$ \rho = \text{cor}\left({Y_{i(l)}}{Y_{i'(l)}}\right) = \frac{\sigma^{2}_{\gamma}}{\sigma^{2}_{\gamma} + \sigma^{2}_{\epsilon}}. $$
(14)
The LMER model is typically fitted by a likelihood approach based on the EM algorithm. Weights can be incorporated in the likelihood function. The lmer() function in R package lme4 [16] provides a reference implementation for fitting the LMER model. The algorithm is an iterative procedure until convergence. Due to relatively high computational cost, the mixedeffects model has limited application in highthroughput data.
The R package lmerTest [17] performs hypothesis tests for lmer() outputs. By default, it adjusts the DF using the Satterthwaite’s approximation [27], and can optionally use the KR approximation.
A semiparametric generalization
In the above sections, we develop the PBtransformed ttest using linear algebra techniques. These techniques can be applied to nonnormal distributions to transform their mean vectors and covariance matrices as well. With the following proposition, we may extend the proposed method to an appropriate semiparametric distribution family. By considering the uncorrelated observations with equal variance as a second order approximation of the data that we are approaching, we can apply a rankbased test on the transformed data to test the original hypotheses. We call this procedure the PBtransformed Wilcoxon test.
Proposition 2
Let \(\check {\mathbf {Y}} := \large \left \{ \check {Y}_{1}, \dots, \check {Y}_{n1}\large \right \}\) be a collection of i.i.d. random variables with a common symmetric density function g(y), g(−y)=g(y). Assume that \(\mathbb {E} \check {Y}_{1} = 0\), \(\text {var}(\check {Y}_{1}) = \sigma ^{2}\). Let Y^{∗} be a random number that is independent of \(\check {\mathbf {Y}}\) and has zero mean and variance σ^{2}. For every symmetric semidefinite \(\mathbf {S} \in \mathrm {M}_{n \times n}, \mathbf {x} \in \mathbb {R}^{n}\) and \(\mu, \beta \in \mathbb {R}\), there exists a linear transformation \(\mathbf {D}: \mathbb {R}^{n1} \to \mathbb {R}^{n}\) and constants u,v, such that
$$ \mathbf{Y} := \mathbf{D} \left(\check{\mathbf{Y}} + u \mathbf{1}_{n1}\right) + (Y^{*} + v) \mathbf{1}_{n} $$
(15)
is an ndimensional random vector with
$$ \mathbb{E} (\mathbf{Y}) = \mathbf{1} \mu + \mathbf{x} \beta \quad \text{and} \quad \text{cov}(\mathbf{Y}) = \sigma^{2} \mathbf{S}. $$
Furthermore, if we apply the PBtransformation to Y, the result is a sequence of (n−1) equal variance and uncorrelated random variables with zero mean if and only if β=0.
Proof
See Additional file 1: Section S1.4. □
The essence of this Proposition is that, starting with an i.i.d. sequence of random variables with a symmetric common p.d.f., we can use linear transformations to generate a family of distributions that is expressive enough to include a nonnormal distribution with an arbitrary covariance matrix and a mean vector specified by the effect to be tested. This distribution family is semiparametric because: a) the “shape” of the density function, g(y), has infinite degrees of freedom; b) the “transformation” (D, u, and v) has only finite parameters.
As mentioned before, applying both the B and Pmaps enables us to use the Wilcoxon signed rank test for the hypotheses with this semiparametric distribution family. This approach has better power than the test with only the Bmap as shown in “Simulations” section. Once the PBtransformed data are obtained, we calculate the Wilcoxon signed rank statistic and follow the testing approach in [21], which is to approximate the asymptotic distribution of the test statistic by a tdistribution with an adjusted DF. Note that Wilcoxon signed rank test is only valid when the underlying distribution is symmetric; therefore, the symmetry assumption in Proposition 2 is necessary. In summary, this PBtransformed Wilcoxon test provides an approximate test (up to the second order moment) for data that follow a flexible semiparametric distributional model.
Extension to multiple regressions
In this section, we present an extension of the proposed methods for the following multiple regression
$$ \begin{aligned} \mathbf{y} &= \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\epsilon}, \qquad \mathbf{y} \in \mathbb{R}^{n}, \quad \mathbf{X} \in \mathrm{M}_{n\times p},\\& \quad \boldsymbol{\beta} \in \mathbb{R}^{p}, \quad \boldsymbol{\epsilon} \in \mathbb{R}^{n}. \end{aligned} $$
(16)
Here the error term ε is assumed to have zero mean but does not need to have scalar covariance matrix. For example, ε can be the summation of random effects and measurement errors in a typical LMER model with a form specified in Eq. 4.
To test the significance of β_{k},k=1,…,p, we need to specify two regression models, the null and alternative models. Here the alternative model is just the full Model (16), and the null model is a regression model for which the covariate matrix is X_{−k}, which is constructed by removing the kth covariate (X_{k}) from X
$$ {\begin{aligned} \mathbf{y} \!&=\! \mathbf{X}_{k}\boldsymbol{\beta}_{k} + \boldsymbol{\epsilon},\quad\! \mathbf{X}_{k} \in \mathrm{M}_{n\times (p1)},\\& \quad\!\! \boldsymbol{\beta}_{k} \in \mathbb{R}^{p1}, \!\quad\! \text{span}\left(\mathbf{X}_{k} \right) \subsetneq \text{span}\left(\mathbf{X} \right). \end{aligned}} $$
(17)
Compared with the original univariate problem, we see that the nuisance covariates in the multiple regression case are X_{−k}β_{−k} instead of 1μ in Eq. 1. Consequently, we need to replace the centering step by regressing out the linear effects of X_{−k}
$$ \mathbf{E} := \mathbf{C} \mathbf{Y} \!:=\! \left(\mathbf{I}_{n\times n}  \mathbf{X}_{k} \left(\mathbf{X}_{k}' \mathbf{S}^{1} \mathbf{X}_{k} \right)^{1}\mathbf{X}_{k}' \mathbf{S}^{1}\right) \mathbf{Y}. $$
The new Btransformation is defined as the eigendecomposition of cov(E)=σ^{2}(S−X_{−k}X−k′). The Ptransformation is derived the same as before, but with the new B matrix.