Sparse regression models for unraveling group and individual associations in eQTL mapping
 Wei Cheng^{1}Email author,
 Yu Shi^{2},
 Xiang Zhang^{3} and
 Wei Wang^{4}
https://doi.org/10.1186/s1285901609869
© Cheng et al. 2016
Received: 20 August 2015
Accepted: 10 March 2016
Published: 22 March 2016
Abstract
Background
As a promising tool for dissecting the genetic basis of common diseases, expression quantitative trait loci (eQTL) study has attracted increasing research interest. Traditional eQTL methods focus on testing the associations between individual singlenucleotide polymorphisms (SNPs) and gene expression traits. A major drawback of this approach is that it cannot model the joint effect of a set of SNPs on a set of genes, which may correspond to biological pathways.
Results
To alleviate this limitation, in this paper, we propose geQTL, a sparse regression method that can detect both groupwise and individual associations between SNPs and expression traits. geQTL can also correct the effects of potential confounders. Our method employs computationally efficient technique, thus it is able to fulfill large scale studies. Moreover, our method can automatically infer the proper number of groupwise associations. We perform extensive experiments on both simulated datasets and yeast datasets to demonstrate the effectiveness and efficiency of the proposed method. The results show that geQTL can effectively detect both individual and groupwise signals and outperforms the stateofthearts by a large margin.
Conclusions
This paper well illustrates that decoupling individual and groupwise associations for association mapping is able to improve eQTL mapping accuracy, and inferring individual and groupwise associations.
Keywords
eQTL mapping Groupwise association Computation efficiencyBackground
Expression quantitative trait loci (eQTL) mapping aims at identifying single nucleotide polymorphisms (SNPs) that influence the expression level of genes. It has been widely applied to analyze the genetic basis of gene expression and molecular mechanisms underlying complex traits [1, 2]. In a typical eQTL study, the association between each expression trait and each SNP is assessed separately [3–5]. This approach does not consider the interactions among SNPs and among genes. However, multiple SNPs may interact with each other and jointly influence the phenotypes [6]. This assumption will inevitably miss complex cases where multiple genetic variants jointly affect the coexpressions of multiple genes. It has been observed in biological experiments that the joint effect of multiple SNPs to a phenotype may be nonadditive [6], and genes from the same biological pathway are usually coregulated [7] by the same genetic basis. The biological process contains both individual effects and joint effects between SNPs and genes [8]. A straightforward approach to detect associations between sets of SNPs and a gene expression level can be done using the standard gene set enrichment analysis [9]. Wu et al. [10] further proposed the variance component models for SNP set testing. Braun et al. employed aggregationbased approaches to cluster SNPs [11]. In [12], Listgarten et al. further considered the potential confounding factors.
In this paper, we propose a novel method, geQTL, to automatically detect individual and groupwise associations in eQTL studies. It uses a twolayer feature selection strategy and adopts efficient optimization techniques, which make it suitable for large scale studies. Moreover, geQTL can automatically infer the optimal number of groupwise associations. We perform extensive experiments on both simulated datasets and yeast datasets to demonstrate the effectiveness and efficiency of the proposed method.
Methods
Preliminaries
Notations
Symbols  Description 

K  Number of SNPs 
N  Number of genes 
H  Number of samples 
M  Number of groupwise associations 
x  Random variables of K SNPs 
z  Random variables of N genes 
y  Latent variables to model groupwise associaiton 
\(\mathbf {X}\in {\mathbb {R}}^{K \times H}\)  SNP matrix data 
\(\mathbf {Z} \in {\mathbb {R}}^{N \times H}\)  Gene expression matrix data 
\(\mathbf {A} \in \mathbb {R}^{M\times K}\)  Groupwise association coefficient matrix between x and y 
\(\mathbf {B} \in \mathbb {R}^{N\times M}\)  Groupwise association coefficient matrix between y and z 
\(\mathbf {C} \in \mathbb {R}^{N\times K}\)  Individual association coefficient matrix between x and y 
α,β,γ,ρ  Regularization parameters 
\(\mathbf {R} \in \mathbb {R}^{N\times K}\)  Indicator matrix showing which elements in C can be nonzero 
where z is a linear function of x with coefficient matrix W, μ is an N×1 translation factor vector. And ε is the additive noise of Gaussian distribution with zeromean and variance γ I, where γ is a scalar. That is, \(\mathbf {\epsilon } \sim \mathcal {N}(\mathbf {0},\gamma \mathbf {I})\).
In association studies, sparsity is a reasonable assumption because only a small fraction of genetic variants are expected to be associated with a set of gene expression traits. This can be modeled as a feature selection problem. For example, the standard Lasso [5] can be used in association mapping, which applies ℓ _{1} penalty on W for sparsity.
where ·_{ F } denotes the Frobenius norm, ·_{1} is the ℓ _{1}norm. η is the empirical parameter for the ℓ _{1} penalty. W is the parameter (also called weight) matrix parameterizing the space of linear functions mapping from X to Z.
where ·_{∗} is the nuclear norm [14]. ρ is the regularization parameter to control the rank of L. L is a lowrank matrix assuming that there are only a small number of hidden factors influencing the gene expression levels.
When we fix {W, we can optimize {L} by using singular value decomposition (SVD) according to the following lemma.
Lemma 1.
is given by \(\mathbf {\widehat {S}}=\mathbf {H}_{\lambda }(\mathbf {O})\), where \(\mathbf {H}_{\lambda }(\mathbf {O})=\mathbf {U}\mathbf {D}_{\lambda }\mathbf {V}^{\mathrm {T}}\) with D _{ λ }=diag[(d _{1}−λ)_{+},…,(d _{ r }−λ)_{+}], U D V ^{T} is the Singular Value Decomposition (SVD) of O, D=diag[d _{1},…,d _{ r }], and \((d_{i}\lambda)_{+}=\max ((d_{i}\lambda),0), (1\leq i \leq r)\).
Both Lasso and LORS do not consider the existence of groupwise associations. Below, we will introduce the proposed model to infer both groupwise and individual associations for eQTL mapping.
geQTL
\(\mathbf {A} \in \mathbb {R}^{M\times K}\) denotes the matrix of coefficients between x and y. \({\sigma _{1}^{2}}\mathbf {I}_{M}\) denotes the variances of the additive noise. I _{ M } is an identity matrix. Here we drop the intercept terms because the input data X and Z are normalized to zero mean and unit variance as preprocessing.
\(\mathbf {B} \in \mathbb {R}^{N\times M}\) denotes the matrix of coefficients between y and z, \(\mathbf {C} \in \mathbb {R}^{N\times K}\) denotes the matrix of coefficients between x and z to encode the individual associations.
where E is a Gaussian whitenoise term. In Eq. (9) and (10), A and B should be sparse since a single gene is often influenced by a small number of SNPs and vice versa [12].
Here, we choose different penalties for A,B,C because the sparsities of different matrices are typically of different scales.
Optimization
The optimization for L can be achieved by following a similar approach as in [14]. To optimize A,B,C, many tools can be used to optimize the ℓ _{1} penalized objective function, e.g., the OrthantWise Limitedmemory QuasiNewton (OWLQN) algorithm [16]. Due to space limitation, we omit the details. In the next, we devise optimization techniques that can dramatically improve the computational efficiency of geQTL.
Boosting the computational efficiency
Given a large number of SNPs and gene expression traits, scalability of the algorithm is a crucial issue. We propose two improved models, geQTL ^{+} and geQTLridge, which optimize the search for significant individual associations, which is the main computational bottleneck of the algorithm.
geQTL ^{+}
In a typical eQTL study, we usually have M≪m i n(K,N). Thus, the bottleneck of the algorithm is to optimize C. Our strategy is to confine the space of C. The intuition is that we only permit a small fraction of elements in C to be nonzero. It has been shown that if Z and X are standardized with zero mean and unit sum of squares, then r=a b s(Z X ^{T}) is equal to the geneSNP correlations (r _{ gs }=c o r(z _{ g },x _{ s })) [17]. Since for many test statistics, e.g., t, F, R ^{2}, and LR, for the simple linear regression problem can be expressed as functions of the sample correlation r _{ gs }, e.g., R ^{2}=r ^{2}, and \(t=\frac {r\sqrt {n2}}{1r^{2}}\), we can find a threshold according to the required pvalue, such that test statistics exceeding the threshold are significant at the required significance level. The test statistics for every geneSNP pair in r are compared with the threshold, and only those elements whose r are greater than the threshold are optimized. We denote \(\mathbf {R} \in \mathbb {R}^{N\times K}\) as the indicator matrix indicating which elements in C can be nonzero (i.e., r _{ gs }>t h r e s h o l d).
geQTLridge
Theorem 1.
The proof of the Theorem 1 is in the following section.
Proof of Theorem 1
Proof.
where D=Z−L−B A X, c _{ i }=(C)_{ i,:} and d _{ i }=(D)_{ i,:}.
is solved by \(\mathbf {C} = \left (\mathbf {c}_{1}^{\mathrm {T}}, \ldots, \mathbf {c}_{N}^{\mathrm {T}}\right)^{\mathrm {T}},\) which leads to the update formula given in Eq. (14).
Determining the number of hidden variables
In Eq. (12), we use B A+C to formulate the overall associations between SNPs and expression traits.

A has linearly independent rows. Since M≪K, this condition is equivalent to that A has full rank;

B has linearly independent columns. Since M≪N, this condition is equivalent to that B has full rank.
The last equality holds because both A and B have full rank.
Due to the existence of noise, we should allow small singular values to be considered as zero. Therefore, we can draw a plot with singular values of \((\mathbf {Z} \mathbf {L} \mathbf {C} \mathbf {X}) \mathbf {X}^{\mathrm {T}}\left (\mathbf {X} \mathbf {X}^{\mathrm {T}}\right)^{1}\) in descending order and set M to be k, if the first k singular values are large and significantly greater than the (k+1)th singular value.
Based on the discussion above, in order to find optimal M, we can first use Lasso to infer the initial value of C. Then, using Eq. 25, we can infer the optimal M at this stage. After that, we can optimize new C, and calculate new optimal M. We can repeat this procedure until M became stable or reach maximal number of iterations.
Results
In this section, we perform extensive experimental study using both simulated and real eQTL datasets to evaluate the performance of our methods. For comparison, we select several stateoftheart eQTL methods, including twograph guided multitask lasso (MTLasso2G) [13], FaSTLMM [12], SETeQTL [18], LORS [14], Matrix eQTL [17] and Lasso [5]. Note that we did not compare with our previous work, GDL, in [19] because it needs to incorporate many prior knowledge, that is not relevant to this work. For all the methods, the tuning parameters are learned using cross validation. The discussion of setting proper number of groupwise associations M is included in the Additional file 1. The shrinkage of the coefficients is also presented in the Additional file 1.
Simulated data
We use a similar setup for simulation study to that in [14]. First, 100 SNPs are randomly selected from the yeast eQTL dataset [20]. This gives birth to the matrix X. 100 gene expression profiles are generated by Z _{ j∗}=β _{ j∗} X+Ξ _{ j∗}+E _{ j∗} (1≤j≤N), where \(\mathbf {E}_{j*}\sim \mathcal {N}(0,\phi I)\) (ϕ=0.1) is used to simulate the Gaussian noise. To simulate the effects of confounding factors, we use Ξ _{ j∗}, drawn from \(\mathcal {N}(\mathbf {0},\tau \Lambda)\). In this paper, we set τ=0.1. Λ is given by F F ^{T}. Here, \(\mathbf {F}\in \mathbb {R}^{H\times J}\) and \(\mathbf {F}_{ij}\sim \mathcal {N}(0,1)\). J denotes hidden factor number. In this paper, we set J to 10.
In Fig. 2, we report the associations inferred by geQTL. Recall that groupwise associations can be inferred from matrix A and B, and individual associations can be inferred from matrix C. It is obvious that geQTL can detect both groupwise and individual signals.
It can be seen that geQTL and geQTL ^{+} outperform all alternative methods by a large margin since they considers both individual and groupwise associations. We also observe that geQTLridge is not as good as geQTL and geQTL ^{+}. This is because geQTLridge does not provide a sparse solution for individual associations. MTLasso2G is comparable to LORS. LORS can correct the effects of the confounders, however, it is not able to detect groupwise mappings. We also observe that by decoupling individual and groupwise associations, the proposed models (geQTL, geQTL ^{+}, and geQTLridge) are more robust to noise than other methods.
Yeast eQTL data
We also validated geQTL using the bench mark dataset–yeast (Saccharomyces cerevisiae) eQTL dataset. The dataset contains 112 yeast segregants generated from a cross of two inbred strains [20]. Originally, It contains 6229 gene epxressions and 2956 SNPs. SNPs with >10 % missing values in the remaining SNPs are imputed using the function fill.geno in R/qtl [21]. The neighboring SNPs with the same genotype profiles are combined, resulting in 1027 genotype profiles. Remove gene expression traits with missing values, we get 4474 expression profiles.
cis and trans analysis
We follow the standard cisenrichment analysis that is used in [22, 23] for evaluation. Moreover, we use the transenrichment with a similar strategy [24]. Genes regulated by transcription factors (obtained from http://www.yeastract.com/download.php) are treated as transacting signals.
Pairwise comparison of different models using cisenrichment and transenrichment
FaSTLMM  geQTLridge  SETeQTL  MTLasso2G  LORS  Matrix eQTL  Lasso  

cis  geQTL ^{+}  <0.0163  0.0124  <0.0001  <0.0001  <0.0001  <0.0001  <0.0001 
FaSTLMM    0.0247  <0.0001  <0.0001  <0.0001  <0.0001  <0.0001  
geQTLridge      <0.0001  <0.0001  <0.0001  <0.0001  <0.0001  
SETeQTL        0.0117  <0.0001  <0.0001  <0.0001  
MTLasso2G          <0.0001  <0.0001  <0.0001  
LORS            <0.0001  0.0052  
Matrix eQTL              0.0134  
MTLasso2G  FaSTLMM  LORS  SETeQTL  Matrix eQTL  Lasso  geQTLridge  
trans  geQTL ^{+}  0.0042  0.0040  0.0033  0.0029  0.0027  0.0022  0.0001 
MTLasso2G    0.0212  0.0134  0.0049  0.0042  0.0038  0.0005  
FaSTLMM      0.0233  0.0178  0.0125  0.0073  0.0006  
LORS        0.3110  0.1103  0.0151  0.0008  
SETeQTL          0.1223  0.0578  0.0016  
Matrix eQTL            0.0672  0.0021  
Lasso              0.0025 
Gene ontology enrichment analysis on detected groupwise associations for yeast
We further evaluate the quality of detected groups of genes by measuring the correlations between the detected groups of genes and the GO (Gene Ontology) categories [25]. Specifically, the GO enrichment test is calculated by DAVID [26]. In this paper, gene sets reported by our algorithm with calibrated pvalues less than 0.01 are considered as significantly enriched.
Reproducibility of eQTLs between studies

Reproducibility of detected SNPgene associations: Let L _{1} and L _{2} be the sets of SNPgene associations detected in the two yeast datasets, respectively. We can rank the associations according to the weights (or qvalues for FaSTLMM). Let \({L_{1}^{T}}\) and \({L_{2}^{T}}\) be the top T most significant associations from the two datasets. The reproducibility is defined as \(\frac {{L_{1}^{T}}\bigcap {L_{2}^{T}}}{T}\).

Reproducibility of trans regulatory hotspots: For each SNP, we count the number of associated genes from the detected SNPgene associations. We use this number as the regulatory degree of each SNP. For FaSTLMM, SNPGene pairs with a qvalue < 0.001 are considered significant associations. We also tried different cutoffs for FaSTLMM (from 0.01 to 0.001), the results are similar. SNPs with large regulatory degrees are often referred to as hotspots. We sort SNPs in descending order of their regulatory degrees. We denote the sorted SNPs lists as S _{1} and S _{2} for the two yeast datasets. Let \({S_{1}^{T}}\) and \({S_{2}^{T}}\) be the top T SNPs in the sorted SNP lists. The trans calling consistency of reported hotspots is denoted by \(\frac {{S_{1}^{T}}\bigcap {S_{2}^{T}}}{T}\).
Conclusions
In literature, much efforts have been done on eQTL mapping. Traditional eQTL mapping approaches can not detect the groupwise associations between sets of SNPs and sets of genes. To achieve that, we propose a fast approach, geQTL, to detect both individual and groupwise associations for eQTL mapping. geQTL can also correct the effects of potential confounders. We also introduce efficient algorithms to scale up the computation so that the algorithms are able to tackle large scale data sets. Additionally, the proposed model provides an effective strategy to automatically infer the proper number of groupwise associations. We perform extensive experiments on both simulated datasets and yeast datasets to demonstrate the effectiveness and efficiency of the proposed method. Inferring individual and groupwise associations also helps us better explain the genetic basis of gene expression. Due to scalability issue, our model simply assume random errors between different genes are independent and have the same variance. That is the reason why current model only identified a small number of groupwise associations. Our future work will further incorporate the relationships between genes by integrating gene coexpression network or proteinproteininteraction network.
Data availability
The software is publicly available at http://www.cs.unc.edu/~weicheng/bioinformatics_code.zip.
Declarations
Acknowledgments
This work is partially supported by NSF IIS1313606, NIH U01HG00848801, NIH R01GM11583301, NSF CAREER, NSF IIS1162374, and NSF IIS1218036.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Bochner BR. New technologies to assess genotype henotype relationships. Nat Rev Genet. 2003; 4:309–14.View ArticlePubMedGoogle Scholar
 Michaelson J, Loguercio S, Beyer A. Detection and interpretation of expression quantitative trait loci (eQTL). Methods. 2009; 48(3):265–76.View ArticlePubMedGoogle Scholar
 Cheung VG, Spielman RS, Ewens KG, Weber TM, Morley M, Burdick JT. Mapping determinants of human gene expression by regional and genomewide association. Nature. 2005; 437:1365–69. doi:10.1038/nature04244.View ArticlePubMedPubMed CentralGoogle Scholar
 Zhu J, Zhang B, Smith EN, Drees B, Brem RB, Kruglyak L, Bumgarner RE, Schadt EE. Integrating largescale functional genomic data to dissect the complexity of yeast regulatory networks. Nat Genet. 2008; 40(7):854–61. doi:10.1038/ng.167.View ArticlePubMedPubMed CentralGoogle Scholar
 Tibshirani R. Regression shrinkage and selection via the lasso. J Royal Statist Soc B. 1996; 58(1):267–88.Google Scholar
 Lander ES. Initial impact of the sequencing of the human genome. Nature. 2011; 470(7333):187–97.View ArticlePubMedGoogle Scholar
 Pujana MA, Han JDJ, Starita LM, Stevens KN, Muneesh Tewari ea. Network modeling links breast cancer susceptibility and centrosome dysfunction. Nat Genet. 2007:1338–1349.Google Scholar
 Musani SK, Shriner D, Liu N, Feng R, Coffey CS, Yi N, Tiwari HK, Allison DB. Detection of gene x gene interactions in genomewide association studies of human population data. Hum Hered. 2007:67–84.Google Scholar
 Holden M, Deng S, Wojnowski L, Kulle B. GSEASNP: applying gene set enrichment analysis to SNP data from genomewide association studies. Bioinformatics. 2008; 24(23):2784–785.View ArticlePubMedGoogle Scholar
 Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rarevariant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet. 2011; 89(1):82–93.View ArticlePubMedPubMed CentralGoogle Scholar
 Braun R, Buetow K. Pathways of distinction analysis: a new technique for multiSNP analysis of GWAS data. PLoS Genet. 2011; 7(6):1002101.View ArticleGoogle Scholar
 Listgarten J, Lippert C, Kang EY, Xiang J, Kadie CM, Heckerman D. A powerful and efficient set test for genetic markers that handles confounders. Bioinformatics. 2013; 29(12):1526–1533.View ArticlePubMedPubMed CentralGoogle Scholar
 Chen X, Shi X, Xu X, Wang Z, Mills R, Lee C, Xu J. A twograph guided multitask lasso approach for eQTL mapping In: Lawrence ND, Girolami MA, editors. Proceedings of the fifteenth international conference on artificial intelligence and statistics (AISTATS12); vol. 22: 2012. p. 208–17. http://jmlr.csail.mit.edu/proceedings/papers/v22/chen12b/chen12b.pdf.
 Yang C, Wang L, Zhang S, Zhao H. Accounting for nongenetic factors by lowrank representation and sparse regression for eQTL mapping. Bioinformatics. 2013; 29(8):1026–34. doi:10.1093/bioinformatics/btt075. Epub 2013 Feb 17.View ArticlePubMedPubMed CentralGoogle Scholar
 Mazumder R, Hastie T, Tibshirani R. Spectral regularization algorithms for learning large incomplete matrices. JMLR. 2010; 11:2287–322.PubMedPubMed CentralGoogle Scholar
 Andrew G, Gao J. Scalable training of L1regularized loglinear models. In: International conference on machine learning: 2007. http://research.microsoft.com/apps/pubs/default.aspx?id=78900.
 Shabalin AA. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012; 28(10):1353–1358.View ArticlePubMedPubMed CentralGoogle Scholar
 Cheng W, Zhang X, Wu Y, Yin X, Li J, Heckerman D, Wang W. Inferring novel associations between SNP sets and gene sets in eQTL study using sparse graphical model. In: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine. ACMBCB’12. New York, NY, USA: ACM: 2012. p. 466–73. http://doi.acm.org/10.1145/2382936.2382996.Google Scholar
 Cheng W, Zhang X, Guo Z, Shi Y, Wang W. Graphregularized dual lasso for robust eQTL mapping. Bioinformatics. 2014; 30(12):i139–i148. doi:10.1093/bioinformatics/btu293.View ArticlePubMedPubMed CentralGoogle Scholar
 Brem RB, Storey JD, Whittle J, Kruglyak L. Genetic interactions between polymorphisms that affect gene expression in yeast. Nature. 2005; 436:701–03. doi:10.1038/nature03865.View ArticlePubMedPubMed CentralGoogle Scholar
 Broman KW, Wu H, Sen S, Churchill GA. R/qtl: QTL mapping in experimental crosses. Bioinformatics. 2003; 19(7):889–90.View ArticlePubMedGoogle Scholar
 Listgarten J, Kadie C, Schadt EE, Heckerman D. Correction for hidden confounders in the genetic analysis of gene expression. Proc Natl Acad Sci U.S.A. 2010; 107(38):16465–16470.View ArticlePubMedPubMed CentralGoogle Scholar
 McClurg P, Janes J, Wu C, Delano DL, Walker JR, Batalov S, Takahashi JS, Shimomura K, Kohsaka A, Bass J, Wiltshire T, Su AI. Genomewide association analysis in diverse inbred mice: power and population structure. Genetics. 2007; 176(1):675–83.View ArticlePubMedPubMed CentralGoogle Scholar
 Yvert G, Brem RB, Whittle J, Akey JM, Foss E, Smith EN, Mackelprang R, Kruglyak L. Transacting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors. Nat Genet. 2003; 35(1):57–64.View ArticlePubMedGoogle Scholar
 The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nat Genet. 2000; 25(1):25–9.View ArticlePubMed CentralGoogle Scholar
 Huang d. a. W., Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009; 4(1):44–57.View ArticleGoogle Scholar
 Smith EN, Kruglyak L. Geneenvironment interaction in yeast gene expression. PLoS Biol. 2008; 6(4):e83. doi:10.1371/journal.pbio.0060083.View ArticlePubMedPubMed CentralGoogle Scholar
 Fusi N, Stegle O, Lawrence ND. Joint modelling of confounding factors and prominent genetic regulators provides increased accuracy in genetical genomics studies. PLoS Comput Biol. 2012; 8(1):1002330.View ArticleGoogle Scholar
 Joo JW, Sul JH, Han B, Ye C, Eskin E. Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies. Genome Biol. 2014; 15(4):61.View ArticleGoogle Scholar