Skip to main content
  • Methodology article
  • Open access
  • Published:

Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries



The joint analysis of several categorical variables is a common task in many areas of biology, and is becoming central to systems biology investigations whose goal is to identify potentially complex interaction among variables belonging to a network. Interactions of arbitrary complexity are traditionally modeled in statistics by log-linear models. It is challenging to extend these to the high dimensional and potentially sparse data arising in computational biology. An important example, which provides the motivation for this article, is the analysis of so-called full-length cDNA libraries of alternatively spliced genes, where we investigate relationships among the presence of various exons in transcript species.


We develop methods to perform model selection and parameter estimation in log-linear models for the analysis of sparse contingency tables, to study the interaction of two or more factors. Maximum Likelihood estimation of log-linear model coefficients might not be appropriate because of the presence of zeros in the table's cells, and new methods are required. We propose a computationally efficient ℓ1-penalization approach extending the Lasso algorithm to this context, and compare it to other procedures in a simulation study. We then illustrate these algorithms on contingency tables arising from full-length cDNA libraries.


We propose regularization methods that can be used successfully to detect complex interaction patterns among categorical variables in a broad range of biological problems involving categorical variables.


One of the most striking discoveries of the genomic era is the unexpectedly small number of genes in the human genome. This amount has decreased from more than 100000 [1] to an estimated number of roughly between 20000 and 25000 [2, 3], tens of thousands less than initially expected and essentially the same number as found in phenotypically much simpler organisms. A question of overriding biological significance is, how complex phenotypes of higher organisms arise from limited genomes. Part of the explanation may be that many genes undergo a process called alternative RNA splicing, which can generate many distinct proteins from a single gene.

RNA splicing is a post-transcriptional process that occurs prior to mRNA translation. After the gene has been transcribed into a pre-messenger RNA (pre-mRNA), it consists of intronic regions destined to be removed during pre-mRNA processing (RNA splicing), as well as exonic sequences that are retained within the mature mRNA. After transcription occurs the actual splicing process, where it is decided which exons are retained in the mature message and which are targets for removal. In general, exons and introns are retained and deleted in different combinations to create a diverse array of mRNAs from a common coding sequence. This process is known as alternative RNA splicing. Depending on the source, the percentage of alternatively spliced genes lies between 35% and 60% [410]. By screening many full-length cDNAs it is possible to record the complete cDNA from a mature RNA for the same gene again and again and a full-length cDNA library, also known as single-gene library (SGL), builds up. The library contains detailed information about how specific exon combinations go together. This information is directly related to the functional regions of the proteins as they are grouped in domains which in many cases correspond to a single exon which encodes these domains. For example a transcription factor consists of a DNA binding domain and a regulatory domain. Thus the alteration of the exon structure corresponds to an alteration in the function of this particular domain. The central premise is that a dependency in the domains points to a functional association. If domains interact functionally then their splicing should be co-regulated. And this co-regulation has direct biological significance because it shows us which variable components also interact in the expressed protein. Because the polypeptide is intricately folded and tightly packed, segments that are separated by dozens of introns in the primary transcript may encode domains that interact functionally within the protein. These domains need not be structural neighbors even in the folded protein, but may interact through electrical or van der Waals forces, effects of global conformational changes, or even associations with other proteins. Because of these intricacies, there are no inherent distance restrictions, or limits on the number of interacting sites, and separate domains may combine their functional effect in unpredictable ways.

Due to the large number of potential combinations in highly alternatively spliced genes, any library will only comprise a small portion of the total theoretically possible inventory of combinations. Statistically, this leads to sparse contingency tables in which dimensions represent exons and cells represent variants. The investigation of interactions among categorical variables where not all possible combinations are observed, means addressing a model selection problem that is challenging both inferentially and computationally.

As far as alternative splicing is concerned, there is an important reason to determine this interaction structure: searching for intrapeptide interactions in functional assays is a very difficult, open-ended problem, where statistical analysis of the splicing interaction structure in the transcriptome can simplify this task enormously by identifying the sets of interacting domains. And as more investigators become interested in this type of information, and large-scale single-gene libraries become available, there is a strong need for reliable statistical methods for analyzing the resulting datasets.

We develop different statistical methods to analyse sparse contingency tables in order to determine the underlying interaction pattern and we use graphical models to visualize these patterns. The methods are compared in a simulation study and illustrated on full-length cDNA libraries.



General introduction to contingency tables and Log-linear Models

In this section we provide general definitions and notations.

Assume we have q categorical random variables or factors, C = {C1,..., C q }, where each C j can take on a finite number g j of possible values, called levels. The vector (c1,..., c q ) represents a particular combination of levels of the joint random variable C = {C1,..., C q }. The total cardinality of C is m = j = 1 q g j MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyBa0Maeyypa0ZaaebmaeaacqWGNbWzdaWgaaWcbaGaemOAaOgabeaaaeaacqWGQbGAcqGH9aqpcqaIXaqmaeaacqWGXbqCa0Gaey4dIunaaaa@37C0@ , which corresponds to the m different combinations of levels (m = 2qwhen all C j are dichotomous, as in our splicing example).

We simplify the notation by mapping each configuration of C to a unique natural number i {1,..., m} with a (bijective) function f:

f: (c1,..., c q ) ↔ i {1,..., m},

so we may write c i = (c1,..., c q ). For n observations of C, the corresponding q-way contingency table has m cells, each listing the frequency of a particular configuration c i :

n c 1 , ... , c q = n i , i = 1 m n i = n . MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqbaeqabeGaaaqaaiabd6gaUnaaBaaaleaacqWGJbWydaWgaaadbaGaeGymaedabeaaliabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiabdogaJnaaBaaameaacqWGXbqCaeqaaaWcbeaakiabg2da9iabd6gaUnaaBaaaleaacqWGPbqAaeqaaOGaeiilaWcabaWaaabCaeaacqWGUbGBdaWgaaWcbaGaemyAaKgabeaakiabg2da9iabd6gaUbWcbaGaemyAaKMaeyypa0JaeGymaedabaGaemyBa0ganiabggHiLdGccqGGUaGlaaaaaa@49BF@

A general introduction to contingency tables can be found in [11].

If the observations are independent, with p i the probability of sampling configuration c i , the distribution of the cell counts (n1,..., n q )tis multinomial with probability p = (p1,..., p q ).

In the splicing example, we may consider the C j as dichotomous random variables representing q sites of alternative splicing, each with two levels, denoted by c j {1, -1}, corresponding to the presence or absence of exon j in a transcript. The contingency table therefore has m = 2qcells, with each cell represented by the q-dimensional binary vector c i = (c1,..., c q ). A log-linear model for the cell probabilities can be written the following way:

log p i = β + l { 1 , ... , q } β l c l + j , k j < k { 1 , ... , q } β j k c j c k + + β 12... q c 1 c 2 c q . MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGagiiBaWMaei4Ba8Maei4zaCMaemiCaa3aaSbaaSqaaiabdMgaPbqabaGccqGH9aqpiiGacqWFYoGydaWgaaWcbaGaeyybIymabeaakiabgUcaRmaaqafabaGae8NSdi2aaSbaaSqaaiabdYgaSbqabaGccqWGJbWydaWgaaWcbaGaemiBaWgabeaaaeaacqWGSbaBcqGHiiIZcqGG7bWEcqaIXaqmcqGGSaalcqGGUaGlcqGGUaGlcqGGUaGlcqGGSaalcqWGXbqCcqGG9bqFaeqaniabggHiLdGccqGHRaWkdaaeqbqaaiab=j7aInaaBaaaleaacqWGQbGAcqWGRbWAaeqaaOGaem4yam2aaSbaaSqaaiabdQgaQbqabaGccqWGJbWydaWgaaWcbaGaem4AaSgabeaakiabgUcaRiablAciljabgUcaRiab=j7aInaaBaaaleaacqaIXaqmcqaIYaGmcqGGUaGlcqGGUaGlcqGGUaGlcqWGXbqCaeqaaOGaem4yam2aaSbaaSqaaiabigdaXaqabaGccqWGJbWydaWgaaWcbaGaeGOmaidabeaakiabl+UimjabdogaJnaaBaaaleaacqWGXbqCaeqaaaqaauaabeqaceaaaeaacqWGQbGAcqGGSaalcqWGRbWAaeaacqWGQbGAcqGH8aapcqWGRbWAcqGHiiIZcqGG7bWEcqaIXaqmcqGGSaalcqGGUaGlcqGGUaGlcqGGUaGlcqGGSaalcqWGXbqCcqGG9bqFaaaabeqdcqGHris5aOGaeiOla4caaa@82B6@

A general log-linear model represents p as:

log (p) = X β,

where β is a vector of unknown coefficients and X a suitable design matrix as indicated below. Let's assume that the cell probabilities are expressed in the following way:

log p c 1 , ... , c q = δ + δ c 1 C 1 + + δ c q C q + δ c 1 , c 2 C 1 , C 2 + + δ c 1 , ... , c q C 1 , ... , C q , MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGagiiBaWMaei4Ba8Maei4zaCMaemiCaa3aaSbaaSqaaiabdogaJnaaBaaameaacqaIXaqmaeqaaSGaeiilaWIaeiOla4IaeiOla4IaeiOla4IaeiilaWIaem4yam2aaSbaaWqaaiabdghaXbqabaaaleqaaOGaeyypa0dcciGae8hTdq2aaSbaaSqaaiabgwGigdqabaGccqGHRaWkcqWF0oazdaqhaaWcbaGaem4yam2aaSbaaWqaaiabigdaXaqabaaaleaacqWGdbWqdaWgaaadbaGaeGymaedabeaaaaGccqGHRaWkcqWIMaYscqGHRaWkcqWF0oazdaqhaaWcbaGaem4yam2aaSbaaWqaaiabdghaXbqabaaaleaacqWGdbWqdaWgaaadbaGaemyCaehabeaaaaGccqGHRaWkcqWF0oazdaqhaaWcbaGaem4yam2aaSbaaWqaaiabigdaXaqabaWccqGGSaalcqWGJbWydaWgaaadbaGaeGOmaidabeaaaSqaaiabdoeadnaaBaaameaacqaIXaqmaeqaaSGaeiilaWIaem4qam0aaSbaaWqaaiabikdaYaqabaaaaOGaey4kaSIaeSOjGSKaey4kaSIae8hTdq2aa0baaSqaaiabdogaJnaaBaaameaacqaIXaqmaeqaaSGaeiilaWIaeiOla4IaeiOla4IaeiOla4IaeiilaWIaem4yam2aaSbaaWqaaiabdghaXbqabaaaleaacqWGdbWqdaWgaaadbaGaeGymaedabeaaliabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiabdoeadnaaBaaameaacqWGXbqCaeqaaaaakiabcYcaSaaa@7850@

where δ is the global mean, δ c 1 C 1 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaacciGae8hTdq2aa0baaSqaaiabdogaJnaaBaaameaacqaIXaqmaeqaaaWcbaGaem4qam0aaSbaaWqaaiabigdaXaqabaaaaaaa@324F@ is the main effect of the first variable and only depends on the distribution of C1. Similarly δ c 1 , c 2 C 1 , C 2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaacciGae8hTdq2aa0baaSqaaiabdogaJnaaBaaameaacqaIXaqmaeqaaSGaeiilaWIaem4yam2aaSbaaWqaaiabikdaYaqabaaaleaacqWGdbWqdaWgaaadbaGaeGymaedabeaaliabcYcaSiabdoeadnaaBaaameaacqaIYaGmaeqaaaaaaaa@38C1@ is the first order interaction between the first two variables and its value only depends on the joint distribution of these two variables.

We now look for a suitable parametrization X ˜ C i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGafmiwaGLbaGaadaahaaWcbeqaaiabdoeadnaaBaaameaacqWGPbqAaeqaaaaaaaa@2FDF@ of the vector spaces spanned by the main effects δ C i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaacciGae8hTdq2aaWbaaSqabeaacqWGdbWqdaWgaaadbaGaemyAaKgabeaaaaaaaa@3043@ , a parametrization X ˜ C i , C j MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGafmiwaGLbaGaadaahaaWcbeqaaiabdoeadnaaBaaameaacqWGPbqAaeqaaSGaeiilaWIaem4qam0aaSbaaWqaaiabdQgaQbqabaaaaaaa@3363@ for the vector spaces spanned by the first order interactions δ C i , C j MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaacciGae8hTdq2aaWbaaSqabeaacqWGdbWqdaWgaaadbaGaemyAaKgabeaaliabcYcaSiabdoeadnaaBaaameaacqWGQbGAaeqaaaaaaaa@33C7@ and so on. To ensure identifiability, we impose constraints on these matrices and denote the resulting matrices by X C i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemiwaG1aaWbaaSqabeaacqWGdbWqdaWgaaadbaGaemyAaKgabeaaaaaaaa@2FD0@ , X C i , C j MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemiwaG1aaWbaaSqabeaacqWGdbWqdaWgaaadbaGaemyAaKgabeaaliabcYcaSiabdoeadnaaBaaameaacqWGQbGAaeqaaaaaaaa@3354@ and so on. The design matrix X finally consists of these submatrices. The constitution of the design matrix X for factors with two levels can directly be derived from (1). The derivation of the design matrix X from (3) in the case of more than two levels per factor is basically an analysis of variance (ANOVA) parametrization with poly-contrasts. Details can be found in Additional file 1 Section 1.

Sometimes we may assume a smaller model without some of the interaction terms. It is of the form as in (2) with some columns removed from the design matrix X. We denote matrices of the form X C j 1 , ... , C j k MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemiwaG1aaWbaaSqabeaacqWGdbWqdaWgaaadbaGaemOAaO2aaSbaaeaacqaIXaqmaeqaaaqabaWccqGGSaalcqGGUaGlcqGGUaGlcqGGUaGlcqGGSaalcqWGdbWqdaWgaaadbaGaemOAaO2aaSbaaeaacqWGRbWAaeqaaaqabaaaaaaa@3973@ by X a , with a = { C j 1 , ... , C j k } C MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyyaeMaeyypa0Jaei4EaSNaem4qam0aaSbaaSqaaiabdQgaQnaaBaaameaacqaIXaqmaeqaaaWcbeaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiabdoeadnaaBaaaleaacqWGQbGAdaWgaaadbaGaem4AaSgabeaaaSqabaGccqGG9bqFcqGHgksZcqWGdbWqaaa@40A3@ . The corresponding subvector of β is denoted by β a .

Graphical Models

A powerful way for visualizing conditional dependencies among variables is given by a graph. A graph G = ( V , ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae8NbXFKaeyypa0JaeiikaGIae8xfXBLaeiilaWIae8hmHuKaeiykaKcaaa@3DE7@ consists of a finite set V MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae8xfXBfaaa@3771@ of vertices and a finite set MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae8hmHueaaa@36AB@ of edges between these vertices. In our context, the vertices correspond to the different discrete random variables. We form the so-called Conditional Independence Graph by connecting all pairs of vertices that appear in the same generator, that is the maximal terms a C which are present in the model. To translate a vector β into a graphical model we look for β a ≠ 0 with β b = 0 a b (where b is a strict super-set of a and |a| > 1) and we draw edges between all vertices corresponding to a. From this graph we can directly read off all marginal and conditional independences by the global Markov property for undirected graphs which states: if two sets of variables a and b are separated by a third set of variables c then a and b are conditionally independent given c (a b|c), where for three subsets a, b and c of V MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae8xfXBfaaa@3771@ , we say c separates a and b if all paths from a to b intersect c. For details, see [12].

Model selection – Non-Hierarchical versus hierarchical models

In the following subsections we introduce different model selection strategies for log-linear models. We first develop an ℓ1-regularization model selection approach, which is then expanded to the new so-called level-ℓ1-regularization approach. In addition, different Bayesian model selection strategies, which we use for comparisons, are explained in Additional file 1 Section 2. Hierarchical models are a subclass of models such that if an interaction term β a is zero, then all higher order interaction terms β b for b a are also zero. If we consider the example above with 2 levels, this means for example that if the first order interaction coefficient β ij = 0 then all higher order interaction coefficients including i and j are also zero, i.e. β ijk = 0, k. While it is possible that the true underlying interaction model may not be hierarchical from a biological standpoint, a difficulty in the use of non-hierarchical models arises from the fact that they are not invariant under reparametrization. We have chosen the design matrix X with some constraints to ensure identifiability, and we used a specific, namely an orthonormal basis. In terms of ANOVA, this choice is equivalent to choosing a poly-contrast. We could have imposed different constraints or have chosen a different basis, and this would have resulted in a different design matrix X or in terms of ANOVA, a different choice of contrast. Suppose we have found an interaction vector β for one parametrization of the log-linear model and that this vector corresponds to a non-hierarchical model, meaning there is at least one lower order interaction term β a equal to zero, while β b ≠ 0 for at least one b a. If we reparametrize the model, using a different design matrix, the coefficient for the model term a may no longer be zero. On the other hand, by reparametrizing a hierarchical model, all zero terms remain zero after reparametrization. Therefore, hierarchicity is preserved after reparametrization while non-hierarchicity depends on the parametrization. This is a distinct advantage of working within the hierarchical class. In a hierarchical model, all zero coefficients can directly be interpreted in terms of conditional independence, while this is not true for non-hierarchical models.

1-Regularized model selection

The Lasso, originally proposed by [13] for linear regression, performs regularized parameter estimation and variable selection at the same time. The Lasso estimate is defined as follows:

β _ λ = arg min β [ i ( Y X β ) i 2 + λ j | β j | ] , MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaecaaeaaiiWacqWFYoGyaiaawkWaamaaCaaaleqabaacciGae43UdWgaaOGaeyypa0JagiyyaeMaeiOCaiNaei4zaC2aaCbeaeaacyGGTbqBcqGGPbqAcqGGUbGBaSqaaiab=j7aIbqabaGcdaWadaqaamaaqafabaGaeiikaGccbeGae0xwaKLaeyOeI0Iae0hwaGLae8NSdiMaeiykaKYaa0baaSqaaiabdMgaPbqaaiabikdaYaaakiabgUcaRiab+T7aSbWcbaGaemyAaKgabeqdcqGHris5aOWaaabuaeaadaabdaqaaiab+j7aInaaBaaaleaacqWGQbGAaeqaaaGccaGLhWUaayjcSdaaleaacqWGQbGAaeqaniabggHiLdaakiaawUfacaGLDbaacqGGSaalaaa@5787@

where Y = (Y1,..., Y n ) is the response vector. This can also be viewed as a penalized Maximum Likelihood estimator, as i ( Y X β ) i 2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaabeaeaacqGGOaakieqacqWFzbqwcqGHsislcqWFybawiiWacqGFYoGycqGGPaqkdaqhaaWcbaGaemyAaKgabaGaeGOmaidaaaqaaiabdMgaPbqab0GaeyyeIuoaaaa@383D@ is proportional to the negative log-likelihood function for Gaussian linear regression. While the MLE for the general regression model is no longer uniquely defined and very poor in the case of more variables than observations, the Lasso estimator is still reasonable as long as λ > 0. For our analysis, we have a similar problem, namely that the MLE does not exist in case of zero counts in the contingency table: a detailed description of the existence of the MLE in general log-linear interaction models is given in [14]. Inspired by the Lasso, we estimate our parameter vector β by the following expression:

β _ λ = arg min β [ l ( β ) + λ j | β j | ] , MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaecaaeaaiiWacqWFYoGyaiaawkWaamaaCaaaleqabaacciGae43UdWgaaOGaeyypa0JagiyyaeMaeiOCaiNaei4zaC2aaCbeaeaacyGGTbqBcqGGPbqAcqGGUbGBaSqaaiab=j7aIbqabaGcdaWadaqaaiabgkHiTiabdYgaSjabcIcaOiab=j7aIjabcMcaPiabgUcaRiab+T7aSnaaqafabaWaaqWaaeaacqGFYoGydaWgaaWcbaGaemOAaOgabeaaaOGaay5bSlaawIa7aaWcbaGaemOAaOgabeqdcqGHris5aaGccaGLBbGaayzxaaGaeiilaWcaaa@506A@

where l(β) is the log-likelihood function l ( β ) = log β [ n ] i = 1 m n n n ( X β ) n MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemiBaWMaeiikaGcccmGae8NSdiMaeiykaKIaeyypa0JagiiBaWMaei4Ba8Maei4zaC2efv3ySLgznfgDOjdaryqr1ngBPrginfgDObcv39gaiqaacqGFzecudaWgaaWcbaGae8NSdigabeaakiabcUfaBHqabiab95gaUjabc2faDjabg2Hi1oaaqadajuaGbaWaaSaaaeaacqWGUbGBdaWgaaqaaiabd6gaUbqabaaabaGaemOBa4gaaaWcbaGaemyAaKMaeyypa0JaeGymaedabaGaemyBa0ganiabggHiLdGccqGGOaakcqqFybawcqWFYoGycqGGPaqkdaWgaaWcbaGaemOBa4gabeaaaaa@595E@ . This minimization has to be calculated under the additional constraint that the cell probabilities add to 1:

i = 1 m exp { ( X β ) i } = 1. MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaabCaeaacyGGLbqzcqGG4baEcqGGWbaCcqGG7bWEcqGGOaakieqacqWFybawiiWacqGFYoGycqGGPaqkdaWgaaWcbaGaemyAaKgabeaakiabc2ha9bWcbaGaemyAaKMaeyypa0JaeGymaedabaGaemyBa0ganiabggHiLdGccqGH9aqpcqaIXaqmcqGGUaGlaaa@4359@

A problem of the optimization (4) is that the solution is no longer independent of the choice of the orthogonal subspaces X a . That is, if any set of orthogonal columns X a of X is reparametrized by a different orthogonal set, we get a different solution. To avoid this undesirable outcome we use a penalty that is intermediate between the ℓ1- and the ℓ2-penalty. This penalty, called group-ℓ1-penalty, has the following form:

a C β a 2 ,  where  β a 2 2 = j ( β a ) j 2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaabuaeaadaqbdaqaaGGaciab=j7aInaaBaaaleaacqWGHbqyaeqaaaGccaGLjWUaayPcSdWaaSbaaSqaaiabloriSnaaBaaameaacqaIYaGmaeqaaaWcbeaaaeaacqWGHbqycqGHgksZcqWGdbWqaeqaniabggHiLdGccqGGSaalcqqGGaaicqqG3bWDcqqGObaAcqqGLbqzcqqGYbGCcqqGLbqzcqqGGaaidaqbdaqaaiab=j7aInaaBaaaleaacqWGHbqyaeqaaaGccaGLjWUaayPcSdWaa0baaSqaaiabloriSnaaBaaameaacqaIYaGmaeqaaaWcbaGaeGOmaidaaOGaeyypa0ZaaabuaeaacqGGOaakcqWFYoGydaWgaaWcbaGaemyyaegabeaakiabcMcaPmaaDaaaleaacqWGQbGAaeaacqaIYaGmaaaabaGaemOAaOgabeqdcqGHris5aaaa@5A4A@

Originally, this has been proposed by [15] for the linear regression problem with factor variables. The estimator of β then becomes

β _ λ = arg min β [ l ( β ) + λ a C a β a 2 ] , MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaecaaeaaiiWacqWFYoGyaiaawkWaamaaCaaaleqabaacciGae43UdWgaaOGaeyypa0JagiyyaeMaeiOCaiNaei4zaC2aaCbeaeaacyGGTbqBcqGGPbqAcqGGUbGBaSqaaiab=j7aIbqabaGcdaWadaqaaiabgkHiTiabdYgaSjabcIcaOiab=j7aIjabcMcaPiabgUcaRiab+T7aSnaaqafabaWaauWaaeaacqWFYoGydaWgaaWcbaGaemyyaegabeaaaOGaayzcSlaawQa7amaaBaaaleaacqWItecBdaWgaaadbaGaeGOmaidabeaaaSqabaaabaqbaeqabiqaaaqaaiabdggaHjabgAOinlabdoeadbqaaiabdggaHjabgcMi5kabgwGigdaaaeqaniabggHiLdaakiaawUfacaGLDbaacqGGSaalaaa@5A70@

subject to the constraint in (5). By imposing a penalty function on the coefficients of the log-linear interaction terms, overfitting as it might occur by using MLE is reduced. Furthermore, the ℓ1-penalty encourages sparse solutions for the single components of β, the group ℓ1-penalty encourages sparsity at the interaction level, meaning that the vector β a , which corresponds to the interaction term a is either present or absent in the model as a whole. In case of factors with only 2 levels, the group ℓ1-penalty and the ℓ1-penalty are equivalent.

For both the ℓ1-, and the group ℓ1-regularization, the parameter λ can be assessed by cross-validation: we divide the individual counts into a number of equal parts and in turn leave out one part for the rest to form a training contingency table with cell counts n train . The solution for an array of values for λ, the so-called solution path, is calculated according to an algorithm described in the following Implementation section. The corresponding vectors of cell probabilities are denoted by p( β _ λ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqcfa4aaecaaeaaiiWacqWFYoGyaiaawkWaamaaCaaabeqaaGGaciab+T7aSbaaaaa@30A8@ ). We then use the remainder of the cell counts n test to calculate the predictive negative log-likelihood score

i = 1 m n t e s t , i log ( p i ( β _ λ ) ) i = 1 m n t e s t , i , MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqcfa4aaSaaaeaacqGHsisldaaeWaqaaGqabiab=5gaUnaaBaaabaGaemiDaqNaemyzauMaem4CamNaemiDaqNaeiilaWIaemyAaKgabeaacqGHflY1cyGGSbaBcqGGVbWBcqGGNbWzdaqadaqaaiabdchaWnaaBaaabaGaemyAaKgabeaacqGGOaakdaqiaaqaaGGadiab+j7aIbGaayPadaWaaWbaaeqabaacciGae03UdWgaaiabcMcaPaGaayjkaiaawMcaaaqaaiabdMgaPjabg2da9iabigdaXaqaaiabd2gaTbGaeyyeIuoaaeaadaaeWaqaaiab=5gaUnaaBaaabaGaemiDaqNaemyzauMaem4CamNaemiDaqNaeiilaWIaemyAaKgabeaaaeaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGTbqBaiabggHiLdaaaOGaeiilaWcaaa@5F6B@

which is proportional to the out-of-sample negative log-likelihood. This score is on the same scale when varying the number of observations and may therefore be used to compare contingency tables of the same dimension but with different numbers of cell entries. The parameter λ is chosen as the value which minimizes the cross-validated score in (7). We use a ten-fold cross-validation in our example.

The resulting model does not necessarily have to be hierarchical and if we consider the hierarchical model induced by this procedure, it might happen that the final model is large for example if a single high order interaction is estimated to be active. To address this, we set up an algorithm described in the next Section.

Level-ℓ1-regularized model selection

In order to prevent the procedure from choosing single high-order interactions, we alter the ℓ1-regularized algorithm described in the previous Section: we do not exclusively apply it to the fully saturated model but also to submodels with lower order interactions. Precisely, a model is fitted with main effects only, and the predictive negative log-likelihood score (7) is calculated for the best main effects model (level 1). The same is done for the model including all main effects and first order interactions (level 2). Proceeding accordingly, we get |C| log-likelihood scores corresponding to the |C| levels. The level with minimal score (7) is then chosen (and within this selected level, we have an ℓ1-regularized estimate).

With this procedure the tendency of including a single high-order interaction while most of its lower order interactions are absent is decreased, and the inclusion is only forced if the predictive negative log-likelihood score strongly speaks in favour of the inclusion. Therefore we tend to select sparser models which can be better hierarchized and interpreted in terms of conditional independence, in contrast to the ordinary ℓ1-model selection procedure.

Algorithm for ℓ1-regularization for factors with two levels

For the regularization approaches we calculate β _ λ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqcfa4aaecaaeaaiiWacqWFYoGyaiaawkWaamaaCaaabeqaaGGaciab+T7aSbaaaaa@30A8@ over a large number of values of λ in order to do some cross-validation using (7). For this purpose, an efficient algorithm is required. As one can easily verify by introducing Lagrange multipliers, finding the solution to (6) under the constraint (5) is equivalent to minimizing an unconstrained function g(β):

g ( β ) = l ( β ) + i = 1 m exp ( μ i ) + λ a C a β a 2 , MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaem4zaCMaeiikaGcccmGae8NSdiMaeiykaKIaeyypa0JaeyOeI0IaemiBaWMaeiikaGIae8NSdiMaeiykaKIaey4kaSYaaabCaeaacyGGLbqzcqGG4baEcqGGWbaCcqGGOaakiiGacqGF8oqBdaWgaaWcbaGaemyAaKgabeaakiabcMcaPaWcbaGaemyAaKMaeyypa0JaeGymaedabaGaemyBa0ganiabggHiLdGccqGHRaWkcqGF7oaBdaaeqbqaamaafmaabaGae4NSdi2aaSbaaSqaaiabdggaHbqabaaakiaawMa7caGLkWoadaWgaaWcbaGaeS4eHW2aaSbaaWqaaiabikdaYaqabaaaleqaaaqaauaabeqaceaaaeaacqWGHbqycqGHgksZcqWGdbWqaeaacqWGHbqycqGHGjsUcqGHfiIXaaaabeqdcqGHris5aOGaeiilaWcaaa@5FD9@

with μ= X β and l ( β ) i n i n ( X β ) i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemiBaWMaeiikaGcccmGae8NSdiMaeiykaKIaeyyhIu7aaabeaeaajuaGdaWcaaqaaiabd6gaUnaaBaaabaGaemyAaKgabeaaaeaacqWGUbGBaaGccqGGOaakieqacqGFybawcqWFYoGycqGGPaqkdaWgaaWcbaGaemyAaKgabeaaaeaacqWGPbqAaeqaniabggHiLdaaaa@4043@ . Here, g is a convex function. If each factor has two levels only, as in our application with single-gene libraries, we can set up an algorithm, which efficiently yields the estimates for a whole sequence of parameters λ. Let A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae8haXheaaa@3747@ denote the set of active interaction terms, which means for a A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae8haXheaaa@3747@ it holds that β a ≠ 0; X A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaacbeGae8hwaG1aaSbaaSqaamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaGabaiab+bq8bbqabaaaaa@38B1@ is the corresponding sub-matrix of X, β A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaaccmGae8NSdi2aaSbaaSqaamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaGabaiab+bq8bbqabaaaaa@391B@ the corresponding sub-vector of β and g A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaem4zaC2aaSbaaSqaamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaGabaiab=bq8bbqabaaaaa@38CA@ is g restricted to the subspace β A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaaccmGae8NSdi2aaSbaaSqaamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaGabaiab+bq8bbqabaaaaa@391B@ . We restrict ourselves to the currently active set A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae8haXheaaa@3747@ , where g A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaey4bIeTaem4zaC2aaSbaaSqaamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaGabaiab=bq8bbqabaaaaa@3A50@ and 2 g A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaey4bIe9aaWbaaSqabeaacqaIYaGmaaGccqWGNbWzdaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae8haXheabeaaaaa@3B79@ are well-defined:

g A ( β A , λ ) = X A t { n n exp ( X A β A ) } + λ ( 0 , s i g n ( β A ) ) t 2 g A ( β A , λ ) = X A t d i a g { exp { X β ) } X A . MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqbaeWabiqaaaqaaiabgEGirlabdEgaNnaaBaaaleaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFaeFqaeqaaOGaeiikaGcccmGae4NSdi2aaSbaaSqaaiab=bq8bbqabaGccqGGSaaliiGacqqF7oaBcqGGPaqkcqGH9aqpcqGHsislieqacqaFybawdaqhaaWcbaGae8haXheabaGaemiDaqhaaOGaei4EaSxcfa4aaSaaaeaacqaFUbGBaeaacqWGUbGBaaGccqGHsislcqGHflY1cyGGLbqzcqGG4baEcqGGWbaCcqGGOaakcqaFybawdaWgaaWcbaGae8haXheabeaakiab+j7aInaaBaaaleaacqWFaeFqaeqaaOGaeiykaKIaeiyFa0Naey4kaSIae03UdWMaeiikaGIaeGimaaJaeiilaWIaem4CamNaemyAaKMaem4zaCMaemOBa4MaeiikaGIae4NSdi2aaSbaaSqaaiab=bq8bbqabaGccqGGPaqkcqGGPaqkdaahaaWcbeqaaiabdsha0baaaOqaaiabgEGirpaaCaaaleqabaGaeGOmaidaaOGaem4zaC2aaSbaaSqaaiab=bq8bbqabaGccqGGOaakcqGFYoGydaWgaaWcbaGae8haXheabeaakiabcYcaSiab9T7aSjabcMcaPiabg2da9iab8HfaynaaDaaaleaacqWFaeFqaeaacqWG0baDaaGccqWGKbazcqWGPbqAcqWGHbqycqWGNbWzcqGG7bWEcyGGLbqzcqGG4baEcqGGWbaCcqGG7bWEcqaFybawcqGFYoGycqGGPaqkcqGG9bqFcqaFybawdaWgaaWcbaGae8haXheabeaakiabc6caUaaaaaa@9AC6@

The algorithm, which is an adaption of the path following algorithm proposed by [16], is set up as follows:

  1. (1)

    Start with β _ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaecaaeaaiiWacqWFYoGyaiaawkWaaaaa@2E3E@ = (-log(m), 0,..., 0)

  2. (2)

    Set: λ0 = 1, A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae8haXheaaa@3747@ = {} and t = 0.

  3. (3)

    While (λ t > λ min )

(3.1) λt+1= λ t - ε

(3.2) A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae8haXheaaa@3747@ = A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae8haXheaaa@3747@ {j A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae8haXheaaa@3747@ : |[Xt· n n MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqcfa4aaSaaaeaaieqacqWFUbGBaeaacqWGUbGBaaaaaa@2F41@ - exp (X β _ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaecaaeaaiiWacqWFYoGyaiaawkWaaaaa@2E3E@ )] j | > λt+1}

(3.3) β _ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaecaaeaaiiWacqWFYoGyaiaawkWaaaaa@2E3E@ is updated as β _ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaecaaeaaiiWacqWFYoGyaiaawkWaaaaa@2E3E@ t+1= β _ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaecaaeaaiiWacqWFYoGyaiaawkWaaaaa@2E3E@ t - 2 g A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaey4bIe9aaWbaaSqabeaacqaIYaGmaaGccqWGNbWzdaWgaaWcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae8haXheabeaaaaa@3B79@ ( β _ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaecaaeaaiiWacqWFYoGyaiaawkWaaaaa@2E3E@ t , λt+1)-1· g A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaey4bIeTaem4zaC2aaSbaaSqaamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaGabaiab=bq8bbqabaaaaa@3A50@ ( β _ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaecaaeaaiiWacqWFYoGyaiaawkWaaaaa@2E3E@ t , λt+1).

(3.4) A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae8haXheaaa@3747@ = A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae8haXheaaa@3747@ \{j A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae8haXheaaa@3747@ : { β _ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaecaaeaaiiWacqWFYoGyaiaawkWaaaaa@2E3E@ t+1,j| <δ}

(3.5) t = t + 1

The pairs ( β _ t , λ t ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaeiikaGYaaecaaeaaiiWacqWFYoGyaiaawkWaamaaBaaaleaacqWG0baDaeqaaOGaeiilaWccciGae43UdW2aaSbaaSqaaiabdsha0bqabaGccqGGPaqkaaa@35D8@ , obtained from the algorithm above, represent the estimates from (6) under the constraint (5) for a range of penalty parameters λ t e.g. (t = ε, 2ε...). The choice of the step length ε represents the tradeoff between computational complexity and accuracy. To increase accuracy, one can perform more than one Newton step (3.3) if the gradient starts deviating from zero. The coefficient δ is also flexible. Typically it is chosen in the order of ε. The lowest λ for which one wants the solution to be calculated is denoted by λ min . Technical details concerning the algorithm can be found in the Appendix.



We choose the true underlying interaction vector β consisting of 5 factors of 2 levels. By enumerating the factors from 1 to 5, the generators of the model are 345 + 235 + 234 + 135 + 123 + 14, which means that all third and fourth order interactions are absent, only five of ten second order interactions and all first order interactions are present. The corresponding coefficients of β are independently simulated using a normal distribution with mean zero and variance one.

Then, 250 draws from a multinomial distribution with probability vector p where log (p) = X β, are taken. This corresponds to a reasonable number of cDNAs in a single-gene library. This is then repeated 10 times. With our choice of β, the resulting contingency tables are sparse. With the simulated cell counts, β _ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaecaaeaaiiWacqWFYoGyaiaawkWaaaaa@2E3E@ is estimated with different methods described in the previous sections and these methods are then compared as follows:


As a model selection score (MSS), we consider the fraction of correctly assigned model terms:

MSS = 1 1 m i = 1 m | 1 { β i 0 } 1 { β _ i 0 } | . MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaeeyta0Kaee4uamLaee4uamLaeyypa0JaeGymaeJaeyOeI0scfa4aaSaaaeaacqaIXaqmaeaacqWGTbqBaaGcdaaeWbqaaiabcYha8jabigdaXmaaBaaaleaacqGG7bWEiiWacqWFYoGydaWgaaadbaGaemyAaKgabeaaliabgcMi5kabicdaWiabc2ha9bqabaGccqGHsislcqaIXaqmdaWgaaWcbaGaei4EaS3aaecaaeaacqWFYoGyaiaawkWaamaaBaaameaacqWGPbqAaeqaaSGaeyiyIKRaeGimaaJaeiyFa0habeaakiabcYha8bWcbaGaemyAaKMaeyypa0JaeGymaedabaGaemyBa0ganiabggHiLdGccqGGUaGlaaa@562E@

Moreover, we consider the root mean squared error for the interaction coefficients,

R M S E = 1 m i = 1 m ( β _ i β i ) 2 . MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemOuaiLaemyta0Kaem4uamLaemyrauKaeyypa0ZaaOaaaeaajuaGdaWcaaqaaiabigdaXaqaaiabd2gaTbaakmaaqahabaGaeiikaGYaaecaaeaaiiWacqWFYoGyaiaawkWaamaaBaaaleaacqWGPbqAaeqaaOGaeyOeI0Iae8NSdi2aaSbaaSqaaiabdMgaPbqabaGccqGGPaqkdaahaaWcbeqaaiabikdaYaaaaeaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGTbqBa0GaeyyeIuoaaSqabaGccqGGUaGlaaa@478F@

For assessing how much the estimation of β varies over multiple datasets, we calculate for every coefficient β _ i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaecaaeaaiiWacqWFYoGyaiaawkWaamaaBaaaleaacqWGPbqAaeqaaaaa@2FC5@ the estimated standard deviation σ _ i MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaecaaeaaiiGacqWFdpWCaiaawkWaamaaBaaaleaacqWGPbqAaeqaaaaa@2FE6@ . The means of these standard deviations are reported as

SPREAD = 1 m i = 1 m σ _ i , MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaee4uamLaeeiuaaLaeeOuaiLaeeyrauKaeeyqaeKaeeiraqKaeyypa0tcfa4aaSaaaeaacqaIXaqmaeaacqWGTbqBaaGcdaaeWbqaamaaHaaabaacciGae83WdmhacaGLcmaadaWgaaWcbaGaemyAaKgabeaaaeaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGTbqBa0GaeyyeIuoakiabcYcaSaaa@42B2@

a measure of variability.

To compare the different procedures for estimation of probabilities p = exp (X β), we calculate the negative log-likelihood score (NLS) similar to the score in (7):

NLS ( β _ ) = i = 1 m p i log ( p i ( β _ ) ) . MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaeeOta4KaeeitaWKaee4uamLaeiikaGYaaecaaeaaiiWacqWFYoGyaiaawkWaaiabcMcaPiabg2da9iabgkHiTmaaqahabaGaemiCaa3aaSbaaSqaaiabdMgaPbqabaGccqGHflY1cyGGSbaBcqGGVbWBcqGGNbWzdaqadaqaaiabdchaWnaaBaaaleaacqWGPbqAaeqaaOGaeiikaGYaaecaaeaacqWFYoGyaiaawkWaaiabcMcaPaGaayjkaiaawMcaaaWcbaGaemyAaKMaeyypa0JaeGymaedabaGaemyBa0ganiabggHiLdGccqGGUaGlaaa@4F79@

Results of simulation study

The results of the simulation study are summarized in Table 1, where we also include the MAP estimators of the Bayesian approaches described in Additional file 1 Section 2. We notice that the penalty-based regularization approaches proposed in this article leads to comparable or better results than the Bayesian approaches with respect to the NLS-score, RMSE and the variation (SPREAD), though the results of Bayesian approaches vary with the prior and the set of possible priors has not been extensively explored.

Table 1 Performance of different algorithms

The level-ℓ1-regularization and the relaxed ℓ1-regularization (see below) are both competitive and can be better than MCMC for model selection.

The results of the MCMC procedures are sensitive to the choice of the prior value or the prior distribution for σ2. A at prior for α a (σ2 = 2) results in worse performance than that of a prior that shrinks the coefficients more towards zero (σ2 = 1/2). This suggests that specification of this prior hyperparameter may be difficult in practice, while we can easily optimize λ in the regularization approach by cross-validation.

The MCMC approaches without model selection perform poorly, as should be expected from data generated by a sparse model. MCMC methods based on a non-hierarchical model selection are also clearly inferior to the hierarchical counterpart. This is not surprising, as we have simulated data from a hierarchical model. In Table 1 we have also added an additional approach, denoted by ℓ2, the equivalent to the ℓ1-regularization but using an ℓ2-penalty instead of an ℓ1-penalty on the coefficients of the log-linear model. This method is equivalent to the MAP estimator with Gaussian priors on β a , with the parameter of the distribution optimized by cross-validation. This Ridge-type method does not perform variable selection, but it is competitive for all other criteria that we assessed.

In addition we consider the relaxed1-regularization approach. Rather than using a single penalty parameter λ, the idea of this method is to control variable selection and parameter estimation by incorporating two penalty parameters. For linear regression it has been proven theoretically as well as empirically [17] that under suitable conditions the relaxed ℓ1-regularization is better than Lasso.

Overall, the level-ℓ1-regularization has good model selection performance (high MSS score) in combination with low negative log-likelihood score (NLS) and a low mean squared error for the true β(RMSE). In addition, it is feasible to optimize the tuning parameter λ by cross-validation as the computational cost is very low compared to the MCMC approaches. On the other hand, posterior distributions of estimates from MCMC methods provide additional information about uncertainty in the model space, compared to point estimates from ℓ1- or ℓ2-regularization.



We estimate the splicing interaction pattern for a dataset corresponding to the itpr1 gene, one of three mammalian genes encoding receptors for the second messenger inositol 1,4,5-trisphosphate (InsP3). This gene is subject to alternative RNA splicing, with seven sites of transcript variation, 6 of these within the ORF and among these, q = 5 were completely assessed in the single-gene libraries. Five single-gene libraries were built, one for adult rat cerebrum as well as four for different stages of postnatal cerebellar development, namely on days 6, 12, 22 and 90, the latter being considered as adult. Each library consists of between 179 and 277 transcripts which were assessed, i.e. j = 1 m n j MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaabmaeaacqWGUbGBdaWgaaWcbaGaemOAaOgabeaaaeaacqWGQbGAcqGH9aqpcqaIXaqmaeaacqWGTbqBa0GaeyyeIuoaaaa@356E@ [179, 277]. This gene is 89% identical at the cDNA level and 95% identical at the amino acid level with the human receptor gene. The complete dataset can be found in [18].

Results of application to Single-Gene Libraries

Unless stated differently, we report the results using the level ℓ1-penalization method. We display the interaction vector β _ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaecaaeaaiiWacqWFYoGyaiaawkWaaaaa@2E3E@ graphically by plotting the components β _ j MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaecaaeaaiiGacqWFYoGyaiaawkWaamaaBaaaleaacqWGQbGAaeqaaaaa@2FC6@ for the different tissue and development stages in Figure 1. Our results suggest that the exons interact mainly in pairs and there is no reliably estimated higher order interaction in the splicing interaction pattern of rat cerebellum. We further notice that the main interaction pattern is very well conserved over different developmental stages. A strong mutual interaction between exons number three, four and five can be observed in all development stages of rat cerebellum as well as in the cerebral tissue. The biggest changes in the interaction pattern during development of rat cerebellum occur from postnatal day six to day 12. This can be seen at position number 10 on the x-axis in Figure 1, and it corresponds to the first order interaction between exons two and three, and from day 12 to day 16, the first main effect changes in sign and magnitude. The first main effect decreases progressively from day 6 to adult, reversing in sign between day 12 and 22. Between day 22 and 90, the interaction pattern is strongly conserved. Comparing the splicing interaction patterns between cerebellum and cerebrum in the adult rat, we see a much more complex pattern in the cerebrum, involving several second order interactions, and therefore a clear distinction from that of the cerebellum.

Figure 1
figure 1

Graphical display of interaction vector. The upper panel shows the estimated splicing interaction vectors β _ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaecaaeaaiiWacqWFYoGyaiaawkWaaaaa@2E3E@ of rat cerebellum tissues at postnatal days six, 12 and 22. The lower panel shows the splicing interaction vector β _ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaecaaeaaiiWacqWFYoGyaiaawkWaaaaa@2E3E@ of rat cerebellum tissues at the age of 90 days, which is considered adult, as well as the splicing interaction vector β _ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaecaaeaaiiWacqWFYoGyaiaawkWaaaaa@2E3E@ of rat cerebral tissue at the age of 90 days. Within an interaction degree, the sequence of coefficients is ordered from left to right as follows: e.g. for 2nd order interactions, 123, 124, 125,..., 345, where 1,...., 5 represent exons 12, 23B, 40, 41, and 42 in the rip3r1 gene, as described in [18].

The conditional independence graphs for the estimated log-linear models are drawn in Figure 2, where the thickness of the edges are proportional to the corresponding coefficient of the interaction vector β _ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaecaaeaaiiWacqWFYoGyaiaawkWaaaaa@2E3E@ (the largest, if there are several giving rise to the same edge) and the radius of the vertices are chosen proportional to the corresponding main effect coefficient. Figure 2 graphically exploits the strongly conserved interactions between exons three, four and five. Except for a rather strong interaction between exon two and three on day six, all other interactions appear to be rather small. The graphical representation of the interaction pattern of adult rat cerebrum reveals a more complex interaction pattern with no conditional independences.

Figure 2
figure 2

Conditional Independence Graphs. Conditional independence graphs for the estimated log-linear models for the itpr1 gene. For each graph, the predictive probability score (7) is reported as a goodness of fit measure. Note the strong mutual interaction between exons three, four and five.

The approaches and results presented here can provide valuable insight into the underlying processes in alternative splicing in general, and specifically in the brain development experiments considered here. Most striking is the strong conservation over developmental stages at day 12, 22 and 90 (adult); some differences are showing between postnatal day six and day 12. Also, the conservation between the cerebellum and cerebrum is less pronounced than over developmental stages. Finally, second- or higher-order interaction terms seem to be of minor relevance, suggesting that in this gene/tissue combination, direct interaction mainly happens between pairs of exons, but not combinations of three or more exons.

We have also estimated β with the hierarchical Bayesian approach using MCMC. For the choice of σ2 = 1 this resulted in very similar interaction patterns as for the level ℓ1-penalization method. For σ2 = 2 it led to remarkably different results. In addition to this, a further dataset was analyzed where the details can be found in Additional file 1 Section 3.


We have developed an efficient method for identifying interaction patterns of categorical variables. This can be used to fit a graphical model which is a valuable tool to visualize the conditional dependence structure among the random variables. In a simulation study, the results of the new level-ℓ1-regularization method are superior in comparison to ℓ1-regularization and slightly better than the MAP estimator from some of the MCMC methods we considered. With real data, the level ℓ1-regularization and hierarchical Bayesian approach led to similar results, subject to a specific choice of priors for the Bayesian method. An important computational advantage of the level-ℓ1-method in comparison to MCMC, is that cross-validation becomes feasible which in turn allows for an empirical choice of the tuning parameter. While the methodology described in this article is motivated by the study of exon splicing interactions in single-gene transcriptomes, it provides a general and flexible toolbox for regularization analysis in relatively high dimensional, sparse contingency tables. Model selection in high dimensional contingency tables has been a traditionally challenging area, and we hope that our generalization of regularization methodologies to this context will prove useful in a variety of areas of computational biology and biostatistics. Several technologies generate categorical data: these include SNP chips that provide genotype and copy number information at the DNA level, sequencing technologies, assays that study binding properties of proteins and binding of RNA to DNA, a variety of disease phenotypes, and more. In most of these contexts the interactions among the variables are critical features in systems biology investigations that aim at studying how the components of complex systems work together in in fluencing biological outcomes. For example, the log-linear models described here provide a natural approach for fitting very general classes of networks to discrete data. The level-ℓ1-regularization is a general tool which can be applied to a wide variety of problems involving sparse contingency tables.

An R package called logilasso will be available for download on the Comprehensive R Archive Network (CRAN).


We note that if β is a minimum of g, then β A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaaccmGae8NSdi2aaSbaaSqaamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaGabaiab+bq8bbqabaaaaa@391B@ is a minimum of g A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaem4zaC2aaSbaaSqaamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaGabaiab=bq8bbqabaaaaa@38CA@ .

In our application with single-gene libraries, all factors have two levels only, which allows to construct an efficient algorithm. Since the gradient

[ l ( β ) + j = 1 m exp ( μ j ) ] = X t ( n n exp ( X β ) ) , MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaey4bIe9aamWaaeaacqGHsislcqWGSbaBcqGGOaakiiWacqWFYoGycqGGPaqkcqGHRaWkdaaeWbqaaiGbcwgaLjabcIha4jabcchaWjabcIcaOGGaciab+X7aTnaaBaaaleaacqWGQbGAaeqaaOGaeiykaKcaleaacqWGQbGAcqGH9aqpcqaIXaqmaeaacqWGTbqBa0GaeyyeIuoaaOGaay5waiaaw2faaiabg2da9iabgkHiTGqabiab9HfaynaaCaaaleqabaGaemiDaqhaaOGaeyyXICTaeiikaGscfa4aaSaaaeaacqqFUbGBaeaacqWGUbGBaaGccqGHsislcyGGLbqzcqGG4baEcqGGWbaCcqGGOaakcqqFybawcqWFYoGycqGGPaqkcqGGPaqkcqGGSaalaaa@5D1D@

where exp(X β) is understood as the componentwise exponential function, it follows that for a minimum β A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaaccmGae8NSdi2aaSbaaSqaamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaGabaiab+bq8bbqabaaaaa@391B@ of g A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaem4zaC2aaSbaaSqaamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaGabaiab=bq8bbqabaaaaa@38CA@ , the following equation holds:

g A ( β A ) = X A t ( n n exp ( X A β ) ) + ( 0 , s i g n ( β A ) ) t λ = 0 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaey4bIeTaem4zaC2aaSbaaSqaamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaGabaiab=bq8bbqabaGccqGGOaakiiWacqGFYoGydaWgaaWcbaGae8haXheabeaakiabcMcaPiabg2da9iabgkHiTGqabiab9HfaynaaDaaaleaacqWFaeFqaeaacqWG0baDaaGccqGHflY1cqGGOaakjuaGdaWcaaqaaiab95gaUbqaaiabd6gaUbaakiabgkHiTiGbcwgaLjabcIha4jabcchaWjabcIcaOiab9HfaynaaBaaaleaacqWFaeFqaeqaaOGae4NSdiMaeiykaKIaeiykaKIaey4kaSIaeiikaGIaeGimaaJaeiilaWIaem4CamNaemyAaKMaem4zaCMaemOBa4MaeiikaGIae4NSdi2aaSbaaSqaaiab=bq8bbqabaGccqGGPaqkcqGGPaqkdaahaaWcbeqaaiabdsha0baakiabgwSixJGaciab8T7aSjabg2da9iabicdaWaaa@7034@

Without loss of generality, we can restrict ourselves to the subspace β - × m-1, because the constraint (5) can only be satisfied for β < 0 as is proved in the following Lemma 1. Therefore β A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae8haXheaaa@3747@ .

Lemma 1. β < 0 for a minimum of g(β) for all λ +.


log(p) = X β< 0 which yields (1,..., 1)X β= < 0 this implies β < 0.

This holds because (1,....., 1) is orthogonal to all columns of X except for the first one.   □

Additionally for β being a minimum, a necessary condition is:

| ( X t ( n n exp ( X β ) ) ) j | < λ , j A . MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaeiiFaWNaeiikaGccbeGae8hwaG1aaWbaaSqabeaacqWG0baDaaGccqGHflY1cqGGOaakjuaGdaWcaaqaaiab=5gaUbqaaiabd6gaUbaakiabgkHiTiGbcwgaLjabcIha4jabcchaWjabcIcaOiab=HfayHGadiab+j7aIjabcMcaPiabcMcaPiabcMcaPmaaBaaaleaacqWGQbGAaeqaaOGaeiiFaWNaeyipaWdcciGae03UdWMaeiilaWIaeyiaIiYaaSbaaSqaaiabdQgaQbqabaGccqGHjiYZt0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqaFaeFqcqGGUaGlaaa@5A42@

Conditions (9) and (10) are sufficient for β being a minimum of (8). To find the β's that solve these equations for an array of values for λ, we set up a so-called path following algorithm. The idea is to start from an optimal solution β λ 0 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaaccmGae8NSdi2aaWbaaSqabeaaiiGacqGF7oaBdaWgaaadbaGaeGimaadabeaaaaaaaa@307E@ for λ0, and follow the path for decreasing λ, using a second-order approximation for β A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaaccmGae8NSdi2aaSbaaSqaamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaGabaiab+bq8bbqabaaaaa@391B@ . In the following, we restrict ourselves to the currently active set A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae8haXheaaa@3747@ , omitting the index A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae8haXheaaa@3747@ . It then holds:

g ( β t + 1 , λ t + 1 ) = 0 g ( β t , λ t + 1 ) + 2 g ( β t , λ t + 1 ) δ β .  This implies δ β = 2 g ( β t , λ t + 1 ) 1 g ( β t , λ t + 1 ) . MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqbaeWabiqaaaqaaiabgEGirlabdEgaNjabcIcaOGGadiab=j7aInaaBaaaleaacqWG0baDcqGHRaWkcqaIXaqmaeqaaOGaeiilaWccciGae43UdW2aaSbaaSqaaiabdsha0jabgUcaRiabigdaXaqabaGccqGGPaqkcqGH9aqpcqaIWaamcqGHijYUcqGHhis0cqWGNbWzcqGGOaakcqWFYoGydaWgaaWcbaGaemiDaqhabeaakiabcYcaSiab+T7aSnaaBaaaleaacqWG0baDcqGHRaWkcqaIXaqmaeqaaOGaeiykaKIaey4kaSIaey4bIe9aaWbaaSqabeaacqaIYaGmaaGccqWGNbWzcqGGOaakcqWFYoGydaWgaaWcbaGaemiDaqhabeaakiabcYcaSiab+T7aSnaaBaaaleaacqWG0baDcqGHRaWkcqaIXaqmaeqaaOGaeiykaKIae4hTdqMae8NSdiMaeiOla4IaeeiiaaIaeeivaqLaeeiAaGMaeeyAaKMaee4CamNaeeiiaaIaeeyAaKMaeeyBa0MaeeiCaaNaeeiBaWMaeeyAaKMaeeyzauMaee4CamhabaGae4hTdqMae8NSdiMaeyypa0JaeyOeI0Iaey4bIe9aaWbaaSqabeaacqaIYaGmaaGccqWGNbWzcqGGOaakcqWFYoGydaWgaaWcbaGaemiDaqhabeaakiabcYcaSiab+T7aSnaaBaaaleaacqWG0baDcqGHRaWkcqaIXaqmaeqaaOGaeiykaKYaaWbaaSqabeaacqGHsislcqaIXaqmaaGccqGHhis0cqWGNbWzcqGGOaakcqWFYoGydaWgaaWcbaGaemiDaqhabeaakiabcYcaSiab+T7aSnaaBaaaleaacqWG0baDcqGHRaWkcqaIXaqmaeqaaOGaeiykaKIaeiOla4caaaaa@96E7@

The algorithm tries to follow the optimal path as close as possible. At each step, it aims to meet the conditions (9) and (10). In step (3.2), the active set A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae8haXheaaa@3747@ is identified, which forces β _ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaWaaecaaeaaiiWacqWFYoGyaiaawkWaaaaa@2E3E@ to meet the condition (10). In step (3.3), a Newton step as described in (11) is performed. Starting from a solution which meets condition (9), the new β _ λ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqcfa4aaecaaeaaiiWacqWFYoGyaiaawkWaamaaCaaabeqaaGGaciab+T7aSbaaaaa@30A8@ approximately meets (9) again.


  1. Liang F, Holt I, Pertea G, Karamycheva S, Salzberg S, Quackenbush J: Gene index analysis of the human genome estimates approximately 120000 genes. Nature Genetics. 2000, 25: 239-240. 10.1038/76126.

    Article  CAS  PubMed  Google Scholar 

  2. International Human Genome Sequencing Consortium: Finishing the euchromatic sequence of the human genome. Nature. 2004, 431: 931-945. 10.1038/nature03001.

    Article  Google Scholar 

  3. Southan C: Has the yo-yo stopped? An assessment of human protein-coding gene number. Proteomics. 2004, 4: 1712-1726. 10.1002/pmic.200300700.

    Article  CAS  PubMed  Google Scholar 

  4. Mironov A, Fickett J, Gelfand M: Frequent alternative splicing of human genes. Genome Research. 1999, 9: 1288-1293. 10.1101/gr.9.12.1288.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  5. Brett D, Hanke J, Lehmann G, Haase S, Delbruck S, Krueger SR, J Bork P: EST comparison indicates 38% of human mRNAs contain possible alternative splice forms. FEBS Lett. 2000, 474: 83-86. 10.1016/S0014-5793(00)01581-7.

    Article  CAS  PubMed  Google Scholar 

  6. International Human Genome Sequencing Consortium: Initial sequencing and analysis of the human genome. Nature. 2001, 409: 860-921. 10.1038/35057062.

    Article  Google Scholar 

  7. Brett D, Pospisil H, Valcarcel J, Reich J, Bork P: Alternative splicing and genome complexity. Nature Genetics. 2002, 30: 29-30. 10.1038/ng803.

    Article  CAS  PubMed  Google Scholar 

  8. The FANTOM Consortium: The transcriptional landscape of the mammalian genome. Science. 2005, 309 (5740): 1559-1563. 10.1126/science.1112014.

    Article  Google Scholar 

  9. Zavolan M, van Nimwegen E, Gaasterland T: Splice variation in mouse full-length cDNAs identified by mapping to the mouse genome. Genome Research. 2003, 12: 1377-1385. 10.1101/gr.191702.

    Article  Google Scholar 

  10. Imanishi T, Itoh T, Suzuki Y, O'Donovan C, Fukuchi S, Koyanagi m, Barrero m, Tamura T, Yamaguchi-Kabata Y, Tanino M: Integrative annotation of 21037 human genes validated by full-length cDNA clones. PloS Biology. 2004, 2: 1-20. 10.1371/journal.pbio.0020162.

    Article  Google Scholar 

  11. Everitt BS: The Analysis of Contingency Tables. Monographs on Statistics and Applied Probability. 1992, Chapman and Hall, 45: 2

    Google Scholar 

  12. Lauritzen SL: Graphical Models. Oxford Statistical Science Series. 1996, Oxford Clarendon Press, 17:

    Google Scholar 

  13. Tibshirani R: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. 1996, 58: 267-288.

    Google Scholar 

  14. Christensen R: Linear Models for Multivariate Time Series, and Spatial Data. 1991, Springer-Verlag

    Book  Google Scholar 

  15. Yuan M, Lin Y: Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society. 2006, 68: 49-67. 10.1111/j.1467-9868.2005.00532.x.

    Article  Google Scholar 

  16. Rosset S: Following Curved Regularized Optimization Solution Paths. Advances in Neural Information Processing Systems. Edited by: Saul LK, Weiss Y, Bottou L. 2005, Cambridge, MA: MIT Press, 17: 1153-1160.

    Google Scholar 

  17. Meinshausen N: Lasso with relaxation. Computational Statistics & Data Analysis.

  18. Regan MR, Lin DDM, Emerick MC, Agnew WS: The effect of higher order RNA processes on changing patterns of protein domain selection: A developmentally regulated transcriptome of type 1 inositol 1,4,5-trisphosphate. Proteins: Structure, Function and Bioinformatics. 2005, 59: 312-331. 10.1002/prot.20225.

    Article  CAS  Google Scholar 

Download references


CD was partially supported by the Swiss National Science Foundation grant number NF 200020-113270 and by a PhD scholarship from the CC-SPMD. GP was partly supported by NSF grant DMS034211.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Corinne Dahinden.

Additional information

Authors' contributions

CD derived the mathematical details, implemented and tested the algorithm. GP initiated the project, suggested ideas and edited the manuscript. ME provided the datasets and the biological interpretation. PB supervised the project and suggested some of the main ideas. All authors read and approved the final manuscript.

Electronic supplementary material


Additional file 1: The Additional file consists of 3 sections. Section 1 contains details concerningthe parametrization of the log-linear model. Section 2 describes some Bayesian model selection approaches, which were used for comparison with our algorithm. In Section 3 a further dataset on which we tested our algorithm is introduced and the results are given on that dataset. (PDF 190 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Dahinden, C., Parmigiani, G., Emerick, M.C. et al. Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries. BMC Bioinformatics 8, 476 (2007).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: