Skip to main content
Fig. 11 | BMC Bioinformatics

Fig. 11

From: GMMchi: gene expression clustering using Gaussian mixture modeling

Fig. 11

GMMchi on gene expression data: as before, in both cases, the pink columns indicate clearly negative expression while the yellow columns indicate at most very low-level expression. The dotted red lines mark the intersection between the two fitted normal curves (a) or between the fitted single normal distribution and the tail (b), and so the best estimate of where to separate low probably negative expressing cell lines from those that are clearly positive. The expression distribution in (a) is for gene EPHB3 encoding the protein Ephrin Receptor B3. In this example, GMMchi estimates the cutoff between the mixture of two normal components that are adequately normal as shown in the Q–Q plot with the data along the 45-degree line. In (b), a good example of a tail trimming process, the first histogram shows the expression distribution for CDH1, the epithelial membrane protein E-Cadherin, prior to tail trimming. The first Q-Q plot revealed datapoints deviating from the 45-degree line, suggesting an inadequate fit. The second histogram shows the tail identification step where the grey shading indicates the potential identification of a non-normal tail. The third histogram is the result of iterative tail trimming while the last figure shows the Q-Q plot of the resulting fit, indicating an adequately fitted normal component with a non-normal tail

Back to article page