Skip to main content

Learning Sparse Representations for Fruit-Fly Gene Expression Pattern Image Annotation and Retrieval



Fruit fly embryogenesis is one of the best understood animal development systems, and the spatiotemporal gene expression dynamics in this process are captured by digital images. Analysis of these high-throughput images will provide novel insights into the functions, interactions, and networks of animal genes governing development. To facilitate comparative analysis, web-based interfaces have been developed to conduct image retrieval based on body part keywords and images. Currently, the keyword annotation of spatiotemporal gene expression patterns is conducted manually. However, this manual practice does not scale with the continuously expanding collection of images. In addition, existing image retrieval systems based on the expression patterns may be made more accurate using keywords.


In this article, we adapt advanced data mining and computer vision techniques to address the key challenges in annotating and retrieving fruit fly gene expression pattern images. To boost the performance of image annotation and retrieval, we propose representations integrating spatial information and sparse features, overcoming the limitations of prior schemes.


We perform systematic experimental studies to evaluate the proposed schemes in comparison with current methods. Experimental results indicate that the integration of spatial information and sparse features lead to consistent performance improvement in image annotation, while for the task of retrieval, sparse features alone yields better results.


Embryos undergo a temporally ordered differentiation process, starting as basic undifferentiated eggs. Through the process of differentiation, gene expressions take on increasingly complex patterns. Transcriptional regulation of the fruit-fly Drosophila melanogaster is one of the best understood examples of the regulatory networks that govern gene expression patterning. An understanding of the regulatory networks responsible for gene patterning in Drosophila embryos has been aided by digital images produced via in situ hybridization [13]. These images document the spatiotemporal dynamics of differentiation found in Drosophila embryos. A comparative analysis of these images is beneficial for the understanding of functions and interactions in gene networks [414]. To facilitate these discoveries, tools have been developed to searching for images based on keywords that describe embryonic structures [15], and searching for images based on gene expression patterns [13, 14]. Images for these tools have been obtained from databases of Drosophila embryonic images, e.g. the Berkeley Drosophila Genome Project (BDGP), and they are annotated with a controlled vocabulary (CV) [1, 2] (Figure 1). The CV terms describe the developmental and anatomical properties of gene expression during embryogenesis [1]. Currently, groups of BDGP images are manually annotated with CV terms. This is done collectively so that not all images in a group necessarily correspond with each CV annotation. The manual nature of these tasks puts an inordinate burden on biologists as the collection of Drosophila gene expression patterns are growing rapidly [1]. It is therefore imperative to investigate efficient and effective computational methods to automate this task [1618].

Figure 1
figure 1

Sample image groups (all images within a group are from the same stage range and the same gene) and the associated terms from BDGP for the gene engrailed in stage ranges 7-8 and 9-10.

Image annotation and image retrieval problems have been studied extensively in computer vision and machine learning. However, natural images are the most common subjects of study for image annotation and image retrieval problems; and commonly-used annotation and retrieval techniques may not be effective for our task. For example, unlike most natural images, BDGP images have all been aligned and scaled semi-automatically. The binary feature vector (BFV) representation have been developed correlate pattern similarities between images [13], however the BFV representation is not robust to distortions; there were also some studies which tried to use robust descriptors to represent the BDGP images [1922], however they have not exploited spatial information. It is desirable to represent images in a way that takes advantage of the spatial properties of image features, while at the same time being robust to image distortions. In our annotation problem, we are interested in collectively annotating groups of images, with each group annotated with multiple CV terms. Previous studies have revealed that ignoring group memberships can be detrimental to annotation performance [19], and formulating the task as learning the function between local input patterns and CV terms lead to significant performance improvement [21].

In this article we propose a novel approach for the automated annotation and retrieval of Drosophila melanogaster images. We present an image representation model that takes advantage of the spatial information provided by the BDGP images while at the same time being more robust against distortions. We also take advantage of a state-of-the-art learning model in order to boost the performance of our tasks. Our feature representation framework is inspired by the spatial bag-of-words (BoW) approach for image representation. The BoW approach involves first extracting features from local patches on images. These patches are then quantized to a visual word that has been determined by a pre-computed codebook. Our approach involves extracting these local patches from each image in a group, while maintaining a record of the locations where features are extracted. Thus, our bag-of-words method is essentially a spatial-bag-of-words method. As previous experiments have discovered [16], using only one codebook word to describe a local patch does not capture the slight differences between a word and the actual feature. Therefore, we have adopted a sparse learning framework in order to take advantage of multiple codebook words that show varying levels of similarity to a single feature, leading to a “visual sentence” representation of the image patch.

We have tested our methods on BDGP images from the FlyExpress database ( Annotation results from our study show that the spatial-bag-of-words approach consistently outperforms the non-spatial, bag-of-words approach as well as the binary feature vector approach. Results also show that incorporating the sparse learning framework into our representation model further improves performance. While for the image retrieval task, experiments show that utilizing the sparse representation alone is sufficient.


In this section, we describe the bag-of-words (BoW) and the sparse learning representations for gene expression pattern image annotation and retrieval.

The bag-of-words approach

The bag-of-words method was originally used for text classification problems where each document is represented as a feature vector indicating the frequency of each word in the document. Such feature vector representation is used to classify documents into one or more categories. This text categorization approach has been adapted to image analysis [23]. Specifically, images are represented as a collection of “visual words”, based on features extracted from the images [24].

In the BoW approach for image representation, invariant visual features are usually extracted from a subset of images [24] to produce a visual codebook using a clustering algorithm, though a recent study shows that the clustering process is not really essential [25]. Here the cluster centers are considered to be visual words. From this codebook, each feature from an image patch is quantized to the closest visual word in the codebook. A histogram is then created to represent the number of occurrences of each word located in an image. This histogram is a global representation because it only tracks the number of occurrences of each word in an image but not the location of those words, thereby the spatial layout of local image features is not captured. This is considered as one of the major drawbacks of the BoW model [19]. Next, we discuss each step involved in the BoW model when applied to fruit fly images in details.

Feature detection

Feature detection involves locating regions in an image to serve as representative boundaries for visual words. We are using images that have been properly scaled and aligned semi-automatically. We use a series of overlapping circles to represent areas where feature information is extracted to construct a single visual word. An example of these overlapping circles is shown in Figure 2. In our experiments, the radius of the patches are set to 16.

Figure 2
figure 2

Illustration of image patch extraction and the three levels of bag-of-words partitioning with weighting factors for the spatial pyramid approach. After feature description using the overlapping circular patches, three levels of bag-of-words partitioning are shown. The top level of partitioning is just a global bag-of-words representation.

Feature description

Based on the regions described above, a local feature is extracted from each of the overlapping circle. Because of its robustness against variations in image scale and rotation, we use the scale-invariant feature transform (SIFT) descriptor [26] for representing each local patch. Thus, each image consists of a collection of feature vectors.

Codebook generation

The codebook is constructed by obtaining a collection of representative vectors from the extracted features. We use the common generation approach of selecting a subset of images and then using the k-means algorithm to cluster their SIFT feature vectors [27]. The number of cluster centers which represent the visual words can be set manually. For our image annotation and retrieval problem, we have set this number to 2000. The SIFT feature vectors can then be quantized to the closest codebook centers in order to form a visual word representation for each image.

Once the codebook has been created, we can assign codebook words to features extracted from image patches. Formally, assume the number of patches (feature vectors) for a given image is I and the size of the codebook is J. Define e ij =1 if the ith feature vector is assigned to the jthcodeword, and 0 otherwise. Then the given image can be represented as H= h 1 , h 2 , , h J where

h j = i = 1 I e ij .

The spatial bag-of-words approach

A major limitation of the BoW approach is that the spatial information of local image features is not encoded, as the bag-of-words representation is an un-ordered collection of visual words. A previous study on a bag-of-words approach [19] for automated annotation of Drosophila embryo image groups showed encouraging results, and a recent study [21] showed that using spatial information together with visual information is better than using only visual information. We expect the performance can be further improved by taking advantage of the spatial information, i.e., the location where visual words are found within images. Intuitively, the additional spatial information of visual words within images may facilitate the classification of images when the discriminant features are restricted to a certain region, which is the case for our CV terms. This can be implemented by adopting a method similar to the spatial pyramid matching scheme [28].

Our approach for image representation is based on an implementation of the spatial bag-of-words method. Like the BoW method, the spatial BoW method creates a histogram for each image, counting the number of times each word appears in an image. Additionally, the spatial BoW tracks the position where each visual word is located. Therefore, the spatial BoW method benefits from the robustness of the BoW method while also taking advantage of the spatial properties of images.

A spatial bag-of-words is much like a normal bag-of-words except that it is represented by a larger feature vector. While a histogram of an image is represented by a non-spatial bag-of-words, H, a spatial bag-of-words consists of multiple non-spatial bags, concatenated. Specifically, for each image with n spatial sections, a spatial bag M n can be represented as M n = H 1 , H 2 , , H n , where each H i corresponds to a non-spatial bag-of-words for a particular spatial section. Thus we have n bags-of-words from n spatial sections on each image that are concatenated to form M n . This way, different sections of a spatial vector represent different sections of an image. Our automated annotation representation is created by partitioning feature patches into 3 by 6 sections on each image. This representation creates a multiple of 18 in added dimensionality to a non-spatial representation of the same visual words. For each image group in the study we also create a global bag-of-words representation to test the differences in annotation performance that are seen between the global and the spatial approaches. Figure 2 shows a global bag-of-words representation, a 2 by 2 spatial BoW representation, and a 4 by 4 spatial BoW representation below the circular feature representations of two separate images.

The sparse spatial representation

The original BoW representation, as applied to image analysis, assigns each feature vector to the closest visual word in the dictionary. Denote the feature vector obtained for a given patch as yRd and the dictionary matrix as DRd×c, in which each column is a centroid (visual word). Then, the assignment of an image patch to a visual word can be written formally as the following optimization problem:

min e 1 2 De y 2 2 s.t. e i { 0 , 1 } , i = 1 c e i = 1

Clearly, the constraints enforce that only one element in the solution e will be set to one, which corresponds to the visual word most similar to the image patch y. In this case, relationships between a feature vector and other visual words are discarded. This would not be a problem if a feature vector is an exact match with the visual word that it is assigned to, as in the case of text classification. However for images, a feature vector may be close to multiple visual words. In such cases, the relationship with the closest word would be overestimated and the relationships with the other similar words would be lost, leading to degenerated representation accuracy.

The sparse approach for BoW representation addresses this problem by assigning feature vectors to multiple visual words simultaneously. We seek to represent the local patch using “visual sentence” with a set of “words” instead of a single one. Besides the selection of visual words to form this sentence, we also need to evaluate the “contributions”. A commonly used approach is to formulate this problem as a sparse learning problem, which can be solved by state-of-the-art algorithms.

Mathematically, the generalization from visual word to visual sentence can be done by relaxing the constraint in (2). We construct the representation vector xRc, such that for the ithentry, i=1,…,c, x i = w i when the ith keyword is selected with contribution w i , and 0 when the keyword is not selected.

In order to make x sparse (contains multiple 0 entries), an 1regularization is imposed, resulting in the following optimization problem:

min x Dx y 2 + λ | x | 1 s.t. x i 0 , i = 1 , , c

In which |·|1 is the 1 norm and λ is a parameter that controls the sparsity. In our experiments, λ is fixed to be 0.01. This problem is closely related to LASSO[29], and can be solved by many existing software packages, such as SLEP [30].

The comparison between “visual word” and “visual sentence” for image representation is illustrated in Figure 3. As shown in the figure, the sparse learning provides more smooth representation.

Figure 3
figure 3

Different histogram representation obtained for a given image. The histogram on the left is obtained by assigning each local patch to a single visual word, while the one on the right is obtained by applying the sparse learning formula to select a set of visual words for each patch.

Integrating the spatial and sparse approaches into the BoW representation model is therefore expected to produce a more accurate description of Drosophila images. We have created both sparse and non-sparse versions of both our global and spatial bag-of-words representations, and compare different combinations of approaches for image annotation and retrieval. Detailed performance evaluation can be found in the results section.

Results and discussion

Data description

The Drosophila gene expression pattern images used in our study are obtained from the FlyExpress database, which contains standardized images obtained from the Berkeley Drosophila Genome Project (BDGP). In BDGP, the Drosophila embryogenesis is divided into six stage ranges (1-3, 4-6, 7-8, 9-10, 11-12, 13-16). The first stage range is not included in this study because of the small number of CV terms used to describe its images. Images from the remaining stage ranges are annotated separately in their respective groups because the majority of terms are stage range specific. The second through sixth stage ranges consist of 1081, 877, 1072, 2113, and 2816 image groups, respectively. The last two stage ranges contain the largest number of lateral images as well as the highest counts of CV terms.

Evaluation of annotation performance

We employ the one-against-rest support vector machines (SVM) to annotate the gene expression pattern images, where the SVM builds a decision boundary between image groups that contain a particular term and the remaining image groups. We employ the LIBSVM package [31] and the linear kernel is used. The regularization parameter is set to 1 in all cases. Our proposed method combines both the spatial and sparse approaches and is denoted by SVMSpatial+Sparse. We compare our method with those that utilize only sparse, only spatial, or global bag-of-words approaches. These approaches are denoted by SVMSparse, SVMSpatial, and SVMGlobal, respectively. The performance comparison of the four representations in terms of AUC and macro F1 scores is summarized in Tables 1 and 2, respectively.

Table 1 Comparison of different annotation methods in terms of AUC
Table 2 Comparison of different annotation methods in terms of macro F1

Since most CV terms are stage-range specific, we annotate the image groups according to their stage ranges separately. The numbers and proportions of positive samples for the 10 most frequent term in each stage range are summarized in Table 3. For each stage range, we begin with the 10 terms that appear most frequently, and then we add additional terms in the order of their frequencies with a step size of 10. This results in different numbers of data sets in each stage range, depending on the total number of CV terms in that stage range. The extracted data sets are randomly partitioned into disjoint training and testing sets using the ratio 1:1 for each term. For each data set, we generate 30 random partitions and the average performance is reported. Because our method models each individual term separately, we can compare the results of our method against the results of the other method on a term-by-term basis. For example, we can compare annotation results of our method with the non-spatial method in stage range 13-16, term by term, where 40 CV terms are used. In this comparison, of the 40 terms being studied, 39 saw an average increased AUC performance and 31 saw average increased F1 Score (F1) performance. Due to space limitation, we will not show each individual term by term comparison. Instead, we show the results for each stage range where various numbers of CV terms are used.

Table 3 Number and proportion of postive samples for 10 most frequent terms in each stage range

Table 1 shows a comparison of AUC results for all four methods discussed. The best results for each case are highlighted in bold. The results show that both the spatial and the sparse methods consistently outperform the non-spatial method in terms of average AUC. The results also show that combining both sparse and spatial approaches outperforms any of the other three methods. The results indicate that the sparse approach offers improved performance over the spatial approach for the earlier stage ranges, and that the two approaches are comparable for the last stage range. The poorer performance of the spatial approach for the earlier stages may have been due to the less developed embryonic structures found earlier in embryogenesis. Combining the spatial and sparse approaches resulted in the best results, particularly in the later stage ranges.

Table 2 shows a similar type of comparison as in Table 1. The only difference is that F1 score is used as a comparison measure instead of AUC. We observe a similar trend: both the spatial and sparse methods outperform the global approach; the sparse approach performs slightly better than the spatial approach in the earlier stages, and they achieve similar performance during the last stage. Again, we can observe that combining the sparse and spatial approaches generates better results than using sparse or spatial information alone.

We have observed that there were significant differences in performance increases between earlier stage ranges where Drosophila embryos were less developed and later stage ranges where embryos were more developed. We also observe that there are certain terms that benefit far greater from a spatial bag-of-words approach than other terms. For example, mesectoderm anlage in statu nascendi, central brain anlage, crystal cell specific anlage, hypopharynx primordium P2, procrystal cell, and crystal cell are all stage dependent terms that showed the most dramatic increases in annotation performance. These increases in performance were consistent across multiple stage range tests, where the number of terms being annotated varied. There are also a number of terms such as pole cell, mesectoderm primordium, foregut primordium, germ cell, embryonic central brain neuron, embryonic central brain glia, and lateral cord glia that showed good performance across multiple stage ranges, where various numbers of CV terms were annotated. We included detailed performance evaluation of individual terms in 6 different data sets in Figures 4 and 5.

Figure 4
figure 4

The AUC of individual terms on three data sets from stage range 11-12. The three figures, from top to down, show the performance with 30, 40, and 50 terms, respectively.

Figure 5
figure 5

The AUC of individual terms on three data sets from stage range 13-16. The three figures, from top to down, show the performance with 40, 50, and 60 terms, respectively.

There are pioneering works on constructing feature representations for Drosophlia gene expression image annotation. Zhou et al. [32] applied multi-resolution 2D wavelet discrete transform followed by min-Redundancy max-Relevance feature selection. Puniyani et al. [12] proposed an automatic system named “SPEX2” that performs pattern extraction using Markov random field and further extracts features using the SIFT descriptor and singular value decomposition. Using the top 10 most frequent terms [12] in the BDGP data set, Zhou’s system achieved an average F1 score of about 0.35, while Puniyani’s method achieved about 0.45. For comparison purposes, we extract the individual F1-scores for the same terms. Our Sparse + Spatial representation yields an average F1-score of 0.64, which outperforms both methods.

Comparison of different classifiers

Since the main focus of this section is to demonstrate the performance of various image representations, we fix our classifier to be SVM with linear kernel for its effectiveness in high-dimensional data. However, it will also be interesting to investigate how different classifiers perform in this task. As an illustrative example, we use stage range 11-12 with sparse representation and test the classification performance of three different classifiers including SVM, logistic regression and ridge regression. The performance in terms of sensitivity and specificity is reported in Table 4. For all three methods, we apply 4-fold cross validation for parameter selection. As we can see in Table 4, the three classifiers achieve comparable overall performance, and SVM achieves slightly higher sensitivity.

Table 4 Performance evaluation in terms of sensitivity and specificity

Performance of over-sampling

As we can see in Tables 2 and 4, when the number of labels is large, the average sensitivity as well as F1 score is quite low. This is due to the dramatic lack of positive samples for some labels. For example, in stage range 11-12, when we use 50 labels, the proportion of positive samples in these 50 labels can be as low as 0.8%. In this subsection, we present some preliminary results on tackling this problem with over-sampling.

The over-sampling method works as follows. Before training a classifier for a particular label, we first do random sampling on the positive samples with replacement, so that the number of positive samples is equal to the negative ones. Then, we train the classifier using the balanced samples. We test this method using the same setting as in the previous subsection, and the classification performance is presented in Table 5. As we can see in Tables 4 and 5, the over-sampling method provides promising improvements in this example, especially when the number of labels is large. For example, when using the logistic regression on annotating 50 labels, the over-sampling improves sensitivity from 0.2 to 0.36. Exploring methods such as over-sampling to further improve the classification performance will be an interesting future direction.

Table 5 Performance evaluation of the over-sampling method in terms of sensitivity and specificity

Evaluation of retrieval performance

Based on the proposed image representations, we obtain the pair-wise similarity for every two images in the database, which can be used for image retrieval. In our study, the representative images for different views and stage ranges from the well-known Interactive Fly websitea are used as queries. Then, for a given method and a query image, we select 8 images with the highest similarity values to obtain a set of query results. Note that the query images are removed from the results since they are always the one with highest similarity. Sample query results from different views and stage ranges are presented in Figures 6, 7, 8, 9 and 10.

Figure 6
figure 6

Retrieval results for query image ID insitu21869 with the dorsal view in stage range 4-6.

Figure 7
figure 7

Retrieval results for query image ID insitu22067 with the lateral view in stage range 7-8.

Figure 8
figure 8

Retrieval results for query image ID insitu16633 with the lateral view in stage range 9-10.

Figure 9
figure 9

Retrieval results for query image ID insitu21912 with the lateral view in stage range 11-12.

Figure 10
figure 10

Retrieval results for query image ID insitu23837 with the lateral view in stage range 13-16.

First, we will compare different methods by visually inspecting the images retrieved for each query. The first conclusion we can draw from the figures is that the methods based on the bag-of-words (the first three columns) generally outperform the one that utilizes the binary representation only. For example, for the stripe patterns such as those in Figures 6 and 8, the BFV method retrieves less than 4 similar images in its top 8 matches, and in Figure 6, even the best match looks quite different from the query image. Also, we can observe that among the three proposed methods, the sparse representations generally yield more satisfactory results, particularly when the layout of the pattern is subtle, such as the ones in Figure 7.

We also give brief interpretations of the retrieved images by analyzing the functions of the corresponding genes in the biological process annotated in the gene ontologyb. Figure 6 shows a stripe pattern expressed by gene odd, obtained from the dorsal view, in stage range 4-6. odd is in charge of the periodic partitioning. The retrieved genes prd and slp1 are about periodic partitioning and blastoderm segmentation, respectively. Both of them are closely related to the query gene. We also observe that several other retrieved genes, such as comm, comm2, run, trn and Alhambra, are not directly related to the segmentation process. However, they are all involved in the development of the nerve system. It will be interesting to examine how these two functions are related.

Figure 8 shows a pattern expressed by gene slp1, during stage range 9-10. As we can see, all of the three “visual sentence” based approaches retrieved 6 images with slp1 expressed. The rest of the genes retrieved, such as slp2 which is involved in periodic partitioning, and en which is associated with the head segmentation process, are all closely connected to the blastoderm segmentation controlled by slp1.

Figure 9 is taken from the lateral view, during stage range 11-12. The corresponding gene pdm2 is linked to the nervous system development. We can observe that our proposed method with the “visual sentence” concept returns 2 images with the same gene as the top query results. The gene nub takes part in the fate determination of ganglion mother cell, neuroblast. beat-IIIc and wg are related to the formation of synapse and endoderm, respectively.

Figure 10 illustrates a pattern expressed by gene Gasp, during stage range 13-16, taken from the lateral view. The spatial and sparse representation retrieves 4 images with the same gene, compared to 2 images by spatial BoW and 1 image obtained by BFV. Gasp as well as CG13676 is involved in the chitin metabolic process. Another gene, Idgf2, which is related to the chitin catabolic process, is also closely related. The trh gene, which affects the epithelial cell fate determination and open tracheal system, is also related because chitin regulates epithelial tube morphogenesis; in addition to its classical role, protecting mature epithelia.


This article presents computational methods for annotating Drosophila gene expression pattern images, and identifying similar images based on gene patterning. In both tasks, images are represented as bags-of-words. The size of the bags is determined by the spatial properties of a representation. For both applications, a sparse learning framework was used. Results on the FlyExpress database indicate that the proposed annotation method outperforms the non-sparse, non-spatial bag-of-words method, as well as approaches that would use either a sparse or spatial framework.

In our study, the bag-of-words representations were created by partitioning image features with local feature patches. Terms that saw the greatest increases in annotation accuracy may only reside in specific regions of Drosophila embryos during a given stage of development. one promising direction is to create local bag-of-words from these regions in order to eliminate some of the noise created by other unrelated regions, when searching for specific embryonic structures. This technique is commonly referred to as region of interest (ROI). We plan to explore this in the future.


a b


  1. Tomancak P, Beaton A, Weiszmann R, Kwan E, Shu S, Lewis SE, Richards S, Ashburner M, Hartenstein V, Celniker SE, Rubin GM: Systematic determination of patterns of gene expression during Drosophila embryogenesis. Genome Biology 2002, 3(12):0088.1–0088.14.

    Article  Google Scholar 

  2. Tomancak P, Berman B, Beaton A, Weiszmann R, Kwan E, Hartenstein V, Celniker S, Rubin G: Global analysis of patterns of gene expression during Drosophila embryogenesis. Genome Biology 2007, 8(7):R145. 10.1186/gb-2007-8-7-r145

    Article  PubMed Central  PubMed  Google Scholar 

  3. Grumbling G, Strelets V, The FlyBase Consortium: FlyBase: anatomical data, images and queries. Nucleic Acids Research 2006, 34: D484-D488. 10.1093/nar/gkj068

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  4. Fowlkes CC, Luengo Hendriks CL, Keränen SV, Weber GH, Rübel O, Huang MY, Chatoor S, DePace AH, Simirenko L, Henriquez C, Beaton A, Weiszmann R, Celniker S, Hamann B, Knowles DW, Biggin MD, Eisen MB, Malik J: A Quantitative Spatiotemporal Atlas of Gene Expression in the Drosophila Blastoderm. Cell 2008, 133(2):364–374. 10.1016/j.cell.2008.01.053

    Article  CAS  PubMed  Google Scholar 

  5. Lécuyer E, Yoshida H, Parthasarathy N, Alm C, Babak T, Cerovina T, Hughes TR, Tomancak P, Krause HM: Global Analysis of mRNA Localization Reveals a Prominent Role in Organizing Cellular Architecture and Function. Cell 2007, 131: 174–187. 10.1016/j.cell.2007.08.003

    Article  PubMed  Google Scholar 

  6. Samsonova AA, Niranjan M, Russell S, Brazma A: Prediction of Gene Expression in Embryonic Structures of Drosophila melanogaster. PLoS Comput Biol 2007, 3(7):e144. [] [] 10.1371/journal.pcbi.0030144

    Article  PubMed Central  PubMed  Google Scholar 

  7. Luengo Hendriks C, Keranen S, Fowlkes C, Simirenko L, Weber G, DePace A, Henriquez C, Kaszuba D, Hamann B, Eisen M, Malik J, Sudar D, Biggin M, Knowles D: Three-dimensional morphology and gene expression in the Drosophila blastoderm at cellular resolution, I: data acquisition pipeline. Genome Biology 2006, 7(12):R123. 10.1186/gb-2006-7-12-r123

    Article  PubMed Central  PubMed  Google Scholar 

  8. Keranen S, Fowlkes C, Luengo, Hendriks C, Sudar D, Knowles D, Malik J, Biggin M: Three-dimensional morphology and gene expression in the Drosophila blastoderm at cellular resolution, II: dynamics. Genome Biology 2006, 7(12):R124. 10.1186/gb-2006-7-12-r124

    Article  PubMed Central  PubMed  Google Scholar 

  9. Weber GH, Rubel O, Huang MY, DePace AH, Fowlkes CC, Keranen SVE, Luengo Hendriks CL, Hagen H, Knowles DW, Malik J, Biggin MD, Hamann B: Visual Exploration of Three-dimensional Gene Expression Using Physical Views and Linked Abstract Views. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2008, 99: 296–309.

    Google Scholar 

  10. Frise E, Hammonds AS, Celniker SE: Systematic image-driven analysis of the spatial Drosophila embryonic expression landscape. Molecular Systems Biology 2010, 6: 345.

    Article  PubMed Central  PubMed  Google Scholar 

  11. Mace DL, Varnado N, Zhang W, Frise E, Ohler U: Extraction and comparison of gene expression patterns from 2D RNA in situ hybridization images. Bioinformatics 2010, 26(6):761–769. 10.1093/bioinformatics/btp658

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  12. Puniyani K, Faloutsos C, Xing EP: SPEX2: automated concise extraction of spatial gene expression patterns from Fly embryo ISH images. Bioinformatics 2010, 26(12):i47-i56. [] [] 10.1093/bioinformatics/btq172

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  13. Kumar S, Jayaraman K, Panchanathan S, Gurunathan R, Marti-Subirana A, Newfeld SJ: BEST: A Novel Computational Approach for Comparing Gene Expression Patterns From Early Stages of Drosophila melanogaster Development. Genetics 2002, 162(4):2037–2047. [] []

    PubMed Central  CAS  PubMed  Google Scholar 

  14. Gurunathan R, Emden BV, Panchanathan S, Kumar S: Identifying spatially similar gene expression patterns in early stage fruit fly embryo images: binary feature versus invariant moment digital representations. BMC Bioinformatics 2004, 5(202):13.

    Google Scholar 

  15. Kumar S, Konikoff C, Van Emden B, Busick C, Davis KT, Ji S, Wu L-W, Ramos H, Brody T, Panchanathan S, Ye J, Karr TL, Gerold K, McCutchan M, Newfeld SJ: FlyExpress: Visual mining of spatiotemporal patterns for genes and publications in Drosophila embryogenesis. Bioinformatics 2011, 27(23):3319–3320. [] [] 10.1093/bioinformatics/btr567

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  16. Ji S, Sun L, Jin R, Kumar S, Ye J: Automated annotation of Drosophila gene expression patterns using a controlled vocabulary. Bioinformatics 2008, 24(17):1881–1888. 10.1093/bioinformatics/btn347

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  17. Lécuyer E, Tomancak P: Mapping the gene expression universe. Current Opinion in Genetics & Development 2008, 18(6):506–512. 10.1016/j.gde.2008.08.003

    Article  Google Scholar 

  18. Ye J, Chen J, Janardan R, Kumar S: Developmental stage annotation of Drosophila gene expression pattern images via an entire solution path for LDA. ACM Transactions Knowledge Discovery from Data 2008, 2: 1–21.

    Article  Google Scholar 

  19. Ji S, Li YX, Zhou ZH, Kumar S, Ye J: A Bag-of-Words Approach for Drosophila Gene Expression Pattern Annotation. BMC Bioinformatics 2009, 10: 119. 10.1186/1471-2105-10-119

    Article  PubMed Central  PubMed  Google Scholar 

  20. Ji S, Yuan L, Li YX, Zhou ZH, Kumar S, Ye J: Drosophila Gene Expression Pattern Annotation Using Sparse Features and Term-term Interactions. Proceedings of the Fifteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2009, 407–416.

    Google Scholar 

  21. Li YX, Ji S, Kumar S, Ye J, Zhou ZH: Drosophila Gene Expression Pattern Annotation through Multi-instance Multi-label Learning. Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence 2009, 1445–1450.

    Google Scholar 

  22. Ji S, Sun L, Jin R, Ye J: Multi-label Multiple Kernel Learning. In Advances in Neural Information Processing Systems 21 Edited by: Koller D, Schuurmans D, Bengio Y, Bottou L. 2009, 777–784.

    Google Scholar 

  23. Sivic J, Zisserman A: Efficient Visual Search of Videos Cast as Text Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 2009, 31: 591–606.

    Article  PubMed  Google Scholar 

  24. Mikolajczyk K, Schmid C: A Performance Evaluation of Local Descriptors. IEEE Trans Pattern Anal Mach Intell 2005, 27(10):1615–1630.

    Article  PubMed  Google Scholar 

  25. Zhang Y, Jin R, Zhou ZH: Understanding bag-of-words model: a statistical framework. International Journal of Machine Learning and Cybernetics 2010, 1: 43–52. [10.1007/s13042–010–0001–0] [–010–0001–0] [10.1007/s13042-010-0001-0][] 10.1007/s13042-010-0001-0

    Article  Google Scholar 

  26. Lowe DG: Distinctive Image Features from Scale-Invariant Keypoints. Int J Comput Vision 2004, 60(2):91–110.

    Article  Google Scholar 

  27. Moosmann F, Nowak E, Jurie F: Randomized Clustering Forests for Image Classification. IEEE Trans Pattern Anal Mach Intell 2008, 30(9):1632–1646.

    Article  PubMed  Google Scholar 

  28. Lazebnik S, Schmid C, Ponce J: Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. USA: IEEE Computer Society, Washington, D C; 2006:2169–2178.

    Google Scholar 

  29. Tibshirani R: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B 1996, 58: 267–288.

    Google Scholar 

  30. Liu J, Ji S, Ye J: SLEP: Sparse Learning with Efficient Projections. Arizona State University; 2009. [] []

    Google Scholar 

  31. Chang CC, Lin CJ: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology , 2: 27:1–27:27. [] []

  32. Zhou J, Peng H: Automatic recognition and annotation of gene expression patterns of fly embryos. Bioinformatics 2007, 23(5):589–596. 10.1093/bioinformatics/btl680

    Article  CAS  PubMed  Google Scholar 

Download references


We thank Bernard Van Emden and Michael McCutchan for help with access to the gene expression data. This work is supported in part by the National Institutes of Health grants (LM010730, HG002516), the National Science Foundation grants (IIS-0953662, DBI-1147134), and the National Science Foundation of China grants (60975043, 2010CB327903).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jieping Ye.

Additional information

Authors’ contributions

All authors analyzed the results and wrote the manuscript. SJ and JY conceived the project and designed the methodology. AW and LY implemented the programs and drafted the manuscript. SJ, YJ, Z, SK, and JY supervised the project and guided the implementation. All authors have read and approved the final manuscript.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Yuan, L., Woodard, A., Ji, S. et al. Learning Sparse Representations for Fruit-Fly Gene Expression Pattern Image Annotation and Retrieval. BMC Bioinformatics 13, 107 (2012).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: