A growing number of leading institutions now routinely utilize digital imaging technologies to support investigative research and routine diagnostic procedures. The exponential rate at which images and videos are being generated has resulted in a significant need for efficient content-based image retrieval (CBIR) methods, which allow one to quickly characterize and locate images in large collections based upon the features of a given query image. CBIR has been one of the most active research areas in a wide spectrum of imaging informatics fields over the past few decades
[1–13]. Several domains stand to benefit from the use of CBIR including cinematography, education, investigative basic and clinical research, and the practice of medicine. CBIR has been successfully utilized in applications spanning radiology
[4, 11, 14, 15], pathology
[9, 16–18], dermatology
[19, 20], and cytology
There have been several successful CBIR systems that have been developed for medical applications since the 1980’s. Several approaches utilize simple features such as color histograms
[4, 22], texture
[6, 25], or fuzzy features
 to characterize the content of images while allowing higher level diagnostic abstractions based on systematic queries
[4, 25–27]. The recent adoption and popularity of case-based reasoning
 and evidence-based medicine
 has created a compelling need for more reliable image retrieval strategies to support diagnostic decisions. In fact, a number of state-of-the-art CBIR systems
[4, 9, 11–13, 15, 16, 25, 30–32] have been designed to support the processing of queries across imaging modalities.
With the advent of whole-slide imaging technology, the size and scale of image-based data has grown tremendously, making it impractical to perform matching operations across an entire image dataset using traditional methods. To meet this challenge, a new family of strategies are being developed, which enable investigators to perform sub-region searching to automatically identify image patches that exhibit patterns that are consistent with a given query patch. In practice, this approach makes it possible to select a region or object of interest within a digitized specimen as a query while the algorithm systematically identifies regions exhibiting similar characteristics in either the same specimen or across disparate specimens. The results can then be used to draw comparisons among patient samples in order to make informed decisions regarding likely prognoses and most appropriate treatment regimens.
To perform a region-of-interest (ROI) query, Vu et al.
 presented a Sam Match framework-based similarity model. The use of a part-based approach was later reported in
 to solve the CBIR problem by synthesizing a DoG detector, and a local hashing table search algorithm. The primary limitation of this approach, however, was the time cost of the large number of features that need to be computed. Intra-expansion and inter-expansion strategies were later developed to boost the hash-based search quality based on a bag-of-features model which could more accurately represent the images. Recently, a structured visual search method was developed to perform CBIR in medical image datasets
. The primary advantage of this framework is that it is flexible and can be quickly extended to other modalities.
Most CBIR algorithms rely on content localization, feature extraction, and user feedback steps
[5–7, 25, 27, 36–40]. The retrieved results are then ranked by some criteria, such as appearance similarity or diagnostic relevance, which can also serve as a measure of the practical usability of the algorithm. Typically the retrieved images only include those cases with the most similar appearance to a given query image whereas introducing relevance feedback
[41–47] to CBIR provides a practical means for addressing the semantic gap between visual and semantic similarity.
Large-scale image retrieval applications are generally computationally expensive. In this paper, we present the use of the CometCloud
[48, 49] to execute CBIR in a parallel fashion on multiple high performance computing (HPC) and cloud resources as a means for reducing computational time significantly. CometCloud is an autonomic cloud framework that allows dynamic, on-demand federation of distributed infrastructures. It also provides an effective programming platform that supports MapReduce, Workflow, and Master-Worker/BOT models making it possible for investigators to quickly develop applications that can run across the federated resources
[49–53]. The algorithm that our team developed exploits the parallelism of CBIR by combining the HPC assets at Rutgers University with external cloud resources. Moreover, our solution uses cloud abstractions to federate resources elastically to achieve acceleration, while hiding infrastructure and deployment details. In this way, the CBIR algorithm can be made available as accessible services to end users.
The contributions of this paper are: 1) a novel CBIR algorithm based on a newly developed coarse-to-fine searching criteria which is coupled with a novel feature called hierarchical annular histogram (HAH); 2) a CBIR refinement schema based on dual-similarity relevance feedback; and 3) a reliable parallel implementation of the CBIR algorithm based on Cloud computing.