 Methodology article
 Open access
 Published:
ATMAD: robust image analysis for Automatic Tissue MicroArray Dearraying
BMC Bioinformatics volume 19, Article number: 148 (2018)
Abstract
Background
Over the last two decades, an innovative technology called Tissue Microarray (TMA), which combines multitissue and DNA microarray concepts, has been widely used in the field of histology. It consists of a collection of several (up to 1000 or more) tissue samples that are assembled onto a single support – typically a glass slide – according to a design grid (array) layout, in order to allow multiplex analysis by treating numerous samples under identical and standardized conditions. However, during the TMA manufacturing process, the sample positions can be highly distorted from the design grid due to the imprecision when assembling tissue samples and the deformation of the embedding waxes. Consequently, these distortions may lead to severe errors of (histological) assay results when the sample identities are mismatched between the design and its manufactured output. The development of a robust method for dearraying TMA, which localizes and matches TMA samples with their design grid, is therefore crucial to overcome the bottleneck of this prominent technology.
Results
In this paper, we propose an Automatic, fast and robust TMA Dearraying (ATMAD) approach dedicated to images acquired with brightfield and fluorescence microscopes (or scanners). First, tissue samples are localized in the large image by applying a locally adaptive thresholding on the isotropic wavelet transform of the input TMA image. To reduce false detections, a parametric shape model is considered for segmenting ellipseshaped objects at each detected position. Segmented objects that do not meet the size and the roundness criteria are discarded from the list of tissue samples before being matched with the design grid. Sample matching is performed by estimating the TMA grid deformation under the thinplate model. Finally, thanks to the estimated deformation, the true tissue samples that were preliminary rejected in the early image processing step are recognized by running a second segmentation step.
Conclusions
We developed a novel dearraying approach for TMA analysis. By combining waveletbased detection, active contour segmentation, and thinplate spline interpolation, our approach is able to handle TMA images with high dynamic, poor signaltonoise ratio, complex background and nonlinear deformation of TMA grid. In addition, the deformation estimation produces quantitative information to asset the manufacturing quality of TMAs.
Background
Tissue MicroArrays (TMA) history
The development of multitissue techniques was started at the mid1980s in order to address the scarcity issue of diagnostic reagents and tissue samples. The pioneer work was contributed by Dr Battifora who introduced, in 1986, the multitumor “sausage” tissue block [1]. In this method, several rods of tissue, which were extracted from paraffinembedded tissue blocks (or shortened as paraffin blocks), deparaffinized and rehydrated, were put together and reparaffinized after being tightly wrapped in small intestine of small mammals like a sausage. To avoid deparaffinization and reparaffinization procedures of Battifora’s “sausage” technique, in 1987, Wan et al. conceived the punching technique [2] which used 16gauge needle for retrieving cylinders of tissue (also tissue cores) from paraffin blocks and arraying them in a recognizable pattern. Although Wan’s punching technique was a big footstep and was used in nearly all of today TMA techniques, its tissue pattern was not a grid one which is more structured and facilitates the identification of each tissue sample. The first multitissue grid pattern is described by Battifora and Mehta in their 1990’s paper under the name of “checkerboard tissue block” [3] in which tissue rods were manually aligned in a Cartesian coordinate system (checkerboard pattern). By combining the punching technique of Wan and the “checkerboard” concept of Battifora and Mehta, Kononen et al. invented in 1998 a machine for assembling efficiently and accurately extracted tissue cores in grid pattern [4]. The proposed technique called “tissue microarray” (TMA) became therefore popular and widely used in most pathological laboratories. In the last decade, different TMA techniques were developed to improve manufacturing process and minimize manufacturing cost [5–15], but all of them were based on Battifora’s, Wan’s and Kononen’s previous works. Since in most TMA techniques, extracted tissue samples have cylinder form, in the following, we use the terms “tissue cores” or “TMA cores” (or even more shorter cores) to refer TMA tissue samples.
Challenges of TMA dearraying
In a TMA, assembled tissue cores are collected from different donor blocks. It is thus highly important to matching them with their proper metadata for further clinical or pathological analysis. To this end, grid pattern was conceived to ease the localization of each TMA cores. However, in spite of numerous technique improvements [12, 16], TMAs manufactured recently by manual or automated (semiautomated) machine are still subjected to the deformation of the design tissue grid due to bad positioning of the tissue cores with respect to the design. Another main source of deformation is the heat deformation of the paraffin waxes – commonly used in TMA techniques – when embedding tissue cores into recipient block. Sectioning paraffinembedded tissue blocks with a microtome to produce multiple slides may also produce additional deformation. In fact, the design grid may suffer geometric transformations such as translation, rotation and shearing (linear or affine deformations) combined with dilatation, distortion and random perturbations (nonlinear deformations). In addition, some fragile tissue cores may be lost or split into several fragmented parts, making more difficulties to recognize them. Figure 1 illustrates a typical image of TMA imaged in fluorescence. We can clearly observe that the ideal TMA grid which is a square grid is significantly distorted after the manufacturing process and the present tissue cores do not have a perfectly circular shape as expected. These problems need to be taken into account to develop robust dearraying methods.
Stateofart of TMA dearraying methods
Closely similar to TMAs, DNA microarrays (also known as biochips) are constructed by spotting DNA probes by robots with high precision according to a grid pattern. Numerous gridding methods for microarrays were used to localize each DNA probes and find its row and column coordinates with respect to the design grid. This procedure is called “dearraying”. Despite the similitude of these microarray concepts, existing “dearraying” methods for microarrays are not adapted for TMAs because the grids are more highly deformed. Along with the commercialization of digital imaging devices for TMA analysis over the last decade, several methods for TMA “dearraying” have been developed [17–22]. In general terms, a “dearraying” approach consists in two steps: (i) segmentation and localization of assembled tissue cores; (ii) array coordinate (row and column coordinates) estimation of each core.
Firstly, for segmenting tissues, existing dearraying methods usually assume that the histogram of a TMA image is bimodal. Under this assumption, these methods perform in general a thresholding by taking the local minimum between two highest peaks corresponding to the background and the foreground, of the image intensity histogram as global threshold. Various thresholding techniques were proposed from a simple thresholding as in [17] to more sophisticated methods such as the momentpreserving thresholding in [19], the automatic thresholding based on SavitskyGolay filtered histogram in [20] or Otsu’s method used in [21, 22]. To improve the segmentation result, preprocessing like contrast enhancement transform [22] or template matching [19] was applied. Morphological operators were also used as postprocessing for removing outliers in the thresholded map as in [17, 22]. However, this underlying assumption is not satisfied in case of images acquired from novel fluorescence device because of their complex background. Due to the nature of fluorescence imaging, pixels corresponding to irrelevant objects – such as dusts, glue and washing stains – in the background have often high intensities resulting as a high peak in the intensity histogram; in contrast, the intensities of pixels corresponding to tissue cores could be relatively lower. Hence, as a consequence, most of cores fail to be detected with a high threshold and there is a number of outliers corresponding to a low threshold value.
Secondly, for estimating row and column coordinates of each TMA cores, the methods mentioned above were generally based on distance and angle criteria to define the average spacing between the cores and the orientation of the observed grid. These criteria were derived simply from the distance between neighbor tissue cores [19], or from sophisticated measures such as the histogram of distance and angle [17] or the coefficients of the Hough transform [18] or even the Delaunay triangulation [22]. To deal with the case of missing tissue cores or the design of TMA grid in which some positions are left empty [16], linear or local bilinear interpolation were used as in [17, 22] for completing the grid. Whereas these methods yield satisfactory results for further pathological analysis, they can not produce quantitative information about the deformation of the TMA grid which is an indicator for evaluating the quality of the manufactured input TMA. For that reason, we address this issue and develop a dearraying method which is able to provide quantitative information about the deformation. Our approach allows management of traceability and quality control of the whole TMA manufacturing process.
Overview of the method
In this paper, we propose a fast and efficient approach for automated TMA dearraying with the emphasis on fluorescence TMA images and modeling of TMA grid deformation. The proposed approach called ATMAD is based on the following image processing operations: core detection, core segmentation and estimation of the grid deformation. For the tissue localization step of the dearraying procedure, we combine the detection and segmentation tasks to produce reliable inputs for the second step – the computation of the array coordinate of each tissue core. This second step is performed by using the deformation estimation module followed by a segmentation task to refine the result. The outline of our approach is shown in Fig. 2 which describes the two steps of the dearraying procedure and the combination of the three image processing operations.
The “detection” operation (i.e. the detection) is based on a wavelet approach. In order to process images having large dynamic range, complex background and high noise level such as fluorescence images, we compute a stationary wavelet transform of the input TMA image at an appropriate scale to the tissue size – the average tissue core radius given by the manufacturer. By choosing the mother wavelet as a difference of Gaussians, we can deduce the closedform expression of the wavelet atom at any desired scale and use it to perform directly the wavelet decomposition. Our technique is faster and more accurate than the wellknown “à trous” algorithm [23]. The wavelet transform map is then locally thresholded to spatially adapt to the contrast between the foreground – corresponding to TMA cores – and the inhomogeneous background. The position of potential tissue cores is defined as the center of the connected components in the thresholded wavelet transform map.
To delineate the boundary of each tissue core and improve detection result, an ellipseshaped active contour [24] is used for segmenting the detected object at each position obtained from previous step. The segmented objects, which are too large or too small than the given average size of tissue sample or too elongated, will be considered as false detection and be discarded from the list of potential positions. This removal is essential to discard potential outliers and enhance the reliability of the input for the estimation of row and column coordinates of TMA cores.
Instead of estimating directly the row and column coordinates of each core from the position list, we approximate the deformation of the TMA grid using the thinplate model. In fact, the deformed grid is the image (in the sense of set theory) of the regular grid of design by the deformation. Given the deformation at some arbitrary points of the grid, the thinplate interpolation allows to estimate it at other points [25]. The more points we have known, the more precisely we estimate the deformation. Once the deformation is approximated, the computation of row and column coordinates of each tissue core is therefore straightforward. By reformulating as an approximation problem and solving it iteratively, our method is robust to high nonlinear deformations which were observed in most real TMA images. Moreover, according to the thinplate model, the approximation yields information such as the average translation, the rotation angle, the bending energies along the horizontal and vertical axes, etc. These information are useful to assess the quality of the manufactured TMAs.
The remainder of this paper is organized as follows. In the next section, we describe the dearraying approach including a technical presentation of the “detection, “segmentation”, “deformation estimation” tasks. We also figure out how the proposed approach is adapted for TMA images acquired with brightfield microscopes. In “Results and discussion” section, we present the experimental results obtained from simulated and real data. Finally, the last section gathers the conclusions drawn from this research and details the future work.
Methods
In our approach, the estimation of core positions on the input TMA image is subsequently refined in successive tasks by considering different image domains (i.e. patches or regions) in the input original image. Such a strategy allows not only to avoid unnecessary processing on noncontent regions but also to reduce the acquisition time, storage and processing time of high resolution data. To distinguish the inputs and outputs of each task and facilitate the comprehension of the technical details, we present in Fig. 3 a diagram which illustrates a few notations which will be used throughout the paper.
TMA core detection
The detection of approximately circular TMA cores can be performed by spot detection algorithms. Spot detection is a wellknown topic in image processing (see [26] for a recent review). Over past decades, number of spot detection methods have been proposed [27–31]. To produce satisfactory results, most of these methods require fine adjustment of a critical parameter: the detection scale corresponding to the size of the objects of interest. Automatic selection of the detection scale is a challenging problem since the objects of interest may have different sizes or they may have the same size as the irrelevant objects in the background. Few methods of automatic scale selection [32–34] have been proposed recently. However, in the context of tissue microarrays, the diameter of assembled TMA cores is defined by the size of the needle used for extracting cores from paraffin tissue blocks. The determination of the scale parameter used for spot detection is straightforward from this measure which is often given by the manufacturer. We propose here a fast algorithm for tissue core detection by performing directly the wavelet decomposition at the appropriate aforementioned scale and computing a locallyadaptive threshold of the wavelet coefficients.
Preprocessing
Waveletbased detection techniques are known to be robust to nonstationary noises like Poisson noise or mixedPoissonGaussian noise as in TMA images, acquired by brightfield or fluorescence. Preprocessing operations such as image denoising or variance stabilization transform are thus not mandatory. However, our algorithm is primarily designed for detecting bright spots over a dark background, and especially adapted for fluorescence images. brightfield TMA images, in which the tissue cores are darker than the background, are first inverted before further processing.
Scale selection
Nowadays, standard TMAs are manufactured with a diameter of tissue core typically from 0.6 to 1.5 mm. Given an imaging resolution (pixel size), the optimal scale of wavelet decomposition can be determined according to the core radius. If we denote r_{ core } the expected average radius (in pixels) of TMA cores, the optimal scale index \({\hat {\jmath }}\) of the wavelet decomposition that best fits the size of TMA cores is defined as:
where σ_{1} is selected according to the pixel size.
Fast isotropic wavelet decomposition
In contrast to multiresolution approaches, our detection method requires only the wavelet decomposition at the appropriate selected scale. To compute the decomposition at a desired scale, usual wavelet transform techniques perform a sequence of successive convolutions which are used for computing iteratively the decomposition from the smallest scale to the coarsest scale. These techniques are time consuming when dealing with large images and high number of scales. Instead, to address to this computational issue, we build a dyadic isotropic wavelet frame \(\left \lbrace \psi _{j} \right \rbrace _{j \geq 1}\) by choosing the scaling function ϕ_{ j } as a Gaussian function whose variance \(v_{j}^{2}\) is a function of scale \(j \in \{1, \ldots, j_{\max }\}\) and \(j_{\max }\) is the maximum index of the highest scale:
where ∥·∥_{2} denotes the Euclidean norm, \(\mathbf {x} \in \Omega \subset \mathbb {R}^{2} \) is the pixel location in the rectangular domain Ω and
with σ_{ k }=2^{k−1} σ_{1}. Thanks to the semigroup property of Gaussian functions, the relationship between the scaling functions at subsequent scales can be expressed as:
where ⋆ denotes the convolution operator. Therefore, the wavelet decomposition Ψ_{ j }u of \(u: \Omega \subset \mathbb {R}^{2} \rightarrow \mathbb {R} \) at the scale \(j \in \{1, \ldots, j_{\max }\}\) is obtained by convolution of u with the wavelet atom ψ_{ j } as:
with the conventions \(v_{0}^{2} = 0\) and G_{0}(·)=δ(·) (Dirac delta function). For more technical details on the proposed wavelet frame and the wavelet decomposition and reconstruction algorithms, please refer to the Additional file 1.
Locallyadaptive thresholding
While the wavelet decomposition plays the role of a filtering which reduces the noise and enhances the objects of interest, a common way to detect objects is to threshold the filtered image – the wavelet decomposition of the input TMA image in our case. As depicted in [32], a global threshold is not appropriate to handle complex situations, especially when dealing with images acquired in fluorescence context because of their inhomogeneous background. To overcome this difficulty, we propose to define an adaptive threshold according to the local distribution of the wavelet decomposition \(\Psi _{\hat {\jmath }}{u}\) previously computed. Accordingly, we consider the following statistical test at each point x of the TMA image u:
Pixels corresponding to tissue cores have strong positive responses in the wavelet decomposition. Under the null hypothesis \(\mathcal {H}_{0}\), the wavelet coefficient \(\Psi _{\hat {\jmath }}{u}(\mathbf {x})\), which follows the local distribution of the waveletdecomposedimage background with mean μ(x) and variance ν^{2}(x), is lower than a certain value τ(x). Let \(\mathbb {P} \left (\Psi _{\hat {\jmath }}{u}(\mathbf {x}) < \tau (\mathbf {x}) \right)\) be the probability for a pixel x to be classified as “background” class. The threshold τ(x) is used to control the number of misclassification. Given a probability of false alarm \(p_{_{\text {FA}}} > 0\), the corresponding threshold \(\tau _{_{\text {FA}}}\) is selected such that the misclassification probability \(\mathbb {P} \left (\Psi _{\hat {\jmath }} {u}(\mathbf {x}) \geq \tau _{_{\text {FA}}}(\mathbf {x}) \right)\) is lower than \(p_{_{\text {FA}}}\). By applying the conventional probabilistic Tchebychev’s inequality, we get, ∀κ(x)>0:
It follows that
Now, let us define \(\tau _{_{\text {FA}}}(\mathbf {x}) = \mu (\mathbf {x}) + \kappa (\mathbf {x})\) and assume \(\left ({\nu ^{2}(\mathbf {x})}/{\kappa ^{2}(\mathbf {x})}\right) \leq p_{_{\text {FA}}}\) such that \(\mathbb {P} \left (\Psi _{\hat {\jmath }} {u}(\mathbf {x}) \geq \tau _{_{\text {FA}}}(\mathbf {x}) \right) \leq p_{_{\text {FA}}}\). Finally,
and the adaptive threshold \(\tau _{_{\text {FA}}}(\mathbf {x})\) is controlled by the pvalue inferred from the significance level α of the test, and set by the user. If \(p_{_{\text {FA}}} < \alpha \), this suggests that the null hypothesis (i.e. a pixel x is classified as a “background” pixel) may be rejected. In practice, one typically sets \(p_{_{\text {FA}}} = 0.05\) (or 0.01) which corresponds to a significance level α=5% (or 1% respectively).
To determine the threshold \(\tau _{_{\text {FA}}}(\mathbf {x})\), the local mean μ(x) and the local variance ν^{2}(x) of the image background on the wavelet decomposition \(\Psi _{\hat {\jmath }}{u}\) are required. However, prior knowledge about the image background distribution is unfortunately not available in most cases. We consider thus empirical estimations of μ and ν^{2} at each point x from \(\Psi _{\hat {\jmath }}{u}\):
where g(·) is a weighting positive function (i.e. ∥g(·)∥_{1}=1, ∥·∥_{1} is the L_{1} norm and g(x)≥0,∀x∈Ω) mainly used to avoid the estimation of the background distribution statistics being biased from coefficients corresponding to the foreground. By construction, \(\hat {\mu }(\mathbf {x})\) and \(\hat {\nu }^{2}(\mathbf {x})\) are weighted estimators derived from \(\Psi _{\hat {\jmath }}{u}\) which is a filtered version of u by the bandpass filter \(\psi _{\hat {\jmath }}\) in order to enhance the objects of radius r_{ core }. It is thus convenient to define the weighting function g according to the wavelet atom \(\psi _{\hat {\jmath }}\). By using an affine transform which implies the positivity and the normalization conditions, we propose a candidate for g(·) as follows :
where \(\sup \psi _{\hat {\jmath }} = \ \psi _{\hat {\jmath }} \_{\infty }\) denotes the supremum (\(L_{\infty }\) norm) of \(\psi _{\hat {\jmath }}\) and \(\left \ \psi _{\hat {\jmath }} + \sup \psi _{\hat {\jmath }} \right \_{1}\) is the normalization factor to ensure \(\\hat {g}(\cdot)\_{1} = 1\). The choice of this candidate is clarified in Fig. 4 showing the wavelet atom and its derived weighting function according to a given circular spot. The proposed weighting which is constructed from the wavelet atom has the same size of the considered spot and has a hollow shape at the center (see right column in Fig. 4). This specific shape allows to reduce the impact of high wavelet coefficients corresponding to foreground pixels on the estimation of the background statistics.
By substituting the empirical estimators to μ(x) and ν^{2}(x), we obtain the estimated detection threshold:
Thresholding the wavelet decomposition \(\Psi _{\hat {\jmath }}{u}\) with respect to \(\hat {\tau }_{_{\text {FA}}}\) results in a binary image \(I_{_{\text {FA}}}: \Omega \rightarrow \{0,1\}\):
where each connected component in \(I_{_{\text {FA}}}\) represents a region which is potentially a tissue core of the TMA image. The gravity centers of these regions (or detection position) will be used as inputs for estimating the array coordinates of TMA cores. However, the detection reliability has a great impact on the dearraying outcome: few false detections may lead to severely inaccurate results. Removing false detections (i.e. outliers) is then crucial. To this end, the size of detected regions seems to be a relevant criterion since the core size is given in most cases by the TMA manufacturer. Although, due to the complexity of backgrounds, it may be highly different from the true core size. Instead of exploiting the imprecise information derived from the binary detection map \(I_{_{\text {FA}}}\), we perform an activecontourbased segmentation to delineate the objects at each detected position. Also, we reuse the segmentation results to confirm and improve the preliminary detection results.
Segmentation of TMA cores
As depicted in previous section, the detection binary image \(I_{_{\text {FA}}}\) does not allow us to accurately determine the size of detected objects. Active contours [35] are typically well appropriate in our context since they can evolve to closely delineate the object borders and thus yield an estimation of the TMA core size. The family of parametric active contours presented below will help to refine the detected position and the size of TMA cores and eventually to determine the orientation of the potential core if it was deformed during the manufacturing process.
Since the seminal paper of Kass, Witkin, and Terzopoulos, active contour models (or snakes) [35] have been successfully used to detect discontinuities, detect objects of interest or segment images, especially in bioimaging [36]). General purpose closed contours are generally controlled by elastic forces based on local curvature and image based potentials [35, 37–39]. The curve evolves from its initial starting position towards the target object. The optimization of the underlying energy functional is traditionally performed using variational principles and finite differences techniques, which needs an appropriate initialization to converge to a relevant solution. At the end of the nineties and beginning of the 2000’s, geodesic active contours [40] based on the theory of surfaces evolution and geometric flows have been introduced to segment an arbitrary number of highly complex objects in the image. In our TMA context, the 2D shapes of tissue cores can be actually well estimated by ellipseshaped active contours which belong to the family of parametric deformable templates.
Applicationtailored parametrized templates introduced by Yuille et al. [41] were proposed in cases where strong a priori knowledge about the shape being analyzed is available (e.g. eyes or lips in human faces [41]). The models are handbuilt using simple parametrized 2D geometric representations. Another line of research focused on models of random deformations for a given initial shape (deformable template). Grenander et al. [42, 43] obtained the first promising results in image segmentation by considering statistical deformable models which describe the statistics of local deformations applied to an original template. Markov models and MonteCarlo techniques have been introduced in this context to derive optimal random deformations estimates from image data [42–46]. In the approach initially proposed by Cootes et al. [47] and successfully applied to object tracking [45], the shape structure and the parameters describing its deformations are learned from a training set of representative shapes. Meanwhile, Staib and Duncan [48] proposed to combine parametric snakes (Bsplines) to the standard decomposition on a Fourier basis to analyze deformable biomedical structures. All these methods are generally robust to noise but computationally demanding if stochastic iterative procedures are used to conduct the minimization and no initial guess close to the optimal solution is provided. Very recently “snakescules” [49] combined to fast algorithms and Markov point process [50] have been proposed along the same philosophy but dedicated to the detection of cells or nuclei in fluorescence microscopy images.
Finally, the ellipse fitting concept has been furthemore introduced by Thévenaz et al. as an extension of the simple circleshaped active contour [49] which can be defined just by two points [24]. As a consequence, a triplet of points is necessary to parametrize the ellipseshaped version. However, this parametrization which has an extra degree of freedom increases the complexity of the model and makes the optimization of ellipse parameters more challenging when compared to the circleshaped model. To overcome these difficulties, an alternative way was proposed in [51]: the ellipses are configured by their center, their axes and the angle between their major axis and the horizontal. Under this configuration, the cost function introduced in [24], and defined below (see Eq. (12)) as the contrast between the core and the ring defined by the pair of ellipses (see Fig. 5) – and the derivatives with respect to the ellipse parameters could be calculated efficiently by using the Green’s theorem [48]. Nevertheless, the Green’s theorem cannot be applied with no error in the discrete setting and digitized images. In order to handle properly the ellipse parametrization described in [51] instead of [24] in the discrete setting, we propose a pixelbased smooth approximation of the underlying cost energy functional. Our approximation allows us to calculate properly the derivatives of the cost function with respect to the ellipse parameters and is not based on the Green’s theorem also used in [48] for energy minimization.
Definition of the ellipsebased energy
More formally, let Γ be the outer ellipse with parameters \(\left \lbrace \mathbf {x}_{0}, a, b, \theta \right \rbrace \) where \(\mathbf {x}_{0} = \left (x_{0}, y_{0} \right)\) is the center, a and b are the semi major and minor axes respectively, and θ is the angle of rotation. The inner ellipse \({\Gamma }^{\prime }\) is defined as a concentric and coaxial ellipse of Γ such the latter has an area (denoted Γ) twice larger than the former: \({\Gamma } = 2{\Gamma }^{\prime }\) (see Fig. 5). The factor 2 ensures that the area of the elliptical outer ring is equal to the area of the elliptical inner core. Let us consider a rectangular image patch P containing a potential TMA core associated to a connected component estimated by the detection method in the early stage. The ellipse energy (or cost function) is defined as a normalized image contrast between the two domains \({\Gamma }^{\prime }\subset {\Gamma } \subset P\) where P is a rectangular domain in the image domain Ω which contains a single TMA core [24, 52]:
To handle discrete images, the continuous image u defined in (12) can be replaced by its sampled version as follows:
where \(u\left [\mathbf {x}\right ]\) is the discrete sample of u(x) and denotes the set indicator function such as if x∈Γ and 0 otherwise. However, there are two major drawbacks while considering this energy function. Firstly, the calculation of the energy gradient is not trivial in the discrete setting since the indicator functions in (13) are piecewise constant which are not differentiable at some points. Secondly, due to sampling effect, brutal switch of the membership of some points from a domain to another may happen just with an infinitesimal change in the ellipse parameters, giving rise to severe numerical instabilities. Smooth approximations of the underlying piecewise constant functions is recommended to overcome both discontinuity and sampling problems. The calculations of partial derivatives of the energy functional is facilitated if we can define a fuzzy membership to avoid abrupt domain switches (see Fig. 8a and b). Our goal is then to build an approximation which favors the computation of the partial derivative of the energy with respect to each ellipse parameter as much as possible. First, we consider the following quadratic form:
For a given point x, \(\left \ \mathbf {x}  {\mathbf {x}_0} \right \_{{{\Gamma }}}\) is a normalized metric between x and the ellipse center x_{0} induced by the geometry of the ellipse Γ. A pixel x belongs to the interior of the ellipse Γ if and only if \( \left \ \mathbf {x}  {\mathbf {x}_0} \right \_{{{\Gamma }}}^{2} \leq 1\) since \(\\mathbf {x}  {\mathbf {x}_0} \_{{{\Gamma }}}^{2}\) is always positive. The term can be then expressed by a function of \(\left \ \mathbf {x}  {\mathbf {x}_0} \right \_{{{\Gamma }}}\) as . Moreover, we need to find a smooth function which closely approximates as investigated in [24, 38] and has simple derivative. We realized that the graph of looks similar to the \(C^{\infty }\) Sshaped logistic curve whose the derivative is easy to compute. Let us consider therefore the following logistic function:
where ε>0 controls the steepness of the curve (see the plot of t↦S_{ ε }(t) in Fig. 6 for several values of ε). The smaller ε, the closer the curve S_{ ε } approaches the graph of the indicator function . Thanks to the property of logistic functions, the derivative of S_{ ε } can be easily computed as \(S_{\epsilon }'(t)\,=\,\epsilon ^{1}\!S_{\epsilon }(t) \left (1  S_{\epsilon }(t)\right)\). Finally, the energy functional has the following form [24]:
with \(w_{\epsilon }(t) = S_{\epsilon }(t)  2S_{\epsilon }(2t) = \frac {1}{1 + e^{\frac {t1}{\epsilon }}}  \frac {2}{1 + e^{\frac {2t1}{\epsilon }}}\). For illustration, we present in Fig. 7 the plot of w_{ ε } whose the term \(w_{\epsilon }\left (\ \mathbf {x}  {\mathbf {x}_0} \_{{\Gamma }}^{2}\right)\) is nothing else than a smooth approximation of the piecewise constant function . These weights are very similar to those described in [38] and based on the \(\arctan \) function.
Calculation of partial derivatives
By applying the derivation rules of composite functions, the partial derivatives of the energy with respect to each ellipse parameter \(\left \lbrace \mathbf {x}_{0}, a, b, \theta \right \rbrace \) are given by:
and the calculation of partial derivatives of \(\\mathbf {x}  {\mathbf {x}_0} \_{{{\Gamma }}}^{2}\) are detailed in the Additional file 1. As depicted in Fig. 8c, for a given parametrization {x_{0},a,b,θ}, the term \(w_{\epsilon }'\left (\ \mathbf {x}  {\mathbf {x}_0} \_{{{\Gamma }}}^{2} \right)\) vanishes for most of points x. Thus, the computation of the partial derivatives J(u,Γ) takes account only few points near the ellipse boundaries where \(w_{\epsilon }'\left (\.  {\mathbf {x}_0} \_{{{\Gamma }}}^{2} \right)\) is nonzero. Our smooth approximation which is adapted for discrete images produces similar expressions of the partial derivatives of the ellipse energy when comparing with those described in [51] for continuous images. It can be viewed as the expression of the Green’s theorem in the discrete setting and an alternative to the optimization presented in [44].
Multiellipse segmentation for multitissue core analysis
Let {c_{ n }}_{1≤n≤N} be the centroids of the connected components of the binary detection map \(I_{_{\text {FA}}}\). In the original image u, we extract a rectangular patch P_{ n } centered at c_{ n } with a radius ρ larger than the given tissue core radius r_{ core } (for example, ρ=2r_{ core }). Let us define
where x=(x,y)∈P_{ n }, \(\\mathbf {x}\_{\infty } = \sup (x,y)\) and \(\phantom {\dot {i}\!}\Pi _{\rho, \mathbf {c}_{n}} \cdot \) denotes the patch extraction operator with center c_{ n } and radius ρ. In order to perform a multiobject segmentation, we consider the following multiellipse optimization problem:
where \(\left \lbrace {\mathbf {x}_{0,n}}, a_{n}, b_{n}, \theta _{n} \right \rbrace \) are the parameters of the ellipse Γ_{ n } and Υ is a set of constraints to ensure the ellipses fall into an acceptable range of configurations. In practice, we typically set
for some predefined values \(\rho _{\max }, r_{\min }, r_{\max }, \theta _{\min }, \theta _{\max }\) set according to the extracted patch positions and the allowed sizes and orientations of tissue cores. The constraint \(\left \ {\mathbf {x}_{0,n}}  {\mathbf {x}_{0,n'}} \right \_{2} > \rho \) which prevents the distance between two ellipse centers being too close helps to avoid the overlapping of segmented tissue cores. In what follows, we denote \(\mathcal {J}(u, \Gamma _{1}, \ldots, \Gamma _{n})\) the global cost function associated with the optimization problem (18).
By construction, the function \(\mathcal {J}(u, \Gamma _{1}, \ldots, \Gamma _{n})\) is differentiable with respect to \(\left ({{\Gamma }}_{1}, \hdots, {{\Gamma }}_{N} \right)\). The common way to minimize \(\mathcal {J}(u, \Gamma _{1}, \ldots, \Gamma _{n})\) under the constraint set Υ is to use a gradient method whose performance depends on how efficient is the computation of the gradient of \(\mathcal {J}(u, \Gamma _{1}, \ldots, \Gamma _{n})\). Since \(\mathcal {J}(u, \Gamma _{1}, \ldots, \Gamma _{n})\) is a linear combination of separable functions, therefore, the gradient can be simply obtained as:
where
and the expressions of its partial derivatives are given in (17).
The result of the multiellipse optimization problem (18) is a set of ellipses \(\left \lbrace {{\Gamma }}_{n} \right \rbrace _{1 \leq n \leq N}\phantom {\dot {i}\!}\) which fits the objects located in the regions of interest \(\lbrace \Pi _{\rho,\mathbf {c}_{n}} u \rbrace _{1 \leq n \leq N}\phantom {\dot {i}\!}\). Furthermore, given the major axes of these ellipses and the TMA core radius r_{ core }, we discard the tiny, giant and flattened ellipses and we keep those which are most similar to the expected tissue cores. The center of the selected ellipse allows us to determine the position of the recognized TMA core. This reference position will be used to determine the array coordinates of the corresponding tissue core. In the following, we denote \(\mathcal {X}_{0} = \lbrace {\mathbf {x}_{0,n}} \rbrace _{n \in \lbrace 1, \hdots, N \rbrace }\) as the set of centers of the N reliable and selected ellipses.
Estimation of array coordinate and TMA core positions
An ideal TMA is the one which has tissue cores perfectly aligned in both horizontal and vertical directions and equally spaced according to a regular square grid. The array coordinate \(\mathbf {p} = (k, l) \in \mathbb {Z}^{2}\) of a core can be simply obtained by drawing two orthogonal lines crossed at the considered core position. However, due to the deformation of the design TMA grid, the lines passing through tissue cores and their nearest neighbors may be slightly inclined with respect to the horizontal or vertical axes. Moreover, the direction of these lines may have a large spectrum of variations which makes more challenging the tracking of tissue cores over a given direction. To deal with this deformation, existing TMA dearraying methods use usually distanceandanglebased criteria for the purpose of defining the neighborhood of TMA cores. Although this approach estimates robustly the average coretocore distance and the two principal directions of the deformed core grid, it may fail for some welldetected cores whose the position is strongly distorted with respect to their neighbors. In order to avoid this failure, we introduce an algorithm for estimating iteratively the deformation of the TMA grid in a way that the grid which is warped by the estimated deformation at an iteration gets closer to the observed TMA grid. To this end, we assume that the deformation of the TMA grid can be decomposed by linear and nonlinear parts. Under this assumption, we estimate the linear part of the deformation by defining an oblique grid (affine warping) which is derived from the detected core positions as the initialization of the warped grid (see Fig. 9). The latter is used to find nearby cores that will be taken into account to compute an estimator of the grid deformation by using the thinplate interpolation [25] if we do an analogy with material deformation.
Estimation of the linear deformation
Our goal is to approximate the distorted TMA grid Λ (which is observed partially with the set of point \(\mathcal {X}_{0}\)) by an oblique grid Λ_{0} which minimizes the distance between them in the way that the deformation of the grid is approximated by a 2D affine transform. For this purpose, we consider the set \({\mathcal {C}_0}\) of core pairs whose each pair \(\phantom {\dot {i}\!}({\mathbf {x}_{0,n}}, {\mathbf {x}_{0,n'}})\) is formed by an element of \(\mathcal {X}_{0}\) and one of its four nearest neighbors with respect to the Euclidean distance
where \(\mathcal {N}({\mathbf {x}_{0,n}})\) denotes the 4neighborhood of x_{0,n}. To estimate the average coretocore distance \(\bar {d}_{cc}\), we compute the trimmed mean (denoted TM) of the length of the segment defined by the pair \(\phantom {\dot {i}\!}({\mathbf {x}_{0,n}}, {\mathbf {x}_{0,n'}})\) of \({\mathcal {C}_0}\) by discarding the most extreme values (typically 30%):
Let \(\text {ang}({\mathbf {x}_{0,n}}, {\mathbf {x}_{0,n'}})\phantom {\dot {i}\!}\) be the angle between the line passing through \(({\mathbf {x}_{0,n}}, {\mathbf {x}_{0,n'}})\phantom {\dot {i}\!}\) and the horizontal axis such that \(0.25{\pi } \leq \text {ang}({\mathbf {x}_{0,n}}, {\mathbf {x}_{0,n'}}) \leq {0.75\pi }\phantom {\dot {i}\!}\). By analogy, we define the two principal angles of the deformed TMA grid as follows:
Finally, we denote \(\hat {\mathbf {t}}\) as the global translation of the distorted TMA grid with respect to the origin that minimizes the distance between the set \(\mathcal {X}_{0}\) and the linearlyestimated grid Λ_{0}
where \(\mathcal {F}\) maps each array coordinates \(\mathbf {p} \in \mathbb {Z}^{2}\) to a position of Λ_{0} corrected by \(\hat {\mathbf {t}}\) and
Note that \(\mathbf {M}_{\bar {\alpha }, \bar {\beta }}\) is a changeofbasis matrix of unit column vectors and \(\bar {d}\) is a scaling factor which transforms array coordinates (elements of \(\mathbb {Z}^{2}\)) to real spatial positions (in \(\Omega \subset \mathbb {R}^{2}\)). The resulting oblique grid is parametrized with four parameters \(\{\bar {d}, \bar {\alpha }, \bar {\beta }, \hat {t}\}\) and represents the affine part of the grid deformation. We thus arrive at the affine mapping function: \(\mathcal {A}(\mathbf {p}) = \hat {\mathbf {t}} + \mathcal {F}(\mathbf {p}) \in \Lambda _{0}\). The oblique grid Λ_{0} will serve as initialization to estimate of the nonlinear deformation of the grid. Figure 9 illustrates an example showing the oblique grid obtained from a given set of points as well as its estimated parameters.
Thinplatebased estimation of the deformation
Let Λ^{⋆} be the ideal design TMA grid with (0,0) as origin and d the ideal distance between two neighboring cores along the horizontal and vertical axes. The mapping is then defined as:
The deformation \({\mathcal {D}}\) maps each point \({\mathbf {y}_{\mathbf {p}}^{\star }} \in {{\Lambda ^{\star }}}\) onto a point \({\mathbf {y}_{\mathbf {p}}} = \mathcal {D}({\mathbf {y}_{\mathbf {p}}^{\star }})\) in the distorted grid Λ. In order to estimate the deformation \({\mathcal {D}}\) at all points of the grid Λ^{⋆}, we aim at approximating this set from the observed set \(\mathcal {X}_{0} = \{{\mathbf {x}_{0,n}}\} \) by using the thinplate splines as an interpolant. Indeed, given a set of points \({\mathcal {D}}^{1} \mathcal {X}_{0} = \left \lbrace {\mathcal {D}}^{1} \left ({\mathbf {x}_{0,n}} \right) \right \rbrace _{n \in \lbrace 1, \ldots, N \rbrace }\), the coefficients of the interpolating thinplate splines are the minimizers of a quadratic function which is the first approximation of the bending energy of the mapping from \({\mathcal {D}}^{1} \mathcal {X}_{0}\) to the set of target points \(\mathcal {X}_{0}\) (see [25]). Nevertheless, unlike the usual framework [25], the correspondence between the two sets of points is not established, that is \({\mathcal {D}}^{1} \mathcal {X}_{0}\) is unknown. Instead of investigating a matching method to determine \({\mathcal {D}}^{1} \mathcal {X}_{0}\), we propose to build a sequence of grids {Λ^{(m)}}_{m≥0} which evolves iteratively to fit \(\mathcal {X}_{0}\). We initialize this sequence with the oblique grid Λ^{(0)}=Λ_{0} previously computed. The linear approximation of \({\mathcal {D}}\) is then as follows:
At iteration m, a core position \({\mathbf {x}_{0,n}} \in \mathcal {X}_{0}\) is associated to a position \({\mathbf {y}_{\mathbf {p}}^{\star }}\) if the former is located within a radius δ from \({\mathbf {y}_{\mathbf {p}}}^{{{(m)}}} = {\mathcal {D}}^{{{(m)}}}({\mathbf {y}_{\mathbf {p}}^{\star }})\). Pairs of associated positions establish therefore the correspondence between the ideal grid Λ^{⋆} and the set of observed point \(\mathcal {X}_{0}\). We also note that all the positions of Λ^{⋆} do not have a corresponding position in \(\mathcal {X}_{0}\) as shown in Fig. 10 mainly because the cardinality of these sets are not the same. Let \(\mathcal {P}^{{{(m)}}}\) be the set of pairs of associated positions:
Assume that N^{(m)} is the number of elements of \(\mathcal {P}^{{{(m)}}}\). The objective is to estimate the deformation \(\mathcal {D}^{{{(m)}}}\) from the set of N^{(m)} associated pairs \(({\mathbf {x}_{0,n}}, {\mathbf {y}_{\mathbf {p}}^{\star }})\). According to [25], we define the Gram’s matrix \(\left \lbrace \mathbf {K}^{{{(m)}}}_{n, n'}\right \rbrace _{1 \leq n, n' \leq N^{{{(m)}}}}\) as follows:
and the additional matrices as:
and \(\mathbf {W}^{{{(m)}}} \,=\, \left (\!\mathbf {w}_{1}^{{{(m)}}}, \mathbf {w}_{2}^{{{(m)}}}, \hdots, \mathbf {w}_{N^{{{(m)}}}}^{{{(m)}}}\!\right)^{\top }\). By using the entries of the matrix W^{(m)}, the estimators of the deformation \({\mathcal {D}}\) and of the grid Λ at the next iteration m+1 are therefore defined as:
This iterative scheme will be stopped at the iteration m^{∗}=m if there are no change between Λ^{(m)} and Λ^{(m+1)}. At convergence, the row and column coordinates of a detected cores of position \({\mathbf {x}_{0,n}} \in \mathcal {X}_{0}\) is simply given by:
Moreover, since the grid \({{\Lambda }}^{(m^{*})}\) is an estimator of the deformed TMA grid Λ which is partially observed in \(\mathcal {X}_{0}\), it can be used as approximated positions to recognize tissue cores which are missed during detection and segmentation processes. Indeed, to refine dearraying result, we perform another multiellipse optimization at the position of remaining nodes of the grid \({{\Lambda }}^{(m^{*})}\). If there are ellipses that meet the size and the roundness criteria of standard cores, we add them to the list of detected core position and adjust the coefficients of the thinplate splines according to the new list. An example of TMA dearraying is depicted in Fig. 14 showing the gain of our method in term of tissue core detection.
Results and discussion
Description of data sets
To evaluate our dearraying ATMAD approach, we selected a number of DNA microarray and tissue microarray images including those which are artificially simulated and those which are acquired in both brightfield and fluorescence modes. The selected images were collected from various sources and can be classified into three data sets.
The first set is a collection of binary images generated by Dr Yinhai Wang in [22] as pseudo TMA slides. This data set was artificially created by taking account of different possible situations occurring during the TMA manufacturing process, including rotations and stretches of the design grid as well as irregularities in the size and the shape of tissue cores. The average core radius is approximately r_{ core }=15 pixels for all images. The whole set of all these simulated images and ground truths can be freely downloaded at https://get.google.com/albumarchive/117531880452844036890.
The second data set is composed of color TMA images from the AIDS and Cancer Specimen Resource (ACSR) Digital Library of the University of California San Francisco (http://acsr.ucsf.edu). This online library – managed and visualized by Aperio’s WebScope software – contains several hundreds of tissue specimens which are mostly stained with H&E (Hematoxylin and Eosin) stain and are imaged by brightfield microscopy technique. For this experiment, we considered downsampled version (with the magnification between 0.4X and 0.6X) of the original images hosted on ACSR’s server in order to reduce the processing time. The considered resolutions correspond to images of approximately 1000×1000 pixels, on which the TMA cores have radius of only a few dozen pixels but it is sufficient for our approach to localize them.
The third set for the evaluation includes fluorescence highdynamicrange (HDR) images showing DNA microarray and tissue microarray slides. Provided by the courtesy of Innopsys company, these HDR images which were saved in 16bitTIFF format were acquired using a scanner called InnoScan 1100AL (see https://www.innopsys.com/en/lifesciencesproducts/microarrays/innoscan/innoscan1100al for more details). The latter which is equipped with three excitation lasers (488 nm, 532 nm and 625 nm compatible with cyanine dyes such as Cy2, Cy3 and Cy5 respectively). It can perform simultaneously the acquisition on each excitation channel and provides up to three color fluorescence images. The maximal scan area supported by the mentioned device is 22×74 mm^{2} corresponding to the size of typical microscopy slides used in most biological laboratories nowadays. For the same reason with ACSR’s images, we selected typical images acquired by this Innopsys’s scanner with spatial resolutions varying in a range from 10 to 40 μm per pixel in this experiment instead of using those with higher resolution (up to 0.5 μm per pixel or a 20X magnification equivalently). Indeed, considering such images of low resolution and small size as input data not only enables efficient and lowmemoryrequirement processing but also requires very short scanning time – less than just five minutes with a resolution of 10 μm per pixel when compared with typically several hours of acquisition at submicrometer resolutions.
Regarding the complexity of the data sets, it contains difficult cases such as irregular and nonrounded shapes, fragmented cores as well as low contrasts between image background and foreground. Sophisticated array design with incomplete (missing cores) rows and columns is also present in the image set for the purpose of testing the robustness of our dearraying approach (see Figs. 11, 12, 13 and 14).
Experimental results and algorithm evaluation
For the first and second data sets which contain images with dark spots and bright background, we firts performed a color inversion before further processing. The dearraying procedure was directly applied on binary and grayscale images. Multichannel color images as in the case of ACSR’s data require a conversion to grayscale such as a simple average over all channels which we used in these experiments.
In order to evaluate the performance of our ATMAD algorithm, we analyzed the obtained results by considering two criteria: (i) the rate of samples which are successfully localized and (ii) the rate of samples whose array coordinates are correctly estimated. To that end, the dearraying ATMAD outcome was compared with the groundtruth provided by the simulated dataset or by manual annotation of realworld TMA images. The comparative similarity between the dearraying results and groundtruths (simulation, annotation) is quantitatively measured by these six following metrics:

Accuracy: \(\mathrm {A} = \frac {\text {TP} + \text {TN}}{\text {TP} + \text {TN} + \text {FP} + \text {FN}}\),

Precision: \(\mathrm {P} = \frac {\text {TP}}{\text {TP} + \text {FP}}\),

Recall (sensitivity): \(\mathrm {R} = \frac {\text {TP}}{\text {TP} + \text {FN}}\),

Fscore: \(\mathrm {F} = 2\frac {\mathrm {P}\mathrm {R}}{\mathrm {P} + \mathrm {R}}\),

Gscore: \(\mathrm {G} = \sqrt {\mathrm {P} \mathrm {R}}\),

Jaccard coefficient: \(\text {JSC} = \frac {\text {TP}}{\text {TP} + \text {FP} + \text {FN}}\).
“True Positive” (TP) denotes the number of true tissue samples (cores) which are correctly localized, or those whose array coordinates are correctly estimated. “False Negative” (FN) denotes the number of true cores which are not successfully localized (due to non detection or failed segmentation), or those whose array coordinates are not estimated. “False Positive” (FP) denotes the number of cores which are wrongly localized (due to false detection), or those whose array coordinates are wrongly estimated. “True Negative” (TN) denotes the number of “empty” spot positions (no core is placed) where no core is wrongly localized.
To better appreciate the impact of the components (or modules) of our dearraying approach, the performance was evaluated under four different setting options (see also Table 1):

Option #1:
deactivation of ellipsebased segmenta
tion and nonlinear registration modules,

Option #2:
activation of ellipsebased segmentation
module and deactivation of nonlinear
registration module,

Option #3:
deactivation of ellipsebased segmenta
tion module and activation of onlinear
registration module,

Option #4:
activation of ellipsebased segmentation
and nonlinear registratione modules.
When the ellipsebased segmentation module is deactivated, the spot localization is performed only with the waveletbased detection method. Consequently, the process of removal of unreliable detected cores based on the size and the shape criteria is then disable, and the refinement of dearraying result using estimated positions of the deformed TMA grid can not be performed. Meanwhile, the deactivation of the nonlinear registration module implies that the grid deformation is assumed to be approximated by an affine (linear) transform. It could result in a noncoincidence between core positions and estimated positions of the deformed grid for most of cores. A distancebased matching is thus necessary to establish the correspondence of each core position and its array coordinates. To allow a stepbystep evaluation of the performance, besides the final dearraying result, intermediary results of the dearraying procedure were also carefully analyzed.
For a comparative evaluation, we also provide dearraying results on simulated images (which are generated using the deformation model described in [22]) obtained with the Wang’s method [22] – the stateoftheart method for TMA brightfield image dearraying – and compare these results to those obtained with the proposed ATMAD method. Unfortunately, it was not possible to apply the method [22] on realworld images since the software and code are not available. In Table 2, the average performance obtained on each dataset as well as on each example is shown in Figs. 11, 12, 13 and 14. We notice that, except the precision and recall scores which are not in agreement in certain cases, the Accuracy, Fscore, Gscore and Jaccard metrics yield consistent results about the effectiveness of the dearraying method. Accordingly, we will focus on the Fscore metric in the next sections for the sake of simplicity. The results with all the metrics are reported in Table 2.
Simulated images
We evaluated our ATMAD method applied to the Wang’s dataset and we compared the results with those obtained with the method described in [22].
An example of dearraying result with different levels of deformation is illustrated in Fig. 11. The top row shows the original images. The two middle rows show the dearraying outcomes obtained with deactivation and activation of the segmentation respectively (the nonlinear estimation for the deformation is activated in both cases). These two cases correspond to the Option #3 and #4 respectively, as reported in Table 2. The recognized spot positions are marked by green boxes and correctly aligned in a array to facilitate localization and identification. The bottom row of Fig. 11 shows the groundtruth provided by the authors of the dataset.
As expected, in the case of simulated images when the background is constant, our method provided a perfect Fscore = 1 (corresponding to an accuracy of 100%) in average with the Option #3 even if the localization of spots is only performed with the waveletbased detection method. On the second row of Fig. 11 showing the dearraying results obtained on two typical examples with the deactivation of the ellipsebased segmentation method, we notice that all the spots are successfully recognized and the array coordinates are correctly estimated. The results are similar to those obtained with the Wang’s method [22] (for more details, see Table 2). Meanwhile, the dearraying results obtained with the Option #4 achieved a slightly lower Fscore F=0.98 (corresponding to an accuracy of 95%) in average. This score is a direct consequence of the fact that all existing spots were not recognized by the spot localizer due to segmentation failure or elimination. As depicted in Fig. 11e and f, the too small, too large and too elongated spots are not taken into account in the final dearraying results. This behavior is confirmed by a lower Recall value which measures the sensitivity of the method (R=0.95 in average compared to the perfect score R=1 obtained with Option #3). Although, despite a smaller number of correct spot positions, the estimation of the array coordinate yielded exact results for successfully recognized spots (Precision value P=1 in average) comparing with the groundtruth. In terms of deformation estimation, the estimated potential spot positions provided by the dearraying with two setting options are almost identical. It thus allows us to localize spots which were not recognized and demonstrates the robustness of our method for estimating the grid deformation.
Regarding the two remaining options (not illustrated in Fig. 11), when both the segmentation and nonlinear estimation modules are deactivated (Option #1), ATMAD produced surprisingly very satisfying dearraying results with a Fscore = 0.96 in average on this set of simulated images (see Table 2). This score which is slightly lower than those obtained with Option #4 is due to the monotone and nonoscillating nature of the deformation model used to generate the test images, as described in [22] and illustrated in Fig. 11a and b. Meanwhile, the combination of the segmentation activated and the nonlinear estimation deactivated (Option #2) yielded strongly inferior results. The Fscore in average is barely 0.91 (corresponding to an accuracy level of 84%). It is mainly due to lower rate of correctly localized spots, implying inaccurate linear estimation for the deformation.
Moreover, we point out that when there is no false positive (i.e. FP = 0 implies P = 1), the Jaccard similarity coefficient (JSC) coincides with the Recall (R) value. This explains why we have obtained the same values for these two performance measures on this set of simulated images. We also observe similar behaviors in some cases on brightfield and fluorescence images when the method tends to eliminate all false detections during the localization step.
Brightfield images
We have noticed that in the previous experiments with simulated data, our waveletbased detection algorithm was able to localize all spots on images with constant background. In the case of brightfield TMA images whose background is not constant but generally homogeneous, this approach might still be efficient for spot localization since the situation is much more simpler than in fluorescence imaging. In this section, we focus on the evaluation of ATMAD applied to the ACRS dataset with the Options #3 and #4 (see Table 1) to assess the impact of the ellipsebased segmentation algorithm. In Fig. 12, the dearraying result with these two setting options on a H&E stained TMA image containing irregularities in the shape of tissue cores is illustrated. The original input image shown in Fig. 12a is the slide cut #9 of the TMA whose the ID is 550T001101 on ACSR’s database. The dearraying results obtained with the Option #3 and #4 are depicted in Fig. 12c and d respectively. To evaluate the accuracy of these results, we consider in Fig. 12b a reference dearraying obtained by manual annotation. The latter is presented in the same format (i.e. an array representation) as those of the automated dearraying outcomes to facilitate comparison.
Comparing with an accuracy of 100% obtained on simulated data, localization only based the wavelet method achieved in average approximately 94% of existing TMA cores on ACSR’s data (corresponding to a Fscore = 0.96 in average). Indeed, it failed generally to recognize cores with inner hole or cores which are split into parts (see Fig. 12c) since the shape of these cores implies that the wavelet coefficients at their position are lower than the detection threshold – resulting to non detection. Activating the segmentation module does not improve successful recognition rate of the localization step due to the use of detected core position for initializing the ellipse fitting. In our interest, the main role of this module in the localization step is to measure the size and the roundness of detected objects in order to eliminate false detection and to provide reliable input for the estimation of the grid deformation. For this reason, only about 87% of existing cores were correctly recognized during the localization step (corresponding to Fscore = 0.91 in average) with the combination of the detection and the segmentation modules due to the segmentation failure and the elimination of outliers. In spite of the difference between the localization results obtained with the deactivation/activation of the segmentation module, the nonlinear estimation of the grid deformation using these results however yielded similar dearraying outcomes as illustrated in Fig. 12c and d. The overall accuracy of the dearraying procedure with the activation of the ellipsebased segmentation module is approximately 93% (corresponding to Fscore = 0.95 in average) compared to 87% (corresponding to Fscore = 0.91 in average) if the module is activated (see Table 2). Under the latter setting options, the final recognition rate of tissue cores has increased by about 6% with respect to the rate obtained after the localization step. This improvement is due to the segmentation performed using the potential position which is provided by the estimation of the grid deformation to recognize missed cores during the first step of the dearraying procedure (for example, some fragmented cores or cores with inner hole were additionally recognized as shown in Fig. 12d in comparison with Fig. 12c). This approach is useful, not only for brightfield images, but also in the case of fluorescence images, in which the contrast between the background and the foreground is often significantly weaker.
For the two remainder options (Options #1 and #2), ATMAD produced slightly inferior scores when compared to those obtained with the Options #3 and #4. It is due to the imprecise estimation of tissue core positions computed with affine registration of the grid. Quantitative similar results were observed in the case of simulated images as reported in Table 2.
Fluorescence images
In this section, we evaluated ATMAD on a more challenging image dataset which is acquired by fluorescence scanners and characterized by high noise level and nonhomogeneous background. Unlike simulated and brightfield images depicting tissues, fluorescence images provided by Innopsys company, are composed of both DNA microarray and TMA images. Examples of DNA and TMA image dearraying are respectively shown in Figs. 13 and 14.
For the illustrated DNA microarray, we presented in Fig. 11 only the original image, the final dearraying result and the corresponding manual annotations. Whereas, intermediate results were additionally illustrated in Fig. 12 besides the original image as well as the final result and the ground truth in the case of TMA image to allow stepbystep evaluation.
As expected, the proposed ATMAD method achieved 100% accuracy (corresponding to the perfect Fscore = 1) in average on DNA microarray images under all four considered setting options (see Table 2 and Fig. 13). This perfect score was obtained due to the regularity of the size, the shape and the grid of spotted DNA samples which facilitates the localization and the estimation of the array coordinates of each spot.
It is however not possible to reach such performance scores on TMA images in most cases because of the deformation of TMA grid and the irregularities of TMA cores. Indeed, when the segmentation module is deactivated (Options #1 and #3), the localization of TMA cores estimated with only the waveletbased detection method, often suffers from false positives because erroneous detection of irrelevant objects on the background is not eliminated.
False detection mostly occurs in images with complex background such as those illustrated in Figs. 14a and b. Note that in the case of fluorescence TMA images, the number of false positives is significantly larger than in the case of simulated and brightfield TMA images. On the other hand, tanks to the adaptive threshold derived from the wavelet transform, there is in general no false negative (i.e. all existing tissue cores were detected). These results demonstrate that the detection operation is not too sensitive (perfect recall score R=1 in average), but also it is not precise enough (weak precision score P=0.77 in average) in fluorescence imaging. Consequently, it lowered the overall performance of TMA core localization. In Table 2, the accuracy is only about 79% (corresponding to a Fscore = 0.87) in average. Note that the linear transform estimation (Option #1) using the set of localizations with false positives, yielded satisfying dearraying results (with an accuracy of 91% or Fscore = 0.94 in average), mainly because robust estimators are used for TMA grid registration. Nevertheless, in some cases the nonlinear transform estimation (Option #3) was unable to correctly handle erroneous inputs and to produce reliable dearraying results.
In order to reduce the number of false positives during the localization step, we combined the waveletbased detection method with the ellipsebased segmentation method. Despite lowlight fluorescence imaging conditions and low contrast in images, the multicore ellipsebased segmentation perfectly performed with a rate of 100% of successful segmentation over all detected positions. The segmentation procedure provided reliable features of the object found at each detected position (see Fig. 14c). By combining the detection and the segmentation modules, the localizer gave better results; in average, the overall accuracy is about 91% (Fscore = 0.94) to be compared to only 79% when the ellipsebased segmentation module is not activated (see Table 2). Given these precise localization results, the nonlinear transform estimation produced satisfactory outcomes; the row and column coordinates of most existing TMA cores were accurately computed (Fig. 14f). In average, the dearraying with activation of both the ellipsebased segmentation and the nonlinear transform estimation modules (Option #4) achieved a Fscore = 0.96 (corresponding to an accuracy of 92%), which is sightly better than those obtained with Option #1 (Fscore = 0.95). We also notice a gain of about 1% in terms of overall accuracy (0.01 in terms of Fscore performance) comparing to the localization step. The improvement between the two steps of the dearraying procedure demonstrates the positive influence of the ellipsebased segmentation module on the overall performance of the proposed ATMAD method.
To sum up, the proposed dearraying method rarely achieves perfect scores in the case of real (brightfield and fluorescence) images (except those obtained on DNA microarrays) in comparison to simulated images. This weaker performance is often due to the insufficient number of localized cores obtained on images with complex nonhomogeneous background and/or highly irregular shapes of tissue cores. Consequently, we get imperfect dearraying results which represent only array coordinates of each core. In spite of these imperfections, we have noticed that the spline approximation of the grid deformation yields, in most cases, accurate core position. More sophisticated segmentation algorithms can be used to further localize cores which were not recognized, and thus refine dearraying results.
The majority of the time computing is spent on the detection task to compute the wavelet transform. Overall, the computational cost is less than 5 s for dearraying a 1000×1000 image. The experiments were performed on a Macbook Pro equipped with 2.7 Ghz Intel Core i7, 16 Gb of RAM and the Mac OS X v. 10.12.4 operating system. The algorithm was implemented in Matlab and we exploited the intrinsic parallelism of the CPU by performing many ellipsebased segmentation in parallel.
Conclusion
This paper introduced a fast and efficient algorithm for dearraying TMA by combining wavelet transform, active contour and thinplate interpolation. The proposed ATMAD algorithm is adapted not only for brightfield images but also for fluorescence images which are more challenging in terms of tissue localization due to complex backgrounds. This difficulty is carried out by a twostep approach: a fast detection followed by a careful segmentation to reduce the number of false alarms. The row and column coordinates of each localized tissue core are next computed by estimating the deformation of the design grid. Using the estimation of the deformation, tissue cores which are missed during localization can be later recognized and it refine thus the dearraying result.
Abbreviations
 DOG:

Difference of two Gaussians
 FA:

False Alarm
 HDR:

High dynamic range
 TMA:

Tissue MicroArray
 ATMAD:

Automatic tissue microarray dearraying
References
Battifora H. The multitumor (sausage) tissue block: novel method for immunohistochemical antibody testing. Lab Invest. 1986; 55(2):244–8.
Wan WH, Fortuna MB, Furmanski P. A rapid and efficient method for testing immunohistochemical reactivity of monoclonal antibodies against multiple tissue samples simultaneously. J Immunol Methods. 1987; 103:121–9.
Battifora H, Mehta P. The checkerboard tissue block. An improved multitissue control block. Lab Invest. 1990; 63:722–4.
Kononen J, Bubendorf L, Kallionimeni A, Bärlund M, Schraml P, Leighton S, Torhorst J, Mihatsch MJ, Sauter G, Kallionimeni OP. Tissue microarrays for highthroughput molecular profiling of tumor specimens. Nat Med. 1998; 4:844–7.
Gillett CE, Springall RJ, Barnes DM, Hanby AM. Multiple tissue core arrays in histopathology research: a validation study. J Pathol. 2000; 192:549–53.
Chan JK, Wong CS, Ku WT, Kwan MY. Reflections on the use of controls in immunohistochemistry and proposal for application of a multitissue springroll control block. Ann Diagn Pathol. 2000; 5:329–36.
Schoenberg Fezjo M, Slamon DJ. Frozen tumor tissue microarray technology for analysis of tumor RNA, DNA and proteins. J Pathol. 2001; 150:1645–1650.
Packeisen J, Buerger H, Krech R, Boecker W. Tissue microarrays: a new approach for quality control in immunohistochemistry. J Clin Pathol. 2002; 55:613–5.
Badve S, Deshpande C, Hua Z, Lögdberg L. Expression of invariant chain (CD 74) and major histocompatibility complex (MHC) class II antigens in the human fetus. J Histochem Cytochem. 2002; 50:473–82.
Hidalgo A, Piña P, Guerrero G, Lazos M, Salcedo M. A simple method for the construction of small format tissue arrays. J Clin Pathol. 2003; 56:144–6.
Wang L, Deavers MT, Malpica A, Silva EJ, Liu J. Tissue macroarray: a simple and costeffective method for highthroughput studies. Appl Immunohistochem Mol Morphol. 2003; 11:174–6.
Dan HL, Zhang YL, Zhang Y, Wang YD, Lai ZS, Yang YJ, Cui HH, Jian YT, Geng J, Ding YQ, Guo CH, Zhou DY. A novel method for preparation of tissue microarray. World J Gastroenterol. 2004; 10:579–82.
Pan CH, Chen H., C ans Chiang. An easy method for manual construction of highdensity tissue arrays. Appl Immunohistochem Mol Morphol. 2004; 12:370–2.
Datta M, Kahler A, Macias V, Brodzeller T, KajdacsyBalla A. A simple inexpensive method for the production of tissue microarrays from needle biopsy specimens: examples with prostate cancer. Appl Immunohistochem Mol Morphol. 2005; 13:96–103.
Montgomery K, Zhao S, van de Rijn M, Natkunam Y. A novel method for making “tissue” microarrays from small numbers of suspension cells. Appl Immunohistochem Mol Morphol. 2005; 13:80–4.
Pilla D, Bosisio FM, Marotta R, Faggi S, Forlani P, Falavigna M, Biunno I, Martella E, De Blasio P, Borghesi S, Cattoretti G. Tissue microarray design and construction for scientific, industrial and diagnostic use. J Pathol Inform. 2012; 3:42.
Vrolijk H, Sloos W, Mesker W, Franken P, Fodde R, Morreau H, Tanke H. Automated acquisition of stained tissue microarrays for highthroughput evaluation of molecular targets. J Mol Diagn. 2003; 5(3):160–7.
Chen W, Reiss M, Foran DJ. A prototype for unsupervised analysis of tissue microarrays for cancer research and diagnostics. IEEE Trans Inform Technol Biomed. 2004; 8(2):89–96.
Dell’Anna R, Demichelis F, Barbareschi M, Sboner A. An automated procedure to properly handle digital images in large scale tissue microarray experiments. Comput Methods Programs Biomed. 2005; 79:197–208.
Rabinovich A, Krajewski S, Krajewska M, Shabaik A, Hewitt SM, Belongie S, Reed JC, Price JH. Framework for parsing, visualizing and scoring tissue microarray images. IEEE Trans Inform Technol Biomed. 2006; 10(2):209–19.
Lahrmann B, Halama N, Westphal K, Ernst C, Elsawaf Z, Sinn P, Bosch FX, Dickhaus H, Jäger D, Schirmacher P, Grabe N. Robust gridding of TMAs after wholeslide imaging using template matching. Cytometry A. 2010; 77(12):1169–76.
Wang Y, Savage K, Grills C, McCavigan A, James JA, Fennell DA, Hamilton PW. A TMA dearraying method for high throughput biomarker discovery in tissue research. PLoS ONE. 2011; 6(6):26007.
Starck JL, Murtagh R. Image restoration with noise suppression using wavelet transform. Astron Astrophys. 1994; 288:342–8.
Thévenaz P, DelgadoGonzalo R, Unser M. The ovuscule. IEEE Trans Pattern Anal Mach Intell. 2011; 33(2):382–93.
Bookstein FL. Principal warps: thinplate splines and the decomposition of deformationa. IEEE Trans Pattern Anal Mach Intell. 1989; 11(6):567–85.
Kervrann C, Sorzano CO, Acton ST, OlivoMarin JC, Unser M. A guided tour of selected image processing and analysis methods for fluorescence and electron microscopy. IEEE J Sel Topics Signal Process. 2016; 10(1):6–30.
Breen EJ, Joss GH, Williams KL. Locating objects of interest within biological images: The top hat box filter. Comput Assist Microsc. 1991; 3(2):97–102.
OlivoMarin JC. Extraction of spots in biological images using multiscale products. Pattern Recognit. 2002; 35(9):1989–96.
Sage D, Neumann FR, Hediger F, Gasser SM, Unser M. Automatic tracking of individual fluorescence particles: Application to thestudy of chromosome dynamics. IEEE Trans Image Process. 2005; 14(9):1372–83.
Zhang B, Fadili MJ, Starck JL, OlivoMarin JC. Multiscale variancestabilizing transform for mixedPoissonGaussian processes and its applications in bioimaging. Proc IEEE Intl Conf Image Process. 2007; 6:233–6.
Smal I, Niessen W, Meijering E. A new detection scheme for multiple object tracking in fluorescence microscopy by joint probabilistic data association filtering. Proc IEEE Intl Symp Biomed Imaging. 2008;264–7.
Basset A, Boulanger J, Bouthemy P, Kervrann C, Salamero J. SLTLoG: A vesicle segmentation method with automatic scale selection and local thresholding applied to TIRF microscopy. Proc IEEE Intl Symp Biomed Imaging. 2014;533–6.
Püspöki Z, Ward JP, Sage D, M U.Fast detection and refined scale estimation using complex isotropic wavelets. Proc IEEE Intl Symp Biomed Imaging. 2015;512–5.
Püspöki Z, Sage D, Ward JP, Unser M. Spotcaliper: fast waveletbased spot detection with accurate size estimation. Bioinformatics. 2016; 32(8):1278–80.
Kass M, Witkin A, Terzopoulos D. Snakes: Active contour models. Intl J Comput Vision. 1998; 1(4):321–31.
DelgadoGonzalo R, Uhlmann V, Schmitter D, Unser M. Snakes on a plane: A perfect snap for bioimage analysis. IEEE Signal Process Mag. 2015; 32(1):41–8.
Zhu S, Yuille A. Region competition: unifying snakes, region growing, energy/Bayes/MLD for multiband image segmentation. IEEE Trans Pattern Anal Mach Intell. 1996; 18(9):884–90.
Chan TF, Vese L. Active contoru without edges. IEEE Trans Image Process. 2001; 10(2):266–77.
Kervrann C, Trubuil A. Optimal level curves and minimizers of cost functionals in image segmentation. J Math Imaging Vision. 2002; 17(2):153–74.
Caselles V, Kimmel R, Sapiro G. Geodesic active contours. Intl J Comput Vision. 1997; 22(1):61–79.
Yuille A, Hallinan PW, Cohen DS. Feature extraction from faces using deformable templates. Intl J Comput Vision. 1992; 8(2):99–111.
Amit Y, Grenander U, Piccioni M. Structural image restoration through deformable templates. J Am Statist Assoc. 1991; 86:376–87.
Grenander U, Chow Y, Keenan DM. Hands – A Pattern Theoretic Study of Biological Shapes. Berlin/Heidelberg/NewYork: Springer; 1991.
Grenander U, Miller M. Representations of knowledge in complex systems. J Royal Stat Soc Ser B. 1994; 56(4):549–603.
Kervrann C, Heitz F. A hierarchical Markov modeling approach for the segmentation and tracking of deformable shapes. Graph Models Image Underst. 1998; 60(3):173–95.
Descombes X. Stochastic Geometry for Image Analysis: WileyISTE, 9781848212404; 2011.
Cootes TF, Taylor CJ, Cooper DH, Graham J. Active shape models  their training and application. CVGIP: Image Underst. 1994; 61:38–59.
Staib LH, Duncan JS. Boundary finding with parametrically deformable models. IEEE Trans Pattern Anal Mach Intell. 1992; 14(11):1061–75.
Thévenaz P, Unser M. Snakuscules. IEEE Trans Image Processing. 2008; 17(4):585–93.
Descombes X. Multiple objects detection in biological images using a marked point process framework. Methods. 2016; 115:2–8.
Pediredla AK, Seelamantula CS. A unified approach for optimization of snakuscules and ovuscules. Proc IEEE Intl Conf Acoustics, Speech, and Signal Process. 2012;681–4.
O’Sullivan F, Qian Q. A regularized contrast statistic for object boundary estimation  implementation and statistical evaluation. IEEE Trans Pattern Anal Mach Intell. 1994; 16(6):561–70.
Acknowledgements
We would like to show our gratitude to Dr Jinhai Wang (Queen’s University Belfast, Belfast, United Kingdom) and the AIDS and Cancer Specimen Resource Digital Library (University of California San Francisco) for sharing their valuable tissue microarray images which were used to evaluate the proposed approach.
Availability of data and materials
The selected images were collected from various sources and can be classified into three data sets.
∙ The first set is a collection of binary images generated by Dr Yinhai Wang in [22] as pseudo TMA slides. This data set was artificially created by taking account of different possible situations occurring during TMA manufacturing process. The simulated images and ground truths can be freely downloaded at https://get.google.com/albumarchive/117531880452844036890.
∙ The second data set is composed of color TMA images from the AIDS and Cancer Specimen Resource (ACSR) Digital Library of the University of California San Francisco (http://acsr.ucsf.edu). This online library – managed and visualized by Aperio’s WebScope software – contains several hundreds of tissue specimens which are mostly stained with H&E (Hematoxylin and Eosin) stain and are imaged by brightfield microscopy technique.
∙ The third set includes fluorescence highdynamicrange (HDR) images showing DNA microarray and tissue microarray slides. These images were acquired using a scanner called InnoScan 1100AL (see https://www.innopsys.com/en/lifesciencesproducts/microarrays/innoscan/innoscan1100al) and were provided by the courtesy of Innopsys company.
Author information
Authors and Affiliations
Contributions
All the authors contributed to the method design. VP and CC acquired the HDR images using a scanner called InnoScan 1100AL. HNN implemented the algorithm and carried out the experiments. HNN and CK wrote the paper. All authors have read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
This work was done when H.N. Nguyen (Ph.D. student) was with Inria and his PhD thesis was funded by Innopsys company. The authors’ declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file
Additional file 1
Isotropic wavelet frame. Direct wavelet decomposition algorithm and reconstruction. Partial derivatives of the ellipse quadratic form. (PDF 255 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Nguyen, H., Paveau, V., Cauchois, C. et al. ATMAD: robust image analysis for Automatic Tissue MicroArray Dearraying. BMC Bioinformatics 19, 148 (2018). https://doi.org/10.1186/s1285901821118
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1285901821118