The biomedical sciences are fraught with uncertainty. The sources of this uncertainty are manifold. Devices used to monitor biological processes vary in terms of resolutions. Gaps in the full understanding of basic biology compound this problem. Biological diversity or heterogeneity may make predictions difficult. Finally, uncertainty may be due to the unpredictable sources of noise, which can be inside or outside the biological system itself.

In molecular biology uncertainty is ubiquitous; for example, tissue heterogeneity makes it difficult to compare a tissue sample composed of pure tumor cell populations with one composed of tumor and other non-tumoral elements such as supporting structural tissues (i.e. stroma) and vessels. However, in molecular biology, one rarely can examine an entire tumor and biopsies are taken with the assumption that they represent a portion of the whole tumor.

This paper addresses the uncertainty associated with measuring replicate samples. Understanding such kind of uncertainty would help guide decision making and allow for alternate strategies to be explored. Usually, the measurements of replicate samples are averaged to derive a single measurement. This value is then used for example when building a classification system which may play a critical role in the decision making process. Unfortunately, an average measurement (or median value) hides the uncertainty or heterogeneity present in the replicates, and may thus lead to decision making rules which are too reliant on this pooled data. This process may lead to a model that is not sufficiently robust to work in an independent dataset.

TMA studies represent a context where the issue of biological heterogeneity is particularly relevant. Where gene expression microarray experiments provide researchers with quantitative evaluation of transcripts, TMA evaluate DNA, RNA or protein targets through *in situ* investigations (analyses performed on tissues). *In situ* evaluations are characterized by the fact that the morphology of the analyzed samples is intact and therefore the potential biological heterogeneity within tumor tissue can be analyzed. TMAs allow for large scale profiling of tissue samples. For example, TMAs can be used to investigate panels of proteins that may play a role in tumor progression. They have the potential to be easily translatable to a clinical application such as the development of diagnostic biomarkers, e.g., AMACR [1] or to access a therapeutic target, e.g., Her-2-neu [2].

TMA datasets usually include replicate core biopsies of the same tissue from the same individual to ensure that enough representative tissue is available in each experiment and to better represent the biological variability of the tissue itself and of the protein activity (i.e. accurate sampling). Most TMA datasets are evaluated using straightforward pooling of the data from replicates, thus ignoring variations among biopsies from the same patient (the so called within sample variability). The mean, the maximum or the minimum is usually adopted and the strategy may be based on biological knowledge or on known protein associations. However, it has been found that different choices can lead to covariates with different significance levels in Cox regression [3]. Interestingly, when multiple biomarkers are evaluated, one approach is chosen and applied for all of them regardless of the biologic implications.

However the degree of heterogeneity of the tumor tissue may be an important biological parameter. In a probabilistic framework, for example, accounting for the within sample variability caused by the tumor tissue heterogeneity, could alter the probability of a case belonging to a certain class (even changing the predicted class), providing insight into the particular case study. When measurement occurs at different levels, i.e. different biopsies of the same tumor or different tumors, standard statistical techniques are not appropriate because they either assume that groups belong to entirely different populations or ignore the aggregate information entirely.

Hierarchical models (multilevel models) provide a way of pooling the information for the different groups without assuming that they belong to precisely the same population [4]. They are typically used when information is available but the observation units differ (i.e., meta-analysis of separate randomized trials).

Herein we propose a classification model, which accounts for the tumor within sample variability in a probabilistic framework, relying on Bayesian hierarchical models. Hierarchical Bayes models have been used for modeling individual effects in several experimental contexts, ranging from toxicology to tumor risk [4]. For this reason, their use in the classification context seems particularly suitable to handle TMA data and tumor heterogeneity.

The paper is structured as follows: we first provide relevant background on Bayesian classifiers (specifically on the Naïve Bayes classifiers) and on Bayesian hierarchical models. Then we describe the proposed method and compare its performances to a Naive Bayesian classifier, in which we applied standard pooling strategies. The results will be shown on simulated datasets characterized by different ratios of within and between samples variability and on a real classification problem based on TMA data we generated in our laboratory from a prostate cancer progression array (TMA Core of Dana Farber Harvard Cancer Center, Boston, MA) developed to identify proteins that can distinguish aggressive from indolent forms of this common tumor type.

### Bayesian classifiers and the Naïve Bayes

In this paper we focus on classification problems, where, given the data coming from a target case, we must decide to which class the case belongs. For example, given the set of tumor marker values measured on biopsies of a tissue of a given patient, we must decide if the patient is affected by a particular kind of tumor.

From a Bayesian viewpoint, a classification problem can be written as the problem of finding the class with maximum probability given a set of observed attribute values. Such probability is seen as the posterior probability of the class given the data, and is usually computed using the Bayes theorem, as *P*(*C*|X) = *P*(X|*C*) *P*(*C*)/*P*(X), where *C* is any of the possible class values, X is a vector of *N*
_{
feature
}attribute values, while *P*(*C*) and *P*(X|*C*) are the prior probability of the class and the conditional probability of the attribute values given the class, respectively. Usually Bayesian classifiers maximize *P*(X|*C*)*P*(*C*), which is proportional to *P*(*C*|X), being *P*(X) constant given a dataset.

Bayesian classifiers are known to be the optimal classifiers, since they minimize the risk of misclassification. However, they require defining *P*(X|*C*), i.e. the *joint* probability of the attributes given the class. Estimating this probability distribution from a training dataset is a difficult problem, because it may require a very large dataset even for a moderate number of attributes in order to significantly explore all the possible combinations.

Conversely, in the framework of Naïve Bayes classifiers, the attributes are assumed to be independent from each other given the class. This allows us to write, following Bayes theorem, the posterior probability of the class *C* as: *p*(*C*|X) = ∏_{l = 1}
^{
Nfeature
}
*p*(**X**
^{
l
}|*C*) *p*(*C*)/*P*(X). The Naïve Bayes classifier is therefore fully defined simply by the conditional probabilities of each attribute given the class. The conditional independence property largely simplifies the learning process of the model from data. In presence of discrete and Gaussian data this process turns out to be straightforward. Despite its simplicity, the Naïve Bayes classifier is known to be a robust method, which shows on average good performance in terms of classification accuracy, also when the independence assumption does not hold [5, 6]. Due to its fast induction, the Naïve Bayes classifier is often considered as a reference method in classification studies. Several approaches have been proposed to generalize such classifier [7] and there has been a recent interest in applying hierarchical models to Bayesian classification, such as in the field of expression array analysis [8, 9]. In this paper we present a step forward, describing a hierarchical Naïve Bayesian model which can be convenientln used for classification purposes in the presence of replicated measurements, as for TMA data.

### Bayesian Hierarchical Models

Bayesian hierarchical models (BHM) [4] are powerful instruments able to describe complex interactions between the parameters of a stochastic model. In particular, BHMs are often used to describe population models, in which the parameters characterizing the model of an individual are considered to be related to the parameters of the other individuals belonging to the same population. In this paper we will cope with the problem of classifying tumors of different patients for which repeated measurements of tumor markers are available. The probability distribution of such measurements will depend on the patient; however, all the patients suffering from the same disease will be assumed to be related to each other in terms of tumor marker probability distributions.

Bayesian hierarchical models provide a natural way to represent this relationship by specifying a suitable conditional independence structure and a suitable set of conditional probability distributions.

The simpler structure of a BHM can be summarized as follows: let us suppose that a certain variable *x* (e.g. a tumor marker) has been measured *n*
_{
rep
}times in *m* patients belonging to the same population, i.e. they have the same tumor type. Let us also suppose that *x* is a stochastic variable depending on a set of parameter *θ*, so that for the *i*-th subject, such dependency is expressed by the probability *p*(*x*
_{i}| *θ*
_{
i
}). The assumption that the individuals are "related" to each other can be then represented by introducing the conditional probability *p*(*θ*
_{
i
}|*ϕ*), where *ϕ* is a set of hyper-parameters typical of a population. In this way, each subject is characterized by a probability distribution that depends on population parameters which are common for all individuals of the same population. If we assign a prior distribution to *ϕ*, say *p*(*ϕ*), the joint prior distribution will be *p*(*ϕ*,*θ*) = *p*(*θ* |*ϕ*)*p*(*ϕ*), where *θ* = {*θ*
_{1}, *θ*
_{2}, ..., *θ*
_{
m
}}. Once a data set X = {X
_{1},..., X
_{
m
}} is observed on all *m* patients, where **X**
_{
i
}= (*x*
_{i1}, *x*
_{i2},...,
) is the measurement vector for the *i*-th patient, the joint posterior distribution *p*(*ϕ*, *θ*|X) can be computed by applying the Bayes theorem. It is easy to show that such distribution is proportional to *p*(X|*θ*) *p*(*θ*|*ϕ*) *p*(*ϕ*). Since the population parameters are usually unknown, the integral of such equation over *ϕ* allows to calculate the posterior distributions for the model parameters *θ* given the data coming from all patients.

BHMs have been applied in a variety of contexts, ranging from signal processing [10] to medicine and pharmacology [11, 12] and to bioinformatics. [13–16]. Moreover, several computational techniques for specifying and fitting BHMs have been introduced also to deal with discrete responses, multivariate models, survival models and time series models. Useful reviews can be found in papers and books [17, 18]. Recently, hierarchical models have been applied to extend the Naïve Bayes model in order to relax the assumptions of conditional independence between attributes [19]. In our paper we will use hierarchical models to handle repeated measurements and their heterogeneity.