### Segmentation

Segmentation of tissue images involves separating a tissue image into individual cells. It is done by identifying regions with common properties or identifying contours which delineate regions. So, a natural way to segment such regions is through thresholding the intensity. This method is optimal for thresholding large objects and those with fairly distinct classes, but does not work well with small objects with blurry edges [16]. Active Contour Models (ACM), first introduced by Kass *et al*, represent an intelligent way of detecting boundary edges by considering boundaries as inherently connected and smooth structures [20]. An energy term is associated with the contour and is designed to be inversely proportional to the contour's smoothness and fit to the desired image features. Certain forces can be designed (or derived from energy terms) in a way that the resulting contour deformations will reduce the contour's energy. Because of the way the contours slither while minimizing their energy, ACM are also called snakes. The contour is said to possess an energy given by the sum of the three energy terms: internal, external and constraints. The energy terms are defined in such a way that the final position of the contour will have a minimum energy and therefore the problem of detecting objects reduces to an energy minimization problem. A caveat for active contours is that cells are under segmented when the border between the clustered cells are much brighter than the border between cell and background.

Because classical snakes and active contour models rely on the edge-function, depending on the image gradient, to stop the curve evolution, these models can detect objects only with edges defined by the gradient [

21]. In practice, the discrete gradients are bounded and then the stopping function is never zero on the edges, and the curve may pass through the boundary. If the image is very noisy, then the isotropic smoothing gaussian has to be strong, which will smooth the edges too. Tony and Vese proposed a different active contour model, without a stopping edge-function, i.e, a model which is not based on the gradient of the image

*f*(

*x*) for the stopping process [

12]. The stopping term is based on Mumford-Shah segmentation techniques [

22]. The energy function of the active contour based on this function is given by:

where *ϕ* is the level set function defined on Ω whose zero level set {(*x*) ∈ Ω|*ϕ*(*x*) = 0} defines the segmentation such that *ϕ* > 0 is inside the cell and *ϕ* < 0 is outside the cell. *c*
_{
I
}and *c*
_{
O
}are mean intensities of pixels inside and outside the zero level set. *H* and *δ* are the Heaviside and Dirac functions. *α*, *λ*
_{
I
}and *λ*
_{
O
}are fixed positive parameters.

The minimization of image energy is achieved by evolving the level set for time

*t*, starting from an initiation

*ϕ* (

*t* = 0,

*x*) according to,

where ▽.
is the mean curvature of the level set, generating a regulating force which smoothens the contours. The two forces expand or shrink the contour towards the actual boundary of the cells. We segmented the cells using the above method.

### Modeling cell trajectories

Among the different models developed for describing a stochastic process, auto-regressive model (AR) is perhaps the most popular [

13,

14]. The practical utility of an AR model becomes compelling when the stochastic process is non-stationary especially biological cell movement which sustains spatio-temporal patterns. An AR model computes the position of a cell

*o* at time

*t* based on the previous positions by,

where *o*(*t*) is the centroid of the cell at time *t*, *β*
_{0} is a constant mostly ignored for simplicity purposes, *β*
_{
τ
}are autoregressive parameters, and *ε*(*t*) is the noise level at time *t* included to cover the possible cell positions.

### Quantifying cell motility

Eukaryotic cell migration in isotropic environments can be described as a persistent random walk. Over short time periods, cells follow a relatively straight path, showing persistence of movement. If long time intervals are used to observe the cell position, however, cell movement appears similar to Brownian motion with frequent direction changes. If a cell is executing a random walk, its expected distance (or displacement) <

*d* > of its centroid from its original position varies with time according to the formula.

where <*d*
^{2} > denotes the mean square displacement of the cell, *γ* is the random motility coefficient (formally equivalent to a diffusion coefficient), and *m* is a constant giving the dimensionality of the random walk. According to the above formula, the average distance travelled by a cell is proportional to the square root of the elapsed time. Although they cover short distances rapidly, cells performing random walks travel long distances much more slowly.

At least two parameters are needed to describe a persistent random walk [

23]. The first characteristic of cell movement is the persistence time

*ρ* which is the measure of the average time between significant direction changes. The second motility parameter is the cell speed

*ν* that is intuitively defined as the displacement of the cell centroid per unit time. If the speed is computed in this fashion, care must be taken to use time intervals small enough so that cells move in a constant direction. The persistence time

*ρ* and cell speed

*ν* can also be rigorously defined using mathematical analysis. Starting from different assumptions about the details of cell paths, [

24,

25] developed the following mathematical model to describe persistent random walks:

For long times (

*t* >>

*ρ*), the above formula reduces to the much simpler expression:

The persistent random walk analysis is applicable only when cell movement takes place in an isotropic environment. Modifications are necessary to analyze biased cell movement (e.g., in the presence of a chemoattractant) or to check whether cell locomotion has a preferred direction. One such approach is based on the stochastic concept of Markov chains.

The method proposed by Dickinson and Tranquillo uses a generalized non-linear regression algorithm wherein the cell tracks are assumed to consist of a sequence of cell positions associated with a series of increasing time points differing by a constant time increment [

26]. If

*o*(

*t*) represents the centroid of the cell at time

*t*, then the squared displacement

*d*
^{2}(

*τ*) of the cell over time interval

*τ*, from

*o*(

*t*) to

*o*(

*t* +

*τ*), is:

then, *d*
^{2}(*τ*) is considered a random variable with expected value *η*
_{
τ
}≡ <*d*
^{2}(*τ*) > = <*d*
^{2}(*t*) >, where *η*
_{
τ
}is the theoretical mean-squared displacement over *τ*.

To obtain the measured mean squared displacement of any cell, several squared displacements over the cell track should be averaged. Two obvious and commonly used sampling methods, overlapping and nonoverlapping can be used. The total number of samples available from a single track is maximized by averaging squared displacements from overlapping time intervals. However, it is not statistically independent. An alternate method is to average only nonoverlapping intervals. Speed and persistence were calculated by fitting mean square displacement to a persistent walk model. Since the distance travelled in the given time is known, speed was calculated directly and persistence by fitting the model.