Skip to main content

Statistical shape analysis of tap roots: a methodological case study on laser scanned sugar beets



The efficient and robust statistical analysis of the shape of plant organs of different cultivars is an important investigation issue in plant breeding and enables a robust cultivar description within the breeding progress. Laserscanning is a highly accurate and high resolution technique to acquire the 3D shape of plant surfaces. The computation of a shape based principal component analysis (PCA) built on concepts from continuum mechanics has proven to be an effective tool for a qualitative and quantitative shape examination.


The shape based PCA was used for a statistical analysis of 140 sugar beet roots of different cultivars. The calculation of the mean sugar beet root shape and the description of the main variations was possible. Furthermore, unknown and individual tap roots could be attributed to their cultivar by means of a robust classification tool based on the PCA results.


The method demonstrates that it is possible to identify principal modes of root shape variations automatically and to quantify associated variances out of laserscanned 3D sugar beet tap root models. The introduced approach is not limited to the 3D shape description by laser scanning. A transfer to 3D MRI or radar data is also conceivable.


In breeding and precision agriculture there is a need for a precise description of the 3D architecture of a crop, a plant organ or a harvested product, in a fast and reproducible way [1].

Potential applications involve automated selection procedures in plant breeding of phenotypes with the desirable features and traits (like grain or root shape), assessment of crop development during the growth period, enabling an optimised crop management and sorting out special forms or non-crop contaminants (stones, weed seeds) of harvested products. An additional application is the adaption of harvester settings, based on archetypal 3D-geometries of harvested cultivars to avoid mechanical damage and by this quality reduction.

Yield formation is the combined result of genetic characteristics and environmental effects. In sugar beets the annual increase in sugar yield accounts 1.5% [2]. This is mainly due to an increase in the root yield, while the sugar content of varieties remained stable [3]. One of the main targets of sugar beet breeding is the root development, including relevant quality parameters such as sugar content, non-sugar compounds or mark-content. It can be observed that the dynamic of storage root development in sugar beet has no specific growth stages. There is no phase of maturation, and therefore, yield development and potential can be estimated depending on the duration of the growing period [3]. Unfortunately, in sugar beet development of leaf biomass is not correlated to storage root biomass or yield formation. However, the shape of the sugar beet tap root plays a significant role for the entire processing chain.

Besides effects of varieties, it is well known that the shape of tap roots mainly depends on the type and status of soil, management and environmental conditions [4, 5]. A strong correlation exists between basic root shape parameters like area, length, or radial variation and sugar yield and quality. Additionally, size and shape is characteristic for cultivars. Factors such as formation of branches or soil-adhesion are relevant breeding traits. On-field devices are proposed to measure and monitor basic parameters of tap roots [6] with varying degrees of success. Shape models are used for the classification of crops and weeds [7]. Image processing methodology already plays a significant role for the observation of plant growth [8]. Computer vision is also used for the analysis of plant root shapes. However, currently these methods are mostly based on 2D imaging modalities [9], whereas the most precise description would result from a robust statistical analysis of the true 3D description of tap roots. First results in imaging the 3D shape of sugar beet have been used to extract scalar parameters of the tap root [10] such as height, width, volume and surface area. By imaging over time the development and growth can be observed. The resulting crop growth model can be improved by a true 4D description of the crop plants [11]. Analysis of 3D point clouds with recent mathematical models or machine learning approaches further improve the efficiency and biological interpretability of plant sensor data [12]. This helps to assess the genotype-phenotype-environment interactions and to dissect important traits [13] [14].

Agricultural crops have to cope with adverse environments but still have to maintain productivity at the highest possible level. Here, breeding has to identify and make use of the best combinations of traits [15, 16]. These traits are mostly quantitative and inherited in a complex manner [13]. Although morphological changes are downstream effects of altered gene expression, metabolic adaptations and environmental changes, their precise, unequivocal, and unbiased identification is of utmost importance in a breeding process to assess Genotype × Environment × Phenotype (G ×E×P) interactions [17]. So far, breeding involves classification of desirable traits (usually around 20) by using empirical scales, which can be assessed mostly by visual assessment in traditional breeding schemes [15]. Considering that there are easily 10,000 individuals or more that have to be classified within a very short time span, the common method is prone to human bias (tiredness, adverse and rapidly changing light conditions). The same holds true for the harvested produce where shapes e. g. of tuber crops have to be assessed. Unequivocal identification of shapes or outlines is of high importance as well for purposes of precision farming where there is a need for an on-the-flight identification of shapes and their assessment, e. g. for the differentiation between crop and weed plants, diseased and healthy plants or plant organs or species identification (see e. g. [18]). Further refinement of phenotyping and the increase of traits that can be simultaneously observed calls for objective and automated ways of trait identification. There have been various attempts so far to describe leaf or tap root shapes with elliptic Fourier analysis [1719]. The challenge is now to analyze true 3D shape variations into main components, while most methods so far rely on 2D image analysis [17, 20].

Knowledge of the statistical variation of root shapes has various applications. In plant breeding such information can be used to select tendencies with favorable traits, such as an even shape, large crop size, or an optimal form for the use of crop harvesters and to minimize breaking root tips, soil tare and mechanical damages. In crop management the statistical data may help to detect deviations from the expected plant development, which can be used in a feedback mechanism to adjust growth control parameters, such as fertilizer or pesticides.

In this study we apply a computer vision tool for statistical analysis that is purely shape based. Statistical models of shape have been used widely in computer vision and graphics [21]. In a 2D setting, PCA-based models such as Active Shape [22] or Appearance Models [23] provide a parametric representation of shape that can be used for segmentation, tracking and recognition. In a 3D setting, they are typically used for fitting to noisy or ambiguous data or for 3D reconstruction via analysis-by-synthesis. Essentially, the statistical model provides a constraint that significantly reduces the parameter space for many shape processing problems.

It is a wide spread assumption that the input data is collected from a shape space that is considered as a Riemannian manifold. The classical treatment of shape space is due to Kendall [24], in which sets of landmarks are considered points on a shape manifold in which the effects of scale, rotation and translation are factored out. The tangent plane to Kendall’s shape space enables linear principle component analysis (PCA) in which Euclidean distance approximates Procrustes distance. Srivastava et al. [25] propose a representation for analysing shapes of curves under an elastic metric. Killian et al. [26] model the space of triangulated shapes. Statistical analysis can be performed on shape spaces in a manner that respects the Riemannian geometry of the manifold. This requires Riemannian notions of concepts such as distance, mean value and covariance [27]. Based on these quantities Fletcher et al. [28] transferred the concept of PCA to manifolds by considering a principal geodesic analysis (PGA). The idea of PGA was used by Tournier et al. [29] to build a statistical skeleton model and by Heeren et al. [30] to perform a statistical analysis in the space of triangle meshes. However, all these approaches assume the underlying shape space to be a Riemannian manifold. In particular, the dissimilarity of two shapes is quantified by the length of an optimal, connecting curve. In contrast, we here consider a purely elastic model that measures shape dissimilarity by the amount of elastic deformation energy [31]—for details on the physical and geometrical differences we refer to [32]. Using the physical model of 3D elasticity Rumpf and Wirth compute shape averages [33] and describe a covariance analysis [34, 35] of shapes represented as boundary contours of elastic objects. Here the elastic average is defined as the shape that minimizes the sum of elastic deformation energies to all the given input shapes. With an average at hand a classical PCA is then applied to the displacements with respect to the input shapes. This elastic shape analysis was performed on the space of triangle meshes in [36] and will be applied in the present work to the shape space formed by sugar beet tap roots.

In our experiment the acquisition of input shapes is realized based on 3D laser scanning—a method to recover 3D point clouds from objects. This method is well established in the agricultural context and has been used on plants for 3D modelling of the canopy of tomato plants [37], for in-field scanning of pear-trees [38] and imaging physiological responses of leaves [39]. More detailed scans for organ specific parameterization were also possible when using close-up laserscanning. This enables e.g. identification of single organs [10, 12, 40] or tracking of growth on organ level [41], and it has shown to be very accurate [42]. Close-up scanning enables high resolution and high accuracy imaging with point to point distances below a millimeter [43]. This enables highly accurate 3D surface models of various plant types.

Our contribution. In this work, we demonstrate the extraction and mathematical description of characteristic shape features in the tap roots of different cultivars of sugar beets. To this end, a huge data set consisting of detailed 3D descriptions of beet root samples is reduced to a small number of important parameters without loosing relevant biological information. Our main contributions are twofold. First, we perform a shape based analysis of laser scanned sugar beet tap roots. In particular, one can compute a robust and reliable shape mean as well as principal modes of shape variation on large ensembles of sugar beets with a substantial variability in the shape geometry. In particular, the statistical tool is not based on predefined quantitative properties (such as length or volume), and it is invariant under rigid body motions. Second, we propose an automatic classification tool based on results from the statistical analysis. The number of tap roots that were classified correctly without having been in the training data set is significantly higher than random classification. This indicates that a cultivar based assignment of a tap root to the corresponding cultivar is possible by only using geometrical shape parameters. The introduced approach is not limited to the 3D shape description by laser scanning and can be generalized for any other 3D measuring device with high precision as structure-from-motion approaches [44] or volume-carving methods [45].


We introduce a method for the statistical analysis of large ensembles of 3D tap roots as well as a classification tool based on information gained from the statistical analysis. To prove the benefits of our approach, four cultivars with 35 sugar beet tap roots each were measured with a laser scanner. After several pre-processing steps each beet is finally represented as a characteristic function.Footnote 1 In order to validate our classification tool, we randomly removed 5 beets of each cultivar from the training data set. Afterwards, we computed a shape average as the minimizer of an elastic matching functional for each of the four training data sets separately. In the same physical setup, a principal component analysis has been applied to the displacements between average and input shapes. This way, we obtained principal modes of main variation within all four training data sets. Again, each of these modes is represented by a displacement of the shape average.

The results are shown in Figs. 1 to 4 for the four different cultivars. In each figure we show the input shapes on the left and the five main shape variations on the right, where the middle shape (dark gray) represents the average beet shape. To better illustrate the directions of main variations, the displacements are displayed with different positive and negative magnitudes. Note that all input beets were scaled to have uniform volume, hence the previously most dominant mode of uniform scaling is no longer relevant. However, the relative order within the variances has not been affected substantially by the rescaling. In detail, if \(\left (\tilde \lambda _{k}\right)_{k\geq 0}\) describe the variances (in decreasing order) before the uniform rescaling, where \(\tilde \lambda _{0}\) is the dominant eigenvalue that represents the impact of uniform scaling, and (λk)k≥1 the variances after rescaling, we observe \(\lambda _{k-1}/\lambda _{k} \approx \tilde \lambda _{k-1} / \tilde \lambda _{k}\) for k>1.

Fig. 1
figure 1

Input beets of sort Berenika (left) and first five modes of main variation (right) with average beet in dark grey and corresponding variances λ1,…,λ5 (multiplied by 100)

For the variation in cultivar Berenika (see Fig. 1) the first mode can be described as a tendency to a multiple or at least double apex (2.01), the second variation as a tendency to a long apex or a more dull one (1.00). For the cultivar Cesira (see Fig. 2) the first variation (1.99) is according to the second mode of variation of Berenika, the tendency from a long apex to a dull one. The second variation is similar to the first one of Berenika (1.63). The main variation for the cultivar Mauricia (see Fig. 3) is similar to the long apex or a dull one (0.94), the second variation is the affinity to a multiple or clear and pointy apex (0.79). The first two main variations of Pauletta (see Fig. 4) can be described as the affinity to a pointy or dull apex (0.93) and the tendency from a clear and pointy apex to a second apex at the side (0.40). Note that our model is able to capture the tendency of some cultivars to develop multiple apices, although this introduces a strong non-convexity in the modes. This is for example visible in the first two modes of Berenika (see Fig. 1) and the third mode of Mauricia (see Fig. 3), which represent an initial growth of a second apex from the root body. Even more strikingly, all depicted modes of Cesira (see Fig. 2) can be associated with the development of apices which is obviously a characteristic feature of the corresponding training data.

Fig. 2
figure 2

Input beets of sort Cesira (left) and first five modes of main variation (right) with average beet in dark grey and corresponding variances λ1,…,λ5 (multiplied by 100)

Fig. 3
figure 3

Input beets of sort Mauricia (left) and first five modes of main variation (right) with average beet in dark grey and corresponding variances λ1,…,λ5 (multiplied by 100)

Fig. 4
figure 4

Input beets of sort Pauletta (left) and first five modes of main variation (right) with average beet in dark grey and corresponding variances λ1,…,λ5 (multiplied by 100)

In a second step, the principal modes were used to derive a classification tool. Based on the classical Mahalanobis distance, we first define a distance measure for each cultivar. Then a given beet is classified to the cultivar that induces the lowest distance. For example, when computing the distance of that given beet to cultivar Pauletta, we effectively compute the distance to the average shape of the training data set of Pauletta. In particular, the distance measure penalizes deviations in direction of dominant principal modes of this data set less, as these directions are supposed to be characteristic for cultivar Pauletta.

Classification results are depicted in Figs. 5 and 6, respectively. In detail, we show for every beet to be classified the (extended) Mahalanobis distance to each of the four cultivars as vertical color bars whose height is proportional to the distance (cf. Fig. 5). For example, the height of the blue bar always represents the distance to the Berenika training set. The validation of the classification reveals a significantly better success (i.e. 55%) in comparison to random classification (with a p-value of 0.0045). Note that we considered only the first 13 principal modes for each cultivar to design the distance measures (details will be explained in the “Methods” section). If we ignore the principal modes and simply compute the distance to the average we still obtain a correct overall classification rate of 40%. However, we observe huge differences in classification success when distinguishing between cultivars (cf. Fig. 6). In detail, a beet from cultivar Pauletta was classified correctly in 80% of the trials, whereas Berenika classification success was more or less random (i.e. 20%).

Fig. 5
figure 5

Five beets of sort Berenika (blue), Cesira (red), Mauricia (black) and Pauletta (green) to be classified. Vertical color bars upon each shape are proportional to the distance to the four different cultivars. Overall 55% beets were classified correctly (whereas 25% is random)

Fig. 6
figure 6

Visualization of results shown in Fig. 5 by means of a confusion matrix to demonstrate differences in classification success for the four different cultivars

Simplified PCA approach. For comparison reasons we also computed a Euclidean principal component analysis on five characteristic parameters extracted directly from the 3D point cloud. In detail, these five different parameters are root length, width, surface and volume (cf. [10]) as well as root complexity which is defined as the quotient between root surface and volume.

Figure 7 shows the distribution of the different cultivars using the first two principal components of the PCA. Obviously, distinguishing different cultivars or even a reliable classification is hard, if not impossible.

Fig. 7
figure 7

A principal component analysis of measured root traits based on five measured traits for comparison. By using these parameters a differentiation between the cultivars is hard to achieve


The aim of our study was to perform a statistical shape analysis and to develop a method which allows to distinguish between different cultivars in a non-parametric way. Figs. 1 to 4 show how our method identifies the main components of shape variation. These are mostly in line with traits for sugar beet shapes like size, maximum length, maximum width, average radius, radial variation, circularity, the ratio width to length, and further shape factors like surface roughness or furrow formation. Moreover, properties that seem to be characteristic for a particular training data set can actually be extracted in form of dominant variations. For example, the training data set of Pauletta (cf. Fig. 4, left) suggests that this cultivar tends to have a rather slim and longish body that varies predominantly in length whereas branching of the main tap root is usually not preferred. On the other hand, the clearly separated first principal mode (λ1>2λ2) captures exactly this variation with respect to length and none of the first five modes represents a clear branching (compared to the other cultivars in Figs. 1 to 3). Besides the qualitative modes of variation, our method allows a robust quantification of the associated variance of the main components of 3D shape variation. A next step would be to compare the results also at different environments or sets of stress conditions which enables quantitative statements about the influence of these parameters on the plant’s development. The present outcomes suggest that it will become possible to dissect environmental from genotypic effects on the phenotype and thus allow a true G ×E×P interaction analysis [13]. Furthermore, it is envisaged that the proposed method is adopted to other plant organs and species and likewise will be useful for approaches in precision farming.

The results of Fig. 5 have shown a classification accuracy of about 55% which is significantly better than a random assignment. However, the robustness of the classification can still be improved. In particular, the dependence on the number Mj of considered principal modes in the definition of the distance measure has to be further investigated. Furthermore, the striking differences of classification success between different cultivars (cf. Fig. 6) has to be explored and explained in detail. We believe that increasing the number of samples in the training data sets might improve the robustness as well as the classification success of individual cultivars.

Currently, the resolution of the shapes is limited by the spatial resolution of the characteristic functions which are defined on a 1293 voxel grid. The choice of spatial resolution is a trade-off between a desired level of details and limitation of computation time. However, the sensor provides a resolution of about 50 microns, hence it is desirable to be able to use the full resolution in the numerical simulations as well. To this end, future work will focus on the reduction of the algorithm’s complexity on the one hand, e.g. by using linearized elasticity and to use more sophisticated algorithmic features on the other hand, such as adaptive grids and advanced parallel computing techniques. An altenative approach is applying the method proposed in [36] to the original triangle meshes constructed directly from the laser scans. This shell PCA performs the same statistical shape analysis but with a different physical model (in detail, shapes are treated as hollow objects and one studies elastic deformations of the surface only). Moreover, it might be interesting to investigate the potential application of machine learning approaches to tackle the classification problem. Finally, we aim at the replacement of a manually moved scanner by a device that is able to perform a high number of scans automatically. This replacement is in particular necessary since we aim at increasing the number of samples in the training data sets.


We applied an established computer vision algorithm to perform a statistical shape analysis on 3D laser scanned tap roots. The method is applied to sugar beets, where the mean root shape and the main variations within a group of sugar beets of the same growth period have been computed. Our investigations can be considered as a case study for the statistical analysis of storage roots without predefined classification criteria. The method indeed demonstrates that it is possible to identify principal modes of shape variation automatically and to quantify the associated variances. In particular, the resulting dominant modes of variations can be used to cluster scanned tap roots into categories which forms the basis for linking growth and environmental conditions. Furthermore, this statistical shape analysis can be used in combination with other invasive or non-invasive sensors that access the 3D shape and can be applied to other plant organs and species as well. Since non-invasive sensors such as MRI or radar imaging are usually affected by significant noise the statistical analysis of large ensembles of laser scanned root shapes could help to increase the accuracy of the classification of noisy input data.


Plant material. Sugar beets were grown during summer 2012 in a central experiment of The objectives of the central experiment were the provision of ground truth data for the interpretation of sensor measurements. The soil was a silty loam soil (WRB: Haplic Luvisol) at the Klein Altendorf experimental research station (659’N, 5037’E) near Bonn, Germany. The mean annual temperature is 9.6 C with an average annual precipitation of 625 mm. The cultivars Pauletta, Berenika, Mauricia and Cesira—obtained commercially from KWS SaatFootnote 2—were chosen because of their differing phenology. At the end of the growing season 35 randomly chosen beets of each variety were sampled, i.e. 140 beets in total. Rows for sampling were 2 m ×3 m within a plot of 50 m2 each. Seed density was 90,000 per hectare at a row distance of 0.2 m (plant to plant distance).

Data acquisition. To acquire the 3D shape of the sugar beet tap roots a close-up laser scanner coupled to an articulated measuring arm device [10, 40] was used (see Fig. 8). This is a well evaluated combination for 3D plant imaging [41, 42]. Hardware details such as resolution and accuracy are given in Table 1. With its seven degrees of freedom it is possible to image the plant from various viewpoints to get an occlusion-free 3D model. The output of the laser scanner is a point cloud with XYZ coordinates with more than 3 million points that is later parsed to an automatic triangulation algorithm.

Fig. 8
figure 8

The laserscanner-measuring arm combination with its seven degrees of freedom enables the 3D imaging of the complete sugar beet tap root. It provides a point accuracy of 45μm and a point resolution of 17μma. An RGB image of a sugar beet b is shown together with its laserscanned point cloud with a shaded visualization with 3 million points c and the down-sampled and smoothed wireframe representation d

Table 1 Hardware details for the measuring setup

Preprocessing (cf. Fig. 9, left column). Before being fed to our statistical method we made use of the commercial 3D-CAD-Software Geomagic Studio 12 to apply basic preprocessing routines. First, as the sensor is moved manually (see Fig. 8), visible parts from the mounting device and the measuring table had to be removed. Second, the integrated outlier removal function was used as well as a grid-based reduction of the point density (to an average distance of 0.5 mm) to enable a smooth surface generation. Subsequently, the automatic point cloud triangulation was performed to approximate the surface boundary \({{\mathcal {S}}}\) of the solid root shape \({{\mathcal {O}}}\), i.e. \({{\mathcal {S}}} = \partial {{\mathcal {O}}}\). We then applied a uniform rescaling such that all shapes have the same (inscribed) volume.Footnote 3

Fig. 9
figure 9

Workflow of our method for data pre-processing (left), statistical analysis (middle) and classification (right)

The proposed statistical method requires the representation of each root shape via a characteristic function \(\chi _{{{\mathcal {O}}}}\) on the computational domain Ω=[0,1]3, i.e. we have \(\chi _{{{\mathcal {O}}}}(x) = 1\) if \(x \in {{\mathcal {O}}}\) and \(\chi _{{{\mathcal {O}}}}(x) = 0\) else. Then, the sugar beet surface \({{\mathcal {S}}}\) is implicitly represented as the interface between these two regions. To obtain such a characteristic function, we first compute the signed distance function \(d:\Omega \to {{\mathbbm {R}}}\) of \({{\mathcal {S}}}\), see [46]. In detail, |d(x)| represents the shortest distance from xΩ to \({{\mathcal {S}}}\), where d(x)>0 if x is in the interior of \({{\mathcal {O}}}\) and d(x)<0 if \(x \notin \bar {{\mathcal {O}}}\). To this end, Ω=[0,1]3 is discretized by a regular grid Ωh with 1293 nodes (129 equally spaced vertices in each spatial direction of the unit cube). Mathematically, we aim at solving the nonlinar partial differential equation |d(x)|=1 for all xΩh with the boundary condition d(x)=0 if \(x \in {{\mathcal {S}}}\), the so-called Eikonal equation. To this end, we first evaluate the signed distance d(x) on the grid nodes xΩh closest to the triangles of \({{\mathcal {S}}}\). Afterwards, we compute its values far from the triangular surface via information propagation by a fast marching algorithm [47] to obtain the solution \(d\colon \Omega _{h} \to {{\mathbbm {R}}}\) of the PDE. Since the triangulated surface might have been non-closed due to local scan deficiencies or self-occlusion, the signed distance function d may have an incorrect sign at a number of nodes. As a remedy, we regularize d by seeking a minimizer dnew of the energy

$$\int_{\Omega}\left(|d|_{\epsilon}-|d_{\text{new}}|_{\epsilon}\right)^{2}\,{{\mathrm{d}}} x+\sigma\int_{\Omega}|\nabla d_{\text{new}}|^{2}\,{{\mathrm{d}}} x\,. $$

Here, \(|d|_{\epsilon }=\sqrt {|d|^{2}+\epsilon ^{2}}\) with ε=10−4 denotes a differentiable approximation of the absolute value function and the regularization parameter was set to σ=0.01 in our application. The nonlinear optimization is performed using a finite element approach with multilinear basis functions on the regular grid Ωh and a standard trust region method [48]. Once dnew is computed we are finally able to define the characteristic function. To this end, we set \(\chi _{{{\mathcal {O}}}}(x)=1\) if dnew(x)<0 and \(\chi _{{{\mathcal {O}}}}(x)=0\) else, which is done for all grid nodes xΩh. After data compression, the required memory to store the function \(\chi _{{{\mathcal {O}}}}\) is comparable to the one needed to store an explicit triangular representation of the shape \({{\mathcal {S}}}\).

Statistical analysis model (cf. Fig. 9, middle column). In the following we will generalize the standard statistical analysis of point sets in a (linear) vector space, e. g. \({{\mathbbm {R}}}^{n}\), to the space of shapes S, where \({{\mathcal {S}}} \in {{\mathbf {S}}}\) represents the surface of a volumetric object \( {{\mathcal {O}}} = {{\mathcal {O}}}({{\mathcal {S}}}) \subset {{\mathbbm {R}}}^{n} \). To underpin this generalization let us briefly recall the basic concepts of the mean and the principal component analysis (PCA) on vector spaces [49].

The arithmetic mean \(\bar x\) of m points x1, x2, …, xm in \({{\mathbbm {R}}}^{n}\) is given as \(\bar x = \frac 1m (x_{1} + x_{2} + \ldots + x_{m})\). It can also be characterized as the minimizer of the energy \(E[x] = {\sum \nolimits }_{i=1}^{m} {{\mathcal {W}}}[x_{i}-x]\), where \({{\mathcal {W}}}[y]=\mu |y|^{2}\) is the elastic energy stored in a spring stretched along a vector y and μ is the associated stiffness coefficient of the spring. Indeed, the arithmetic mean is the equilibrium position of the central hub of a network with m springs, where one end of each spring is attached to one of the input points and the other end is connected to all other springs at a hub so that all springs pull at this hub. If \(X = \left [x_{1} - \bar x|\ldots |x_{m} - \bar x\right ]\) is the n×m (centered) data matrix, the covariance matrix is given by \(\tfrac 1mXX^{T}\). A PCA now involves a spectral decomposition of this n×n matrix, where n might be very large. Hence we use the fact that the m×m matrix \(C= \tfrac 1m X^{T} X\) has the same (non-trivial) eigenvalues as the covariance matrix, where mn. Note that \(C_{ij} = \frac 1m g\left (x_{i}-\bar x,x_{j}-\bar x\right)\) where g denotes a suitable scalar product on \({{\mathbbm {R}}}^{n}\) (typically the standard Euclidean inner product \(\left (x_{i}-\bar x\right)\cdot \left (x_{j}-\bar x\right)\)). If C=QΛQT is the singular value decomposition of C with an orthogonal matrix Q and a diagonal matrix Λ with diagonal entries λ1λ2≥…≥λm, then the eigenvectors of the covariance matrix are obtained via \(w_{j} = {\sum \nolimits }_{i=1}^{m} \sqrt {\lambda _{j}^{-1}} q_{j}^{i}\left (x_{i}-\bar x\right)\). These are exactly the principal directions of variations, and the eigenvalues λj describe the variance in that direction. Here, \(q_{j}^{i}\) denotes the ith entry in the jth column Q. As g is supposed to be positive (semi-)definite, we have λi≥0 for i=1,…,m. Furthermore, if we assume x1,…,xm to be linearly independent we have λ1≥…≥λm−1>λm=0 as we consider centered data, i.e. \(x_{1}-\bar x, \ldots, x_{m} - \bar x\) spans a (m−1)-dimensional subspace.

Now this general concept is transferred to shape statistics. In the context of tap root shapes \({{\mathcal {S}}}_{i}\) for i=1,…,m, the corresponding volumetric objects \({{\mathcal {O}}}_{i} = {{\mathcal {O}}}\left ({{\mathcal {S}}}_{i}\right)\) play the role of the points xi, and the vector xix is replaced by the optimal deformation \(\phi _{i}:{{\mathcal {O}}}_{i}\to {{\mathbbm {R}}}^{n}\) of the object \({{\mathcal {O}}}_{i}\) into an object \({{\mathcal {O}}}\). Here, optimal means that the deformation ϕi costs the least elastic energy \({{\mathcal {W}}}\left [\psi _{i}, {{\mathcal {O}}}_{i}, {{\mathcal {O}}}\right ] = \int _{{{\mathcal {O}}}_{i}} W\left (D \psi _{i}\right) {{\mathrm {d}}} x\) of all deformations with \(\psi _{i}({{\mathcal {O}}}_{i}) = {{\mathcal {O}}}\), where W(·) denotes a hyperelastic energy density acting on the local deformation gradient \(D \psi _{i}\in {{\mathbbm {R}}}^{n\times n}\). This energy and thus the thereby defined shape mean is invariant under rigid body motions, i. e. it does not change if the position or the orientation of the root shapes is varied. The arithmetic mean of m input objects \({{\mathcal {O}}}_{1},\,{{\mathcal {O}}}_{2},\, \ldots, \, {{\mathcal {O}}}_{m}\) is defined as the object \(\bar {{\mathcal {O}}}\) which minimizes the energy

$$\begin{array}{*{20}l} E[{{\mathcal{O}}}] = \sum\limits_{i=1}^{m} {{\mathcal{W}}}\left[\phi_{i},{{\mathcal{O}}}_{i},{{\mathcal{O}}}\right]\,. \end{array} $$

For details we refer to [33]. Different from the vectors \(x_{i}-\bar x\) above, the deformations ϕi are strongly nonlinear and cannot be used in a (linear) PCA. Instead, one can replace the deformation ϕi by its linear representative, the associated elastic stress (cf. Fig. 10)

$$\sigma_{i} = W'\left(\left(D\left(\phi_{i}^{-1}\right)\right)^{-1}\right) \left(\det D\left(\phi_{i}^{-1}\right)\right) \left(D\left(\phi_{i}^{-1}\right)\right)^{-T} \nu, $$

evaluated on the average shape \(\bar {{\mathcal {S}}}\) (the surface of the average object \(\bar {{\mathcal {O}}}\)) with surface normal vector ν [31]. Even more intuitive, one can also replace the nonlinear deformation ϕi by the displacement \(u_{i}:\bar {{\mathcal {S}}}\to {{\mathbbm {R}}}^{n}\) of each point on the surface \(\bar {{\mathcal {S}}}\) of \(\bar {{\mathcal {O}}}\) which is observed if the elastic stress σi is applied at the surface of the average object. The Hessian of the corresponding elastic energy implies a natural scalar product g(·,·) on these displacements ui [31, 35]. Thus, we define the matrix

$$C = \frac1m \left(g\left(u_{i},u_{j}\right)\right)_{i,j=1,\ldots, m} $$

as a representation of the covariance operator, which is defined by

$${{\mathbf{Cov}}} \, u = \frac1m \sum\limits_{i=1}^{m} g(u_{i}, u) u_{i} $$

and can be regarded as the analogon of the covariance matrix in the vector space case. Again, the whole construction is rigid body motion invariant. Via the same spectral decomposition C=QΛQT of the m×m matrix C as above, we finally obtain the principal modes of shape variation,

$$w_{j}(x)= \sum\limits_{i=1}^{m} \sqrt{\lambda_{j}^{-1}} q_{j}^{i} u_{i}(x), $$

and associated variances λj. A comprehensive introduction to this concept can be found in [32, 34]. Let us emphasize that this statistical approach does not make any assumptions on how the different shapes and their variations are configured. It is also not assumed that the original shapes are actually elastic—this is just a mathematical tool to define the dissimilarity between shapes.

Fig. 10
figure 10

Sketch of the stresses σi induced by the deformations ϕi on the averaged shape \(\bar {{\mathcal {S}}} = \partial \bar {{\mathcal {O}}}\), here for i=1,2,3

Classification (cf. Fig. 9, right column). Sample beets were available for N=4 different cultivars. The training data set of cultivar j is represented as a set of displacements \(\left (u^{j}_{k}\right)_{k\leq K_{j}}\) from the group average \(\bar {{\mathcal {S}}}_{j}\), where Kj denotes the sample size. The aim of classification is to assign an arbitrary shape \({{\mathcal {S}}}\) or a corresponding shape displacement u, respectively, to one of these N cultivars. In detail, we need a distance measure dj for each cultivar that quantifies the distance of u to the set \(\left (u^{j}_{k}\right)_{k\leq K_{j}}\). We say that u is classified to cultivar l if dl(u)≤dj(u) for j=1,…,N.

Our distance measure dj is an extension of the classical Mahalanobis distance. First, we assume that we have already performed a PCA (as described above) for each cultivar independently. Let \(\left (\lambda ^{j}_{k}\right)_{k\leq K_{j}}\) and \(\left (w^{j}_{k}\right)_{k\leq K_{j}}\) denote the eigenvalues and eigenvectors computed in the PCA of cultivar j, respectively. We define Uj to be the linear span of \(\left (u^{j}_{k}\right)_{k\leq K_{j}}\) and \(U_{j}^{\perp }\) the orthogonal complement with respect to the metric gj(.,.) associated by the jth cultivar. For some truncation value MjKj the classical (squared) Mahalanobis distance is given by

$$ m_{j}^{2}(u) := \sum\limits_{k=1}^{M_{j}} \frac{g_{j}\left(u, w_{k}^{j}\right)^{2}}{\lambda_{k}^{j} }\,. $$

Note that \(m_{j}^{2}\left (w^{j}_{k}\right) = \left (\lambda ^{j}_{k}\right)^{-1}\), that means the Mahalanobis distance penalizes deviations in direction of dominant eigenvectors less. On the other hand, we have mj(u)=0 for each \(u^{\perp } \in U_{j}^{\perp }\), which requires some kind of regularization. If Pju is a projection of u onto Uj we decompose \(u = P_{j} u + \left (u\,-\,P_{j} u\right) \in U_{j} \oplus U_{j}^{\perp }\) and finally define an extended Mahalanobis distance by

$${d}_{j}^{2}(u) = m_{j}^{2}\left(P_{j} u\right) + \frac1\beta g_{j}\left(u\,-\,P_{j} u, u\,-\,P_{j} u \right) \,. $$

Here β>0 is a regularization parameter. In the experiments shown in Fig. 5 we have chosen β=10−5 and Mj=13 for all four cultivars.

Numerical implementation. The matching deformations ϕi between an input object \({{\mathcal {O}}}_{i}\) and the object \({{\mathcal {O}}}\), both represented by characteristic functions χi and χ, respectively, are computed using a penalty approach, where one minimizes

$$\int_{{{\mathcal{O}}}_{i}} W(D \phi) {{\mathrm{d}}} x+ \gamma \int_{[0,1]^{3}}\left(\chi_{i}-\chi \circ\phi\right)^{2}\,{{\mathrm{d}}} x $$

(for some penalty parameter γ=50) with respect to the deformation ϕ. The first term ensures that the deformation with minimum energy is found, and the second term ensures that ϕ indeed deforms \({{\mathcal {O}}}_{i}\) into \({{\mathcal {O}}}\). In detail, we made use of the polyconvex energy density \(W(D\phi) = \bar W(\|D\phi \|, \det D\phi)\) with

$$\bar W(a,d) = \frac\mu2 a^{2} +\frac\lambda4 d^{2}-\left(\mu+\frac\lambda2\right)\log d-\frac{3\mu}2-\frac\lambda4\,, $$

where the elastic parameters are μ=λ=1. Consequently, the average shape \(\bar {{\mathcal {S}}}\), represented by χ, and the deformations ϕ1,…,ϕm are obtained by minimizing

$$\sum\limits_{i=1}^{m} \int_{{{\mathcal{O}}}_{i}} W\left(D \phi_{i}\right) {{\mathrm{d}}} x+\gamma \int_{[0,1]^{3}}\left(\chi_{i}-\chi \circ\phi_{i}\right)^{2}\,{{\mathrm{d}}} x $$

over all ϕi and χ. For details we again refer to [33]. The functions χi and deformations ϕi are discretized by 1293 nodes (i. e. a function value is assigned to each node of a grid with 129 vertices in each space direction) so that the number of degrees of freedom scales with 1293·3·m, yielding a high-dimensional problem. The principal component analysis basically involves a singular value decomposition of a symmetric m×m correlation matrix and the solution of m linear systems of equations of size 1293 in order to translate the boundary stresses into displacements or shape variations.

Availability of data and materials

The datasets used and analysed during the current study are available from the corresponding author on reasonable request.


  1. A characteristic function assigns 1 = true to every data point inside the sugar beet volume and 0 = false to all other points.


  3. The rescaling proved to improve the later classification substantially while it does not qualitatively change the shape analysis. See discussion above.


G ×E×P :

Genotype ×Environment × Phenotype


principle component analysis


partial differential equation


  1. Bradshaw JE. Root And Tuber Crops. vol. 7. Berlin: Springer; 2010, pp. 173–219.

    Google Scholar 

  2. Märländer B, Hoffmann C, Koch H-J, Ladewig E, Niemann M, Stockfisch N, Varrelmann M, Mahlein A-K. Sustainable intensification – a quarter century of research towards higher efficiency in sugar beet cultivation. Sugar Ind. 2018; 4:200–17.

    Google Scholar 

  3. Hoffmann CM, Kenter C. Yield potential of sugar beet – have we hit the ceiling?Front Plant Sci. 2018; 9:289.

    PubMed  PubMed Central  Google Scholar 

  4. Vamerali T, Guarise M, Ganis A, Mosca G. Effects of water and nitrogen management on fibrous root distribution and turnover in sugar beet. Eur J Agron. 2009; 31(2):69–76.

    CAS  Google Scholar 

  5. Broom I, Edmunds BS, Ip S. Modelling asymmetrical growth curves that rise and then fall: Applications to foliage dynamics of sugar beet (beta vulgaris l.)Ann Bot. 1997; 79(6):657–65.

    Google Scholar 

  6. Shimazu M, Shibata Y, Kataoka T, Okamoto H, Kuriyama A. Development of a measurement device for root diameter of sugar beets. J Soc Agric Mach. 2010; 72(2):177–84.

    Google Scholar 

  7. Persson M, Åstrand B. Classification of crops and weeds extracted by active shape models. Biosyst Eng. 2008; 100(4):484–97.

    Google Scholar 

  8. Midtiby HS, Giselsson TM, Jørgensen RN. Estimating the plant stem emerging points (PSEPs) of sugar beets at early growth stages. Biosyst Eng. 2012; 111(1):83–90.

    Google Scholar 

  9. Moreda GP, Muñoz MA, Ruiz-Altisent M, Perdigones a.Shape determination of horticultural produce using two-dimensional computer vision—a review. J Food Eng. 2012; 108(2):245–61.

    Google Scholar 

  10. Paulus S, Behmann J, Mahlein A-K, Plümer L, Kuhlmann H. Low-cost 3D systems - well suited tools for plant phenotyping. Sensors. 2014; 14(2):3001–18.

    PubMed  Google Scholar 

  11. Berkels B, Fletcher PT, Heeren B, Rumpf M, Wirth B. Discrete geodesic regression in shape space In: Heyden A, Kahl F, Olsson C, Oskarsson M, Tai XC, editors. Energy Minimization Methods in Computer Vision and Pattern Recognition. EMMCVPR 2013. Lecture Notes in Computer Science, vol 8081. Berlin: Springer: 2013. p. 108–22.

  12. Wahabzada M, Paulus S, Kerstin C, Mahlein A-K. Automated interpretation of 3D laserscanned point clouds for plant organ segmentation. BMC Bioinformatics. 2015; 16:248.

    PubMed  PubMed Central  Google Scholar 

  13. Tardieu F, Tuberosa R. Dissection and modelling of abiotic stress tolerance in plants. Curr Opin Plant Biol. 2010; 13(2):206–12.

    PubMed  Google Scholar 

  14. Mahlein A-K, Kuska MT, Behmann J, Polder G, Walter A. Hyperspectral sensors and imaging technologies in phytopathology: State of the art. Annu Rev Phytopathol. 2018; 56(1):535–58.

    CAS  PubMed  Google Scholar 

  15. Chahal G, Gosal S. Principles and Procedures of Plant Breeding: Biotechnological and Conventional Approaches. Pangbourne: Alpha Science International Ltd.; 2002.

    Google Scholar 

  16. Acquaah G. Principles of Plant Genetics and Breeding. Oxford: Blackwell Publishing Ltd.; 2006.

    Google Scholar 

  17. Iwata H, Nesumi H, Ninomiya S, Takano Y, Ukai Y. The evaluation of genotype × environment interactions of citrus leaf morphology using image analysis and elliptic Fourier descriptors. Breed Sci. 2002; 52(4):243–51.

    Google Scholar 

  18. Neto J, Meyer G, Jones D, Samal A. Plant species identification using elliptic Fourier leaf shape analysis. Comput Electron Agric. 2006; 50(2):121–34.

    Google Scholar 

  19. Iwata H, Ukai Y. SHAPE: a computer program package for quantitative evaluation of biological shapes based on elliptic Fourier descriptors. J Hered. 2002; 93(5):384–5.

    CAS  PubMed  Google Scholar 

  20. Tsialtas J, Maslaris N. Sugar beet root shape and its relation with yield and quality. Sugar Tech. 2010; 12:47–52.

    CAS  Google Scholar 

  21. Brunton A, Salazar A, Bolkart T, Wuhrer S. Review of statistical shape spaces for 3d data with comparative analysis for human faces. Comp Vision Image Underst. 2014; 128:1–17.

    Google Scholar 

  22. Cootes TF, Taylor CJ, Cooper DH, Graham J. Active shape models-their training and application. Comp Vision Image Underst. 1995; 61(1):38–59.

    Google Scholar 

  23. Cootes TF, Edwards GJ, Taylor CJ. Active appearance models. IEEE Trans Pattern Anal Mach Intell. 2001; 23(6):681–5.

    Google Scholar 

  24. Kendall DG. Shape manifolds, procrustean metrics, and complex projective spaces. Bull Lond Math Soc. 1984; 16(2):81–121.

    Google Scholar 

  25. Srivastava A, Klassen E, Joshi SH, Jermyn IH. Shape analysis of elastic curves in euclidean spaces. IEEE Trans Pattern Anal Mach Intell. 2011; 33(7):1415–28.

    PubMed  Google Scholar 

  26. Kilian M, Mitra NJ, Pottmann H. Geometric modeling in shape space. ACM Trans Graph. 2007; 26(3):1–8.

    Google Scholar 

  27. Pennec X. Intrinsic statistics on Riemannian manifolds: Basic tools for geometric measurements. J Math Imaging Vision. 2006; 25(1):127–54.

    Google Scholar 

  28. Fletcher PT, Lu C, Pizer SM, Joshi S. Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE Trans Med Imaging. 2004; 23(8):995–1005.

    PubMed  Google Scholar 

  29. Tournier M, Wu X, Courty N, Arnaud E, Reveret L. Motion compression using principal geodesics analysis. Comput Graph Forum. 2009; 28:355–64.

    Google Scholar 

  30. Heeren B, Zhang C, Rumpf M, Smith W. Principal geodesic analysis in the space of shells. Comput Graph Forum. 2018; 37:173–84.

    Google Scholar 

  31. Ciarlet PG. Mathematical Elasticity, Vol. I: Three-Dimensional Elasticity. In: Studies in Mathematics and its Applications. Amsterdam: Elsevier: 1997.

    Google Scholar 

  32. Rumpf M, Wirth B. Variational methods in shape analysis In: Scherzer O, editor. Handbook of Mathematical Methods in Imaging. New York: Springer: 2011. p. 1363–401.

    Google Scholar 

  33. Rumpf M, Wirth B. A nonlinear elastic shape averaging approach. SIAM J Imaging Sci. 2009; 2(3):800–33.

    Google Scholar 

  34. Rumpf M, Wirth B. An elasticity approach to principal modes of shape variation In: Tai XC, Mørken K, Lysaker M, Lie KA, editors. Scale Space and Variational Methods in Computer Vision. SSVM 2009. Lecture Notes in Computer Science, vol 5567. Berlin: Springer: 2009. p. 709–720.

  35. Rumpf M, Wirth B. An elasticity-based covariance analysis of shapes. Int J Comput Vis. 2011; 92(3):281–95.

    Google Scholar 

  36. Zhang C, Heeren B, Rumpf M, Smith WA. Shell PCA: statistical shape modelling in shell space. In: IEEE International Conference on Computer Vision (ICCV). Santiago: 2015. p. 1671–79.

  37. Hosoi F, Nakabayashi K, Omasa K. 3-d modeling of tomato canopies using a high-resolution portable scanning lidar for extracting structural information. Sensors. 2011; 11(2):2166–74.

    PubMed  Google Scholar 

  38. Palacin J, Palleja T, Tresanchez M, Sanz R, Llorens J, Ribes-Dasi M, Masip J, Arno J, Escola A, Rosell J. Real-time tree-foliage surface estimation using a ground laser scanner. IEEE Trans Instrum Meas. 2007; 56(4):1377–83.

    Google Scholar 

  39. Omasa K, Hosoi F, Konishi A. 3D lidar imaging for detecting and understanding plant responses and canopy structure. J Exp Bot. 2007; 58(4):881–98.

    CAS  Google Scholar 

  40. Paulus S, Dupuis J, Mahlein A-K, Kuhlmann H. Surface feature based classification of plant organs from 3D laserscanned point clouds for plant phenotyping. BMC Bioinformatics. 2013; 14:238.

    PubMed  PubMed Central  Google Scholar 

  41. Paulus S, Schumann H, Leon J, Kuhlmann H. A high precision laser scanning system for capturing 3D plant architecture and analysing growth of cereal plants. Biosyst Eng in press. 2014; 121:1–11.

    Google Scholar 

  42. Paulus S, Eichert T, Goldbach HE, Kuhlmann H. Limits of active laser triangulation as an instrument for high precision plant imaging. Sensors. 2014; 14(2):2489–509.

    PubMed  Google Scholar 

  43. Wagner B, Santini S, Ingensand H, Gärtner H. A tool to model 3D coarse-root development with annual resolution. Plant Soil. 2011; 346(1-2):79–96.

    CAS  Google Scholar 

  44. Rose J, Paulus S, Kuhlmann H. Accuracy analysis of a multi-view stereo approach for phenotyping of tomato plants at the organ level. Sensors. 2015; 15(5):9651–65.

    PubMed  Google Scholar 

  45. Jahnke S, Roussel J, Hombach T, Kochs J, Fischbach A, Huber G, Scharr H. pheno seeder - a robot system for automated handling and phenotyping of individual seeds. Plant Physiol. 2016; 172(3):1358–70.

    CAS  PubMed  PubMed Central  Google Scholar 

  46. Russo G, Smereka P. A remark on computing distance functions. J Comput Phys. 2000; 163:51–67.

    Google Scholar 

  47. Sethian JA. A fast marching level set method for monotonically advancing fronts, Vol. 93; 1996. pp. 1591–5.

  48. Nocedal J, Wright SJ. Numerical Optimization, 2nd edn. New York: Springer; 2006.

    Google Scholar 

  49. Hair J, Black W, Babin B, Anderson R. Multivariate Data Analysis, 7th edn. New Jersey: Prentice Hall; 2009.

    Google Scholar 

Download references


Not applicable.


This study was funded by the German Federal Ministry of Education and Research (BMBF) within the scope of the interdisciplinary network program (Funding code: 0315529). The BMBF played no role in the design of the study, the collection, analysis, and interpretation of data and in writing the manuscript. This study was partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC 2070 - 390732324. Furthermore, we received funding by the German Research Foundation (DFG) via the “Hausdorff Center for Mathematics” (GZ 2047/1 - 390685813) and “Mathematics Münster” (EXC 2044 - 390685587). Open access funding provided by Projekt DEAL.

Author information

Authors and Affiliations



SP, HK, HG, MR and BW designed the study. HG maintained the plant material, SP carried out the measuring, pre-processing was done by SP and BH. BH and BW implemented the main algorithm and performed all numerical simulations as well as the post-processing. SP, BH, AKM, BW and MR interpreted the data and drafted the manuscript. All author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Behrend Heeren.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

Not applicable.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Heeren, B., Paulus, S., Goldbach, H. et al. Statistical shape analysis of tap roots: a methodological case study on laser scanned sugar beets. BMC Bioinformatics 21, 335 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: