 Research article
 Open Access
 Published:
Statistical shape analysis of tap roots: a methodological case study on laser scanned sugar beets
BMC Bioinformatics volume 21, Article number: 335 (2020)
Abstract
Background
The efficient and robust statistical analysis of the shape of plant organs of different cultivars is an important investigation issue in plant breeding and enables a robust cultivar description within the breeding progress. Laserscanning is a highly accurate and high resolution technique to acquire the 3D shape of plant surfaces. The computation of a shape based principal component analysis (PCA) built on concepts from continuum mechanics has proven to be an effective tool for a qualitative and quantitative shape examination.
Results
The shape based PCA was used for a statistical analysis of 140 sugar beet roots of different cultivars. The calculation of the mean sugar beet root shape and the description of the main variations was possible. Furthermore, unknown and individual tap roots could be attributed to their cultivar by means of a robust classification tool based on the PCA results.
Conclusion
The method demonstrates that it is possible to identify principal modes of root shape variations automatically and to quantify associated variances out of laserscanned 3D sugar beet tap root models. The introduced approach is not limited to the 3D shape description by laser scanning. A transfer to 3D MRI or radar data is also conceivable.
Background
In breeding and precision agriculture there is a need for a precise description of the 3D architecture of a crop, a plant organ or a harvested product, in a fast and reproducible way [1].
Potential applications involve automated selection procedures in plant breeding of phenotypes with the desirable features and traits (like grain or root shape), assessment of crop development during the growth period, enabling an optimised crop management and sorting out special forms or noncrop contaminants (stones, weed seeds) of harvested products. An additional application is the adaption of harvester settings, based on archetypal 3Dgeometries of harvested cultivars to avoid mechanical damage and by this quality reduction.
Yield formation is the combined result of genetic characteristics and environmental effects. In sugar beets the annual increase in sugar yield accounts 1.5% [2]. This is mainly due to an increase in the root yield, while the sugar content of varieties remained stable [3]. One of the main targets of sugar beet breeding is the root development, including relevant quality parameters such as sugar content, nonsugar compounds or markcontent. It can be observed that the dynamic of storage root development in sugar beet has no specific growth stages. There is no phase of maturation, and therefore, yield development and potential can be estimated depending on the duration of the growing period [3]. Unfortunately, in sugar beet development of leaf biomass is not correlated to storage root biomass or yield formation. However, the shape of the sugar beet tap root plays a significant role for the entire processing chain.
Besides effects of varieties, it is well known that the shape of tap roots mainly depends on the type and status of soil, management and environmental conditions [4, 5]. A strong correlation exists between basic root shape parameters like area, length, or radial variation and sugar yield and quality. Additionally, size and shape is characteristic for cultivars. Factors such as formation of branches or soiladhesion are relevant breeding traits. Onfield devices are proposed to measure and monitor basic parameters of tap roots [6] with varying degrees of success. Shape models are used for the classification of crops and weeds [7]. Image processing methodology already plays a significant role for the observation of plant growth [8]. Computer vision is also used for the analysis of plant root shapes. However, currently these methods are mostly based on 2D imaging modalities [9], whereas the most precise description would result from a robust statistical analysis of the true 3D description of tap roots. First results in imaging the 3D shape of sugar beet have been used to extract scalar parameters of the tap root [10] such as height, width, volume and surface area. By imaging over time the development and growth can be observed. The resulting crop growth model can be improved by a true 4D description of the crop plants [11]. Analysis of 3D point clouds with recent mathematical models or machine learning approaches further improve the efficiency and biological interpretability of plant sensor data [12]. This helps to assess the genotypephenotypeenvironment interactions and to dissect important traits [13] [14].
Agricultural crops have to cope with adverse environments but still have to maintain productivity at the highest possible level. Here, breeding has to identify and make use of the best combinations of traits [15, 16]. These traits are mostly quantitative and inherited in a complex manner [13]. Although morphological changes are downstream effects of altered gene expression, metabolic adaptations and environmental changes, their precise, unequivocal, and unbiased identification is of utmost importance in a breeding process to assess Genotype × Environment × Phenotype (G ×E×P) interactions [17]. So far, breeding involves classification of desirable traits (usually around 20) by using empirical scales, which can be assessed mostly by visual assessment in traditional breeding schemes [15]. Considering that there are easily 10,000 individuals or more that have to be classified within a very short time span, the common method is prone to human bias (tiredness, adverse and rapidly changing light conditions). The same holds true for the harvested produce where shapes e. g. of tuber crops have to be assessed. Unequivocal identification of shapes or outlines is of high importance as well for purposes of precision farming where there is a need for an ontheflight identification of shapes and their assessment, e. g. for the differentiation between crop and weed plants, diseased and healthy plants or plant organs or species identification (see e. g. [18]). Further refinement of phenotyping and the increase of traits that can be simultaneously observed calls for objective and automated ways of trait identification. There have been various attempts so far to describe leaf or tap root shapes with elliptic Fourier analysis [17–19]. The challenge is now to analyze true 3D shape variations into main components, while most methods so far rely on 2D image analysis [17, 20].
Knowledge of the statistical variation of root shapes has various applications. In plant breeding such information can be used to select tendencies with favorable traits, such as an even shape, large crop size, or an optimal form for the use of crop harvesters and to minimize breaking root tips, soil tare and mechanical damages. In crop management the statistical data may help to detect deviations from the expected plant development, which can be used in a feedback mechanism to adjust growth control parameters, such as fertilizer or pesticides.
In this study we apply a computer vision tool for statistical analysis that is purely shape based. Statistical models of shape have been used widely in computer vision and graphics [21]. In a 2D setting, PCAbased models such as Active Shape [22] or Appearance Models [23] provide a parametric representation of shape that can be used for segmentation, tracking and recognition. In a 3D setting, they are typically used for fitting to noisy or ambiguous data or for 3D reconstruction via analysisbysynthesis. Essentially, the statistical model provides a constraint that significantly reduces the parameter space for many shape processing problems.
It is a wide spread assumption that the input data is collected from a shape space that is considered as a Riemannian manifold. The classical treatment of shape space is due to Kendall [24], in which sets of landmarks are considered points on a shape manifold in which the effects of scale, rotation and translation are factored out. The tangent plane to Kendall’s shape space enables linear principle component analysis (PCA) in which Euclidean distance approximates Procrustes distance. Srivastava et al. [25] propose a representation for analysing shapes of curves under an elastic metric. Killian et al. [26] model the space of triangulated shapes. Statistical analysis can be performed on shape spaces in a manner that respects the Riemannian geometry of the manifold. This requires Riemannian notions of concepts such as distance, mean value and covariance [27]. Based on these quantities Fletcher et al. [28] transferred the concept of PCA to manifolds by considering a principal geodesic analysis (PGA). The idea of PGA was used by Tournier et al. [29] to build a statistical skeleton model and by Heeren et al. [30] to perform a statistical analysis in the space of triangle meshes. However, all these approaches assume the underlying shape space to be a Riemannian manifold. In particular, the dissimilarity of two shapes is quantified by the length of an optimal, connecting curve. In contrast, we here consider a purely elastic model that measures shape dissimilarity by the amount of elastic deformation energy [31]—for details on the physical and geometrical differences we refer to [32]. Using the physical model of 3D elasticity Rumpf and Wirth compute shape averages [33] and describe a covariance analysis [34, 35] of shapes represented as boundary contours of elastic objects. Here the elastic average is defined as the shape that minimizes the sum of elastic deformation energies to all the given input shapes. With an average at hand a classical PCA is then applied to the displacements with respect to the input shapes. This elastic shape analysis was performed on the space of triangle meshes in [36] and will be applied in the present work to the shape space formed by sugar beet tap roots.
In our experiment the acquisition of input shapes is realized based on 3D laser scanning—a method to recover 3D point clouds from objects. This method is well established in the agricultural context and has been used on plants for 3D modelling of the canopy of tomato plants [37], for infield scanning of peartrees [38] and imaging physiological responses of leaves [39]. More detailed scans for organ specific parameterization were also possible when using closeup laserscanning. This enables e.g. identification of single organs [10, 12, 40] or tracking of growth on organ level [41], and it has shown to be very accurate [42]. Closeup scanning enables high resolution and high accuracy imaging with point to point distances below a millimeter [43]. This enables highly accurate 3D surface models of various plant types.
Our contribution. In this work, we demonstrate the extraction and mathematical description of characteristic shape features in the tap roots of different cultivars of sugar beets. To this end, a huge data set consisting of detailed 3D descriptions of beet root samples is reduced to a small number of important parameters without loosing relevant biological information. Our main contributions are twofold. First, we perform a shape based analysis of laser scanned sugar beet tap roots. In particular, one can compute a robust and reliable shape mean as well as principal modes of shape variation on large ensembles of sugar beets with a substantial variability in the shape geometry. In particular, the statistical tool is not based on predefined quantitative properties (such as length or volume), and it is invariant under rigid body motions. Second, we propose an automatic classification tool based on results from the statistical analysis. The number of tap roots that were classified correctly without having been in the training data set is significantly higher than random classification. This indicates that a cultivar based assignment of a tap root to the corresponding cultivar is possible by only using geometrical shape parameters. The introduced approach is not limited to the 3D shape description by laser scanning and can be generalized for any other 3D measuring device with high precision as structurefrommotion approaches [44] or volumecarving methods [45].
Results
We introduce a method for the statistical analysis of large ensembles of 3D tap roots as well as a classification tool based on information gained from the statistical analysis. To prove the benefits of our approach, four cultivars with 35 sugar beet tap roots each were measured with a laser scanner. After several preprocessing steps each beet is finally represented as a characteristic function.^{Footnote 1} In order to validate our classification tool, we randomly removed 5 beets of each cultivar from the training data set. Afterwards, we computed a shape average as the minimizer of an elastic matching functional for each of the four training data sets separately. In the same physical setup, a principal component analysis has been applied to the displacements between average and input shapes. This way, we obtained principal modes of main variation within all four training data sets. Again, each of these modes is represented by a displacement of the shape average.
The results are shown in Figs. 1 to 4 for the four different cultivars. In each figure we show the input shapes on the left and the five main shape variations on the right, where the middle shape (dark gray) represents the average beet shape. To better illustrate the directions of main variations, the displacements are displayed with different positive and negative magnitudes. Note that all input beets were scaled to have uniform volume, hence the previously most dominant mode of uniform scaling is no longer relevant. However, the relative order within the variances has not been affected substantially by the rescaling. In detail, if \(\left (\tilde \lambda _{k}\right)_{k\geq 0}\) describe the variances (in decreasing order) before the uniform rescaling, where \(\tilde \lambda _{0}\) is the dominant eigenvalue that represents the impact of uniform scaling, and (λ_{k})_{k≥1} the variances after rescaling, we observe \(\lambda _{k1}/\lambda _{k} \approx \tilde \lambda _{k1} / \tilde \lambda _{k}\) for k>1.
For the variation in cultivar Berenika (see Fig. 1) the first mode can be described as a tendency to a multiple or at least double apex (2.01), the second variation as a tendency to a long apex or a more dull one (1.00). For the cultivar Cesira (see Fig. 2) the first variation (1.99) is according to the second mode of variation of Berenika, the tendency from a long apex to a dull one. The second variation is similar to the first one of Berenika (1.63). The main variation for the cultivar Mauricia (see Fig. 3) is similar to the long apex or a dull one (0.94), the second variation is the affinity to a multiple or clear and pointy apex (0.79). The first two main variations of Pauletta (see Fig. 4) can be described as the affinity to a pointy or dull apex (0.93) and the tendency from a clear and pointy apex to a second apex at the side (0.40). Note that our model is able to capture the tendency of some cultivars to develop multiple apices, although this introduces a strong nonconvexity in the modes. This is for example visible in the first two modes of Berenika (see Fig. 1) and the third mode of Mauricia (see Fig. 3), which represent an initial growth of a second apex from the root body. Even more strikingly, all depicted modes of Cesira (see Fig. 2) can be associated with the development of apices which is obviously a characteristic feature of the corresponding training data.
In a second step, the principal modes were used to derive a classification tool. Based on the classical Mahalanobis distance, we first define a distance measure for each cultivar. Then a given beet is classified to the cultivar that induces the lowest distance. For example, when computing the distance of that given beet to cultivar Pauletta, we effectively compute the distance to the average shape of the training data set of Pauletta. In particular, the distance measure penalizes deviations in direction of dominant principal modes of this data set less, as these directions are supposed to be characteristic for cultivar Pauletta.
Classification results are depicted in Figs. 5 and 6, respectively. In detail, we show for every beet to be classified the (extended) Mahalanobis distance to each of the four cultivars as vertical color bars whose height is proportional to the distance (cf. Fig. 5). For example, the height of the blue bar always represents the distance to the Berenika training set. The validation of the classification reveals a significantly better success (i.e. 55%) in comparison to random classification (with a pvalue of 0.0045). Note that we considered only the first 13 principal modes for each cultivar to design the distance measures (details will be explained in the “Methods” section). If we ignore the principal modes and simply compute the distance to the average we still obtain a correct overall classification rate of 40%. However, we observe huge differences in classification success when distinguishing between cultivars (cf. Fig. 6). In detail, a beet from cultivar Pauletta was classified correctly in 80% of the trials, whereas Berenika classification success was more or less random (i.e. 20%).
Simplified PCA approach. For comparison reasons we also computed a Euclidean principal component analysis on five characteristic parameters extracted directly from the 3D point cloud. In detail, these five different parameters are root length, width, surface and volume (cf. [10]) as well as root complexity which is defined as the quotient between root surface and volume.
Figure 7 shows the distribution of the different cultivars using the first two principal components of the PCA. Obviously, distinguishing different cultivars or even a reliable classification is hard, if not impossible.
Discussion
The aim of our study was to perform a statistical shape analysis and to develop a method which allows to distinguish between different cultivars in a nonparametric way. Figs. 1 to 4 show how our method identifies the main components of shape variation. These are mostly in line with traits for sugar beet shapes like size, maximum length, maximum width, average radius, radial variation, circularity, the ratio width to length, and further shape factors like surface roughness or furrow formation. Moreover, properties that seem to be characteristic for a particular training data set can actually be extracted in form of dominant variations. For example, the training data set of Pauletta (cf. Fig. 4, left) suggests that this cultivar tends to have a rather slim and longish body that varies predominantly in length whereas branching of the main tap root is usually not preferred. On the other hand, the clearly separated first principal mode (λ_{1}>2λ_{2}) captures exactly this variation with respect to length and none of the first five modes represents a clear branching (compared to the other cultivars in Figs. 1 to 3). Besides the qualitative modes of variation, our method allows a robust quantification of the associated variance of the main components of 3D shape variation. A next step would be to compare the results also at different environments or sets of stress conditions which enables quantitative statements about the influence of these parameters on the plant’s development. The present outcomes suggest that it will become possible to dissect environmental from genotypic effects on the phenotype and thus allow a true G ×E×P interaction analysis [13]. Furthermore, it is envisaged that the proposed method is adopted to other plant organs and species and likewise will be useful for approaches in precision farming.
The results of Fig. 5 have shown a classification accuracy of about 55% which is significantly better than a random assignment. However, the robustness of the classification can still be improved. In particular, the dependence on the number M_{j} of considered principal modes in the definition of the distance measure has to be further investigated. Furthermore, the striking differences of classification success between different cultivars (cf. Fig. 6) has to be explored and explained in detail. We believe that increasing the number of samples in the training data sets might improve the robustness as well as the classification success of individual cultivars.
Currently, the resolution of the shapes is limited by the spatial resolution of the characteristic functions which are defined on a 129^{3} voxel grid. The choice of spatial resolution is a tradeoff between a desired level of details and limitation of computation time. However, the sensor provides a resolution of about 50 microns, hence it is desirable to be able to use the full resolution in the numerical simulations as well. To this end, future work will focus on the reduction of the algorithm’s complexity on the one hand, e.g. by using linearized elasticity and to use more sophisticated algorithmic features on the other hand, such as adaptive grids and advanced parallel computing techniques. An altenative approach is applying the method proposed in [36] to the original triangle meshes constructed directly from the laser scans. This shell PCA performs the same statistical shape analysis but with a different physical model (in detail, shapes are treated as hollow objects and one studies elastic deformations of the surface only). Moreover, it might be interesting to investigate the potential application of machine learning approaches to tackle the classification problem. Finally, we aim at the replacement of a manually moved scanner by a device that is able to perform a high number of scans automatically. This replacement is in particular necessary since we aim at increasing the number of samples in the training data sets.
Conclusions
We applied an established computer vision algorithm to perform a statistical shape analysis on 3D laser scanned tap roots. The method is applied to sugar beets, where the mean root shape and the main variations within a group of sugar beets of the same growth period have been computed. Our investigations can be considered as a case study for the statistical analysis of storage roots without predefined classification criteria. The method indeed demonstrates that it is possible to identify principal modes of shape variation automatically and to quantify the associated variances. In particular, the resulting dominant modes of variations can be used to cluster scanned tap roots into categories which forms the basis for linking growth and environmental conditions. Furthermore, this statistical shape analysis can be used in combination with other invasive or noninvasive sensors that access the 3D shape and can be applied to other plant organs and species as well. Since noninvasive sensors such as MRI or radar imaging are usually affected by significant noise the statistical analysis of large ensembles of laser scanned root shapes could help to increase the accuracy of the classification of noisy input data.
Methods
Plant material. Sugar beets were grown during summer 2012 in a central experiment of CROP.SENSe.net. The objectives of the central experiment were the provision of ground truth data for the interpretation of sensor measurements. The soil was a silty loam soil (WRB: Haplic Luvisol) at the Klein Altendorf experimental research station (6^{∘}59’N, 50^{∘}37’E) near Bonn, Germany. The mean annual temperature is 9.6 ^{∘}C with an average annual precipitation of 625 mm. The cultivars Pauletta, Berenika, Mauricia and Cesira—obtained commercially from KWS Saat^{Footnote 2}—were chosen because of their differing phenology. At the end of the growing season 35 randomly chosen beets of each variety were sampled, i.e. 140 beets in total. Rows for sampling were 2 m ×3 m within a plot of 50 m^{2} each. Seed density was 90,000 per hectare at a row distance of 0.2 m (plant to plant distance).
Data acquisition. To acquire the 3D shape of the sugar beet tap roots a closeup laser scanner coupled to an articulated measuring arm device [10, 40] was used (see Fig. 8). This is a well evaluated combination for 3D plant imaging [41, 42]. Hardware details such as resolution and accuracy are given in Table 1. With its seven degrees of freedom it is possible to image the plant from various viewpoints to get an occlusionfree 3D model. The output of the laser scanner is a point cloud with XYZ coordinates with more than 3 million points that is later parsed to an automatic triangulation algorithm.
Preprocessing (cf. Fig. 9, left column). Before being fed to our statistical method we made use of the commercial 3DCADSoftware Geomagic Studio 12 to apply basic preprocessing routines. First, as the sensor is moved manually (see Fig. 8), visible parts from the mounting device and the measuring table had to be removed. Second, the integrated outlier removal function was used as well as a gridbased reduction of the point density (to an average distance of 0.5 mm) to enable a smooth surface generation. Subsequently, the automatic point cloud triangulation was performed to approximate the surface boundary \({{\mathcal {S}}}\) of the solid root shape \({{\mathcal {O}}}\), i.e. \({{\mathcal {S}}} = \partial {{\mathcal {O}}}\). We then applied a uniform rescaling such that all shapes have the same (inscribed) volume.^{Footnote 3}
The proposed statistical method requires the representation of each root shape via a characteristic function \(\chi _{{{\mathcal {O}}}}\) on the computational domain Ω=[0,1]^{3}, i.e. we have \(\chi _{{{\mathcal {O}}}}(x) = 1\) if \(x \in {{\mathcal {O}}}\) and \(\chi _{{{\mathcal {O}}}}(x) = 0\) else. Then, the sugar beet surface \({{\mathcal {S}}}\) is implicitly represented as the interface between these two regions. To obtain such a characteristic function, we first compute the signed distance function \(d:\Omega \to {{\mathbbm {R}}}\) of \({{\mathcal {S}}}\), see [46]. In detail, d(x) represents the shortest distance from x∈Ω to \({{\mathcal {S}}}\), where d(x)>0 if x is in the interior of \({{\mathcal {O}}}\) and d(x)<0 if \(x \notin \bar {{\mathcal {O}}}\). To this end, Ω=[0,1]^{3} is discretized by a regular grid Ω_{h} with 129^{3} nodes (129 equally spaced vertices in each spatial direction of the unit cube). Mathematically, we aim at solving the nonlinar partial differential equation ∇d(x)=1 for all x∈Ω_{h} with the boundary condition d(x)=0 if \(x \in {{\mathcal {S}}}\), the socalled Eikonal equation. To this end, we first evaluate the signed distance d(x) on the grid nodes x∈Ω_{h} closest to the triangles of \({{\mathcal {S}}}\). Afterwards, we compute its values far from the triangular surface via information propagation by a fast marching algorithm [47] to obtain the solution \(d\colon \Omega _{h} \to {{\mathbbm {R}}}\) of the PDE. Since the triangulated surface might have been nonclosed due to local scan deficiencies or selfocclusion, the signed distance function d may have an incorrect sign at a number of nodes. As a remedy, we regularize d by seeking a minimizer d_{new} of the energy
Here, \(d_{\epsilon }=\sqrt {d^{2}+\epsilon ^{2}}\) with ε=10^{−4} denotes a differentiable approximation of the absolute value function and the regularization parameter was set to σ=0.01 in our application. The nonlinear optimization is performed using a finite element approach with multilinear basis functions on the regular grid Ω_{h} and a standard trust region method [48]. Once d_{new} is computed we are finally able to define the characteristic function. To this end, we set \(\chi _{{{\mathcal {O}}}}(x)=1\) if d_{new}(x)<0 and \(\chi _{{{\mathcal {O}}}}(x)=0\) else, which is done for all grid nodes x∈Ω_{h}. After data compression, the required memory to store the function \(\chi _{{{\mathcal {O}}}}\) is comparable to the one needed to store an explicit triangular representation of the shape \({{\mathcal {S}}}\).
Statistical analysis model (cf. Fig. 9, middle column). In the following we will generalize the standard statistical analysis of point sets in a (linear) vector space, e. g. \({{\mathbbm {R}}}^{n}\), to the space of shapes S, where \({{\mathcal {S}}} \in {{\mathbf {S}}}\) represents the surface of a volumetric object \( {{\mathcal {O}}} = {{\mathcal {O}}}({{\mathcal {S}}}) \subset {{\mathbbm {R}}}^{n} \). To underpin this generalization let us briefly recall the basic concepts of the mean and the principal component analysis (PCA) on vector spaces [49].
The arithmetic mean \(\bar x\) of m points x_{1}, x_{2}, …, x_{m} in \({{\mathbbm {R}}}^{n}\) is given as \(\bar x = \frac 1m (x_{1} + x_{2} + \ldots + x_{m})\). It can also be characterized as the minimizer of the energy \(E[x] = {\sum \nolimits }_{i=1}^{m} {{\mathcal {W}}}[x_{i}x]\), where \({{\mathcal {W}}}[y]=\mu y^{2}\) is the elastic energy stored in a spring stretched along a vector y and μ is the associated stiffness coefficient of the spring. Indeed, the arithmetic mean is the equilibrium position of the central hub of a network with m springs, where one end of each spring is attached to one of the input points and the other end is connected to all other springs at a hub so that all springs pull at this hub. If \(X = \left [x_{1}  \bar x\ldots x_{m}  \bar x\right ]\) is the n×m (centered) data matrix, the covariance matrix is given by \(\tfrac 1mXX^{T}\). A PCA now involves a spectral decomposition of this n×n matrix, where n might be very large. Hence we use the fact that the m×m matrix \(C= \tfrac 1m X^{T} X\) has the same (nontrivial) eigenvalues as the covariance matrix, where m≪n. Note that \(C_{ij} = \frac 1m g\left (x_{i}\bar x,x_{j}\bar x\right)\) where g denotes a suitable scalar product on \({{\mathbbm {R}}}^{n}\) (typically the standard Euclidean inner product \(\left (x_{i}\bar x\right)\cdot \left (x_{j}\bar x\right)\)). If C=QΛQ^{T} is the singular value decomposition of C with an orthogonal matrix Q and a diagonal matrix Λ with diagonal entries λ_{1}≥λ_{2}≥…≥λ_{m}, then the eigenvectors of the covariance matrix are obtained via \(w_{j} = {\sum \nolimits }_{i=1}^{m} \sqrt {\lambda _{j}^{1}} q_{j}^{i}\left (x_{i}\bar x\right)\). These are exactly the principal directions of variations, and the eigenvalues λ_{j} describe the variance in that direction. Here, \(q_{j}^{i}\) denotes the i^{th} entry in the j^{th} column Q. As g is supposed to be positive (semi)definite, we have λ_{i}≥0 for i=1,…,m. Furthermore, if we assume x_{1},…,x_{m} to be linearly independent we have λ_{1}≥…≥λ_{m−1}>λ_{m}=0 as we consider centered data, i.e. \(x_{1}\bar x, \ldots, x_{m}  \bar x\) spans a (m−1)dimensional subspace.
Now this general concept is transferred to shape statistics. In the context of tap root shapes \({{\mathcal {S}}}_{i}\) for i=1,…,m, the corresponding volumetric objects \({{\mathcal {O}}}_{i} = {{\mathcal {O}}}\left ({{\mathcal {S}}}_{i}\right)\) play the role of the points x_{i}, and the vector x_{i}−x is replaced by the optimal deformation \(\phi _{i}:{{\mathcal {O}}}_{i}\to {{\mathbbm {R}}}^{n}\) of the object \({{\mathcal {O}}}_{i}\) into an object \({{\mathcal {O}}}\). Here, optimal means that the deformation ϕ_{i} costs the least elastic energy \({{\mathcal {W}}}\left [\psi _{i}, {{\mathcal {O}}}_{i}, {{\mathcal {O}}}\right ] = \int _{{{\mathcal {O}}}_{i}} W\left (D \psi _{i}\right) {{\mathrm {d}}} x\) of all deformations with \(\psi _{i}({{\mathcal {O}}}_{i}) = {{\mathcal {O}}}\), where W(·) denotes a hyperelastic energy density acting on the local deformation gradient \(D \psi _{i}\in {{\mathbbm {R}}}^{n\times n}\). This energy and thus the thereby defined shape mean is invariant under rigid body motions, i. e. it does not change if the position or the orientation of the root shapes is varied. The arithmetic mean of m input objects \({{\mathcal {O}}}_{1},\,{{\mathcal {O}}}_{2},\, \ldots, \, {{\mathcal {O}}}_{m}\) is defined as the object \(\bar {{\mathcal {O}}}\) which minimizes the energy
For details we refer to [33]. Different from the vectors \(x_{i}\bar x\) above, the deformations ϕ_{i} are strongly nonlinear and cannot be used in a (linear) PCA. Instead, one can replace the deformation ϕ_{i} by its linear representative, the associated elastic stress (cf. Fig. 10)
evaluated on the average shape \(\bar {{\mathcal {S}}}\) (the surface of the average object \(\bar {{\mathcal {O}}}\)) with surface normal vector ν [31]. Even more intuitive, one can also replace the nonlinear deformation ϕ_{i} by the displacement \(u_{i}:\bar {{\mathcal {S}}}\to {{\mathbbm {R}}}^{n}\) of each point on the surface \(\bar {{\mathcal {S}}}\) of \(\bar {{\mathcal {O}}}\) which is observed if the elastic stress σ_{i} is applied at the surface of the average object. The Hessian of the corresponding elastic energy implies a natural scalar product g(·,·) on these displacements u_{i} [31, 35]. Thus, we define the matrix
as a representation of the covariance operator, which is defined by
and can be regarded as the analogon of the covariance matrix in the vector space case. Again, the whole construction is rigid body motion invariant. Via the same spectral decomposition C=QΛQ^{T} of the m×m matrix C as above, we finally obtain the principal modes of shape variation,
and associated variances λ_{j}. A comprehensive introduction to this concept can be found in [32, 34]. Let us emphasize that this statistical approach does not make any assumptions on how the different shapes and their variations are configured. It is also not assumed that the original shapes are actually elastic—this is just a mathematical tool to define the dissimilarity between shapes.
Classification (cf. Fig. 9, right column). Sample beets were available for N=4 different cultivars. The training data set of cultivar j is represented as a set of displacements \(\left (u^{j}_{k}\right)_{k\leq K_{j}}\) from the group average \(\bar {{\mathcal {S}}}_{j}\), where K_{j} denotes the sample size. The aim of classification is to assign an arbitrary shape \({{\mathcal {S}}}\) or a corresponding shape displacement u, respectively, to one of these N cultivars. In detail, we need a distance measure d_{j} for each cultivar that quantifies the distance of u to the set \(\left (u^{j}_{k}\right)_{k\leq K_{j}}\). We say that u is classified to cultivar l if d_{l}(u)≤d_{j}(u) for j=1,…,N.
Our distance measure d_{j} is an extension of the classical Mahalanobis distance. First, we assume that we have already performed a PCA (as described above) for each cultivar independently. Let \(\left (\lambda ^{j}_{k}\right)_{k\leq K_{j}}\) and \(\left (w^{j}_{k}\right)_{k\leq K_{j}}\) denote the eigenvalues and eigenvectors computed in the PCA of cultivar j, respectively. We define U_{j} to be the linear span of \(\left (u^{j}_{k}\right)_{k\leq K_{j}}\) and \(U_{j}^{\perp }\) the orthogonal complement with respect to the metric g_{j}(.,.) associated by the jth cultivar. For some truncation value M_{j}≤K_{j} the classical (squared) Mahalanobis distance is given by
Note that \(m_{j}^{2}\left (w^{j}_{k}\right) = \left (\lambda ^{j}_{k}\right)^{1}\), that means the Mahalanobis distance penalizes deviations in direction of dominant eigenvectors less. On the other hand, we have m_{j}(u^{⊥})=0 for each \(u^{\perp } \in U_{j}^{\perp }\), which requires some kind of regularization. If P_{j}u is a projection of u onto U_{j} we decompose \(u = P_{j} u + \left (u\,\,P_{j} u\right) \in U_{j} \oplus U_{j}^{\perp }\) and finally define an extended Mahalanobis distance by
Here β>0 is a regularization parameter. In the experiments shown in Fig. 5 we have chosen β=10^{−5} and M_{j}=13 for all four cultivars.
Numerical implementation. The matching deformations ϕ_{i} between an input object \({{\mathcal {O}}}_{i}\) and the object \({{\mathcal {O}}}\), both represented by characteristic functions χ_{i} and χ, respectively, are computed using a penalty approach, where one minimizes
(for some penalty parameter γ=50) with respect to the deformation ϕ. The first term ensures that the deformation with minimum energy is found, and the second term ensures that ϕ indeed deforms \({{\mathcal {O}}}_{i}\) into \({{\mathcal {O}}}\). In detail, we made use of the polyconvex energy density \(W(D\phi) = \bar W(\D\phi \, \det D\phi)\) with
where the elastic parameters are μ=λ=1. Consequently, the average shape \(\bar {{\mathcal {S}}}\), represented by χ, and the deformations ϕ_{1},…,ϕ_{m} are obtained by minimizing
over all ϕ_{i} and χ. For details we again refer to [33]. The functions χ_{i} and deformations ϕ_{i} are discretized by 129^{3} nodes (i. e. a function value is assigned to each node of a grid with 129 vertices in each space direction) so that the number of degrees of freedom scales with 129^{3}·3·m, yielding a highdimensional problem. The principal component analysis basically involves a singular value decomposition of a symmetric m×m correlation matrix and the solution of m linear systems of equations of size 129^{3} in order to translate the boundary stresses into displacements or shape variations.
Availability of data and materials
The datasets used and analysed during the current study are available from the corresponding author on reasonable request.
Notes
 1.
A characteristic function assigns 1 = true to every data point inside the sugar beet volume and 0 = false to all other points.
 2.
 3.
The rescaling proved to improve the later classification substantially while it does not qualitatively change the shape analysis. See discussion above.
Abbreviations
 G ×E×P :

Genotype ×Environment × Phenotype
 PCA:

principle component analysis
 PDE:

partial differential equation
References
 1
Bradshaw JE. Root And Tuber Crops. vol. 7. Berlin: Springer; 2010, pp. 173–219.
 2
Märländer B, Hoffmann C, Koch HJ, Ladewig E, Niemann M, Stockfisch N, Varrelmann M, Mahlein AK. Sustainable intensification – a quarter century of research towards higher efficiency in sugar beet cultivation. Sugar Ind. 2018; 4:200–17.
 3
Hoffmann CM, Kenter C. Yield potential of sugar beet – have we hit the ceiling?Front Plant Sci. 2018; 9:289.
 4
Vamerali T, Guarise M, Ganis A, Mosca G. Effects of water and nitrogen management on fibrous root distribution and turnover in sugar beet. Eur J Agron. 2009; 31(2):69–76.
 5
Broom I, Edmunds BS, Ip S. Modelling asymmetrical growth curves that rise and then fall: Applications to foliage dynamics of sugar beet (beta vulgaris l.)Ann Bot. 1997; 79(6):657–65.
 6
Shimazu M, Shibata Y, Kataoka T, Okamoto H, Kuriyama A. Development of a measurement device for root diameter of sugar beets. J Soc Agric Mach. 2010; 72(2):177–84.
 7
Persson M, Åstrand B. Classification of crops and weeds extracted by active shape models. Biosyst Eng. 2008; 100(4):484–97.
 8
Midtiby HS, Giselsson TM, Jørgensen RN. Estimating the plant stem emerging points (PSEPs) of sugar beets at early growth stages. Biosyst Eng. 2012; 111(1):83–90.
 9
Moreda GP, Muñoz MA, RuizAltisent M, Perdigones a.Shape determination of horticultural produce using twodimensional computer vision—a review. J Food Eng. 2012; 108(2):245–61.
 10
Paulus S, Behmann J, Mahlein AK, Plümer L, Kuhlmann H. Lowcost 3D systems  well suited tools for plant phenotyping. Sensors. 2014; 14(2):3001–18.
 11
Berkels B, Fletcher PT, Heeren B, Rumpf M, Wirth B. Discrete geodesic regression in shape space In: Heyden A, Kahl F, Olsson C, Oskarsson M, Tai XC, editors. Energy Minimization Methods in Computer Vision and Pattern Recognition. EMMCVPR 2013. Lecture Notes in Computer Science, vol 8081. Berlin: Springer: 2013. p. 108–22.
 12
Wahabzada M, Paulus S, Kerstin C, Mahlein AK. Automated interpretation of 3D laserscanned point clouds for plant organ segmentation. BMC Bioinformatics. 2015; 16:248.
 13
Tardieu F, Tuberosa R. Dissection and modelling of abiotic stress tolerance in plants. Curr Opin Plant Biol. 2010; 13(2):206–12.
 14
Mahlein AK, Kuska MT, Behmann J, Polder G, Walter A. Hyperspectral sensors and imaging technologies in phytopathology: State of the art. Annu Rev Phytopathol. 2018; 56(1):535–58.
 15
Chahal G, Gosal S. Principles and Procedures of Plant Breeding: Biotechnological and Conventional Approaches. Pangbourne: Alpha Science International Ltd.; 2002.
 16
Acquaah G. Principles of Plant Genetics and Breeding. Oxford: Blackwell Publishing Ltd.; 2006.
 17
Iwata H, Nesumi H, Ninomiya S, Takano Y, Ukai Y. The evaluation of genotype × environment interactions of citrus leaf morphology using image analysis and elliptic Fourier descriptors. Breed Sci. 2002; 52(4):243–51.
 18
Neto J, Meyer G, Jones D, Samal A. Plant species identification using elliptic Fourier leaf shape analysis. Comput Electron Agric. 2006; 50(2):121–34.
 19
Iwata H, Ukai Y. SHAPE: a computer program package for quantitative evaluation of biological shapes based on elliptic Fourier descriptors. J Hered. 2002; 93(5):384–5.
 20
Tsialtas J, Maslaris N. Sugar beet root shape and its relation with yield and quality. Sugar Tech. 2010; 12:47–52.
 21
Brunton A, Salazar A, Bolkart T, Wuhrer S. Review of statistical shape spaces for 3d data with comparative analysis for human faces. Comp Vision Image Underst. 2014; 128:1–17.
 22
Cootes TF, Taylor CJ, Cooper DH, Graham J. Active shape modelstheir training and application. Comp Vision Image Underst. 1995; 61(1):38–59.
 23
Cootes TF, Edwards GJ, Taylor CJ. Active appearance models. IEEE Trans Pattern Anal Mach Intell. 2001; 23(6):681–5.
 24
Kendall DG. Shape manifolds, procrustean metrics, and complex projective spaces. Bull Lond Math Soc. 1984; 16(2):81–121.
 25
Srivastava A, Klassen E, Joshi SH, Jermyn IH. Shape analysis of elastic curves in euclidean spaces. IEEE Trans Pattern Anal Mach Intell. 2011; 33(7):1415–28.
 26
Kilian M, Mitra NJ, Pottmann H. Geometric modeling in shape space. ACM Trans Graph. 2007; 26(3):1–8.
 27
Pennec X. Intrinsic statistics on Riemannian manifolds: Basic tools for geometric measurements. J Math Imaging Vision. 2006; 25(1):127–54.
 28
Fletcher PT, Lu C, Pizer SM, Joshi S. Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE Trans Med Imaging. 2004; 23(8):995–1005.
 29
Tournier M, Wu X, Courty N, Arnaud E, Reveret L. Motion compression using principal geodesics analysis. Comput Graph Forum. 2009; 28:355–64.
 30
Heeren B, Zhang C, Rumpf M, Smith W. Principal geodesic analysis in the space of shells. Comput Graph Forum. 2018; 37:173–84.
 31
Ciarlet PG. Mathematical Elasticity, Vol. I: ThreeDimensional Elasticity. In: Studies in Mathematics and its Applications. Amsterdam: Elsevier: 1997.
 32
Rumpf M, Wirth B. Variational methods in shape analysis In: Scherzer O, editor. Handbook of Mathematical Methods in Imaging. New York: Springer: 2011. p. 1363–401.
 33
Rumpf M, Wirth B. A nonlinear elastic shape averaging approach. SIAM J Imaging Sci. 2009; 2(3):800–33.
 34
Rumpf M, Wirth B. An elasticity approach to principal modes of shape variation In: Tai XC, Mørken K, Lysaker M, Lie KA, editors. Scale Space and Variational Methods in Computer Vision. SSVM 2009. Lecture Notes in Computer Science, vol 5567. Berlin: Springer: 2009. p. 709–720.
 35
Rumpf M, Wirth B. An elasticitybased covariance analysis of shapes. Int J Comput Vis. 2011; 92(3):281–95.
 36
Zhang C, Heeren B, Rumpf M, Smith WA. Shell PCA: statistical shape modelling in shell space. In: IEEE International Conference on Computer Vision (ICCV). Santiago: 2015. p. 1671–79.
 37
Hosoi F, Nakabayashi K, Omasa K. 3d modeling of tomato canopies using a highresolution portable scanning lidar for extracting structural information. Sensors. 2011; 11(2):2166–74.
 38
Palacin J, Palleja T, Tresanchez M, Sanz R, Llorens J, RibesDasi M, Masip J, Arno J, Escola A, Rosell J. Realtime treefoliage surface estimation using a ground laser scanner. IEEE Trans Instrum Meas. 2007; 56(4):1377–83.
 39
Omasa K, Hosoi F, Konishi A. 3D lidar imaging for detecting and understanding plant responses and canopy structure. J Exp Bot. 2007; 58(4):881–98.
 40
Paulus S, Dupuis J, Mahlein AK, Kuhlmann H. Surface feature based classification of plant organs from 3D laserscanned point clouds for plant phenotyping. BMC Bioinformatics. 2013; 14:238.
 41
Paulus S, Schumann H, Leon J, Kuhlmann H. A high precision laser scanning system for capturing 3D plant architecture and analysing growth of cereal plants. Biosyst Eng in press. 2014; 121:1–11.
 42
Paulus S, Eichert T, Goldbach HE, Kuhlmann H. Limits of active laser triangulation as an instrument for high precision plant imaging. Sensors. 2014; 14(2):2489–509.
 43
Wagner B, Santini S, Ingensand H, Gärtner H. A tool to model 3D coarseroot development with annual resolution. Plant Soil. 2011; 346(12):79–96.
 44
Rose J, Paulus S, Kuhlmann H. Accuracy analysis of a multiview stereo approach for phenotyping of tomato plants at the organ level. Sensors. 2015; 15(5):9651–65.
 45
Jahnke S, Roussel J, Hombach T, Kochs J, Fischbach A, Huber G, Scharr H. pheno seeder  a robot system for automated handling and phenotyping of individual seeds. Plant Physiol. 2016; 172(3):1358–70.
 46
Russo G, Smereka P. A remark on computing distance functions. J Comput Phys. 2000; 163:51–67.
 47
Sethian JA. A fast marching level set method for monotonically advancing fronts, Vol. 93; 1996. pp. 1591–5.
 48
Nocedal J, Wright SJ. Numerical Optimization, 2nd edn. New York: Springer; 2006.
 49
Hair J, Black W, Babin B, Anderson R. Multivariate Data Analysis, 7th edn. New Jersey: Prentice Hall; 2009.
Acknowledgments
Not applicable.
Funding
This study was funded by the German Federal Ministry of Education and Research (BMBF) within the scope of the interdisciplinary network program CROP.SENSe.net (Funding code: 0315529). The BMBF played no role in the design of the study, the collection, analysis, and interpretation of data and in writing the manuscript. This study was partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy  EXC 2070  390732324. Furthermore, we received funding by the German Research Foundation (DFG) via the “Hausdorff Center for Mathematics” (GZ 2047/1  390685813) and “Mathematics Münster” (EXC 2044  390685587). Open access funding provided by Projekt DEAL.
Author information
Affiliations
Contributions
SP, HK, HG, MR and BW designed the study. HG maintained the plant material, SP carried out the measuring, preprocessing was done by SP and BH. BH and BW implemented the main algorithm and performed all numerical simulations as well as the postprocessing. SP, BH, AKM, BW and MR interpreted the data and drafted the manuscript. All author(s) read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
Not applicable.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Heeren, B., Paulus, S., Goldbach, H. et al. Statistical shape analysis of tap roots: a methodological case study on laser scanned sugar beets. BMC Bioinformatics 21, 335 (2020). https://doi.org/10.1186/s12859020036548
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12859020036548
Keywords
 Tap roots
 Shape variation
 Energy minimization
 3D point clouds