Automated interpretation of 3D laserscanned point clouds for plant organ segmentation
 Mirwaes Wahabzada^{1}Email author,
 Stefan Paulus^{2},
 Kristian Kersting^{3} and
 AnneKatrin Mahlein^{1}
Received: 28 January 2015
Accepted: 8 July 2015
Published: 8 August 2015
Abstract
Background
Plant organ segmentation from 3D point clouds is a relevant task for plant phenotyping and plant growth observation. Automated solutions are required to increase the efficiency of recent highthroughput plant phenotyping pipelines. However, plant geometrical properties vary with time, among observation scales and different plant types. The main objective of the present research is to develop a fully automated, fast and reliable data driven approach for plant organ segmentation.
Results
The automated segmentation of plant organs using unsupervised, clustering methods is crucial in cases where the goal is to get fast insights into the data or no labeled data is available or costly to achieve. For this we propose and compare data driven approaches that are easytorealize and make the use of standard algorithms possible. Since normalized histograms, acquired from 3D point clouds, can be seen as samples from a probability simplex, we propose to map the data from the simplex space into Euclidean space using Aitchisons log ratio transformation, or into the positive quadrant of the unit sphere using square root transformation. This, in turn, paves the way to a wide range of commonly used analysis techniques that are based on measuring the similarities between data points using Euclidean distance. We investigate the performance of the resulting approaches in the practical context of grouping 3D point clouds and demonstrate empirically that they lead to clustering results with high accuracy for monocotyledonous and dicotyledonous plant species with diverse shoot architecture.
Conclusion
An automated segmentation of 3D point clouds is demonstrated in the present work. Within seconds first insights into plant data can be deviated – even from nonlabelled data. This approach is applicable to different plant species with high accuracy. The analysis cascade can be implemented in future highthroughput phenotyping scenarios and will support the evaluation of the performance of different plant genotypes exposed to stress or in different environmental scenarios.
Keywords
Background
Recent phenotyping platforms implement a variety of imaging methods, such as 3Dscanning, RGBimaging, spectral imaging, and/or chlorophyll fluorescence imaging to collect data for quantitative and qualitative studies on plant genotypes in different stress scenarios [1, 2]. The advantage of optical sensor methods in highthroughput screenings is, that a high number of plants can be investigated in time course experiments; and – due to the nondestructive nature of the sensors – the same individual can be observed over time (in contrast to analytical and destructive approaches). Furthermore these sensor methods eliminate the human bias which always occurs when plants are rated visually or manually [3, 4]. Although the current state of the art in sensing plants is far from fully recapitulating entire plant systems, optical sensing systems come close to this ambitious aim. The step towards bridging the ’phenotyping bottleneck’ by technical in plant breeding demands sophisticated sensing approaches and adequate data analysis methods [5–7].
Common methods to assess characteristic and functional parameters of plants from their architecture and geometry by optical sensors are 3Dlaserscanning or photogrammetric techniques [8, 9]. Laserscanning has the advantage of a high resolution, combined with a high accuracy, including direct access to the 3D point cloud. These highly resolved 3D point clouds allow an accurate description of the geometry of plant organs and of subtle changes due to abiotic or biotic stress [10]. Plant attributes of relevance which can be deduced from 3D point clouds are plant biomass, growth curves, size and number of relevant plant organs, proportions among single plant organs (i.e. leave, stem and ears of cereals), or shape parameters (product quality).
The segmentation of plant organs is an important task in data analysis. In literature different approaches were proposed. One strategy is the use of a preprocessed mesh representation, and a manual partition of the mesh into morphologic regions [9]. This step has recently been automated [11], but still requires the preprocessed mesh representation of the 3D measurements. Other works aiming at the classification of laser scanned data are used in robotics, e.g. for object or scene recognition/interpretation. For instance, methods that can be subordinated under collective classification approaches take the surrounding information of a point into account. However, they often rely on complex algorithms, are time consuming, and much research has gone into the direction making them more efficient (see [12] and references). One way for identification and segmentation of plant organs without time and labor intensive preprocessing are surface feature histograms. As it has been shown before in Paulus et al. [8], they are an innovative and suitable method for plant organ parametrization from 3D data. These histograms have been developed to recognize geometric primitives in 3D point clouds, where e.g. planes, cylinders and spheres show specific and easy to distinguish histograms. The reason why plants organs lead to specific feature histograms and provide a good separation is that leaf and stem very well correspond to primitives like plane or cylinders, for example. It has been previously shown, that this method is independent to the point to point distance and applicable to multiple plants. Therefore, the surface feature histograms provide an interpretation based on the geometry of the surface and can be used as input for machine learning algorithms like Support Vector Machines (SVM) [13]. As the histogram representation is influenced by the points neighborhood, it makes the application of algorithms such as SVM’s also possible in general. However, for classification a crucial amount of prior knowledge is important. Until now these approaches require a manual supervision of the model after the data is measured. A fully automated data analysis cascade is missing but highly desirable, to save the time and cost for manual labelling the training data by skilled operators.
Triggered by this, we tackle the challenge of how to efficiently analyze this huge amount of data. In particular, we investigated the question "Can machines help to facilitate the segmentation of plant organs if no labeled data is given?" and show that this is indeed the case. Specifically, we group the surface feature histograms, acquired from 3D point clouds, using unsupervised clustering approaches. The benefit of unsupervised methods is that they can be used for exploratory data analysis and do not require labeled data, such as class information. A common and widely used method for this is kmeans clustering using the Euclidean distance, for which good approaximation guarantees are known. However, since our data consists of normalized histograms, using solely the Euclidean distance may be not appropriate. Consequently, we propose a data driven approximation approach that is based on mapping the data into a different space in a preprocessing step. More precisely, since the histograms can be seen as points on a probability simplex, we propose to map the data from the simplex into Euclidean space using Aitchison geometry [14–16] or into the positive quadrant of the unit sphere [17]. This, in turn, makes it possible to employ the Euclidean distance to measure the similarities between normalized histograms in the space mapped to. Actually, since we change the way we represent the data, any standard methods devised for the Euclidean space can be used. For instance, matrix factorization methods [18, 19] become applicable, where kmeans is subordinated. Additionally, based on distance computations we can compute an hierarchical decomposition of the data [20], which can also be used in context of spectral clustering [21]. Furthermore, the proposed approach can also be beneficial for supervised learning, such as SVM’s using RBFkernel, where a common choice is the squared Euclidean distance.
Methods
The work flow of the current paper is illustrated in Fig. 1. After data acquisition with a 3D laser scanner, histograms were calculated on the point cloud data. These histograms were used for clustering the data. In a final step the evaluation of the result regarding accuracy, speed and applicability was conducted.
Notation: We denote vectors by lower case letters (\(\vec {x}\)); a realvalued vector of size m is written as \(\vec {x}\in \mathbb {R}^{m}\); subscripted lower case italic (x _{ j }) refer to the components of a vector; matrices are written as bold upper case letters (X); a realvalued m×n matrix is written as \(\textit {\textbf {X}}\in \mathbb {R}^{m\times n}\) or using the shorthand X ^{ m×n }.
Histogram calculation
Histogram based surface representations have been proven to enable the identification of geometrical primitives in lowresolution point clouds acquired on robotic carrier systems [22]. Coming from robotics, point feature histograms were originally used for the detection of basic geometric shapes in lowresolution laser scans [22, 23] and for a registration of different laser scan viewpoints [24]. Surface feature histograms, a histogram advancement, recently showed their applicability for the segmentation of organs on grapevine and wheat [8], as well as in barley for an organ based parametrization in time course experiments [25]. These histograms encode the information of the surface as e.g. curvature using the neighbourhood of a point and the surface normals. This curvature is characteristic for the surface of e.g. plant leaves and stems and can be used as an input for machine learning methods like SVM to classify these organs automatically. Different geometrical features were calculated and their value domain is subdivided into 5 subregions. Each combination of these subregions corresponds to one histogram bin. By this, a representation of the geometrical neighborhood of one point in the 3D space by a histogram including 125 (histogram) bins is possible.
where β is a weight function \(\beta _{j}= 1\left (0,5+\frac {d(\vec {z}_{i},\vec {z}_{j})}{r_{H}}0.5\right)\). The use of the weights β for the calculation of the final histograms ensures that histograms of points near the limit of the radius r _{ H } have lower impact than those closer tho the point \(\vec {z}_{i}\). For a detailed description we refer to Paulus et al. [8].
Metrics for measuring histogram similarity
Here, \(\vec {\mu }_{j}\) denotes the cluster representative, ζ _{ ij } is binary, that is ζ _{ ij }∈{0,1}, describing the cluster membership of a data point x _{ i } to cluster j.
However, using the Euclidean distance directly for analyzing surface feature histograms is not a sensible idea, as it is known to be sensitive to noise and does not generalize well [28]. Therefore, we propose a data driven approach by looking at the properties of the data itself. Since the histograms represent proportions that sum to one, they can be considered to be samples from a probability simplex. In other words, we are interested in clustering normalized histograms on the simplex. For doing this, we consider two different approaches that are based on simple data transformation as preprocessing. The presented approaches are not only easytorealize but still employ the Euclidean distance for measuring histogram similarities. In turn standard algorithms for clustering or classification of normalized data, for example, can be used.
In the following we will focus on kmeans, as it is a simple and widely used method for clustering objects and a number of efficient implementations exists for parallel and streaming settings [29]. Since we use it here for clustering normalized histograms, we will discuss and motivate two approaches for measuring the histogram similarity.
Hellinger distance
This, in turn, is equivalent to the square of the Euclidean distance, as given by Eq. (6), between the square root of two data points \(\vec {x}\) and \(\vec {y}\). Thus, clustering of data using square root transformations and kmeans should lead to a good clustering in terms of minimizing Hellinger distance between each object and its nearest cluster center. It can be shown that this yields an O(logn) approximation of clustering based on minimizing KLdivergence [17]. However, KLdivergence do not satisfy the metric properties, i.e. it is is not symmetric and do not satisfy the triangle inequality. The latter point holds also for its symmetric alternatives, such as Jeffrey’s Divergence [31].
i.e. transform the data from simplex space into positive quadrant of the unit sphere [17]. The resulting representation, in turn, can be used to find a clustering of histograms, as considered in the paper, using standard implementations of kmeans. Since the cluster centers for the mapped data do not lie on the unit sphere, we recompute them using the original histograms and cluster assignments. This make sure that the cluster centers lie on the simplex.
Aitchison distance
and its inverse \(clr^{1}(\vec {y}) = \left ({\exp {(y_{1})}}/{\sum _{j}\exp {(y_{j})}},\ldots,\right.\left. {\exp {(y_{m})}}/{\sum _{j}\exp {(y_{j})}}\right)\). Thus, we can use clr transformed histograms with Euclidean distance within kmeans clustering. Note, other transformations, such as isometric logratio transformation [32], may be used as well. It solves the clr problem that leads to singular covariance matrix, by preserving its properties like isometry between the simplex and the real space.
However, since the histograms, considered in this work, also consist of empty or zero bins, hence, this leads to numerical problems when computing \(clr(\vec {x})\) due to the logarithm and as also the geometric mean in the denominator is \(g(\vec {x})=0\) if any x _{ j }=0 for j=1,…,m. Finding a good choice for replacing them is essential when using log ratio transformations (see [33] and references), e.g. for missing or rounded values. For the histogram analysis, Wahl et al. [28] suggested to replace the zero bins by a small common value, which is lower as the smallest nonzero value. For the experiments in the current work we used a simple procedure by adding a small value ε to all data points. It has shown that using this approach will lead to a better clustering using clr approach, compared to replacing only zero bins across different datasets. Note, by contrast, for the SQrapproach we do not need to care about the zero bins.
Histogram clustering algorithm
The overall procedure for clustering the normalized histograms acquired from 3D point clouds is summarized in Algorithm 1. We start by transforming the data using either SQr or the clr approach [lines 1–4]. Then, on the new representation of the data, we run kmeans clustering in [lines 5–14], which can be done using an EMalgorithm by iteratively optimizing the cluster memberships which are stored in a matrix Z [lines 8–10] (Estep) and computing the cluster representatives in matrix M [lines 11–13] (Mstep). Finally we determine the cluster representatives on the simplex \(\pmb {\tilde M}\) using the inverse centered log ratio transformation for the clr approach. For the SQr approach we use the cluster assignments in Z and the original inputs X to get the final cluster centers.
However, as we transform our data before clustering and do not change the underlying algorithms, the time complexity remains the same. For the transformations we need only one pass over the entire dataset. This, in turn, can be easily parallelized or can also be done sequentially, to overcome memory issues. Using kmeans as given by Algorithm 1 [lines 5–14] enables to find a local optimum, whereas finding of a global optimum is an NPhard problem [34], even for k = 2.
Data acquisition
The data was acquired with the 3D measuring combination of an articulated measuring arm (Romer Infinite 2.0 (1.4 m), Hexagon Metrology Services Ltd., London UK) and laser triangulation sensor (Perceptron Scan Works V5, Perceptron Inc., Plymouth, MI, USA). This combination has been proven regarding applicability for plant measuring and accuracy for the scanning of grapevine, wheat and barley [8, 10]. It provides an accuracy of about 45 μ m for points within the 2Dscanning field. The single 2Dscan lines were combined automatically by the articulated measuring arm to a 3D point cloud. The measuring arm enables imaging an almost occlusion free point cloud by using many different points of view. The point cloud was processed using Geomagic Studio 12 (Raindrop Geomagic Inc, Morrisville, NC, USA).
The preprocessing of the point cloud is limited to the cutting of scanned objects that do not belong to the focussed object. Furthermore the point cloud density is reduced to an uniform grid of 0.5 m m point to point distance, this is necessary due to the scanning method that produces an inhomogeneous point resolution all over the point cloud according to the speed that sensor is moved over the object.
Datasets

Grapevine (stem, leaves): The grapevine plants (Vitis vinifera ssp. vinifera, variety Mueller Thurgau) were grown in commercial substrate in plastic pots (\(\varnothing 170~mm\)) under greenhouse conditions. The plants were watered and fertilized on demand. Environmental parameters were kept constant at 23/20 °C (day/night), 60 % relative humidity and a photoperiod of 16 h. The measurement was done at growth stage 19 (according to BBCH, [35]). We had a total number of n=55635 calculated histograms, each with a length of m=125. For our evaluation we could make use of label information (stem and leaf), which were set manually by a human annotator.

Grapevine (berry, rachis): The second grapevine datasets (Vitis vinifera ssp. vinifera, variety Mueller Thurgau) included the berries and the rachis. It was grown on a vineyard at Geilweilerhof, Sindelfingen, Germany in Summer of 2012. This point cloud consisted of a total number of n=57989 histograms. For this dataset no label information was given, because the segmentation is even manually very hard.

Wheat: The wheat plants (Triticum aestivum, variety Taifun) were grown in plastic pots (\(\varnothing 200~mm\)) under similar conditions as the grapevine plant. The measurement was done at growth stage BBCH 85. The dataset consisted of n=215090 histograms. For this dataset manually determined labels for histograms on the ear, stem and leaves were provided.

Barley: Additionally we used three barley datasets (Hordeum vulgare L, CV. Barke). They were grown in plastic pots (\(\varnothing 16~cm\)) in a green house under similar conditions as the grapevine plant. The measurements followed the same plant at different developing stages (19, 26, 31 days after sowing). They consisted of a total number of n=15064 (plant 1, BBCH 12), n=41167 (plant 2, BBCH 21) and n=139465 (plant 3, BBCH 23) histograms. For each histogram the labels (leaf or stem) were provided and used for the evaluation.
All histogram calculations used fixed radii for the normal and histogram calculation r _{ N }=2.5 and r _{ H }=12.5 according to [8].
Results and discussion

KM: histogram clustering using using kmeans and Euclidean distance on normalized histograms directly.

HC1: histogram clustering where we transformed the data using Eq. (9) before processing.

HC2: histogram clustering, where the data was transformed using clr approach as given by Eq. (11), before processing.
In this work we used a simple procedure for replacing the zero bins by adding a small value \(\epsilon =\frac {1}{m}\) to all data points, where m denotes the number of bins used for histogram computation, and normalized the data before computing the clr transformation. This led to similar or better clusterings compared to other settings in the range 10^{−16}≤ε≤10^{−1}. Note, the zero bins were replaced only for computing the HC2, whereas for HC1 we used the original inputs directly.
With respect to application within plant phenotyping, the needed amount of clusters is often known or given before/or during the experiment, as one is looking for specific plant organs. As long as it is aimed to separate leaves and stems, it is recommended to use two clusters, one for each organ. Using more clusters enables the recognition of further classes like inlaying berries or leaf border points which have not been focused before. However, in such cases determining the number of clusters automatically may be crucial; we left this questions for the further work. For the sake of better visualization we show for the qualitative results in the following only clusterings learned for a small number of clusters. All experiments were conducted on a standard computer with 3.2 GHz Intel Core i73930K and 16 GB main memory.
Quantitative comparison of histogram clustering approaches
where n _{ ij } denote the number of histograms with label i in cluster j and n _{ j } the total number of objects in cluster j. A lower entropy value stands for a better clustering, indicating that clusters contain mostly objects with similar labels.
The Fmeasure in Fig. 2 (top row) clearly show that histogram clustering using data transformations outperforms the naive method on all datasets. The best results are achieved if the number of clusters is equal to the number of different labels, which is k=2 for grapevine and barley dataset, and k=3 for wheat dataset. Additionally, the middle row in Fig. 2 shows the entropy results. A lower value indicates that the clusters contain mostly histograms with a particular label. Here, using histogram clustering, as given by Algorithm 1, outperforms the direct application of kmeans clustering for grapevine and wheat dataset. For the barley data set it is comparable or better than kmeans. The lower value for the larger number of classes indicates a better separation between leaves and stems for all methods. For grapevine and wheat dataset the differences are small, which indicates that we are already good even for lower number of cluster. For all datasets the algorithm required only few minutes per run and number of clusters (k=2,…,8) to get the clustering, as shown in Fig. 2 (bottom row).
Automated identification of plant organs
In the results for the grapevine dataset it was possible to distinguish between rachis and parts containing the berry using histogram clustering approaches HC1 and HC2 (Fig. 3). More interestingly, the clusterings can distinguish between the berry surface, where individual grapes are well captured by the 3D laserscans, and parts belonging to the inner parts of the fruit. However, using kmeans directly does not capture this well, as shown in Fig. 3 first column. It needed one more cluster (k=4) to separate berry and rachis parts, but also required one more cluster to describe the parts on the fruit, compared to other methods. Interestingly, the clusters achieved for the barley dataset show a more accurate differentiation of different parts and are more coherent if using Algorithm 1 and data mappings (HC), compared to running kmeans (KM) directly. This is illustrated in Fig. 4 first column, where also big parts on the leaves are assigned to the cluster containing the histograms from stem. By contrast, using HC(1,2) lead to more clearly distinguished clusters, that also can facilitate further labeling of the data. However, in cases when very large datasets and varying dimensionalities need to be analyzed, finding a good choice for ε to replace zero bins can be time consuming and tricky if using HC2 (clr approach). Therefore, the use of HC1 (SQr approach) may be an option, as it also led to results of similar quality compared to those found by HC2. The results for the remaining datasets are shown in Additional files 2 and 3 and can be thought of as another justification of quantitative results, discussed in the previous subsection.
In general, the results show that the time consuming and costly work of manual labelling can be automated in high precision. Furthermore, the clustering with an undefined amount of clusters for regions of points with similar surface structure become visible. This helps to get a deeper knowledge of the plants/organs structure as it is now possible e.g. to access transition regions between single organs. Moreover, by using unlabeled data we could show that our clustering enables an organ segmentation even when manual labelling is very hard or almost impossible. Interestingly, the clustering of the grapevine fruit enabled the segmentation of the inner skeleton which is hard to access by the human eye.
Conclusions
Modern plant phenotyping with diverse sensors and exhaustive time series measurements of multiple replicates arose an increasing demand for task orientated data analysis solutions. The present paper provided data driven approaches for plant organ segmentation that make the use of standard algorithms, such as kmeans with the Euclidean distance, possible. Actually any data analysis method that build on similarities or distance computations between surface feature histograms, acquired from 3D point clouds, is applicable. We achieved an automation of the data analysis pipeline and a reduction of prior knowledge for the interpretation of plant surfaces. By clustering the histogram representation, different classes of the input point cloud could be identified and separated. Our approach shows that manual labeling can be automated. This approach can especially be used when manual labeling becomes extremely hard due to occlusion or in case that is only possible by viewing from a specific direction. Automated labeling allows the segmentation of unintuitive surface regions, which enables a more objective way for surface segmentation of plants. Besides getting fast insights on the data one may additionally use the result of automated clustering to subsequently support active learning approaches. Current stateoftheart research in developing descriptors for 3D surfaces [40] suggests that our method can easily be transferred to various 3D descriptors like Spin Images, Shape Context or Local Surface Patches. The presented data analysis pipeline will speed up the assessment of geometrical features in highthroughput plant phenotyping.
Declarations
Acknowledgements
This work could be carried out due to the financial support of the German Federal Ministry of Education and Research (BMBF) within the scope of the competitive grants program “Networks of excellence in agricultural and nutrition research  CROP.SENSe.net” (Funding code: 0315529).
Authors’ Affiliations
References
 Fiorani F, Rascher U, Jahnke S, Schurr U. Imaging plants dynamics in heterogenic environments. Curr Opin Biotechnol. 2012; 23(2):227–35.View ArticlePubMedGoogle Scholar
 Sozzani R, Busch W, Spalding EP, Benfey PN. Advanced imaging techniques for the study of plant growth and development. Trends Plant Sci. 2014; 19(5):304–10.View ArticlePubMedPubMed CentralGoogle Scholar
 Mahlein AK, Oerke EC, Steiner U, Dehne HW. Recent advances in sensing plant diseases. Eur J Plant Pathol. 2012; 133:197–209.View ArticleGoogle Scholar
 Berdugo CA, Zito R, Paulus S, Mahlein AK. Fusion of sensor data for the detection and differentiation of plant diseases in cucumber. Plant Pathol. 2014; 63(6):1344–56.View ArticleGoogle Scholar
 Furbank RT, Tester M. Phenomics–technologies to relieve the phenotyping bottleneck. Trends Plant Sci. 2011; 16(12):635–44. ISSN 18784372.View ArticlePubMedGoogle Scholar
 Wahabzada M, Mahlein AK, Bauckhage C, Steiner U, Oerke EC, Kersting K. Metro maps of plant disease dynamics: Automated mining of differences using hyperspectral images. PLoS ONE. 2015; 10(1):e0116902. doi:10.1371/journal.pone.0116902.View ArticlePubMedPubMed CentralGoogle Scholar
 Kuska M, Wahabzada M, Leucker M, Dehne HW, Kersting K, Oerke EC, et al. Hyperspectral phenotyping on the microscopic scale: towards automated characterization of plantpathogen interactions. Plant Methods. 2015; 11(1):28. ISSN 17464811, doi10.1186/s1300701500737, http://www.plantmethods.com/content/11/1/28.
 Paulus S, Dupuis J, Mahlein AK, Kuhlmann H. Surface feature based classification of plant organs from 3D laserscanned point clouds for plant phenotyping. BMC Bioinf. 2013; 14(1):238.View ArticleGoogle Scholar
 Frasson RPdM, Krajewski WF. Threedimensional digital model of a maize plant. Agric Forest Meteorology. 2010; 150(3):478–88.View ArticleGoogle Scholar
 Paulus S, Schumann H, Leon J, Kuhlmann H. A high precision laser scanning system for capturing 3D plant architecture and analysing growth of cereal plants. Biosystems Engineering. 2014; 121:1–11.View ArticleGoogle Scholar
 Paproki A, Sirault X, Berry S, Furbank R, Fripp J. A novel mesh processing based technique for 3D plant analysis. BMC Plant Biol. 2012; 12(1):63.View ArticlePubMedPubMed CentralGoogle Scholar
 Behley J, Kersting K, Schulz D, Steinhage V, Cremers AB. Learning to hash logistic regression for fast 3D scan point classification. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Taipei, Taiwan: 2010. p. 5960–5.Google Scholar
 Vapnik NV. Statistical Learning Theory. New York: Wiley; 1998. ISBN 0471030031, http://www.zentralblattmath.org/zmath/en/search/?an=0935.62007.Google Scholar
 Aitchison J. The statistical analysis of compositional data. J R Stat Soc. 1982; 44(2):139–77.Google Scholar
 Aitchison J. On criteria for measures of compositional difference. Math Geol. 1992; 24(4):365–79.View ArticleGoogle Scholar
 Aitchison J, BarceloVidal C, MartinFernandez JA, PawlowskyGlahn V. Logratio analysis and compositional distance. Mathematical Geology. 2000; 32(3):271–5.View ArticleGoogle Scholar
 Chaudhuri K, McGregor A. Finding metric structure in information theoretic clustering. In: Proceedings of the Conference on Learning Theory (COLT), Helsinki, Finland: 2008. p. 391–402.Google Scholar
 Thurau C, Kersting K, Wahabzada M, Bauckhage C. Convex nonnegative matrix factorization for massive datasets. Knowledge Inf Syst. 2011; 29(2):457–78.View ArticleGoogle Scholar
 Thurau C, Kersting K, Wahabzada M, Bauckhage C. Descriptive matrix factorization for sustainability: Adopting the principle of opposites. J Data Min Knowledge Discovery. 2012; 24(2):325–54.View ArticleGoogle Scholar
 Kersting K, Wahabzada M, Thurau C, Bauckhage C. Hierarchical convex NMF for clustering massive data. In: Proceedings of the 2nd Asian Conference on Machine Learning (ACML), Tokyo, Japan, JMLR Workshop and Conference Proceedings, vol. 13. JMLR.org: 2010. p. 253–68.Google Scholar
 Yan D, Huang L, Jordan MI. Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Paris, France. Paris, France: 2009. p. 907–16.Google Scholar
 Rusu RB, Holzbach A, Blodow N, Beetz M. Fast geometric point labeling using conditional random fields. In: Proceedings of the 22nd IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) St. Louis, MO, USA: Oct 2009. p. 7–12. ISBN 9781424438037.Google Scholar
 Rusu RB, Marton ZC, Blodow N, Dolha M, Beetz M. Towards 3D point cloud based object maps for household environments. Robotics and Autonomous Systems. Nov 2008; 56(11):927–41. ISSN 09218890.View ArticleGoogle Scholar
 Dupuis J, Paulus S, Behmann J, Plümer L, Kuhlmann H. A multiresolution approach for an automated fusion of different lowcost 3D sensors. Sensors. 2014; 14:7563–79.View ArticlePubMedPubMed CentralGoogle Scholar
 Paulus S, Dupuis J, Riedel S, Kuhlmann H. Automated analysis of barley organs using 3D laser scanning  an approach for high throughput phenotyping. Sensors. 2014; 14(7):12670–86. doi:10.3390/s140712670.View ArticlePubMedPubMed CentralGoogle Scholar
 Rusu RB, Blodow N, Beetz M. Fast point feature histograms (FPFH) for 3D registration. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Kobe, Japan: May 2009. p. 3212–17. ISBN 9781424427888.Google Scholar
 Bishop CM. Pattern Recognition and Machine Learning. New York: Springer; 2006. ISBN 0387310738.Google Scholar
 Wahl E, Hillenbrand U, Hirzinger G. Surfletpairrelation histograms: a statistical 3Dshape representation for rapid classification. In: Proceedings of Fourth International Conference on 3D Digital Imaging and Modeling (3DIM), Banff, Canada: 2003. p. 474–81.Google Scholar
 Apache Software Foundation. Apache Mahout: Scalable machinelearning and datamining library.http://mahout.apache.org.
 Rusu RB, Marton ZC, Blodow N, Beetz M. Persistent point feature histograms for 3D point clouds. In: Proceedings of the 10th International Conference on Intelligent Autonomous Systems (IAS), BadenBaden, Germany: 2008. p. 119–28.Google Scholar
 Vajda I. On metric divergences of probability measures. Kibernetika. 2009; 45(6):885–900.Google Scholar
 Egozcue JJ, PawlowskyGlahn V, MateuFigueras G, BarceloVidal C. Isometric logratio transformations for compositional data analysis. Math Geol. 2003; 35(3):279–300.View ArticleGoogle Scholar
 MartínFerníndez JA, BarcelóVidal C, PawlowskyGlahn V. Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Math Geol. 2003; 35(3):253–78. ISSN 08828121, doi:http://dx.doi.org/10.1023/A:1023866030544.View ArticleGoogle Scholar
 Dasgupta S, Freund Y. Random projection trees and low dimensional manifolds In: Ladner RE, Dwork C, editors. Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing (STOC), Victoria, British Columbia, Canada: May 17–20 2008. p. 537–46.Google Scholar
 Lorenz DH, Eichhorn KW, Bleihilder H, Klose R, Meier U, Weber E. Growth stages of the grapevine: Phenological growth stages of the grapevine (vitis vinifera l. ssp. vinifera)codes and descriptions according to the extended bbch scale. Aust J Grape and Wine Res. 1995; 1(2):100–3.View ArticleGoogle Scholar
 Manning CD, Raghavan P, Schütze H. Introduction to Information Retrieval. New York, NY, USA: Cambridge University Press; 2009.Google Scholar
 Amigó E, Gonzalo J, Artiles J, Verdejo F. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf Retrieval. 2009; 12(4):461–86.View ArticleGoogle Scholar
 Zhao Y, Karypis G. Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning. 2004; 55(3):311–31. ISSN 08856125.View ArticleGoogle Scholar
 Nguyen HT, Smeulders AWM. Active learning using preclustering. In: Proceedings of International Conference on Machine Learning (ICML), Banff, Alberta, Canada: 2004. p. 79–86.Google Scholar
 Guo Y, Bennamoun M, Sohel F, Lu M, Wan J, Kwok NM. A comprehensive performance evaluation of 3D local feature descriptors. Int J Comput Vision. 2015:1–24. ISSN 09205691, doi:http://dx.doi.org/10.1007/s112630150824y.
Copyright
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.