Skip to main content
Fig. 5 | BMC Bioinformatics

Fig. 5

From: Visually guided classification trees for analyzing chronic patients

Fig. 5

Visually guided classification tree. Example of the design of a classification tree guided by visual LDA plots. In each node we show the total number of samples contained in its associated region, and in square brackets the number of samples per class [CRG-5192, CRG-6144, CRG-7071]. Initially the dataset is balanced (there are the same number of samples per CRG). The root node splits the dataset in two by the presence or absence of the diagnosis code ‘250’ in a guided way, considering the LDA plot above the node (also shown in Fig. 4). Most of the samples of the left branch belong to CRG-5192, and thus this node can be a leaf node representing that class if the clinician considers it appropriate. In the right branch, a new LDA plot is computed with the data subset for which the code ‘250’ =1 (i.e., is present). In this case, the drug code ‘N05AL’ is the feature that better separates samples from CRG-6144 and CRG-7071. Thus, we consider a new split based on this feature. Its right branch (presence) only contains CRG-7071 samples, and therefore constitutes a leaf node. Subsequently, we generate a new LDA plot with the remaining data for which ‘N05AL’ is absent. In this case, the diagnosis code ‘496’ is the longest axis vector that points towards the direction that better helps to separate the classes, and it is selected to split the corresponding region. Since most of the samples in the right branch belong to CRG-7071, we consider that it should be a leaf node. In the left branch, this process could be performed recursively until each node is defined as a leaf. In this case, since most of the samples belong to CRG-6144, the process could be halted, creating the leaf node

Back to article page