Volume 14 Supplement 1
Computational Intelligence in Bioinformatics and Biostatistics: new trends from the CIBB conference series
An extensible sixstep methodology to automatically generate fuzzy DSSs for diagnostic applications
 Antonio d'Acierno^{1}Email author,
 Massimo Esposito^{2} and
 Giuseppe De Pietro^{2}
DOI: 10.1186/1471210514S1S4
© d'Acierno et al.; licensee BioMed Central Ltd. 2013
Published: 14 January 2013
Abstract
Background
The diagnosis of many diseases can be often formulated as a decision problem; uncertainty affects these problems so that many computerized Diagnostic Decision Support Systems (in the following, DDSSs) have been developed to aid the physician in interpreting clinical data and thus to improve the quality of the whole process. Fuzzy logic, a well established attempt at the formalization and mechanization of human capabilities in reasoning and deciding with noisy information, can be profitably used. Recently, we informally proposed a general methodology to automatically build DDSSs on the top of fuzzy knowledge extracted from data.
Methods
We carefully refine and formalize our methodology that includes six stages, where the first three stages work with crisp rules, whereas the last three ones are employed on fuzzy models. Its strength relies on its generality and modularity since it supports the integration of alternative techniques in each of its stages.
Results
The methodology is designed and implemented in the form of a modular and portable software architecture according to a componentbased approach. The architecture is deeply described and a summary inspection of the main components in terms of UML diagrams is outlined as well. A first implementation of the architecture has been then realized in Java following the objectoriented paradigm and used to instantiate a DDSS example aimed at accurately diagnosing breast masses as a proof of concept.
Conclusions
The results prove the feasibility of the whole methodology implemented in terms of the architecture proposed.
Background
Making an effective and efficient medical diagnosis is pivotal in clinical daily practice, clearly because of the impact of this singular decision making process in the eventual illness trajectory and disease management. For such a reason, the optimization of the diagnostic process in terms of number and duration of patient examinations, with corresponding accuracy, sensitivity, and specificity, is known to reduce morbidity and mortality rates, control costs and improve both doctorpatient and communityfacility relationships [1].
The task of medical diagnosis, like almost any other diagnostic process, is made more complex to obtain even for a medical expert because of a web of relevant uncertainties, in the form of information incompleteness, impreciseness, fragmentariness, not fully reliability, vagueness and contradictoriness [2]. Specifically, patients may not be able to describe exactly the natural history of their disease in terms of what has happened to them or how they feel; doctors and health care practitioners may not understand or interpret exactly what they hear or observe; the accuracy of available laboratory reports, which may come with some degree of error; and the effects of treatment in an individual patient or in a group or population as a whole in terms of how diseases alter the normal functioning of the body [3]. The need to identify the most accurate medical diagnosis in a very timely manner increases dramatically in the case of mortal diseases, as both the rapid and accurate diagnosis and prompt initiation of treatment are recognized as necessary conditions to limit further complications, cut costs and reduce human suffering.
In order to improve the possibility of early and accurate diagnosis of illness, there is thus the need for the application of Diagnostic Decision Support Systems (in the following, DDSSs) in the process, because these are known to improve practitioner's performance, reduce costs and improve patient outcomes [4].
The first DDSSs proposed in literature relied on crisp models based on thresholding for solving medical classification problems. Nevertheless, they neither take into account the fuzziness of input data nor reproduce the expert decisionmaking process applied in a vagueladen domain such as medicine. As a matter of fact, the decisionmaking model every trustworthy physician has in mind to perform heuristic diagnosis is often pervaded by uncertainty and vagueness.
Recently, DDSSs based on a multivalued logic and, in particular, on Fuzzy Logic, have been applied to medical classification problems (see, for example, [5], [6] and [7]) demonstrating their capability both to overcome the problem of managing imprecise and uncertain knowledge and offer a support for the medical decision making process. Moreover, Fuzzy Logic is quite close to natural language and allows presenting the results to clinicians in a more natural form. This makes fuzzy based DDSSs more acceptable to human users than black box systems, because both the semantic expressiveness and reasoning (by using linguistic variables and rules) are comprehensible and may be validated by human inspection.
Crisp and fuzzy modeling
The first DDSSs were mainly based on Crisp Logic, which provides an inference morphology for drawing conclusions from existing neat and clearcut information: specifically, new truths can be inferred from old ones. In more detail, it relies on rules, defined as conditional statements written in the following form: if crispAntecedent then crispConsequent where crispAntecedent is a crisplogic expression made of one or more simple predicates linked by logical operators and depending on input crisp variables, and crispConsequent is an expression of output variables which are assigned crisp values. A crisp variable C (also named feature) is characterized by a set of crisp values c it can assume, whereas a predicate P classifies the values belonging to a variable into two groups or categories, i.e. values that make it either true or false, respectively. In other words, the set defined by P(c) is written as: {cP(c)}, and is just a collection of all the values for which P is true. For instance, {cc is a positive integer less than 3} is the set {1,2}.
Crisp modeling is the task of determining the parameters characterizing a rule base, classified into the followings:

Structural parameters. Related mainly with the size of the rule base, they include the number of variables involved in the rules and the number of rules.

Connective parameters. Related with the topology of the rule base, these parameters include antecedents, consequents, and weights of the rules.
Unlike Crisp Logic, Fuzzy Logic resembles human reasoning in its use of vague information to generate decisions [8], where vague predicates are used and values belonging to a variable cannot be classified into two groups (either true or false). In this sense, Fuzzy Logic incorporates an alternative way of reasoning, which allows modeling complex systems using a higher level of abstraction originating from knowledge and experience [9].
The fuzzy inference morphology relies on a Fuzzy Inference System (in the following, FIS) based on if fuzzyAntecedent then fuzzyConsequent rules, where fuzzyAntecedent is a fuzzylogic expression made of one or more simple fuzzy expressions linked by fuzzy operators and depending on input fuzzy variables, and fuzzyConsequent is an expression of the output variables which are assigned fuzzy terms.
Essentially, fuzzy reasoning is made of four steps, namely fuzzification of input variables, rule evaluation, aggregation of rule outputs and, finally, defuzzification. Fuzzification of input variables converts crisp (realvalued) inputs into fuzzy terms. For each fuzzy rule, Rule evaluation applies such fuzzified inputs to its antecedents, making use of a fuzzy operator in case of multiple antecedents, with the final aim of generating a single value indicating its degree of activation. Such a value is, then, used in the implication to infer the conclusion of the rule. Aggregation combines the membership functions of all rule consequents previously evaluated in order to generate a single fuzzy set as output. Finally, defuzzification determines the best representative crisp value of this aggregated output fuzzy set.
Fuzzy modeling is the task of determining the parameters of a FIS, classified into the followings [10]:

Logical parameters. They include the shape of the membership functions, the fuzzy logic operators applied for AND, OR, implication, and aggregation operations, and the defuzzification method.

Structural parameters. Related mainly with the size of the fuzzy system, they include the number of variables involved in the rules, the number of membership functions defined for each linguistic variable, and the number of rules.

Connective parameters. Related with the topology of the system, these parameters include antecedents, consequents, and weights of the rules.

Operational parameters. These parameters define the mapping between linguistic and numeric representations of the variables, so characterizing the membership functions of the linguistic variables.
The most usual, and cheapest, way for modeling medical knowledge in fuzzybased DDSSs is asking an expert to write ifthen rules. Moreover, after formalizing the expert's knowledge under the form of rules, the designer and the expert have to choose the shape and location of membership functions for all the linguistic values related to all the variables involved. This step, sure enough, requires both medical expertise and technical intervention along with great effort to identify which among the design choices are suited to the given problem. Alternatively, an emerging solution is represented by data driven fuzzy modeling, that is being widely adopted in different application domains to automatically generate a rule base from data, even if the interpretability is not guaranteed in many situations and redundancy can occur in the rules produced.
According to [11] a fuzzy model is interpretable when: (i) the fuzzy terms associated to a variable (usually indicated as fuzzy partition) are interpretable as linguistic labels, (ii) the rule base is as small as possible, and (iii) the ifpart of each rule does not includes all the independent variables but only a subset of them.
State of the art
A number of fuzzybased DDSSs has faced up these challenges, and has attempted to address the subjects of knowledge acquisition, representation, and utilization in medical diagnosis.
In [12], a rulebase selfextraction and simplification method is proposed, devised to establish interpretable fuzzy models from numerical data. A fuzzy clustering technique associated with a fuzzy partition validity index is used to extract the initial fuzzy rulebase and find out the optimal number of fuzzy rules. To reduce the complexity of fuzzy models while keeping good model accuracy, some approximate similarity measures are presented and a parameter finetuning mechanism is introduced to improve the accuracy of the simplified model. Experimental results are reported with respect to different case studies, such as function approximation, dynamical system identification and mechanical property prediction for hot rolled steels. These testcases are characterized by a relatively small number of inputspace variables. No experimental test has been reported for problems characterized by a high number of inputspace variables.
In [13], an evolving hierarchical fuzzy system based on probabilistic incremental program evolution is presented. The use of hierarchical fuzzy systems allows to limit both the number of rules and the number of fuzzy operations with respect to single level systems. Worthy results are described for case studies concerning non linear system identification, such as MakeyGlass chaotic time series prediction, and the Iris and Wine classification.
In [14] a datadriven innovative approach is presented for generating a fuzzy rule based decision support system for the diagnosis of coronary artery disease. The implemented methodology relies on four stages.
In the first stage, a decision tree is induced from the dataset, while in the second stage, a set of rules is extracted from it. This set of rules is in Disjunctive Normal Form (DNF) and involves crisp variables modeling neat and clearcut quantities. It can be used as a whole to classify new incoming data coherently with the knowledge embedded into the initial dataset. In the third stage, the crisp model is fuzzified, i.e., the crisp rules are transformed into fuzzy ones, using a fuzzy membership function instead of the crisp one and definitions of S and T norms. Finally, in the fourth stage, the parameters entering the fuzzy model are adapted using a global optimization technique.
In [15] a generic methodology is presented for the automated generation of fuzzy models. The methodology is realized in three stages. Initially, a crisp model is created whereas, in the second stage it is transformed into a fuzzy one. In the third stage, all parameters entering the fuzzy model are optimized. A specific realization of this methodology is implemented, using decision trees for the creation of the crisp model, the sigmoid function, the minmax operators and the maximum defuzzifier for the transformation of the crisp model into a fuzzy one, and four different optimization strategies, including global and local optimization techniques, as well as hybrid approaches.
In [16] a generic approach to the design of interpretable datadriven fuzzy models, which can be used in the construction of DDSSs, is proposed. The approach addresses several design steps, including fuzzy partitioning, rule learning, variable selection and rule base simplification. The fuzzy partitioning step consists in generating a collection of fuzzy partitions of various sizes from two to a userdefined maximum value, based upon indices or an objective function. The rule learning step includes two categories of methods, namely region based methods and prototype based ones. The rule base simplification merges some rules into a more generic incomplete rule, where some variables (one or more) appear in some rules. The variable selection determines the number of terms for a given variable necessary to get a good rule base, in terms of tradeoff between its complexity and accuracy, measured by performance indexes.
The contribution of the work
Recently, we informally introduced [17, 18] a methodology to design and develop a fuzzybased DDSS for medical classification problems by extracting fuzzy knowledge from data. In this work, we first propose a formalization of a refined and assessed version of such a methodology, which essentially specifies a flow of stages needed to develop a fuzzybased DDSS as well as the characteristics of the input and output produced and consumed in the different stages. As a result, it formalizes the role assumed by each stage in terms of its interface, but it does not provide any indication about how the single activities have to be done in the form of strategies to be adopted or algorithms to be applied, especially because this choice is strictly linked to the specific application domain. This issue is further motivated by the fact that the methodology, whose strength relies on its generality and modularity, has been thought as a basis for the development process of fuzzybased DDSSs by supporting the integration of alternative techniques in each of its stages. Both the generality and flexibility make it applicable to almost any medical classification domain and, also, enable the possibility to test the efficiency of different methods in order to detect their best integration with respect to specific classes of problems.
The methodology has been realized in the form of a modular and portable architecture according to a ComponentBased Software Development (in the following, CBSD), with the aim of defining a collection of components customizable or extensible by existing available solutions that are compatible to the original placeholders.
The architecture has been developed in Java according to the objectoriented paradigm in order to create a truly portable DDSS, solving the problem of having parts of it implemented for different platforms. The resulting architecture can be considered as wellsuited for almost any medical domain where the real world is simulated in a broad sense and a diagnosis in terms of classification is required.
As a proof of concept, such an architecture has been used to instantiate a DDSS example aimed at accurately diagnosing breast masses starting from the widely used Wisconsin Breast Cancer Dataset (in the following, WBCD) to evaluate the feasibility of the methodology.
The manuscript is structured as follows: in Methods, the proposed methodology is formally described and the choices and techniques which are identified for the specific realization are analyzed. Furthermore, the design approach used for the development of the proposed architecture is introduced. In Results, the architecture designed is diffusely described and a summary inspection of the main components is reported as well. Moreover, the proof of concept DDSS for validating the methodology is introduced and its results are discussed. Finally, conclusions and ongoing activities are outlined in the last section as closure to the paper.
Methods
Our methodology includes six stages, where the first three stages work with crisp models, whereas the last three ones are employed on fuzzy models (Figure 2).
Rule extraction
The Rule Extraction stage is essentially devised to the extraction, from a specific input dataset, of a collection of ifthen rules constituting the crisp rule base (in the following, CRB), specifically represented in a weighted Disjunctive Normal Form (wDNF). We work with this representation because of its high degree of compactness and knowledge synthesis. The CRB, thus, is a disjunctive system of crisp rules, where at most one rule must be satisfied by an item of the initial dataset, i.e. the rules are linked by mutually exclusive or connectives. More formally, each crisp rule ${r}_{ij}^{c}$ (note that the superscript c is used to label a rule as crisp) in the CRB denotes the i^{ th } rule which predicts the j^{ th } class, with j = 1... M and i = 1... N_{ j }, where M is the total number of classes and N_{ j } is the total number of rules that predict the j^{ th } class, respectively. As a result, in general, there could be one or more than one rule for each class predicted, while each class is likely to be covered by at least one rule in the CRB.
where ${x}_{ijk}^{c}$ is the specific feature selected from the set X^{ c }, $o{p}_{ijk}^{c}$ is a comparison operator selected from the sets {=, ≠} and {<, >, ≤, ≥} in the cases of categorical and numerical features, respectively, and ${v}_{ijk}^{c}$ represents a categorical value or a crisp numerical threshold.
This DNF is labeled as weighted since each rule is associated with a degree of relevance, such as its coverage or accuracy, with respect to its predicted class depending on the domain of application and the specific dataset concerning this domain.
Different solutions can be adopted to extract rules, ranging from purely logical approaches to statisticallybased ones or relying on artificial neural networks, genetic algorithms and on nonconnectionist machine learning (e.g. decision trees) [19]. Independently from the specific method used, it is relevant to point out that it is possible to extract rules able to correctly classify an item in the dataset from its known features, i.e. every item in the input dataset is covered by exactly one rule in the CRB, but without avoiding the possibility of overfitting the input dataset which can be characterized by a degree of uncertainty. This uncertainty may arise from two different sources. The first is mismeasurement, i.e., for a variety of reasons, the value of a feature or class may be incorrectly measured. The second source of uncertainty is the occurrence of extraneous factors not recorded, but affecting the results so that the class of an item in the dataset cannot be determined wholly from its recorded features. The resulting CRB extracted in these situations tends to be very large and many rules reflect particular items in the training dataset which are very unlikely to occur in further examples, i.e. they cover a very small part of the input space, are matched only by a few examples, lack generality and can become counterproductive. This issue represents the motivation for the second stage, i.e. Selection.
Selection
The Selection stage is in charge of determining the sufficient number of rules, as are necessary to get a good CRB with respect to the specific dataset concerning the domain under observation, where a good CRB represents a reasonable tradeoff between complexity, determined by the number of rules, and accuracy, measured by appropriate performance indexes. The selection is done with the idea of granting two main factors emerged as primary determinants of interpretability. First, the number of rules should be small so involving that a full set of complete rules should be avoided since it can quickly lead to a combinatorial explosion when the number of features rises [16]. The second condition is strictly linked to the first one and is specific to complex systems with a large number of features: rules must not systematically include all the features, but only the important ones in the context of the rule, so generating the often called incomplete rules [16].
Different methods can be applied to perform the rule selection and, thus, implicitly also the variable selection, each of them exploiting adhoc heuristics guided by userdefined indicators (such as the final number of rules to be chosen, from a minimum of one for each predicted class to a maximum which can be properly indicated by the user) and suitable with respect to the different applications and their specific requirements. For example, the choice of the good CRB can be piloted by a longestmatch criterion, i.e. depending on the longest lefthandside (LHS) of the rules that match an item of the dataset. Its rationale is based on the conclusion that longer antecedent part will contain more accurate and richer information for the final classification than the shorter ones. Differently, a most confident selection could be adopted, by identifying the rules with the highest confidence as the best one, where the longer rule is chosen in case of a tie. The rationale of this criterion is based on the assumption that the testing dataset will share the same characteristics as the training dataset used in the Rule Extraction. So if a rule has a high confidence in the training data, then this rule will also show a high confidence in the testing data, which means the class predicted by that rule will be most likely to occur next.
Summarizing, this stage has a double relevance in terms of the functionalities offered: first, it performs the rule selection to achieve a smaller number of more general rules with the idea that it may have greater predictive accuracy on unseen data, at the expense of no longer correctly classifying some of the items in the original training dataset; then, it implicitly carries out also a feature selection leading to an incomplete rule base which takes into consideration only those features that are really required since included into the significant rules previously selected.
It is worth noting that the elimination of features in order to obtain incomplete rules could be undertaken at the extraction level since the Rule Extraction step can remove features from the whole rule, for example using statistical indexes. Differently, this stage is intended to select only a subset of the features instead of the whole set of candidate ones for other reasons: i) it can be cheaper to measure only a reduced set of features; ii) prediction accuracy may be improved through exclusion of poorly significant features; iii) the final DDSS to be built might be simpler and potentially faster when fewer input variables are used; iv) knowing which features are more relevant can give insight into the nature of the classification problem and allow a better understanding of the final DDSS. At this point, in the context of the same rule r_{ ij } given as output after the Selection, with respect to numerical features, different predicates can contain the same feature selected from the set X^{ c }, i.e. ${x}_{ijl}^{c}={x}_{ijk}^{c}$ with l, k ∈ [1, ..., K_{ ij }] and l ≠ k. This consideration represents the motivation for the third stage, Reduction,
Reduction
After applying this simplification procedure to every couple of candidate predicates, each resulting predicate will be formulated according to one of the forms defined in equations 46. A clarifying note has to be reported about the categorical features. As they assume mutual exclusive values and, in each rule, the antecedent predicates are admitted to be connected only by means of conjunction operators, it is not possible (since meaningless) at all that two different predicates might contain the same feature assuming different values in the context of the same rule. Thus, the Reduction does not involve the categorical features in its simplification procedure.
Fuzzification
After the first three stages, a crisp model made of rules based on clearcut boundaries has been generated in accordance with the fuzzifiability property. Successively, starting from the crisp model produced, the generation of the fuzzy model begins with the fourth stage, named Fuzzification.
In more detail, in order to create fuzzy variables and terms, in this stage, crisp rules are translated into a corresponding fuzzy version, where every feature contained in the crisp rules is associated to a linguistic variable. It is relevant to mention that only features determined relevant to the classification by means of the previous stages are fuzzified. Of course, also the predicted class is associated to a linguistic variable. Successively, each linguistic variable is further characterized by a set of terms subjectively describing it. With respect to this issue, the degree of detail to be used in partitioning the universe of discourse of each variable, i.e. number of linguistic terms to be referred to it, has been chosen in accordance with the crisp model. Summarizing, the fuzzy rule base (in the following, FRB) achieved in this stage continues to be a disjunctive system of rules. Nevertheless, differently from the crisp case, where exactly one rule must be satisfied, the fuzzy rules are linked by simple OR connectives, where at least one rule must be satisfied, i.e. one or more rules may be weakly or strongly activated simultaneously.
where ${v}_{ijk}^{f}$ represents a (fuzzified) numerical interval in the case ${x}_{ijk}^{f}$ is a fuzzy variable associated to a numerical feature or a (fuzzified) categorical value in the case ${x}_{ijk}^{f}$ is fuzzy variable associated to a categorical feature. Finally, also the FRB is weighted since each fuzzy rule is associated with the same degree of relevance pertaining the crisp rule it has been generated from.
Different methods can be applied to fuzzify crisp rules [14], [15], [20], [21], ranging from solutions which, on the one hand, exploit the symbolic structure of the crisp rules to generate fuzzy variables and terms to be inserted into the predicates of the corresponding fuzzy rules and, on the other hand, soften the sharp crisp thresholds to minimise continuous terms close to the decision boundaries from misclassification, to approaches where the crisp rule structure is only used to define fuzzy variables and terms and the sharp thresholds are not taken into account at all into the fuzzification process.
FIS configuration
Connective and structural parameters of the FIS (to be generated) have been thus defined in the previous stage. Differently, in this stage, the most appropriate logical parameters to be used in the FIS have to be determined by the designer based on experience and depending on the domain characteristics. Typical choices for the reasoning mechanism are Mamdanitype, TakagiSugenoKang (TKS)type, and Singletontype. Common fuzzy operators are min, max, product, probabilistic sum, and bounded sum. The most common membership functions are triangular, trapezoidal, gaussian and bellshaped. For defuzzification several methods have been proposed with the center of area (COA) and the mean of maxima (MOM) methods being the most popular. Moreover, depending on the typology of reasoning mechanism desired, different Inference Engines can be used, for instance, for supporting the rule chaining [22] or operating in accordance with an IfThenElse rule structure [23].
FIS adaptation
After determining the most appropriate logical parameters, in order to complete the generation of the FIS underpinning the DDSS, the operational parameters have to be identified in the last stage, named FIS Adaptation, in terms of parameters characterizing shape and location of membership functions for all the terms related to all the linguistic variables involved in the FRB.
For what concerns linguistic variables linked to categorical features, the determination of shape and location of their membership functions is very simple. Since a categorical feature is one that has two or more categories, but there is no intrinsic ordering to them, they can be modeled as independent singular singletons. Referring to linguistic variables linked to numeric features, a tuning process considering the whole FRB obtained has to be used a posteriori to adjust the membership function parameters. The classic way to refine the membership functions is to adjust through slight modifications their parameters in order to find the local or global minimum of a mono/multi objective function F(x) opportunely defined, which takes into account specific indexes modeling at least three characteristics [24] a DDSS should possess.
First, the performance of a DDSS in performing a diagnosis can be evaluated with reference to the correct classification rate (CR), even if the system should jointly provide also a numerical value (the confidence χ) indicating its confident in the outcome produced. Furthermore, a DDSS should provide the physicians with the possibility of deeply understanding how this outcome has been generated (interpretability), in order to increase its trustworthiness and not to appear as a black box that produces unintelligible outputs. It is worth noting that these characteristics can often result conflicting.
This tuning process can be implemented using parameter adjustment algorithms, such as Neural Networks algorithms, and in most cases, the gradient of a cost function with respect to each adjustable parameter can be calculated and the parameters can be updated accordingly. There are also some derivativefree optimization such as Genetic Algorithms and Random Search Methods. As a concluding remark, it is worth noting that in this stage not only the operational parameters can be adapted, but also the weights of each rule in the FRB, which have been previously extracted in the first stage and associated to the fuzzy rules in the fourth stage, can be refined. Such a way, it is possible to induce a better cooperation among rules and to more accurately modulate the firing strength of a rule in the process of determining the output class.
Implementing the methodology
The proposed methodology has been realized in the form of a modular and portable architecture according to a CBSD approach, with the aim of defining a collection of replaceable (and reusable) components characterized by a functional cohesion (i.e. the single component performs a welldefined set of functions) and a low degree of coupling in terms of composition and interaction between them.
The choice of a CBSD approach is based on the idea that each single component can be not only implemented from scratch but, in particular, also customized or extended by existing available solutions that are compatible to the original placeholders. Such a way, the CBSD can significantly reduce development effort and timetomarket, and improve maintainability, reliability and overall quality of final architecture designed. The architecture is developed in Java according to the objectoriented paradigm in order to create a truly portable DDSS, solving the problem of having parts of it implemented for different platforms.
In the following section, the architecture designed will be diffusely described and a summary inspection of the main components in terms of UML class diagrams will be reported as well.
Results
The proposed architecture
The architecture provides for three different layers: a Process Layer, which coordinates the activity flow foreseen by the methodology in terms of two loosely coupled subprocesses, a Method Layer which handles highly cohesive and welldefined operations (from a functional perspective) to be done in each subprocess, and a Model&Data Layer which manages the data structures used to read and store crisp and fuzzy knowledge bases as well as data repositories. Moreover, in accordance with the CBSD approach, the architecture highlights a transversal layer, shared between the Method Layer and the Model&Data Layer, which reflects the idea that each single component at these two layers can be implemented not only from scratch but also by reusing or customizing existing available solutions, such as data mining software packages (e. g. WEKA [25]) or fuzzy logic libraries (e. g. jFuzzyLogic [26] or XFuzzy [27]).
The definition of the components for the process layer has been guided by a topdown approach, where the methodology, which can be seen as a high level business process, has been divided into two smaller, more manageable and loosely coupled subprocesses. The need for a process modularization has been pushed by different motivations. First, the proposed methodology is large and complex, and it becomes very difficult to navigate, understand, debug issues and track changes in its implementation. Moreover, referring to the stages of the methodology, the clear and welldefined separation between the crisp and fuzzy domains suggests a natural strategy of modularization from a logical and functional perspective. As a result, the methodology has been modularized in two subprocesses, one for each domain, so as to contextually balance the need of performance and manageability versus the need of reuse as well. The first subprocess, handled by the Crisp Rule Generator, is devised to generate a set of fuzzifiable crisp rules starting from an input dataset, whereas the second one, managed by the FIS Generator, is thought to produce a whole FIS starting from the output of the first subprocess. Both these components assume the role of coordinators with respect to the activity flow foreseen by each single subprocess. They coordinate all the components of the method layer by activating their functionalities, examining and validating their results, and continuing the handling of the respective subprocess accordingly.
The Method Layer contains the building block components for implementing each single task foreseen in both the crisp and fuzzy subprocesses. These components are implemented on top of the Model&Data Layer in terms of a collection of modules which accesses the respective knowledge bases and data repositories, elaborates such an information and stores the results into the knowledge bases again. Moreover, all these components can be opportunely specialized in order to support different criteria, also by wrapping or utilizing existing tools.
The Crisp Rule Selector (Figure 5, right) is responsible of getting the extracted rules from the Crisp Knowledge Base, determining the best n rules (n is a userdefined parameter) to be successively used with respect to a specific criterion and finally storing such a new rule set into the Crisp Knowledge Base again. Since the criterion applied for the selection might require the identification of the set of rules with the highest confidence, such a component can interact with the Crisp Inference Engine to evaluate how each rule works on the dataset gathered from the Data Repository.
Last, the Crisp Reductor (Figure 5, center) is devised to get the selected rules from the Crisp Knowledge Base, simplify their structure in order to grant the fuzzifiability and finally store the reduced rules into the Crisp Knowledge Base again.
The components of the method layer described until this point cooperate among them to model the first subprocess, and, thus, they are all coordinated by the Crisp Rule Generator. The remaining components belonging to the method layer act together for bringing in realization the second subprocess, and, thus, they are all coordinated by the FIS Generator.
Referring to the fifth stage of the methodology, the component involved is the Fuzzy Configurer (see Figure 6, center), which is responsible of setting the most appropriate logical parameters to be used for the construction of the final FIS, opportunely determined by the designer based on experience and depending on the domain characteristics. Moreover, depending on the typology of reasoning mechanism desired, the specific Fuzzy Inference Engine, aimed at performing a fuzzy inference to obtain a fuzzy output, is also configured and instantiated.
The component engaged for the realization of the last stage of the methodology is the Fuzzy Adapter (Figure 6, right), which is mainly devised to first get the partially defined FIS from the Fuzzy Knowledge Base, successively tune its logical parameters, in terms of parameters characterizing shape and location of membership functions, and rule weights, and finally store the completely defined FIS into the Fuzzy Knowledge Base again. Such a component strictly interacts with the Fuzzy Inference Engine since every optimization strategy requires the evaluation about how each fuzzy rule works on the dataset gathered from the Data Repository.
Finally, the Model&Data Layer is responsible for the management of the structures for inserting and gathering information, respectively into and from both the Crisp and Fuzzy Knowledge Bases and, in addition, for accessing the diskbased data structures used by the data repositories. In particular, with respect to the data format in the repositories, commaseparated values (CSV) are used to store tabular data (numbers and text) in a plaintext form. In particular, the first row contains the attribute names (separated by commas) followed by each data row with attribute values listed in the same order (also separated by commas). This choice is due to the fact that many data repositories or spreadsheet applications save or export data into flat files in this format.
Proof of concept: a DDSS for diagnosing breast masses
The architecture has been used to instantiate a DDSS example aimed at accurately diagnosing breast masses starting from the widely used Wisconsin Breast Cancer Dataset (in the following, WBCD) to evaluate the feasibility of the methodology.
This dataset was computed from fine needle aspiration (FNA) of a breast mass through image processing and was collected at the University of Wisconsin; it can be obtained from UCI (University of California at Irvine) machine learning repository. The samples contain features describing characteristics of the cell nuclei present in the image. The version of WBCD used consists of 10 features obtained from FNA, namely radius, texture, perimeter, area, smoothness, compactness, concavity, concave points, symmetry, fractal dimension. Each feature is represented with 3 values, namely the mean, standard error and the worst or largest (mean of the three largest values), but only the mean value was taken into account. The two outputs are benign and malignant. All the instances were properly recorded without any missing value. The diagnosis class is distributed with 357 benign samples and 212 malignant samples.
The architecture was instantiated for creating the DDSS as described in the following. Preliminary, it is worth noting that the tenfold cross validation method was used for the assessment of such a DDSS and the classification rate was chosen as metric to evaluate the goodness of the final results achieved. Such a way, the whole methodology was tested for its validation, since the goodness of the final results was considered as proof of feasible and efficient integration of different methods according to its activity flow in order to obtain fuzzybased DDSSs. As a result, the DDSS is described below with respect to each stage of the methodology, by reporting the partial results calculated only for a fold for the sake of brevity.
where ${t}_{I}^{f},\phantom{\rule{2.77695pt}{0ex}}{t}_{II}^{f}$ and ${t}_{III}^{f}$ correspond to linguistic terms (e.g. low, medium and high) linearly ordered and with boundary regions between successive terms. It is relevant to highlight that the semantics associated to each linguistic term obtained in such a way is strictly applicationdependent and is meaningful only in the context of the rule for which it has been defined. As a result, this method defines as many terms for each linguistic variable as the different crisp predicates which use the corresponding feature when examining rule by rule. Moreover, in such a way, the crisp boundaries expressed in each predicate for the numeric features involved are discharged. Thus, on the one hand, the same fuzzy predicate can be generated from two different instances of a specific form of crisp predicate appearing in two different rules. On the other hand, by working at rule level, even if, in more than one rule, many crisp predicates share a same numeric feature with different crisp boundaries, its universe of discourse is not further partitioned in order to consider the possible subpartitions identified by the overlapping of different crisp intervals. So, any relationship that could exist between the rules in terms of shared features is not taken into account. Both these methodological choices, which can appear as a simplification and a loss of generality, are guided by the consideration that defining a justifiable, i.e. limited, number of terms for each variable is another determinant of interpretability [16]. Indeed, taking into account all the possible overlapping of different crisp predicates in terms of numeric intervals, by observing the whole CRB, in order to deeply partitioning a shared feature, would generate an overfitting of its universe of discourse, compromising the overall interpretability.
Furthermore, in the case of categorical features, the method generates a collection of as many terms for each linguistic variable as the crisp predicates occurring in the whole rule set which assign a different categorical value to the feature used.
For what concerns the predicted class, since it is also a categorical variable, the number of terms associated to the corresponding linguistic variable is assumed to be equal to the number of different values the predicted class can assume.
As a result, the fuzzification of the reduced rules generated a linguistic variable for every feature appearing in their antecedent parts, namely perimeter, texture, concave points and radius, each of them characterized by at most two partitions for the corresponding Universe of Discourse. Moreover, the output class, i.e. diagnosis, was also modeled as a linguistic variable assuming two specific values, namely malignant and benign. The resulting fuzzified rules are outlined in Figure 9, bottom), where linguistic variables and their terms are indicated with their first letter capitalized.
Concerning the generation of the other parameters pertaining the final FIS underpinning the DDSS, the terms Low and High generated after the fuzzification for the input variables were modeled with piecewise linear membership functions, whereas the terms Malignant and Benign for the output variable were represented as singletons. The Singletontype reasoning mechanism was used, where min and max operators were chosen as Tnorm and Snorm functions, respectively. Moreover, the min operator and the center of gravity singleton methods were applied for implementing the implication and defuzzification functions, respectively. A notchained Fuzzy Inference Engine was used by wrapping the implementation given within the JfuzzyLogic tool.
Figure 13 reports the results achieved in terms of fuzzy rules extracted, linguistic variables and terms involved in the rules and, for each term, the corresponding membership function optimized. Moreover, such results are also enriched by means of the values assumed by the quantitative metrics used for the validation.
Conclusions
Having in mind to extend the range of possible users of fuzzybased DDSSs with extensive and easytouse facilities which could considerably reduce the level of knowledge and experience required to their design and realization, this work has first presented a formalization of a refined and assessed version of a sixstep methodology to design and implement fuzzybased DDSSs. Its strength relies on its generality and modularity since it supports the integration of alternative techniques in each of its stages. Stages are employed for: (i) the extraction of crisp rules, (ii) the selection of a significant partition from the whole rule set extracted, (iii) the reduction of the selected rule set, (iv) the creation of fuzzy rules, (v) the generation of the whole fuzzy inference system and (vi) its optimization.
Differently from other existing approaches [14], [15], the described methodology is extremely flexible and does not depend on the typology of fuzzy model to be defined, since it enables the design and realization of fuzzybased DDSSs by taking into account many different and often conflicting requirements, such as the accuracy maximization or the complexity minimization. In more detail, it can not only integrate stateoftheart ruleinduction and ruleoptimization methods, but also freely choose the structural and operational elements of the fuzzy model to be used, such as shape of membership functions or the tnorm and snorm connectors as well as the implication and defuzzification operators. Moreover, methods that use shared fuzzy sets for the rule base [16] are appropriate within a small size work space with a good coverage. Otherwise, in case of a weak coverage the rule base completeness is not guaranteed and, when dealing with large systems, the number of combinations to manage is huge [16]. On the contrary, the proposed methodology is well adapted for large work spaces and generates more compact incomplete rules with only the most locally significant variables, defined successively with a partitioning strictly dependent on the rules where they are involved.
The presented methodology has been realized, according to a CBSD approach, in the form of a modular and portable architecture that has been carefully described from a software engineering perspective.
This architecture has been conceived to support the design of a fuzzybased DDSS on increasing levels of abstraction, thereby partitioning the overall design problem into several subproblems, where each single component at every layer can be implemented from scratch or customized by existing available solutions. Such a way, it can significantly reduce development effort and timetomarket, and improve maintainability, reliability and overall quality of final CDSS designed.
The development of this architecture has been last carried out by using the Java language since it contains several features that argue for it. It is widely distributed and has become one of the major programming languages. The development kit, including compiler and debugger, is freely available on a number of different computer platforms. The core libraries contain many functions which can be used directly and need to not be adopted from external libraries, which is not the case in C++ for instance. By exploiting Java features and diffusion in the usercommunity, the proposed architecture has several unique advantages, e.g. it reduces programming work. Thanks to the huge amount of available Java software, in fact, it is really easy creating new methods to be added to the Method Layer of the architecture without the effort of starting from scratch. Such a way, it is possible to exploit the richness of quickly incorporating new developments made by the active research community which is always working in emerging fields. Moreover, due to the use of a strict objectoriented approach for the its components, the architecture can be used on any machine with Java. Indeed, the concept of modularity of code is highly essential to increase the level of portability. As a result, any user can apply the architecture to implement a DDSS on his machine, independently of the operating system.
As a proof of concept, such an architecture has been used to instantiate a DDSS example aimed at accurately diagnosing breast masses starting from the widely used Wisconsin Breast Cancer Dataset. The results obtained in terms of classification rate proved the feasibility of the whole methodology implemented in terms of the architecture proposed.
For what concerns the ongoing work, knowledge representation techniques such as ontology modeling are investigated to be exploited in order to better define from a semantic point of view the fuzzy variables and terms involved into the rules and improve the readability and understandability of the whole fuzzybased DDSS. The choice of using Java as programming language will be able to facilitate this integration since the most representative tools in the context of knowledge engineering are implemented in Java and released as open source projects. Moreover, since DDSSs, however, typically have unequal classification error costs so that straight CR cannot be assumed as a careful measure of the goodness of a DDSS, in the future, also the confidence χ will be evaluated to be used for selecting a DDSS; in fact, a good DDSS should be highly confident with correctly classified examples while it should be doubtful with misclassified data points. In such a direction, also more sophisticated adaptation techniques able to optimize multiobjective cost functions will be integrated, so taking into account simultaneously CR, the confidence χ and the interpretability. The last important point for future work is to integrate the multithreading and distributed computing to speed computations up during the definition and the adaptation of the fuzzybased DDSS by using widely available multiprocessors and multicore hardware.
Declarations
The publication costs for this article were funded by the MERIT project.
This article has been published as part of BMC Bioinformatics Volume 14 Supplement 1, 2013: Computational Intelligence in Bioinformatics and Biostatistics: new trends from the CIBB conference series. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/14/S1.
Declarations
Acknowledgements
This research has been partially supported by the FLAGSHIP InterOmics project (PB.P05, funded and supported by the Italian MIUR and CNR organizations) and by the MERIT project (funded by the Italian MIUR).
Authors’ Affiliations
References
 Podgorelec V, Kokol P: Towards more optimal medical diagnosing with evolutionary algorithms. J Med Syst. 2001, 25 (3): 195219. 10.1023/A:1010733016906.View ArticlePubMedGoogle Scholar
 Klir GJ, Yuan B: Fuzzy sets and fuzzy logic: theory and applications. 1995, Upper Saddle River, NJ, USA: PrenticeHall, IncGoogle Scholar
 Szolovits P, Patil RS, Schwartz WB: Artificial Intelligence in medical diagnosis. Annals of Internal Medicine. 1988, 108: 8087. 10.7326/00034819108180.View ArticlePubMedGoogle Scholar
 Seising R: From vagueness in medical thought to the foundations of fuzzy reasoning in medical diagnosis. Artif Intell Med. 2006, 38 (3): 237256. 10.1016/j.artmed.2006.06.004.View ArticlePubMedGoogle Scholar
 Estévez J, Alayón S, Moreno L, Sigut J, Aguilar R: Cytological image analysis with a genetic fuzzy finite state machine. Comput Methods Programs Biomed. 2005, 80 (Suppl 1): S3S15.View ArticlePubMedGoogle Scholar
 Anuradha B, Reddy V, Veera C: Cardiac arrhythmia classification using fuzzy classifiers. Journal of Theoretical and Applied Information Technology. 2008, 4 (4): 353359.Google Scholar
 Alayón S, Robertson R, Warfield SK, RuizAlzola J: A fuzzy system for helping medical diagnosis of malformations of cortical development. J Biomed Inform. 2007, 40 (3): 221235. 10.1016/j.jbi.2006.11.002.PubMed CentralView ArticlePubMedGoogle Scholar
 Zadeh LA: Is there a need for fuzzy logic?. Inf Sci. 2008, 178 (13): 27512779. 10.1016/j.ins.2008.02.012.View ArticleGoogle Scholar
 Siagian R: Introducing study fuzziness in complexity research. Journal of Social Complexity. 2003, 1 (2): 3644.Google Scholar
 PenaReyes CA, Sipper M: A fuzzygenetic approach to breast cancer diagnosis. Artificial Intelligence in Medicine. 1999, 17 (2): 131155. 10.1016/S09333657(99)000196.View ArticlePubMedGoogle Scholar
 Guillaume S: Designing fuzzy inference systems from data: an interpretabilityoriented review. Trans Fuz Sys. 2001, 9 (3): 426443. 10.1109/91.928739.View ArticleGoogle Scholar
 Chen MY, Linkens DA: Rulebase selfgeneration and simplification for datadriven fuzzy models. Fuzzy Sets and Systems. 2004, 142 (2): 243265. 10.1016/S01650114(03)00160X.View ArticleGoogle Scholar
 Chen Y, Yang B, Abraham A, Peng L: Automatic design of hierarchical TakagiSugeno type Fuzzy systems using evolutionary algorithms. IEEE T Fuzzy Systems. 2007, 15 (3): 385397. [http://dx.doi.org/10.1109/TFUZZ.2006.882472]View ArticleGoogle Scholar
 Tsipouras MG, Exarchos TP, Fotiadis DI, Kotsia AP, Vakalis KV, Naka KK, Michalis LK: Automated diagnosis of coronary artery disease based on data mining and Fuzzy modeling. IEEE Trans Inf Technol Biomed. 2008, 12 (4): 447458.View ArticlePubMedGoogle Scholar
 Tsipouras MG, Exarchos TP, Fotiadis DI: A methodology for automated fuzzy model generation. Fuzzy Sets Syst. 2008, 159 (23): 32013220. 10.1016/j.fss.2008.04.004.View ArticleGoogle Scholar
 Guillaume S, Charnomordic B: Learning interpretable fuzzy inference systems with FisPro. Inf Sci. 2011, 181 (20): 44094427. 10.1016/j.ins.2011.03.025.View ArticleGoogle Scholar
 d'Acierno A, De Pietro G, Esposito M: Data driven generation of fuzzy systems: an application to breast cancer detection. Proceedings of the 7th international conference on Computational intelligence methods for bioinformatics and biostatistics. 2011, CIBB'10, Berlin, Heidelberg: SpringerVerlag, 203214. [http://dl.acm.org/citation.cfm?id = 2032098.2032120]View ArticleGoogle Scholar
 d'Acierno A, Esposito M, De Pietro G: Eliciting fuzzy knowledge from the PIMA dataset. Biological Wikis, Proceedings of the Joint NETTABBBCC 2010 Workshops. Edited by: Facchiano A, Romano P, Aracne Editrice Srl. 2010, 8589.Google Scholar
 Triantaphyllou E: Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques. 2009, Berlin, Heidelberg: SpringerVerlagGoogle Scholar
 Crockett KA, Bandar Z, McLean D, O'Shea J: On constructing a fuzzy inference framework using crisp decision trees. Fuzzy Sets and Systems. 2006, 157 (21): 28092832. 10.1016/j.fss.2006.06.002.View ArticleGoogle Scholar
 Hsu SC, Hsu JYj, Chiang IJ: Automatic generation of Fuzzy control rules by machine learning methods. Proceedings of the 1995 IEEE International Conference on Robotics and Automation. 1995, Nagoya, JapanGoogle Scholar
 Hall L: Rule chaining in fuzzy expert systems. Fuzzy Systems, IEEE Transactions on. 2001, 9 (6): 822828. 10.1109/91.971731.View ArticleGoogle Scholar
 Russo F, Ramponi G: A fuzzy operator for the enhancement of blurred and noisy images. IEEE Trans Image Process. 1995, 4 (8): 11691174. 10.1109/83.403425.View ArticlePubMedGoogle Scholar
 Ghazavi SN, Liao TW: Medical data mining by fuzzy modeling with selected features. Artif Intell Med. 2008, 43 (3): 195206. 10.1016/j.artmed.2008.04.004.View ArticlePubMedGoogle Scholar
 Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH: The WEKA data mining software: an update. SIGKDD Explor Newsl. 2009, 1018. 11
 Cingolani P: jFuzzyLogic. [http://jfuzzylogic.sourceforge.net/html/index.html]
 Baturone I, Barriga A, SánchezSolano S, JiménezFernández CJ, López DR: Microelectronic design of fuzzy logicbased systems. 2000, Boca Raton, FL, USA: CRC Press, IncGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.