Skip to main content

Advertisement

Table 3 Performance of MicroPIE with Character Predictor

From: Microbial phenomics information extractor (MicroPIE): a natural language processing tool for the automated acquisition of prokaryotic phenotypic characters from text sources

Character Extraction methods # of GSM Values # of MicroPIE Output Values P R F1 Relaxed_P Relaxed_R Relaxed_F1
%G + C linguistic rules N 90 96 0.91 0.97 0.94 0.91 0.97 0.94
Cell Shape term matching S 125 166 0.49 0.65 0.56 0.64 0.84 0.73
Cell Diameter linguistic rules N 14 18 0.67 0.86 0.75 0.72 0.93 0.81
Cell Length linguistic rules N 68 68 0.89 0.89 0.89 0.93 0.93 0.93
Cell Width linguistic rules N 56 58 0.91 0.95 0.93 0.93 0.96 0.95
Cell Relationships & Aggregations term matching S 25 27 0.72 0.78 0.75 0.82 0.88 0.85
Gram Stain Type term matching S 64 62 1.00 0.97 0.98 1.00 0.97 0.98
External Features term matching S 23 21 0.55 0.50 0.52 0.62 0.57 0.59
Internal Features term matching S 63 56 0.78 0.69 0.73 0.91 0.81 0.86
Motility term matching S 76 77 0.71 0.72 0.71 0.84 0.86 0.85
Pigment Compounds term matching S 58 51 0.90 0.79 0.84 0.97 0.85 0.91
NaCl Minimum linguistic rules N 44 46 0.74 0.77 0.76 0.80 0.84 0.82
NaCl Optimum linguistic rules N 33 30 0.92 0.83 0.87 1.00 0.91 0.95
NaCl Maximum linguistic rules N 44 46 0.75 0.78 0.77 0.83 0.86 0.84
pH Minimum linguistic rules N 24 24 0.92 0.92 0.92 0.92 0.92 0.92
pH Optimum linguistic rules N 26 27 0.96 1.00 0.98 0.96 1.00 0.98
pH Maximum linguistic rules N 23 24 0.92 0.96 0.94 0.92 0.96 0.94
Temperature Minimum linguistic rules N 58 44 0.89 0.67 0.77 0.89 0.67 0.77
Temperature Optimum linguistic rules N 62 40 1.00 0.65 0.78 1.00 0.65 0.78
Temperature Maximum linguistic rules N 58 44 0.91 0.69 0.78 0.91 0.69 0.78
Aerophilicity term matching S 83 89 0.63 0.68 0.65 0.69 0.74 0.72
Magnesium Requirement for Growth term matching S 4 2 0.50 0.25 0.33 1.00 0.50 0.67
Vitamins and Cofactors Used For Growth term matching S 14 26 0.39 0.71 0.50 0.39 0.71 0.50
Salinity Requirement for Growth linguistic rule + term matching S 42 65 0.58 0.89 0.70 0.60 0.93 0.73
Antibiotic Sensitivity linguistic rule + term matching S 96 84 0.91 0.80 0.85 0.93 0.81 0.87
Antibiotic Resistant linguistic rule + term matching S 64 49 0.96 0.73 0.83 0.96 0.73 0.83
Colony Shape term matching S 102 98 0.97 0.94 0.96 0.98 0.94 0.96
Colony Margin term matching S 43 44 0.89 0.91 0.90 0.96 0.98 0.97
Colony Texture term matching S 69 75 0.85 0.92 0.88 0.86 0.94 0.90
Colony Color term matching S 80 127 0.53 0.84 0.65 0.59 0.93 0.72
Fermentation Products linguistic rules + term matching S 127 141 0.59 0.66 0.62 0.64 0.71 0.67
Other Metabolic Product term matching S 13 56 0.07 0.31 0.12 0.07 0.31 0.12
Pathogenic term matching S 3 3 0.50 0.50 0.50 0.67 0.67 0.67
Disease Caused term matching S 7 11 0.27 0.43 0.33 0.36 0.57 0.44
Pathogen Target Organ term matching S 4 9 0.22 0.50 0.31 0.22 0.50 0.31
Haemolytic & Haemadsorption Properties term matching S 10 7 0.57 0.40 0.47 0.57 0.40 0.47
Organic Compounds Used Or Hydrolyzed term matching S 620 480 0.85 0.66 0.74 0.89 0.69 0.77
Organic Compounds Not Used Or Not Hydrolyzed term matching S 733 468 0.92 0.58 0.71 0.92 0.59 0.72
Inorganic Substances Used term matching S 36 45 0.59 0.74 0.65 0.61 0.76 0.68
Inorganic Substances Not Used term matching S 61 41 0.81 0.54 0.65 0.81 0.54 0.65
Fermentation Substrates Used linguistic rules + term matching S 411 629 0.57 0.88 0.69 0.59 0.91 0.72
Fermentation Substrates Not Used linguistic rules + term matching S 442 475 0.85 0.91 0.88 0.86 0.93 0.89
In total 4098 4049 Total relaxed hit scores 3198.5
  1. Abbreviations: Superscript N numerical character, S string-based/categorical character. The characters with > = 0.8 in Relaxed_F1 score are shown in bold