Skip to main content

Table 3 Performance of MicroPIE with Character Predictor

From: Microbial phenomics information extractor (MicroPIE): a natural language processing tool for the automated acquisition of prokaryotic phenotypic characters from text sources

Character

Extraction methods

# of GSM Values

# of MicroPIE Output Values

P

R

F1

Relaxed_P

Relaxed_R

Relaxed_F1

%G + C

linguistic rules N

90

96

0.91

0.97

0.94

0.91

0.97

0.94

Cell Shape

term matching S

125

166

0.49

0.65

0.56

0.64

0.84

0.73

Cell Diameter

linguistic rules N

14

18

0.67

0.86

0.75

0.72

0.93

0.81

Cell Length

linguistic rules N

68

68

0.89

0.89

0.89

0.93

0.93

0.93

Cell Width

linguistic rules N

56

58

0.91

0.95

0.93

0.93

0.96

0.95

Cell Relationships & Aggregations

term matching S

25

27

0.72

0.78

0.75

0.82

0.88

0.85

Gram Stain Type

term matching S

64

62

1.00

0.97

0.98

1.00

0.97

0.98

External Features

term matching S

23

21

0.55

0.50

0.52

0.62

0.57

0.59

Internal Features

term matching S

63

56

0.78

0.69

0.73

0.91

0.81

0.86

Motility

term matching S

76

77

0.71

0.72

0.71

0.84

0.86

0.85

Pigment Compounds

term matching S

58

51

0.90

0.79

0.84

0.97

0.85

0.91

NaCl Minimum

linguistic rules N

44

46

0.74

0.77

0.76

0.80

0.84

0.82

NaCl Optimum

linguistic rules N

33

30

0.92

0.83

0.87

1.00

0.91

0.95

NaCl Maximum

linguistic rules N

44

46

0.75

0.78

0.77

0.83

0.86

0.84

pH Minimum

linguistic rules N

24

24

0.92

0.92

0.92

0.92

0.92

0.92

pH Optimum

linguistic rules N

26

27

0.96

1.00

0.98

0.96

1.00

0.98

pH Maximum

linguistic rules N

23

24

0.92

0.96

0.94

0.92

0.96

0.94

Temperature Minimum

linguistic rules N

58

44

0.89

0.67

0.77

0.89

0.67

0.77

Temperature Optimum

linguistic rules N

62

40

1.00

0.65

0.78

1.00

0.65

0.78

Temperature Maximum

linguistic rules N

58

44

0.91

0.69

0.78

0.91

0.69

0.78

Aerophilicity

term matching S

83

89

0.63

0.68

0.65

0.69

0.74

0.72

Magnesium Requirement for Growth

term matching S

4

2

0.50

0.25

0.33

1.00

0.50

0.67

Vitamins and Cofactors Used For Growth

term matching S

14

26

0.39

0.71

0.50

0.39

0.71

0.50

Salinity Requirement for Growth

linguistic rule + term matching S

42

65

0.58

0.89

0.70

0.60

0.93

0.73

Antibiotic Sensitivity

linguistic rule + term matching S

96

84

0.91

0.80

0.85

0.93

0.81

0.87

Antibiotic Resistant

linguistic rule + term matching S

64

49

0.96

0.73

0.83

0.96

0.73

0.83

Colony Shape

term matching S

102

98

0.97

0.94

0.96

0.98

0.94

0.96

Colony Margin

term matching S

43

44

0.89

0.91

0.90

0.96

0.98

0.97

Colony Texture

term matching S

69

75

0.85

0.92

0.88

0.86

0.94

0.90

Colony Color

term matching S

80

127

0.53

0.84

0.65

0.59

0.93

0.72

Fermentation Products

linguistic rules + term matching S

127

141

0.59

0.66

0.62

0.64

0.71

0.67

Other Metabolic Product

term matching S

13

56

0.07

0.31

0.12

0.07

0.31

0.12

Pathogenic

term matching S

3

3

0.50

0.50

0.50

0.67

0.67

0.67

Disease Caused

term matching S

7

11

0.27

0.43

0.33

0.36

0.57

0.44

Pathogen Target Organ

term matching S

4

9

0.22

0.50

0.31

0.22

0.50

0.31

Haemolytic & Haemadsorption Properties

term matching S

10

7

0.57

0.40

0.47

0.57

0.40

0.47

Organic Compounds Used Or Hydrolyzed

term matching S

620

480

0.85

0.66

0.74

0.89

0.69

0.77

Organic Compounds Not Used Or Not Hydrolyzed

term matching S

733

468

0.92

0.58

0.71

0.92

0.59

0.72

Inorganic Substances Used

term matching S

36

45

0.59

0.74

0.65

0.61

0.76

0.68

Inorganic Substances Not Used

term matching S

61

41

0.81

0.54

0.65

0.81

0.54

0.65

Fermentation Substrates Used

linguistic rules + term matching S

411

629

0.57

0.88

0.69

0.59

0.91

0.72

Fermentation Substrates Not Used

linguistic rules + term matching S

442

475

0.85

0.91

0.88

0.86

0.93

0.89

In total

4098

4049

Total relaxed hit scores

3198.5

  1. Abbreviations: Superscript N numerical character, S string-based/categorical character. The characters with > = 0.8 in Relaxed_F1 score are shown in bold