Skip to main content

Table 1 Correlation coefficient of 4-gram frequencies across species.

From: N-gram analysis of 970 microbial organisms reveals presence of biological language models

Genus

Correlation Coefficient

Standard Deviation

1

0.99

0.0047

2

0.86

0.0016

3

0.74

0.0415

4

0.67

0.0007

5

0.60

0.0205

6

0.59

0.0349

7

0.56

0.0334

8

0.52

0.2099

9

0.44

0.0832

10

0.38

0.0088

11

0.34

0.0509

12

0.34

0.1105

13

0.33

0.2081

14

0.20

0.1117

15

0.08

0.0725

16

0.00

0.0589

  1. Correlation coefficient of top forty 4-gram frequencies between Brucella suis and corresponding 4-gram frequencies in other species, computed as an average over each genera. Only genera with at least 9 species each are considered. Standard deviation also is shown. Brucella belongs to genus 1 (first row) and as seen, the correlation of 4-gram frequencies is very high at 0.99 in comparison to species of the same genera but it is lower with species in other genera whether within the same class or different class.