Skip to main content

Table 1 Correlation coefficient of 4-gram frequencies across species.

From: N-gram analysis of 970 microbial organisms reveals presence of biological language models

Genus Correlation Coefficient Standard Deviation
1 0.99 0.0047
2 0.86 0.0016
3 0.74 0.0415
4 0.67 0.0007
5 0.60 0.0205
6 0.59 0.0349
7 0.56 0.0334
8 0.52 0.2099
9 0.44 0.0832
10 0.38 0.0088
11 0.34 0.0509
12 0.34 0.1105
13 0.33 0.2081
14 0.20 0.1117
15 0.08 0.0725
16 0.00 0.0589
  1. Correlation coefficient of top forty 4-gram frequencies between Brucella suis and corresponding 4-gram frequencies in other species, computed as an average over each genera. Only genera with at least 9 species each are considered. Standard deviation also is shown. Brucella belongs to genus 1 (first row) and as seen, the correlation of 4-gram frequencies is very high at 0.99 in comparison to species of the same genera but it is lower with species in other genera whether within the same class or different class.