Skip to main content

Advertisement

Table 3 Structural comparison of networks on subsets of the data (resilience to incomplete data)

From: A nearest-neighbors network model for sequence data reveals new insight into genotype distribution of a pathogen

  Avg. degree D CC Number Comp. Largest Comp. |C1| |C>2|/|V| Genus prec. Genus recall Species prec. Species recall
GroEL Threshold            
Full 10.5 5 0.98 304 46 188 62.3% 90.3% 47.3% 60.6% 63.2%
Sample Avg. 6.2 4 0.97 221 29 150 56.1% 89.7% 46.7% 62.8% 64.5%
Sample 1 6.2 5 0.97 221 30 151 53.8% 90.8% 47.2% 62.2% 65.4%
Sample 2 6.3 4 0.97 229 29 158 52.0% 89.3% 45.4% 62.1% 62.6%
Sample 3 6.5 4 0.98 223 27 152 53.2% 89.2% 45.5% 61.0% 63.0%
Sample 4 6.3 5 0.96 212 32 141 56.7% 87.7% 46.2% 63.4% 64.9%
Sample 5 5.9 4 0.98 218 25 146 58.5% 91.6% 49.4% 65.3% 66.7%
GroEL DiWANN            
Full 2.6 7 0.19 179 34 0 85.4% 80.4% 43.9% 59.5% 61.8%
Sample Avg. 2.7 6 0.41 119 26 0 86.4% 75.8% 51.1% 55.2% 67.8%
Sample 1 2.8 7 0.52 100 33 0 86.4% 73.6% 52.1% 53.4% 68.4%
Sample 2 2.4 5 0.21 111 22 0 85.6% 78.2% 50.4% 56.4% 66.2%
Sample 3 2.8 6 0.58 113 24 0 84.0% 73.6% 52.7% 54.0% 69.2%
Sample 4 2.6 5 0.38 105 23 0 87.3% 77.9% 46.0% 57.9% 67.6%
Sample 5 2.7 7 0.36 105 29 0 88.9% 75.8% 54.1% 54.5% 68.0%
  1. This table shows a comparison of both structure and clustering results for the GroEL dataset for networks generated from a random sample of 60% of the sequences. D denotes diameter, and CC denotes clustering coefficient. Also shown are the number of connected components, and the size of the largest component. |C1| gives the number of nodes in clusters of size 1 (singletons), and |C>2|/|V| shows the percentage of nodes in a cluster of size 3 or above. The full network is also included for comparison. For the threshold based networks, we use a threshold of 30, which had a good trade-off of precision and recall in the community analysis. The full networks contain 812 nodes, while each reduced network contains 487 nodes