Skip to main content

Table 4 Accuracy estimates (100% – error rate) using different parameters for TFBS Identification, based on twenty repetitions, each utilizing ten-fold cross validation for a total of 200 runs

From: Characterizing disease states from topological properties of transcriptional regulatory networks

Promoter Range

1 kb upstream

1 kb upstream

5 kb upstream

5 kb upstream

PWM

All Proflies

Limited Profiles

All Proflies

Limited Profiles

Classifier

Expression

Lower

Upper

Feature Selection

Accuracy

SD

Accuracy

SD

Accuracy

SD

Accuracy

SD

IB1

Threshold

0.2

0.8

InfoGain

91.56%

235%

81.65%

4.22%

93.06%

1.80%

93.27%

2.35%

IB1

Threshold

0.33

0.66

InfoGain

91.89%

2.95%

90.72%

2.90%

95.57%

2.04%

93.62%

1.78%

IB1

Threshold

0.2

0.8

ChiSquared

89.96%

2.74%

81.00%

4.04%

93.92%

1.75%

92.63%

2.22%

IB1

Threshold

0.33

0.66

ChiSquared

91.10%

2.90%

90.67%

2.79%

94.07%

2.43%

93.43%

2.31%

IB1

Tanh

0.25

0.75

InfoGain

92.71%

2.43%

92.74%

2.30%

92.13%

2.47%

92.00%

3.01%

Naive Bayes

Threshold

0.2

0.8

InfoGain

90.47%

2.85%

8235%

3.78%

96.04%

1.34%

94.98%

1.41%

Naive Bayes

Threshold

0.2

0.8

InfoGain

91.67%

2.53%

83.18%

3.11%

94.39%

1.73%

93.78%

2.00%

  1. Table 4 shows the effects of variations in the parameters for connectivity network construction. The genomic region searched for transcription factor binding sites was either 1000 bp or 5000 bp upstream of known genes. Two different collections of Position weighted matrices (PWM) were also applied: 1) all the matrices provided by TRANSFAC relevant to mammalian genes (All Profiles), or 2) the selection of PWMs identified by TRANSFAC as 'high quality' (Limited Profiles).