Skip to main content

Table 4 Comparison of the results of classification obtained using Bayesian networks learnt on low observability predicted datasets with those in which networks were learnt on original datasets

From: Gene expression prediction using low-rank matrix completion

Study

Dataset

True positive rate

False positive rate

Precision

Recall

F-measure

AUROC

Lung adenocarcinoma

Original

0.944

0.057

0.944

0.944

0.944

0.988

 

Low-rank prediction

0.944

0.057

0.944

0.944

0.944

0.996

 

(O = 60 %)

      
 

Sampled Uniform distribution

0.757

0.256

0.758

0.757

0.755

0.777

 

(O = 60 %)

      

Myelodysplastic syndrome

Original

0.865

0.866

0.844

0.865

0.854

0.673

 

Low-rank prediction

0.865

0.92

0.833

0.865

0.849

0.675

 

(O = 40 %)

      
 

Sampled Uniform distribution

0.85

0.868

0.842

0.85

0.846

0.425

 

(O = 40 %)

      

Pulmonary hypertension

Original

0.638

0.121

0.633

0.638

0.635

0.854

 

Low-rank prediction

0.681

0.118

0.645

0.681

0.659

0.897

 

(O = 60 %)

      
 

Sampled Uniform distribution

0.267

0.372

0.213

0.267

0.218

0.424

 

(O = 60 %)

      

Pancreatic ductal

Original

0.782

0.218

0.784

0.782

0.782

0.886

adenocarcinoma

Low-rank prediction

0.821

0.179

0.821

0.821

0.82

0.905

 

(O = 50 %)

      
 

Sampled Uniform distribution

0.397

0.603

0.389

0.397

0.385

0.417

 

(O = 50 %)

      

Psoriasis

Original

0.912

0.088

0.913

0.912

0.912

0.96

 

Low-rank prediction

0.912

0.088

0.912

0.912

0.912

0.956

 

(O = 40 %)

      
 

Sampled Uniform distribution

0.641

0.359

0.641

0.641

0.641

0.648

 

(O = 40 %)

      
  1. Datasets were condensed and constituted of randomly selected 100 gene attributes. Bayesian networks were learned using a bottom-up search method known as K2 algorithm and evaluated in a 10-fold cross validation analysis. The predicted datasets were evaluated by comparing the classification results with those obtained using datasets constructed employing values sampled from a set uniform distribution instead of low-rank recovery, and the fraction of known values were the same in both cases. Notably, the performance of low-rank recovered datasets closely matched with that of the original datasets
  2. Abbreviations: O observability, AUROC Area Under the Receiver Operating Characteristic curve deviation ratio