Faster and more accurate pathogenic combination predictions with VarCoPP2.0

Versbraegen, Nassim; Gravel, Barbara; Nachtegael, Charlotte; Renaux, Alexandre; Verkinderen, Emma; Nowé, Ann; Lenaerts, Tom; Papadimitriou, Sofia

doi:10.1186/s12859-023-05291-3

BMC Bioinformatics

Table 4 DOME Table consisting of essential information to assess the machine learning approach [22]

From: Faster and more accurate pathogenic combination predictions with VarCoPP2.0

VarCoPP2.0	Version	2.0
Data	Provenance	OLIDA [21] and 1000 genomes Project [12]. 1:500 ratio
	Dataset splits	301 positive instances, 150,500 negative instances for training data. 53 positive and 10000 negative instances for validation set. Training with stratified LOGO cross-validation
	Redundancy between data splits	No overlap
	Availability of data	Yes: olida.ibsquare.be (new curated data will be added) and www.internationalgenome.org
Optimization	Algorithm	Balanced Random Forest
	Meta-predictions	Yes: CADD features and ISPP features stem from a predictive model
	Data encoding	Global features
	Parameters	400 decision trees within RF
	Features	15 features, obtained through wrapper approach on training data only, using mean f1 score of 5-fold cross validation
	Fitting	Decision trees are pruned to avoid overfitting
	Regularization	No
	Availability of configuration	Yes: https://github.com/oligogenic/VarCoPP2.0
Model	Interpretability	Transparent model, 400 decision trees
	Output	Probability, thresholded to classification
	Execution time	10000 samples in .2 seconds
	Availability of software	ORVAL: https://orval.ibsquare.be & Github: https://github.com/oligogenic/VarCoPP2.0
Evaluation	Evaluation method	Both stratified LOGO cross validation and independent validation data
	Performance measures	Average precision score, Precision, Recall, Specificity, F1 and Geometric mean
	Comparison	Confusion matrix and aforementioned performance methods on previous version of model and retrained model on new data
	Confidence	Performance differences apparent
	Availability of evaluation	Yes: Github: https://github.com/oligogenic/VarCoPP2.0

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com