Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: Accuracy of a machine learning method based on structural and locational information from AlphaFold2 for predicting the pathogenicity of TARDBP and FUS gene variants in ALS

Fig. 1

Work flowchart for MOVA. The x, y, z coordinates, and the plddt score for the amino acid residues at the substitution sites in the protein in the pdb file of the Alphafold2 database, and the ΔBLOSUM62 of the substituted amino acid residue, were used as parameters for random forest, XGBoost, or support vector machine (SVM) training (A). The sample group was randomly divided into five subsets as avoiding bias in objective variables. With one subset as the test cases and the rest as the training cases, we built the model. The predictions were calculated and validated using the test data. The models were iteratively built so that all five subsets were test cases. (B). The model was generated 30 times with all variants in the dataset as training data. The probability of each possible variant of the gene being pathogenic was predicted, and the average of the predictions was used as the MOVA value (C)

Back to article page