Skip to main content

Table 7 Data distribution

From: A review and comparative study of cancer detection using machine learning: SBERT and SimCSE application

Gene

Total number of normal sequences

Total number of tumor sequences

Total

Before SMOTE: chosen sampling strategy = “not majority’

APC

305214

553563

858777

ATM

545113

610309

1155422

After SMOTE: chosen sampling strategy = “not majority

APC

553563

553563

1107126

ATM

610309

610309

1220618