From: CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques
Dataset
Subset
Carbonylation type and number of samples
K
P
R
T
Training dataset
Positive
618
162
204
191
Negative
26,995
22,418
22,849
24,271
Independent test dataset
117
16
54
24
7439
5318
5966
6507