Skip to main content

Table 1 Benchmark datasets list

From: A benchmark study of sequence alignment methods for protein clustering

Reference Name Dataset IDa Number of sequencesb Number of classesc Average length
Reference1 RV11 236 38 301.178
RV12 382 44 392.6885
Reference2 RV20 1706 41 384.3581
Reference3 RV30 1723 30 387.9745
Reference4 RV40 1113 49 480.0952
Reference5 RV50 443 16 516.6546
Reference9 RV911 423 29 701.5792
RV912 228 28 454.0351
  1. aDataset IDs are abbreviation for the datasets and are used to refer the corresponding dataset in this paper. bNumber of sequences means the number of sequences with only one class label in the raw datasets. cNumber of classes means the number of pre-defined protein clusters in each benchmark dataset