Fig. 3From: ProteinNet: a standardized data set for machine learning of protein structureAlignment size as a function of ProteinNet subset. Box and whisker charts depict the distribution of number of sequences per MSA for ProteinNet training (30% thinning), validation, and test sets. Individual data points for training sets are not shown due to their large sizeBack to article page