Skip to main content

Table 1 Parameters in SnakeMake configuration file

From: Hypercluster: a flexible tool for parallelized unsupervised clustering optimization

config.yml parameter Explanation Example
1 input_data_folder Path to folder in which input data can be found /input_data
2 input_data_files List of prefixes of data files ['input_data1’, 'input_data2’]
3 gold_standard_file File name of gold_standard_file, must be in input_data_folder {'input_data': 'gold_standard_file.txt'}
4 read_csv_kwargs pandas.read_csv keyword arguments for input data {'test_input': {'index_col':[0]}}
5 output_folder Path to folder into which results should be written /results
6 intermediates_folder Name of subfolder to put intermediate results clustering_intermediates
7 clustering_results Name of subfolder to put aggregated results clustering
8 clusterer_kwargs Additional arguments to pass to clusterers KMeans: {'random_state':8}}
9 generate_parameters_addtl_kwargs Additonal keyword arguments for the hypercluster.AutoClusterer class {‘KMeans’: {'random_search': true)
10 evaluations Names of evaluation metrics to use ['silhouette_score', 'number_clustered']
11 eval_kwargs Additional kwargs per evaluation metric function {'silhouette_score': {'random_state': 8}}
12 metric_to_choose_best Which metric to maximize to choose the labels silhouette_score
13 metric_to_compare_labels Which metric to use to compare label results to each other adjusted_rand_score
14 compare_samples Whether to made a table and figure with counts of how often each two samples are in the same cluster "true"
15 output_kwargs pandas.to_csv and pandas.read_csv keyword arguments for output tables {'evaluations': {'index_col':[0]}, 'labels': {'index_col':[0]}}
16 heatmap_kwargs Arguments for seaborn.heatmap for pairwise visualizations {'vmin':-2, 'vmax':2}
17 optimization_parameters Which algorithms and corresponding hyperparameters to try {'KMeans': {'n_clusters': [5, 6, 7] }}