Skip to main content

Table 1 Parameters in SnakeMake configuration file

From: Hypercluster: a flexible tool for parallelized unsupervised clustering optimization

config.yml parameter

Explanation

Example

1 input_data_folder

Path to folder in which input data can be found

/input_data

2 input_data_files

List of prefixes of data files

['input_data1’, 'input_data2’]

3 gold_standard_file

File name of gold_standard_file, must be in input_data_folder

{'input_data': 'gold_standard_file.txt'}

4 read_csv_kwargs

pandas.read_csv keyword arguments for input data

{'test_input': {'index_col':[0]}}

5 output_folder

Path to folder into which results should be written

/results

6 intermediates_folder

Name of subfolder to put intermediate results

clustering_intermediates

7 clustering_results

Name of subfolder to put aggregated results

clustering

8 clusterer_kwargs

Additional arguments to pass to clusterers

KMeans: {'random_state':8}}

9 generate_parameters_addtl_kwargs

Additonal keyword arguments for the hypercluster.AutoClusterer class

{‘KMeans’: {'random_search': true)

10 evaluations

Names of evaluation metrics to use

['silhouette_score', 'number_clustered']

11 eval_kwargs

Additional kwargs per evaluation metric function

{'silhouette_score': {'random_state': 8}}

12 metric_to_choose_best

Which metric to maximize to choose the labels

silhouette_score

13 metric_to_compare_labels

Which metric to use to compare label results to each other

adjusted_rand_score

14 compare_samples

Whether to made a table and figure with counts of how often each two samples are in the same cluster

"true"

15 output_kwargs

pandas.to_csv and pandas.read_csv keyword arguments for output tables

{'evaluations': {'index_col':[0]}, 'labels': {'index_col':[0]}}

16 heatmap_kwargs

Arguments for seaborn.heatmap for pairwise visualizations

{'vmin':-2, 'vmax':2}

17 optimization_parameters

Which algorithms and corresponding hyperparameters to try

{'KMeans': {'n_clusters': [5, 6, 7] }}