Skip to main content

Table 4 Default and dynamic/data-driven parameterizations of BicPAMS

From: BicPAMS: software for biological data analysis with pattern-based biclustering

 

Parameter

Value

Notes

Major parameters

P3 Coherency assumption

Constant assumption

A default assumption considers a (possibly noise-tolerant) constant pattern on a subset of rows/columns/nodes, providing an adequate degree of flexibility (superior to biclusters with differential/dense values or constant values overall) well suited for initial analyzes.

 

P4 Coherency strength

\(|\mathcal {L}|\)=5 or δ=\(\bar {A}\)/5

Adequate sensitivity to different levels of expression ({-2,-1} {0} and {1,2} sets of symbols correspond to down-regulation, preserved and up-regulation) or association strength. Multiple symbols can be assigned to a single real-valued element to guarantee robustness to noise.

 

P5 Quality

80%

Guarantees an adequate tolerance to noise, allowing biclusters to have up to 20% of noisy values.

 

P15 Pattern representation

Closed

Closed pattern representations enable the discovery of maximal biclusters (biclusters that cannot be extended without removing rows or columns).

 

P16 Orientation

Patterns on rows

In accordance with Def.2. Considering expression data where rows correspond to genes, a bicluster with coherency across rows is defined by a group of genes with the same pattern along a subset of conditions. When rows correspond to conditions, a less-trivial bicluster is given by a group genes with preserved expression spanning a subset of conditions.

Mapping options

P6 Normalization

Row

Normalization of values per biological entity or sample.

 

P7 Discretization

Gaussian

Cut-off points of a learned Gaussian curve to minimize imbalanced distributions of items.

 

P8 Noise handler

None

By default multi-item assignments are deactivated for an easy interpretation of results. Nevertheless, we suggest the selection of multi-item assignments to guarantee a heightened robustness to discretization drawbacks and noise.

 

P9 Symmetries

Dynamic

Symmetries are dynamically selected if the inputted data has negative values. This option can be deactivated to force the biclustering task to not distinguish positive from negative values.

 

P10 Missings handler

Remove

Remove is suggested since Quality P5 is already in place to accommodate missing values within biclusters. Nevertheless, Replace option is suggested for data with a considerable amount of missing values.

 

P11 Remove uninformative elements

None

By default, no items are removed. Alternative options should be only selected in the presence of knowledge regarding uninformative elements, such as non-differential expression or loose interactions.

Mining options

P12 Stopping criteria

50 biclusters

A minimum number of 50 biclusters (before postpro cessing) is suggested by default since the combination of this option with the quality and dissimilarity criteria leads to a compact set of dissimilar biclusters. This number (as well as the number of iterations) can be increased to guarantee more complete solutions for complex or large datasets.

 

P13 Min. ♯columns

4

Although maximal biclusters have at least 4 columns by default, this number should be increased for datasets where biclusters have a significantly higher number of columns.

 

P14 ♯Iterations

2

Guarantees the removal of small and highly coherent regions in the dataset (after the 1st iteration) to enable the discovery of less-trivial biclusters. This number can be increased to promote a more even distribution of biclusters across the regions of the inputted data.

 

P17 Pattern miner

Dynamic

From empirical evidence, CharmDiff is suggested for closed patterns, CharmMFI for maximal patterns, and F2G for simple patterns. When order-preserving coherency is inputted, IndexSpan is suggested by default.

 

P18 Scalability

Dynamic

Option activated in the presence of very large datasets (>20 million elements under a constant assumption and >1 million elements for the remaining coherency assumptions).

Closing

P19 Merging

Heuristic

Guarantees an efficient yet quasi-exact postprocessing.

 

P20 Filtering

40% dissimilar elements

Guarantees an adequate level of dissimilarity. Biclusters sharing more than 60% of their elements with a larger bicluster are removed.