Skip to main content

Table 6 Settings to simulate real datasets

From: G-bic: generating synthetic benchmarks for biclustering

Parameters

Gene expression

Recommendation systems

Text mining

Clinical data

Spatio-temporal data

Dataset properties

Number of rows

10000

200000

30000

50000

30

Number of columns

100

30000

20000

8000

150

Heterogeneous?

No

No

No

Yes

No

Properties

Background with 10% noise

Background with 95,5% Missing values

Background with 99,8% missing values

Background with 99,8% missing values

Background with 50% noise and 20% errors

Bicluster properties

Number of biclusters

500

3000

70

30

20

Rows structure

U(80,400)

U(30,70)

U(1000,10000)

U(20,100)

U(2,4)

Columns structure

U(20,40)

U(3,7)

U(600,6000)

U(5,15)

U(7,10)

Contiguity

No

No

No

No

Yes

Biclustering patterns

Additive and Order Preserving

Order Preserving

Constant and Order Preserving

Order Preserving

Additive and Multiplicative

Overlapping

 

10% bics with additive overlap

None

None

None

None