Determining significance of pairwise co-occurrences of events in bursty sequences

BMC Bioinformatics

Table 1 Number of significant pairs in synthetic data

Dataset	Number of significant pairs
	W			C			D
	UL	FL	FL(r)	UL	FL	FL(r)	UL	FL	FL(r)
1. Uncorrelated	39	0	0	54	1	0	54	1	0
2. Correlated	33	1	2	54	2	2	53	2	2
3. Directed	38	1	1	54	1	1	54	2	1.5	(1)
4. Distinct correlated	38	1	1	54	1	1	54	1	1
5. Distinct directed	39	1	1	54	1	1	54	1	1
	Number of randomizations where (a, b) found significant
1. Uncorrelated	85	1	0	97	0	0	90:92	1:1	0:1
2. Correlated	100	100	100	100	100	100	100:100	100:100	100:100
3. Directed	100	93	99	100	88	99	100:98	100:1	100:2	(2)
4. Distinct correlated	94	34	35	97	33	34	95:94	35:33	36:33
5. Distinct directed	93	29	31	99	5	17	96:97	31:5	33:0

(1) Median number of pairs of event types, over 100 randomly generated sequences, whose co-occurrence score is significant. Results are shown for five types of synthetic datasets. (2) The number of randomizations in which the planted pair (a, b) is found significant. UL, FL, and FL(r) correspond to the null models, and W, C, and D to the window, undirected, and directed co-occurrence scores. For the D score, the two values s₁: s₂ denote the number of randomizations in which (a → b) and (b → a) are found significant. The empirical p-values are based on 1000 randomizations. Results are shown for p-value threshold 0.01, with 10 event types, w = 50, burst lengths in [100, 200], sequence length 100000, and parameter values p₁ = 0.01, p₂ = 0.1, with 50 bursts per sequence. For the datasets 4 and 5, the number of bursts containing correlations was randomly chosen from [5, 25].

ISSN: 1471-2105