- Research article
- Open Access

# Graph rigidity reveals well-constrained regions of chromosome conformation embeddings

- Geet Duggal
^{1}and - Carl Kingsford
^{1}Email author

**13**:241

https://doi.org/10.1186/1471-2105-13-241

© Duggal and Kingsford; licensee BioMed Central Ltd. 2012

**Received:**23 March 2012**Accepted:**20 August 2012**Published:**21 September 2012

## Abstract

### Background

Chromosome conformation capture experiments result in pairwise proximity measurements between chromosome locations in a genome, and they have been used to construct three-dimensional models of genomic regions, chromosomes, and entire genomes. These models can be used to understand long-range gene regulation, chromosome rearrangements, and the relationships between sequence and spatial location. However, it is unclear whether these pairwise distance constraints provide sufficient information to embed chromatin in three dimensions. A priori, it is possible that an infinite number of embeddings are consistent with the measurements due to a lack of constraints between some regions. It is therefore necessary to separate regions of the chromatin structure that are sufficiently constrained from regions with measurements that do not provide enough information to reconstruct the embedding.

### Results

We present a new method based on graph rigidity to assess the suitability of experiments for constructing plausible three-dimensional models of chromatin structure. Underlying this analysis is a new, efficient, and accurate algorithm for finding sufficiently constrained (rigid) collections of constraints in three dimensions, a problem for which there is no known efficient algorithm. Applying the method to four recent chromosome conformation experiments, we find that, for even stringently filtered constraints, a large rigid component spans most of the measured region. Filtering highlights higher-confidence regions, and we find that the organization of these regions depends crucially on short-range interactions.

### Conclusions

Without performing an embedding or creating a frequency-to-distance mapping, our proposed approach establishes which substructures are supported by a sufficient framework of interactions. It also establishes that interactions from recent highly filtered genome-wide chromosome conformation experiments provide an adequate set of constraints for embedding. Pre-processing experimentally observed interactions with this method before relating chromatin structure to biological phenomena will ensure that hypothesized correlations are not driven by the arbitrary choice of a particular unconstrained embedding. The software for identifying rigid components is GPL-Licensed and available for download at http://cbcb.umd.edu/kingsford-group/starfish.

## Keywords

- Fission Yeast
- Distance Constraint
- Subtelomeric Region
- Rigid Component
- Chromosome Conformation Capture

## Background

Recent experiments for chromosome conformation capture [1–7] can result in graphs of hundreds of thousands interactions between chromosome locations. Each edge in such a *chromosome conformation graph* is associated with a weight corresponding to the frequency at which the interaction occurs, and the edges in the graph can be interpreted as spatial distance constraints between chromosome locations with an appropriate mapping from interaction frequency to distance [2–4]. The information contained in chromosome conformation graphs has been used to embed entire genomes as well as portions of chromosomes at a kilobase-pair resolution in three dimensions [2–5, 7, 8], and these structures provide first glimpses into how chromosomes take shape within the cell in more detail than what is possible with light microscopy [9]. These experiments are also motivated by the potential to associate genome structure with long-range regulation, chromatin accessibility, and somatic copy number alterations [10]. Embedding chromosome conformation data has become a common practice, and a variety of algorithms have been developed to embed these structures in three dimensions [2, 4, 11]. These embedded structures have been used to gain biological insight into how chromatin structure relates to cancer [4], how sequence relates to to structure [7], and to study chromatin territories [5].

Our primary objective is to determine whether chromosome conformation data from recent experiments on the budding yeast, fission yeast, and human genomes provide an adequate set of constraints for embedding confidently. Underconstrained, *floppy* substructures of an embedded genome can continuously deform without violating any measured distance constraints, resulting in an infinite number of embeddings consistent with the experimental data. As a pre-processing step before embedding, it is thus desirable to identify non-floppy or *rigid* substructures within the genome. It is these structures for which we have the most confidence in three-dimensional embeddings provided by optimization methods such as described in [2–4]. Rigid regions are not rigid in the sense of being physically frozen. In fact, a rigid region can be asssociated with a variety of unique embeddings consistent with distance constraints in the conformation graph. In addition, chromosome conformation measurements at various time points may reveal other snapshots of chromatin structure, and this ensemble of embeddings can reflect the highly flexible nature of chromatin. In contrast, if a substructure of chromatin is not rigid, the flexibility is simply due to the fact that the region is underconstrained by the experimental measurements. Filtering subsequent spatial analyses to consider only those regions that are rigid will help to avoid artifacts created merely by the lack of sufficient constraints to select among consistent, continuously deformable alternatives.

We apply graph rigidity theory [12, 13] to determine the substructures within the genome that are sufficiently constrained to produce a non-floppy embedding in three dimensions. Two key features of our technique are that it deals directly with the chromosome conformation graph rather than relying on computing a spatial embedding and that it does not depend on the precise values of the distance constraints. These are both highly desirable properties for assessing the quality of chromosome conformation data for embedding because there is no consensus yet on a mapping from frequency to distance and computing even a single spatial embedding can be computationally very expensive for an entire genome. In order to efficiently assess rigidity on the scale required by the chromosome conformation capture data, we propose a novel, fast algorithm for identifying rigid substructures. This algorithm uses a family of “pebble game” algorithms [14–17] established for finding rigid substructures in tandem with a novel algorithm using results from rigidity theory [18]. Under the assumption that the edges in these graphs represent fixed distance constraints, the proposed algorithm guarantees that all subgraphs identified are rigid in three dimensions, although they may not be maximal.

While it could be the case that significant portions of the constraints are floppy and potentially uninformative for embedding, we find that, for even strictly filtered graphs, a large rigid subgraph that spans most — but not all — of the genome. Thus, since the region is not underconstrained, the embedded structures of most regions can be more confidently interpreted. This procedure can be applied to any statistical filtering of chromosome conformation data, and we explore the effect of filtering both low-frequency and short-range interactions on the creation of rigidly embeddable structures. Most interactions in genome-wide chromosome conformation graphs occur either infrequently or at short genomic distances, and some of these interactions could be a result of experimental noise or arise from incidental, transient interactions. By systematically filtering interactions, we quantify the frequency cutoff at which large rigid components begin to disappear. Additionally, we find that the creation of rigid components depends crucially on short-range intra-chromosomal interactions and that the pairing or separation between rigid, subtelomeric regions of chromosomes is consistent with light microscopy data for budding and fission yeast.

## Results and discussion

### Algorithms for identifying rigid components

Rigid components correspond intuitively to substructures in the embedding that cannot be continuously deformed without violating one or more measured proximities between chromosome locations. Formally, a graph of distance constraints is a *rigid graph* or *rigid body* in three dimensions if, when the vertices are embedded in generic position in $\mathbb{R}$^{3}, there is no continuous movement of the vertices — aside from a rotation or translation of all vertices — that maintains all the distances between vertices connected by edges. If a graph is not rigid (i.e. *floppy*), infinitely many embeddings are possible since there exists at least one continuous movement of vertices that maintains all the distance constraints. A *rigid component*, or maximally rigid subgraph, is a subset of vertices *C* for which the subgraph induced by *C* is rigid and no superset $D\supset C$ exists for which the subgraph induced by *D* is rigid. We only consider rigid components with 3 or more nodes, although vertices with no edges and single edges can be trivial rigid components of size 1 and 2 respectively.

*bar-joint framework*, vertices represent universal

*joints*and edges represent fixed-length

*bars*between joints. The double-banana graph (Figure 1) is composed of two rigid components in this framework that rotate around a hinge implied by two joints in the graph. The double-banana can also be represented as a type of bar-joint framework called a

*body-bar-and-hinge framework*where rigid bodies can be connected to one another by fixed-length bars as well as hinges that allow just one rotational degree of freedom between two rigid bodies. The double-banana is also an example of a graph that contains rigid components that share nodes, illustrating the fact that rigid components of a graph do not correspond necessarily to a partition of the vertices in the graph.

No efficient algorithm is known for identifying all rigid components in three dimensions in general bar-joint frameworks. Efficient algorithms based on the so-called “pebble games” do exist in two dimensions [14, 15] and for more restricted notions of rigidity in 3-dimensions [13]. Recently, it has been suggested that a variant of a pebble game algorithm designed for two-dimensional rigidity can be applied to arbitrary bar-joint frameworks in three dimensions [13] with good results for most graphs. While this approach often identifies many rigid components, it also erroneously produces components that are floppy. One such example is the double-banana graph of Figure 1. In contrast, efficient, provably correct algorithms exist to find rigid components in body-bar-and-hinge frameworks [16].

We propose an iterative procedure we call Body-Bar-and-Hinge Reduction (Algorithm 1) for more accurately finding rigid components in three dimensions. It begins by gluing together smaller rigid subgraphs and then merges them by reducing the problem to identifying rigid components in the body-bar-and-hinge framework, for which efficient algorithms exist. For graphs close to the minimally rigid threshold (3*n*−6 edges where *n* is the number of vertices in the graph), we suggest the use of a hybrid algorithm (Algorithm 2) that combines the pebble game with the body-bar-and-hinge reduction. In this variant, whenever the pebble game returns a floppy component, Algorithm 1 is run on the component. The pebble game fails when implied hinges exist such as the one in the double-banana graph [13]. In these cases, we observe the pebble game over-estimates the size of the actual rigid components and Algorithm 1 decomposes this floppy component into rigid subgraphs.

#### Algorithm 1

Body-Bar-and-Hinge Reduction Let Max-Triangle(*G,U*) and Max-Vertex(*G,U*) be a triangle or vertex in *G*, respectively, with the largest total degree excluding edges incident to vertices in *U*.

1: **Input:** A graph *G* of distance constraints

2: Remove all vertices of degree ≤ 2

3: Initialize the list of rigid subgraphs $\mathcal{R}$ to the empty list

4: **while** a $T=\mathrm{\text{Max-Triangle}}(G,\bigcup _{C\in \mathcal{R}}C)$ can be found such that *T* is not fully contained in any component in $\mathcal{R}$**do**

5: **while** a $v=\mathrm{\text{Max-Vertex}}(G,\bigcup _{C\in \mathcal{R}}C)$ with *v*∉*T* and at least three edges to *T* can be found **do**

6: Add *v* to *T*

7: Add *T* to $\mathcal{R}$

8: **while** two components ${C}_{i},{C}_{j}\in \mathcal{R}$ share three or more vertices **do**

9: Remove both *C*_{
i
}and *C*_{
j
}from $\mathcal{R}$

10: Add ${C}_{i}\cup {C}_{j}$ to $\mathcal{R}$

11: Let ${\mathcal{R}}_{2}$ be a subset of $\mathcal{R}$ such that for each pair *C*_{
i
},*C*_{
j
}in $\mathcal{R}$, $|{C}_{i}\cap {C}_{j}|=$ 0 or 2.**Comment:** The body-bar-and-hinge framework will be represented by a set of hinges *H* which contains pairs of rigid bodies that share two vertices and a set of bars *B* which contains edges that connect rigid bodies.

12: Initialize *B*, *H*, a set of used hinges *U*_{
H
}, and a set of used nodes *U*_{
N
}to the empty set.

13: **for** every pair *C*_{
i
},*C*_{
j
}in ${\mathcal{R}}_{2}$**do**

14: **if**$|{C}_{i}\cap {C}_{j}|=2$ and ${C}_{i}\cap {C}_{j}=\{v,w\}\notin {U}_{H}$**then**

15: Add {*C*_{
i
},*C*_{
j
}} to *H*

16: Add {*v*,*w*} to *U*_{
H
}

17: Add both *v* and *w* to *U*_{
N
}

18: **for** all pairs of nodes *v*,*w* in *C*_{
i
}△*C*_{
j
}**do**

19: **if** *G* contains an edge between *v* and *w*, *v*∉*U*_{
N
}, and *w*∉*U*_{
N
}**then**

20: Add {*v*,*w*} to *B*

21: Add both *v* and *w* to *U*_{
N
}

22: **Return:** the subsets of vertices in *G* corresponding to the rigid components of the body-bar-and-hinge framework as well as components in $\mathcal{R}\setminus {\mathcal{R}}_{2}$

#### Algorithm 2

Identify Rigid Components

1: **Input:** A graph *G* of distance constraints

2: Initialize the list of rigid components $\mathcal{C}$ to the empty list

3: **for** every connected component *G*_{
i
}in *G* **do**

4: Let $\mathcal{P}$ be the set of components for *G*_{
i
}returned by the pebble game algorithm

5: **for**$H\in \mathcal{P}$**do**

6: **if** the subgraph induced by *H* is floppy **then**

7: append all components returned by Body-Bar-and-Hinge Reduction on the subgraph induced by *H* to $\mathcal{C}$

8: **else**

9: append *H* to $\mathcal{C}$

10: **Return:**$\mathcal{C}$

To determine whether a component produced by the pebble game is floppy or rigid (line 6 of Algorithm 2), we use the standard rank test of a matrix that encodes a graph of distance constraints given an embedding in ${\mathbb{R}}^{3}$[12]. If a random embedding of a graph of distance constraints is rigid, then all generic embeddings are also rigid [19]. This fact allows the rigidity of an identified subgraph of distance constraints to be tested via random embeddings, ignoring the precise distances on the constraints.

We construct rigid subgraphs using Algorithm 1, which starts greedily from a triangle with the most connections to other vertices not yet in a rigid component. This rigid subgraph is then grown one vertex at a time such that each added vertex connects to at least three vertices in the existing subgraph and has the most connections to other vertices not in the subgraph (lines 3-6). By Proposition 1, the grown subgraph is rigid. Once no vertex can be added, another triangle not contained in an existing component is selected and grown by the same vertex addition allowing reuse of any vertex added in a prior step. Once no more triangles can be found, constructed rigid subgraphs that overlap by three or more vertices are merged to form larger rigid subgraphs (lines 8-10). Proposition 2 below guarantees that components merged in this way will be rigid.

#### Proposition 1

If a vertex connects to at least three nodes in a rigid subgraph, then extending the subgraph to include that vertex results in a rigid subgraph. (Vertex 3-Addition Lemma[18])

#### Proposition 2

If two rigid subgraphs overlap by 3 or more nodes, then the union of the subgraphs is rigid (Generic 3-Gluing Lemma[18]).

The resulting subgraphs are merged further by converting them into a body-bar-and-hinge framework as described in lines 11-21 of Algorithm 1.

#### Proposition 3

Algorithm 1 returns rigid components.

By Propositions 1 and 2, the subgraphs produced by the initial greedy phase of Algorithm 1 are rigid and can be used as bodies. Line 11 eliminates the possibility that pairs of rigid bodies overlap by exactly one node: this overlap can neither be represented as a hinge between two rigid bodies nor a bar between two distinct vertices. The framework is then constructed by assuring that each hinge connects exactly two rigid bodies that overlap by two vertices. Lines 14-17 guarantee that whenever a hinge is created between a pair of rigid bodies that overlap by two vertices, that pair of vertices is never used as a hinge again. Lines 18-21 similarly assure that vertices across two rigid bodies are connected together by bars such that no vertex contains multiple bars. These basic rules construct a body-bar-and-hinge framework where hinges only allow one degree of rotational freedom between two rigid bodies and that bars do not share end points [20]. Rigid components in this framework directly correspond to rigid components in the original graph. By a theorem of Tay [21], a variant of the pebble game can be used to identify rigid components in body-bar-and-hinge frameworks, and this can be done in time quadratic in the number of vertices [16].

For graphs close to the minimally rigid threshold (3*n*−6 edges in three dimensions where *n* is the number of vertices in the graph), Algorithm 1 may fail to identify the maximal rigid component. In these cases, we propose using a hybrid algorithm (Algorithm 2) that combines the body-bar-and-hinge reduction with the pebble game algorithm. Since the pebble game does not guarantee that the components it returns are rigid, Algorithm 2 performs matrix rank tests on these components to verify that they are indeed rigid. The bottleneck of Algorithm 2 is the matrix rank testing of components returned by the pebble game, which takes *O*(*m* *n*^{2}) time, where *m* is the number of edges in the graph and *n* is the number of vertices.

### Performance of rigid component algorithms

Although there is no known algorithm that efficiently identifies all maximally rigid subgraphs of bar-joint frameworks in three dimensions at this scale, for a few small individual chromosomes in budding yeast (1,2 and 6) at interaction frequency cutoffs of 98.8, 99.0, 99.2, and 99.4% (see Methods), we observe that Algorithm 1 finds maximally rigid subgraphs. To verify that we find a maximally rigid subgraph, we performed matrix rank tests on all possible induced subgraphs with more vertices than the largest rigid component identified by Algorithm 1. We also compared Algorithm 1 with a recently proposed slow spring relaxation algorithm [13] and found identical rigid components.

For even a single chromosome, the exhaustive subset testing technique takes hours to days on 20 Opteron 8431 (2400MHz) processors and the spring relaxation algorithm takes a similar amount of time on a single processor. A rigidity analysis using these techniques would be infeasible, but Algorithm 1 can identify rigid components on the entire yeast genome (Duan et al. with their FDR 0.01% filtering) in minutes on a single processor. This is despite the fact that finding the maximum triangle, which takes *O*(*n*^{3}) time, is the bottleneck in Algorithm 1. On the other hand, finding any triangle in a graph is at most the time complexity of a matrix multiplication [22]. If we replace the greedy requirement of finding a maximum triangle and maximum vertex with finding any triangle or vertex that meets the edge connection criteria, we obtain identical results at much lower running times (<20 seconds for the Duan et al. genome at FDR 0.01%). In addition, when comparing Algorithm 1 to the pebble game for bar-joint networks, we find identical rigid components for all individual chromosomes in the Duan et al. data set. The pebble game algorithm alone runs in similar time to Algorithm 1, but doesn’t guarantee rigidity. When rank tests are used to confirm rigidity for the pebble game algorithm, the running times are at least 20 times the running times without the rank tests.

### Rigid components in augmented vs. non-augmented chromosome conformation graphs

The pebble game obtains larger rigid components than Algorithm 1 when maximally rigid subgraphs are close to the minimum number of edges required for rigidity (3*n*−6 edges in three dimensions where *n* is the number of vertices in the graph), and Algorithm 2 will always find rigid components at least as large as the pebble game since floppy components returned by the pebble game are decomposed into smaller rigid components, and by Proposition 3, it will never report a floppy component as rigid. Algorithm 2 uses the pebble game in two ways: first, a version of it [13] is applied directly to other input network and, after rigidity matrix tests, if any of these components are floppy, Algorithm 1 is applied using a version of the pebble game to find components on the body-bar-and-hinge network. This version of the pebble game explicitly models the bars and hinges in the body-bar-and-hinge framework. Further discussion and demonstrations of the pebble game and its application can be found online [17].

**Largest rigid component sizes for genome-wide experiments**

Unaugmented | Augmented | |||
---|---|---|---|---|

Experiment | Graph size | Rigid component | Graph size | Rigid component |

GM06690 | 2,880 | 2,879 | 2,882 | 2,880 |

K562 | 2,874 | 2,874 | 2,882 | 2,874 |

Budding yeast | 3,172 | 2,880 | 4,193 | 2,959 |

Fission yeast | 611 | 590 | 619 | 606 |

### Effect of low-frequency and short-range interactions on rigid components

### Rigid components of a graph filtered for metric distances

An alternative way to filter the experimental data is to keep only those constraints that satisfy the metric properties under some frequency to distance mapping. Since chromosome conformation graphs are an aggregation of interactions from millions of cells, each with some conformation of chromatin, it is possible that dense subgraphs resulting from this aggregation are associated with proximities that contradict one another when attempting an embedding. For example, any clique with >4 nodes where the distance between any two nodes is required to be the same is impossible to embed in three dimensions. In general, the problem of determining whether a graph of distance constraints can be embedded in three dimensions is NP-hard [23]. However, one necessary condition for a graph to be embedded in three dimensions is that all interactions satisfy the metric properties.

We therefore also tested a filtering scheme that keeps only sets of edges that satisfy the triangle inequality. This is yet another stringent filtering applied to the data set to test for rigidity. Consider a chromosome conformation graph where weights on the edges are defined to be the distance as determined by a frequency to distance mapping [2–4]. We obtain the set of interactions {*u* *v*} in the subgraph with lengths equal to the weighted shortest path between *u* and *v*; this set satisfies the shortest path metric. The Duan et al. subgraph (FDR 0.01%) after this metric filtering still contains 3,525 vertices and 27,301 edges and one large rigid component with 2,987 vertices. Therefore, even after including only high-frequency, metric interactions, there is sufficient data to obtain a nondeformable embedding.

### Discussion

Microscopy data also confirms some observed properties for the rigid components in the Duan et al. and Tanizawa et al. data sets. At the 99.4% and 99.6% interaction frequency cutoffs, the larger chromosomes in budding yeast break apart into multiple large rigid components (Figure 4, right) with subtelomeric regions in different rigid components. This is consistent with the fact that the subtelomeric regions of chromosomes 4, 12, and 13 are known to be separated from one another and near the nucleolus and nuclear periphery [24, 25]. For chromosome 12 of budding yeast, a subtelomeric region containing ribosomal DNA close to the nucleolous is a part of its own rigid component even at a 98.8% interaction frequency cutoff [2]. For chromosome 1 of the fission yeast genome (interaction frequency cutoff 99.0%), the subtelomeric regions at each end are part of a single rigid component (the red region in Figure 5(B)) and these regions are also observed in close proximity to one another in microscopy experiments [3, 26].

To capture the space of possible structures, our rigid components algorithm can also be used as input to a recent technique that creates an ensemble of embeddings from chromosome conformation data [11]. Generating an ensemble of embeddings can be slow on large collections such as [1], and a potential speedup can be achieved by randomly permuting the edges of the input graph passed to the pebble game. This procedure samples the minimimally rigid subgraphs built with the pebble game. Although it is unlikely that any embedding represents a structure that existed for any particular cell, multiple minimally rigid structures can be used to determine whether there exist rigid substructures that are consistent across random samplings of the data. If these re-appearing substructures exist, then there is stronger evidence that there exist relatively fixed regions or ‘structural invariants’ which can be more confidently in analyzed spatially.

Finally, we find that random graphs produced by applying the configuration model [27] to a chromosome conformation graph generally contain large rigid subgraphs as well. This suggests that the degree distribution of the graphs in these cases are linked to their rigidity.

## Conclusions

Recent chromosome conformation experiments provide an abundance of data which, even after applying several filtering strategies, still result in rigid embeddings for most of the budding yeast, fission yeast, and human genomes. This conclusion is independent of any particular algorithm for embedding a structure. The genome-wide graphs we studied are composed of one large rigid component using fewer than 2% of the edges. Additionally, we find that short-range interactions are crucial for maintaining the large rigid component.

As data for studying the three-dimensional structure of genomes under a variety of conditions becomes increasingly available, restricting spatial analysis to the high-confidence regions of these structures ensures that conclusions drawn from the structures are not artifacts of a lack of sufficient constraints. The algorithm proposed here efficiently identifies non-deformable, rigid substructures within chromosome conformation graphs by using a variety of results from rigidity theory that guarantee the construction of rigid graphs from rigid subgraphs. Graph rigidity is well-suited to assess the quality of chromosome conformation data since the experiments do not currently provide precise distances between chromosome locations, and graph rigidity does not depend on the precise values of the distances in a graph of distance constraints. Before performing computationally expensive embeddings of chromsosome conformation data, pre-processing data with the technique described in Algorithm 2 using any choice of filter quickly isolates regions of the genome for which a sufficient number of constraints exist for an embedding and these subgraphs serve as a basis for embedding chromosome conformation graphs in three dimensions.

## Methods

### Chromosome conformation experiments

Recent experimental methods for chromosome conformation [1–4] operate simultaneously on a million or more eukaryotic cells at the same stage of the cell cycle. The cells are chemically treated so that fragments of DNA bound to pairs of proteins near one another can be sequenced. This procedure results in a set of paired-end reads that can be mapped to pairs of chromosome locations that are near one another.

**Chromosome conformation data sets**

Experiment | Genome | Resolution | Data provided |
---|---|---|---|

Lieberman-Aiden et al. | Human | 100,1000 | R,C,SN |

Duan et al. | Budding yeast | F,10 | R,C,SN,EN |

Tanizawa et al. | Fission yeast | 20 | R,SN,EN |

Bau et al. | Human chr. 16 | F | C |

### Chromosome conformation graphs

A *chromosome conformation graph* encodes experimentally determined constraints between positions along one or more chromatin fibers. Formally, a conformation graph is a graph *G*=(*V*,*E*) where *V* is the set of centers of experimentally observed DNA fragments or larger segments of DNA, and the set of edges *E* corresponds to observed interactions and their frequency. Three of the four data sets we consider provide frequency data directly (Table 2). Tanizawa et al. instead provide experimentally normalized data, effectively dividing the observed counts by 20. Additional statistical normalization methods vary across publications, and there is no consensus yet for which normalization is appropriate to use.

An *augmented chromosome conformation graph* contains the vertices and edges of a chromosome conformation graph, but in addition contains vertices for chromosome fragments that were not observed to have any interaction partners and also includes edges connecting fragments that are adjacent to each other in the genome. Hence, the chromosome conformation graph contains only constraints measured by the experiments, while the augmented graph additionally contains a path representing each chromatin strand (Figure 3). The augmented graph explicitly incorporates the linear nature of the genome as packed chromatin [28]. Various methods to embed chromosome conformation data in three dimensions incorporate this type of constraint [2–4].

**Summary of chromosome conformation graphs**

Experiment | # Vertices | Max intra-chromosomal | Max inter-chromosomal |
---|---|---|---|

frequency | frequency | ||

Lieberman-Aiden et al. GM06690 | 2,882 | 29,931 | 6,068 |

Lieberman-Aiden et al. K562 | 2,882 | 41,124 | 3,331 |

Duan et al. | 4,193 | 4,683 | 107 |

Tanizawa et al. | 619 | 35.25 | 13.75 |

Bau et al. GM12878 | 55 | 5,823 | - |

Bau et al. K562 | 55 | 13,686 | - |

### Preprocessing conformation graphs

- 1.
Frequency: remove

*x*% of the lowest-frequency interactions. Existing filtering schemes keep the frequently occuring interactions while removing transient, potentially noisy ones. - 2.
Genomic distance: remove all interactions with endpoints separated by fewer than

*x*kilobases. Existing filtering schemes also attempt to remove short-range interactions that may be a result of experimental noise. - 3.
Metric distance: remove all interactions that do not satisfy metric properties. Since existing embedding methods all employ a frequency-to-distance mapping [2–4], it is reasonable to remove constraints that violate metric properties of a graph. The set of interactions {

*u**v*} in the subgraph with lengths equal to the weighted shortest path between*u*and*v*satisfies the shortest path metric. To obtain this set, we calculate the shortest paths between the source and target nodes for all edges in the graph and keep only those edges whose length equals the shortest path length [29].

While we consider a variety of cases and data sets, to obtain an idea for the edge set sizes, the frequency cutoffs we consider for the genome-wide experiments are: 98.8, 99.0, 99.2, and 99.4%. For Duan et al. the edge set sizes for each respective cutoff are: 35892, 29910, 23928, 17946, and 11964. the edge set sizes for Lieberman-Aiden et al. are: 28426, 23921, 19328, 14689, and 10096 (healthy), 26798, 22681, 18471, 14053, and 9485 (cancer); the edge set sizes for Tanizawa et al. are: 2167, 1806, 1445, 1084, and 723.

Filtering methods 1 and 2 above allow us to systematically study the affect of removing low frequency interactions and short-range interactions so that we can identify which of these features contributes to the creation of rigid components (existing filtering methods combine the two properties making it difficult to isolate the cause of rigid components). Filtering method 3 is relevant since metrically consistent, low-error embeddings are desireable when embedding chromosome conformation data.

## Declarations

### Acknowledgements

The authors thank Jeremy Bellay, Darya Filippova, Michelle Girvan, Shridhar Hannenhalli, Guillaume Marçais, Rob Patro, Cara Treglio, Praveen Vaddadi, and Hao Wang for useful discussions.

This work was supported by the National Science Foundation [CCF-1053918, EF-0849899, and IIS-0812111 to C.K.]; the National Institutes of Health [1R21AI085376 to C.K.]; and a Univeristy of Maryland Institute for Advanced Studies New Frontiers Award to C.K.

## Authors’ Affiliations

## References

- Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, Sandstrom R, Bernstein B, Bender MA, Groudine M, Gnirke A, Stamatoyannopoulos J, Mirny LA, Lander ES, Dekker J: Comprehensive mapping of long-range interactions reveals folding principles of the human genome.
*Science*2009, 326(5950):289–293. 10.1126/science.1181369PubMed CentralView ArticlePubMedGoogle Scholar - Duan Z, Andronescu M, Schutz K, McIlwain S, Kim YJ, Lee C, Shendure J, Fields S, Blau CA, Noble WS: A three-dimensional model of the yeast genome.
*Nature*2010, 465(7296):363–367. 10.1038/nature08973PubMed CentralView ArticlePubMedGoogle Scholar - Tanizawa H, Iwasaki O, Tanaka A, Capizzi JR, Wickramasinghe P, Lee M, Fu Z: Noma Ki: Mapping of long-range associations throughout the fission yeast genome reveals global genome organization linked to transcriptional regulation.
*Nucleic Acids Res*2010, 38(22):8164–8177. 10.1093/nar/gkq955PubMed CentralView ArticlePubMedGoogle Scholar - Baù D, Sanyal A, Lajoie BR, Capriotti E, Byron M, Lawrence JB, Dekker J, Marti-Renom M: The three-dimensional folding of the α-globin gene domain reveals formation of chromatin globules.
*Nat Struct & Mol Biol*2010, 18: 107–114.View ArticleGoogle Scholar - Kalhor R, Tjong H, Jayathilaka N, Alber F, Chen L: Genome architectures revealed by tethered chromosome conformation capture and population-based modeling.
*Nat Biotech*2012, 30: 90–98.View ArticleGoogle Scholar - Sexton T, Yaffe E, Kenigsberg E, Bantignies F, Leblanc B, Hoichman M, Parrinello H, Tanay A, Cavalli G: Three-dimensional folding and functional organization principles of the Drosophila genome.
*Cell*2012, 148(3):458–472. 10.1016/j.cell.2012.01.010View ArticlePubMedGoogle Scholar - Umbarger MA, Toro E, Wright MA, Porreca GJ, Baù D, Hong SH, Fero MJ, Zhu LJ, Marti-Renom MA, McAdams HH, Shapiro L, Dekker J, Church GM: The three-dimensional architecture of a bacterial genome and its alteration by genetic perturbation.
*Mol Cell*2011, 44(2):252–264. 10.1016/j.molcel.2011.09.010View ArticlePubMedGoogle Scholar - Fraser J, Rousseau M, Shenker S, Ferraiuolo MA, Hayashizaki Y, Blanchette M, Dostie J: Chromatin conformation signatures of cellular differentiation.
*Genome Biol*2009, 10(4):R37. 10.1186/gb-2009-10-4-r37PubMed CentralView ArticlePubMedGoogle Scholar - Marti-Renom MA, Mirny LA: Bridging the Resolution Gap in Structural Modeling of 3D Genome Organization.
*PLoS Comput Biol*2011, 7(7):e1002125. 10.1371/journal.pcbi.1002125PubMed CentralView ArticlePubMedGoogle Scholar - Fudenberg G, Getz G, Meyerson M, Mirny L A: High order chromatin architecture shapes the landscape of chromosomal alterations in cancer.
*Nat Biotechnol*2011, 29: 1109–1113. 10.1038/nbt.2049PubMed CentralView ArticlePubMedGoogle Scholar - Rousseau M, Fraser J, Ferraiuolo M, Dostie J, Blanchette M: Three-dimensional modeling of chromatin structure from interaction frequency data using Markov chain Monte Carlo sampling.
*BMC Bioinf*2011, 12: 414. 10.1186/1471-2105-12-414View ArticleGoogle Scholar - Hendrickson B: Conditions for unique graph realizations.
*SIAM J Comput*1992, 21: 65–84. 10.1137/0221008View ArticleGoogle Scholar - Chubynsky M, Thorpe M: Algorithms for three-dimensional rigidity analysis and a first-order percolation transition.
*Phys Rev E*2007, 76(4):1–25.View ArticleGoogle Scholar - Jacobs D, Thorpe M: Generic rigidity percolation: the pebble game.
*Phys Rev Lett*1995, 75(22):4051–4054. 10.1103/PhysRevLett.75.4051View ArticlePubMedGoogle Scholar - Jacobs D, Hendrickson B: An algorithm for two-dimensional rigidity percolation: the pebble game.
*J Comput Phys*1997, 137(2):346–365. 10.1006/jcph.1997.5809View ArticleGoogle Scholar - Lee A, Streinu I, Theran L: Finding and maintaining rigid components.
*17th Canadian Conference on Computational Geometry*2005, 1–4. [http://www.cccg.ca/]Google Scholar - Lee A, Theran L, Streinu I:
*Pebble games for rigidity*. 2010. [http://linkage.cs.umass.edu/pg/pg.html]Google Scholar - Whiteley W: Some matroids from discrete applied geometry.
*Contemporary Mathematics*1996, 197: 171–311.View ArticleGoogle Scholar - Gluck H: Almost all simply connected closed surfaces are rigid.
*Geometric Topology*1975, 438: 225–239. 10.1007/BFb0066118View ArticleGoogle Scholar - Whiteley W:
*An Introduction to Body-Bar Frameworks*. 2010. [http://www.maths.lancs.ac.uk/power/BodyBarLancaster.pdf]Google Scholar - Tay T: Rigidity of multi-graphs. I. Linking rigid bodies in n-space.
*J Comb Theory, Ser B*1984, 36: 95–112. 10.1016/0095-8956(84)90016-9View ArticleGoogle Scholar - Itai A, Rodeh M: Finding a minimum circuit in a graph. In
*Proceedings of the ninth annual ACM symposium on Theory of computing, STOC ’77*. New York, NY, USA: ACM; 1977:1–10.View ArticleGoogle Scholar - Saxe J: Embeddability of weighted graphs in k-space is strongly NP-hard.
*17th Allerton Conference in Communications, Control and Computing*1979, 480–489.Google Scholar - Therizols P, Duong T, Dujon B, Zimmer C, Fabre E: Chromosome arm length and nuclear constraints determine the dynamic relationship of yeast subtelomeres.
*Proc National Acad Sci U S A*2010, 107(5):2025–2030. 10.1073/pnas.0914187107View ArticleGoogle Scholar - Berger AB, Cabal GG, Fabre E, Duong T, Buc H, Nehrbass U, Olivo-Marin JC, Gadal O, Zimmer C: High-resolution statistical mapping reveals gene territories in live yeast.
*Nat Methods*2008, 5(12):1031–1037. 10.1038/nmeth.1266View ArticlePubMedGoogle Scholar - Cam HP, Sugiyama T, Chen ES, Chen X, FitzGerald PC, Grewal SIS: Comprehensive analysis of heterochromatin- and RNAi-mediated epigenetic control of the fission yeast genome.
*Nat Genet*2005, 37(8):809–819. 10.1038/ng1602View ArticlePubMedGoogle Scholar - Newman M E J: The structure and function of complex networks.
*SIAM Rev*2003, 45(2):58.Google Scholar - Bystricky K, Heun P, Gehlen L, Langowski J, Gasser SM: Long-range compaction and flexibility of interphase chromatin in budding yeast analyzed by high-resolution imaging techniques.
*Proc National Acad Sci U S A*2004, 101(47):16495–16500. 10.1073/pnas.0402766101View ArticleGoogle Scholar - Duggal G, Patro R, Sefer E, Wang H, Filippova D, Khuller S, Kingsford C: Resolving spatial inconsistencies in chromosome conformation data. In
*Proceedings of the 12th Workshop on Algorithms in Bioinformatics, Lecture Notes in Computer Science 7534*. Springer; 2012:288–300. [http://link.springer.com/chapter/10.1007/978–3-642–33122–0_23]Google Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.