cnvCurator is a Java utility designated to provide an interactive platform for conveniently reviewing and curating CNV predictions. There are two major components: 1) segment-central index and display of CNV calls for the purpose of manual review, and 2) editing the problematic CNV segments identified from manual review.
Segment-central index and display of CNV calls for manual review
The list of segments obtained from a given somatic CNV calling program will be indexed on the left of the main cnvCurator window. The segments can be generally classified as copy gain, loss or neutral (i.e., diploid normal without copy number variations), depending on the program’s threshold for signal difference between tumor and matched normal. We list all segments on the left of the main window so that users can not only review a segment of copy number gain or loss, but also examine whether a segment of copy neutral call might be false negative, or contain sub-segment of copy number gain/loss. For easy navigation, we have added a feature to display the segment of copy loss as blue, copy gain as red, and copy neutral as black.
For a given segment of interest, cnvCurator supports concurrent visualization of diverse CNV-related data types on the right of the main window, which include 1) the ideogram of currently displayed chromosome, 2) the genomic coordinates, 3) the sequencing read depth information for tumor and matched normal samples, 4) the logR ratio derived from read depth data, 5) BAF at germline SNV positions [20], 6) details of sequencing read alignment for tumor and matched normal samples, and 7) annotations of genomic features. These tracks provide both direct and indirect evidence from multiple angles to help the reviewer make decisions about the confidence of a given CNV call.
A number of navigating functions such as zoom in/out, moving around the genome, displaying detailed read alignment information, specifying alternative threshold for segment indexing, and changing of color scheme, are implemented in cnvCurator to assist CNV manual review and curation.
Navigate between segments
cnvCurator loads CNV segments into a panel docked at the left side of the application, and organizes the data into a tree structure with segments from the same chromosomes as leaves under the same node. By clicking any segment, the viewer switches to the corresponding genomic region with extra adjacent context displayed. The fraction of flanking regions can be specified by the user.
Display breakpoint windows
Real CNV boundaries are often associated with structural variations which can be detected by split reads or discordant read pairs [21–24]. Therefore, the presence of such reads can be used as supporting information to distinguish real CNVs from false calls caused by uneven sequencing read depth. In addition to the traditional alignment tracks which allow zoom-in/out visualization, cnvCurator provides advanced options to display such signatures in a user-friendly way. For example, once a CNV segment is selected, two pop-up windows for the two breakpoints (±100 bps by default) of the segment will be automatically displayed. Users can zoom-in/out the breakpoint pop-up windows, and simultaneously display multiple pop-up windows in adjacent regions to examine alternative breakpoints.
Edit the problematic CNV segments identified from manual review
An important and unique feature of cnvCurator is the ability to dynamically refine the results during the manual review process and generate a set of manually curated CNV calls. Mis-segmentation is a common problem for CNV detection methods, which will greatly affect downstream interpretation and application of CNV calls. It could be the missing of true breakpoints, which could incorrectly merge two distinct segments into one or miss the genuine segments. It could also be introducing false breakpoints, which could incorrectly separate one segment into two parts, create segments with incorrect boundary, or create entirely false segments. Once the segmentation errors are spotted during the manual review, it would be handy to be able to fix them and have the segment list updated. cnvCurator provides several functionalities to assist the manual curation procedure, such as removing a false/spurious call, adding a missed/genuine segment, and correcting the breakpoint (s) of a segment with incorrect boundary. These can be achieved through merging two adjacent segments and/or splitting a segment. Merging adjacent segments is less complex, while splitting a segment requires knowing the exact location of the new breakpoint. Users can use cnvCurator to narrow down potential breakpoint location in the current window by identifying the position which minimizes the variations of logR ratio within the two segments. When there are multiple break points in the current window (segment), the user can recursively apply the splitting function within each sub-window (segment) to generate multiple new segments. We will implement the function of simultaneously splitting multiple breakpoints in the future version once we can find a solid statistical model for this problem. The user can adjust the range of the current window and examine supporting features (e.g., reads alignment signature) to refine the searching. The segment texts on the left of the main window will be automatically updated once the segment edition (i.e., through splitting and/or merging the segments) is performed, and we have provided an “undo” function to these operations in the software. The updated segment list can be saved to a file and reloaded in the segmentation panel. The detailed tutorial to perform CNV editing during manual review process can be found in the project website. It should be noted that the process of merging two adjacent segments and/or splitting a segment with cnvCurator is not automated, and these editing functions are provided to assist in the manual review process.