Outline of the MRN algorithm and examples.
(A) A non-zero coverage window w containing two high-confidence transitions (hcTs, denoted by white circles and dashed gray lines) located in close proximity but marking two distinct protein binding sites is shown (gray rectangle). In this window, the observed total coverage of aligned PAR-CLIP reads is indicated by the solid black line, which exhibits two proximal peaks corresponding to the interaction sites. The MRN algorithm aims at resolving these binding sites by discriminating binding-dependent coverage fluctuations from noise and by subsequently using geometric properties of RBP binding sites to refine the cluster boundaries. First, all coverage fluctuations, i.e. positive and negative coverage differences (blue and orange triangles, respectively, with heights proportional to the magnitude of the coverage fluctuations) within w are computed and stored in the two vectors n
, respectively. These values are then used to learn a local threshold δ
(see Methods) that is applied to remove noise from the coverage function. Coverage fluctuations smaller than δ
(solid gray line) are discarded. Next, each hcT is processed separately. The values of all retained positive and negative coverage differences localizing upstream and downstream to the analyzed transition, respectively, are ranked. Rankings are stored in the rank vectors r
. Finally, all putative cluster boundaries (rectangles) are identified and their length is ranked. Each candidate cluster, represented by a rank vector summarizing coverage and cluster length rankings (e.g. (0,2,0)), is then evaluated and the optimal cluster (light blue) is identified (see Methods). (B) Clusters (blue rectangles) identified by the MRN algorithm within a complex coverage region of length 1.6 kb of chromosome 10, MOV10 data set. Positive and negative coverage differences are shown in red and blue, respectively. hcTs are indicated by vertical dashed lines. Cyan lines correspond to hcTs that solely localize within clusters identified by the MRN algorithm. Clusters identified using the CWT-based algorithm do not contain these sites. (C) AGO2 clusters identified within the 3’-UTR of the KLHL20 transcript. Each cluster contains ≥1 microRNA seed sequences of microRNAs expressed in HEK293 cells. The color scheme is the same as in B. (D) Same as C, but for the 3’-UTR of the RAB5A transcript.