 Research
 Open Access
 Published:
DCJRNA  double cut and join for RNA secondary structures
BMC Bioinformatics volume 18, Article number: 427 (2017)
Abstract
Background
Genome rearrangements are essential processes for evolution and are responsible for existing varieties of genome architectures. Many studies have been conducted to obtain an algorithm that identifies the minimum number of inversions that are necessary to transform one genome into another; this allows for genome sequence representation in polynomial time. Studies have not been conducted on the topic of rearranging a genome when it is represented as a secondary structure. Unlike sequences, the secondary structure preserves the functionality of the genome. Sequences can be different, but they all share the same structure and, therefore, the same functionality.
Results
This paper proposes a double cut and join for RNA secondary structures (DCJRNA) algorithm. This algorithm allows for the description of evolutionary scenarios that are based on secondary structures rather than sequences. The main aim of this paper is to suggest an efficient algorithm that can help researchers compare two ribonucleic acid (RNA) secondary structures based on rearrangement operations. The results, which are based on real datasets, show that the algorithm is able to count the minimum number of rearrangement operations, as well as to report an optimum scenario that can increase the similarity between the two structures.
Conclusion
The algorithm calculates the distance between structures and reports a scenario based on the minimum rearrangement operations required to make the given structure similar to the other. DCJRNA can also be used to measure the distance between the two structures. This can help identify the common functionalities between different species.
Background
DNA is a biological blueprint that a living organism must have to exist and remain functional. RNA holds the guidelines for this blueprint. RNA is responsible for transferring the genetic code from the nucleus to the ribosome to build proteins. It is identified as a series of letters with bases {A, C, G, U}. RNA’s secondary structure is required to define the functionality of RNA molecules. In contrast to representing the genome as a sequence, representing it as a secondary structure provides more insight into the genome’s function. In this paper, RNA’s secondary structure is presented using a componentbased representation, which was recently proposed in 2011 [1]. In contrast to similarity between gene orders, identifying the similarity of functioning between two structures has a greater impact on comparing species. Comparing two species based on their secondary structures provides more information and reveals more accurate evolutionary scenarios [2]. Comparison of two species based on their secondary structures can also be combined with existing sequencebased algorithms to enhance sequencebased algorithms efficiency [3]. This helps create more accurate phylogenies [4].
The paper outline is as follows  the RNA secondary structure is presented using a componentbased representation. The researchers proceed to describe the measures that are used to determine the similarity between components of the given structures. Genome rearrangement in terms of sequences and its operations, sorting scenario, and distance measures are summarized. We then propose a DCJRNA rearrangement algorithm and explain it in detail. Two case studies using real data are presented, illustrating the detection and application of the proposed rearrangement operations for real RNA secondary structures. The results demonstrate that the proposed algorithm provides one evolutionary scenario that shows how to alter one structure to make it similar to the other or the same as the other. Preliminary work has been presented as a poster in [5].
RNA secondary structure componentbased representation
Badr and Turcotte [1] propose a componentbased structure to define interacting and noninteracting patterns as follows  the representation can be used to define interacting and noninteracting patterns for RNA secondary structures. A pattern (P = {p_{1}, p_{2}. .. p_{m}}) is defined by its subpatterns (P_{i}, 0 < i < m). Each subpattern is defined by its length and intermolecular (INTERM) and intramolecular (INTRAM) components. For noninteracting patterns, there are no INTERM components. These components are defined by their opening bracket (OB), closing bracket (CB), length, and relative locations within the subpatterns. In the INTERM component, OB and CB are located in two different subpatterns. In the INTRAM component, OB and CB are located in the same subpattern. In the INTERM component, OB and CB must be in different subpatterns, which suggests that there must be at least two subpatterns to have INTERM components. OB is located in p_{i}, and CB is located in another subpattern (p_{j}), where j > i and 1 ≤ j ≤ m. OB and CB are defined by their lengths and locations relative to the beginning of p_{i}. Thus, INTERM = {OB, CB, j, len}. In INTRAM components, OB and CB have to be in the same subpattern, which indicates that there must be at least one subpattern to have INTRAM components. OB and CB are located in p_{i}, where 1 ≤ i ≤ m. OB and CB both are defined by their location and length. Therefore, INTRAM = {OB, CB, len}. Figure 1 shows an example of a noninteracting pattern.
Similarities between two RNA secondary structures (Alignment distance)
Badr and AlTurki [6] propose a similarity measure based on aligning two secondary structures that are presented using a componentbased representation. The algorithm extracts the features of each component, which are OB, CB, and length. The similarity between two structures depends on the component’s position, full length, and stem length. These measures are used in the new proposed algorithm. The equations that are applied to calculate the similarity between two components, a_{i} in structure A and b_{j} in structure B, d(f_{ai}, f_{bj}), can be found in [6]. The similarity measure between two components is used to calculate the dynamic programming matrix using the method proposed by Needleman and Wunsch [7]. The alignment score between two structures is calculated using Eq. 1, while the percentage of the similarity between two structures is calculated using Eq. 2 [6].
where Max(a, b) = Max {Score(a, a), Score(b, b.)}
RSmatch [8], which is another alignment distance, is a tool for aligning RNA secondary structures and is also used for motif detection. Determined with widely used algorithms for RNA folding, it decomposes the secondary structure of RNA into a set of atomic structural components. These components are further organized using a tree model to capture the structural particularities. RSmatch can find the optimal global or local alignment between two RNA secondary structures using two scoring matrices  one for singlestranded regions and the other for doublestranded regions. Jiang et al. [9] define the alignment of trees as a measure of similarity between two secondary structures in tree representation.
Sequencebased genome rearrangements
Genomes can be modeled using permutations. Each gene can be allocated once at the genome and assigned a unique number. A gene is modeled by a signed integer when the gene strand is known to biologists [10, 11].
Rearrangement operations
Two genomes can have the same number of genes but may have different orders. A sequence of operations can be applied to change one genome into another. The most common rearrangement events or operations are as follows [12, 13]:

Inversion  This reverses the orientation of a gene (or a group of genes).

Transposition  This changes the order of a gene (or a group of genes). In other words, if the gene is located in one index, it is moved to another index.

Gain  This adds a gene (or a group of genes) to a genome.

Loss  This removes a gene (or a group of genes) from a genome.

Duplication  This duplicates a specific gene (or a group of genes) within a genome.
Distance measures
The distance between two genomes is the minimum number of events or operations that are required to transform one genome into the other. Yancopoulos et al. [14] first proposed double cut and join (DCJ) operations. A DCJ operation consists of cutting a genome at two distinct positions and joining the four resulting open ends in a different way. Since a gene (e.g., a) has an orientation, its two ends, namely the extremities, can be distinguished and denoted as at (tail) and ah (head). An adjacency in a genome is either the extremity of a gene that is adjacent to one of its telomeres or a pair of consecutive gene extremities in one of its chromosomes.
DCJ distance consists of two operations  cut, which cuts an adjacency in two telomeres, and join, which connect two telomeres to form an adjacency. A model in which any operation consists of two cuts followed by two joins on the extremities is considered a DCJ operation [15]. DCJ allows for multichromosomal genomes with both circular and linear chromosomes.
DCJ distance can be easily calculated with the assistance of an adjacency graph, which is a twopart multigraph in which each partition corresponds to the set of adjacencies of one of the two input genomes. An edge connects the same extremities of genes in both genomes. In other words, a onetoone correspondence exists between the set of edges in an adjacency graph and the set of gene extremities. Vertices have degree one or two. Therefore, an adjacency graph is a collection of paths and cycles. DCJ distance can be define as follows:
In this equation, c (G_{1}, G_{2}) is the number of cycles, and p (G_{1}, G_{2}) is the number of odd paths in the adjacency graph.
Sorting scenario
One related issue is identifying a sorting scenario for the given distance, which provides the operations themselves. A single or number of possible solutions or sorting sequences can be found.
Bergeron et al. [11] provide an algorithm to obtain the DCJ operation in O(n) time (Algorithm 1). Mathematically, sorting using DCJ operations is simple. As with DCJ distance, DCJ operations take two adjacencies or telomeres, cut the adjacencies/telomeres, and create new adjacencies or telomeres. There are several DCJ operation types. A DCJ operation may create two adjacencies by cutting two adjacencies. A DCJ operation may also create an adjacency and telomere by cutting an adjacency and removing a telomere. In addition, a DCJ operation can consist of forming two telomeres by cutting an adjacency. Finally, DCJ operations may create an adjacency by removing two telomeres.
Method: DCJRNA algorithm
The RNA componentbased rearrangement algorithm uses a componentbased representation [2] that allows for the unique description of any RNA pattern and shows the main features of the pattern efficiently. The proposed algorithm also uses the DCJ algorithm to describe rearrangement operations. It uses classical operations (inversions, translocations, fissions, fusions, transposition, and block interchanges) with a single operation and provides multichromosomal genomes. The DCJRNA algorithm (Algorithm 2) is described next.
The DCJRNA algorithm completes three main steps:

Step 1  Alignment of similar components based on their component lengths and stem lengths.
In this step, calculate the similarity between components in terms of their component lengths and stem lengths [6]. Similar components are assigned together, beginning with those with the greatest similarity. The similarity measure that is used in this step is as follows 
Then, a matrix (m × n) is built; the entries are the component similarities in terms of component length and stem length. The rows represent the components of the first structure, and the columns represent the components of the second structure. We then search for the maximum entry (greedy) in the matrix. If it is greater than the threshold enhancement (ε) (the minimum similarity score between two components), the components are assigned together, and the corresponding row and column are deleted. If maximum similarity appears in more than one entry, the position similarity is compared between those components only and the assigned components with the greatest similarity in position. Table 1 shows the matrix structure.

Step 2  Permutation generation
In this step, a corresponding permutation is generated for each of the two structures. This is completed by determining the components to be inserted or deleted, as well as the order of the similar components using the alignment that is generated from step 1. A twodimensional array of 3 Χ in size (the maximum number of components in A or B + 1) is constructed and identified as SortArray. The first row contains the desired structure, the second row contains the deleted components from the actual structure, and the third row contains the inserted components from the desired structure. An index value of zero for the first row is reserved for the number of components in the actual structure. An index value of zero for the second row is reserved for the number of deleted components. For third row, an index of zero is reserved for the number of components. Table 2 shows the SortArray structure.

Step 3  Applying the DCJ algorithm.
The component numbers are used to determine the permutations in the DCJ algorithm [16]. Two permutations are provided. The first is for the given or actual permutation, and the second permutation is for the desired one.
Each permutation has two chromosomes 

For the first permutation  The first chromosome is the actual structure of the components, and the second chromosome is the inserted components.

For the second permutation  The first chromosome is the desired structure, and the second chromosome consists of the deleted components.
Each permutation is represented by its adjacencies and telomeres. Finally, the DCJ algorithm is applied to the first and second permutations as input.
The DCJ algorithm [17] is modified in the way that it is applied to sort the first chromosome from the second permutation; this changes the first chromosome of the first permutation. The second chromosome of the second permutation consists of the deleted components, which do not need to be sorted.
Example
In order to clarify the steps of the algorithm, real RNA secondary structures from the Genomic tRNA Database [18] are used as examples. The first structure is for E. coli tRNA for leucine (A), while the other structure is for E. coli tRNA for alanine (B) (see Fig. 2).
The two structures are presented using a componentbased representation 

A = (85, INTERM = {}, INRAM = {a_{1} = (1, 75, 7), a_{2} = (10, 24, 3), a_{3} = (28, 40, 5), a_{4} = (46, 53, 3), a_{5} = (58, 70, 5)})

B = (76, INTERM = {}, INTRAM = {b_{1} = (1, 66, 7), b_{2} = (10, 22, 4), b_{3} = (27, 39, 5), b_{4} = (49, 61, 5)})

The measure weights are equal to one, and threshold enhancement (ε) is equal to 0.5.

Step 1  Alignment of similar components based on their component lengths and stem lengths.
In this step, the similarity between components is calculated in terms of their component lengths and stem lengths. Similar components are assigned together, beginning with those with the greatest similarity (greedy).
In this example, the similarity between components is shown in the matrix in Table 3. First, the maximum number is one. The components are assigned together, and the row and column are removed. In this case, d_{1} (a_{3}, b_{3}) and d_{1} (a_{3}, b_{4}) are at the same position, so the nearest components are assigned in terms of their position (a_{3} and b_{3}). The same case applies for d_{1} (a_{5}, b_{3}) and d_{1} (a_{5}, b_{4}). The maximum value, which is 0.83, is searched for once again. Then, a_{2} and b_{2} are assigned, and the row and column are deleted. The next value is 0.39, which is less than the threshold enhancement (ε) value, suggesting that b_{1} must be inserted and that a_{1} must be deleted. Then, a_{4} is deleted because no other components remain from the second structure.

Step 2  Permutation generation
In this step, similar components are mapped according to the process outlined in the previous step. The inserted components and deleted components are then identified (Table 4).

Step 3  Applying the DCJ algorithm.
The permutations are constructed to apply the DCJ algorithm. The first permutation is chr_{1} = {1, 2, 3, 4, 5} and chr_{2} = {6}. The permutations are represented as a sequence of numbers. To differentiate between the components of the first structure and the second one, the researchers represent the second structure’s component i as i + N, where N equals the number of components in the first structure. The second permutation is chr_{1} = {6, 2, 3, 5} and chr_{2} = {1, 4}.
Then, each genome is represented with its adjacencies and telomeres to ensure that the DCJ algorithm can be applied; the first and second permutations are as follows:

The first permutation is: {{1 t}, {1 h, 2 t}, {2 h, 3 t}, {3 h, 4 t}, {4 h, 5 t}, {5 h}, {6 t}, {6 h}}

The Second permutation is: {{6 t}, {6 h, 2 t}, {2 h, 3 t}, {3 h, 4 t}, {4 h, 5 t}, {5 h}, {1 t}, {1 h, 4 t}, {4 h}}
In addition, {1 t}, {1 h, 4 t}, and {4 h} will not be sorted because they are included in the second chromosome. After applying the DCJ algorithm, the number of DCJ operations (3) is retrieved, as well as the sorting scenario is:

{{{6 t}, {1 h, 2 t}, {1 t}, {2 h, 3 t}, {3 h, 4 t}, {4 h, 5 t}, {5 h}, {6 h}},

{{6 t}, {6 h, 2 t}, {1 h}, {1 t}, {2 h, 3 t}, {3 h, 4 t}, {4 h, 5 t}, {5 h}},

{{6 t}, {6 h, 2 t}, {1 h}, {1 t}, {2 h, 3 t}, {3 h, 5 t}, {4 h, 4 t}, {5 h}}}.
Figure 3 shows the given structures following each rearrangement operation, as well as the similarity score with the original structure after applying each rearrangement operation. It also shows the final desired operation.
To demonstrate the effect of the DCJRNA on increasing the similarity between the structures, the CompPSA algorithm [6] is used to calculate the similarity between the structures before and after applying the algorithm. The similarity between the structures is 42% before applying any changes and increases to 94% after applying the DCJRNA algorithm (Fig. 4).
Results and discussion
To test and validate the DCJRNA algorithm, extensive experiments are conducted, three experiments are applied to three different datasets.
Datasets
There are three different datasets  adjust dataset, accuracy dataset and scalability dataset. In this section, each dataset is described in detail.
Adjust dataset
This dataset consists of three real RNA structures named A, B and C shown in Fig. 5 where selected from the NCBI GenBank [16]. it is used to determine the best threshold enhancement (ε) value. There are two cases for RNA similarities. Dissimilar sequences and exact/approximate similar structures, structures A and B are used. In other case, dissimilar structures and exact/approximate similar sequences, structures A and C are used.
Accuracy dataset
The accuracy dataset is used to calculate the performance and accuracy of the DCJRNA algorithm using different RNA structure sizes. This dataset consists of three pairs of RNA structures that are chosen from the GenBank [19] and Rfam database [20] and differ in size. The first pair of RNA structures consists of two small RNA structures; named D and E, as shown in Fig. 6.
The second pair consists of two medium RNA structures; named F and G, as shown in Fig. 7.
The third pair consists of two large RNA structures; named H and I, as shown in Fig. 8.
Scalability dataset
The scalability dataset is used to calculate the scalability of the time and memory performance of the DCJRNA algorithm using different RNA structure sizes. This dataset consists of 11 RNA structures based on the first RNA structure, A, in the adjust dataset. Then the second structure is a duplicate of the first one, the third structure is a duplicate of the second one, and so on. The RNA structures’ numbers, names, sizes, and number of components are shown in Table 5. The first six RNA structures (J, K, L, M, N, and O) are shown in Fig. 9.
Experiments
Three experiments are conducted  threshold adjustment, performance accuracy, and time and memory performance experiments, the experiments are obtained using real and simulated data in [19].
Threshold adjustment experiment
Threshold adjustment experiments are conducted to determine the best threshold enhancement (ε) value that gives the minimum number of rearrangement operations to make the RNA structures exactly the same or approximately similar.
Experiment setup
The used dataset is the adjust dataset, while fixed parameters are W_{P} equals 0 and W_{cl} and W_{sl} equal 1. Experiments are conducted for 10 values of threshold enhancement (ε) from 0 to 1.
Experiment results
We change the value of the threshold enhancement (ε) from 0.0, 0.1, 0.2, … 1.0 and obtain the result shown in Table 6 for both cases  similar structures with dissimilar sequences and similar structures with dissimilar sequences. As illustrated in Table 7, when the threshold enhancement (ε) equals 1.0, it means that the RNA structures are exactly similar but the number of the rearrangement operations is greater than the other values. On the other side, when threshold enhancement (ε) equals 0.0, it means that when the desired structure has less than or equal number of components as compared to the given structure, the order of the components is changed, and no components are added or deleted.
From results, it can be seen that when the structures are similar, the best threshold enhancement (ε) equals 0.6, because of the similarity between structures and the number of rearrangement operations is reasonable; the structures after sorting for each threshold enhancement (ε) are shown in Fig. 10. For the same reason, when the structures are dissimilar, the best threshold enhancement (ε) equals 0.8.
Performance accuracy experiment
The performance accuracy experiment is conducted to show the accuracy of the DCJRNA algorithm with different RNA sizes. To test the effect of the DCJRNA algorithm and calculate the similarity between structures, the CompPSA algorithm [6] is used.
Experiment setup
The dataset used is accuracy dataset. Since all three RNA structures pairs are similar in their structures and dissimilar in their sequences, the threshold enhancement (ε) equals 0.6 and fixed parameters are W_{P} equals 0 and W_{cl} and W_{sl} are equal to 1.
Experiment results
DCJRNA was applied to three pairs of RNA structures  small, medium, and large RNA structures. Each experiment is discussed in detail in the following.
Small pairs of RNA structures

Step 1  Alignment of Similar Components Based on Component Lengths and Stem Lengths
Calculate the similarity between components as shown in Table 8. Then assign similar components together whenever the similarity between them is greater than or equal to threshold enhancement (ε), which is 0.6. Here, assign D_{1} with E_{1}, E_{4} with D_{3}, E_{2} with D_{2}, and add E_{3}.

Step 2  Permutation Generation
Construct SortArray, fill it as shown in Table 9. After that, construct the permutations to apply the DCJ algorithm.

Step 3  Apply the Double Cut and Join Algorithm
Construct the permutations to apply the DCJ algorithm. First permutation is (chr_{1} = {1,2,3} and chr_{2} = {6}). (Note  permutation represented as a sequence of numbers, to differentiate between the first structure’s components and the second structure’s components, we represent the second structure’s component i as i + N, where N equals the number of components in the first structure.) The second permutation is  (chr_{1} = {1,2,6,3} and chr_{2} = {}). Represent each genome with its adjacencies and telomeres to apply the DCJ algorithm, the first and second permutations are as follows:

The first permutation is: {{1 t}, {1 h, 2 t}, {2 h, 3 t}, {3 h}, {6 t}, {6 h}}

The second permutation is: {{1 t}, {1 h, 2 t}, {2 h, 6 t}, {6 h, 3 t}, {3 h}}
After applying the DCJ algorithm, we obtain the number of the DCJ operations, which is 2, and the sorting scenario is:

{{{1 t}, {1 h, 2 t}, {2 h, 3 t}, {3 h}, {6 t}, {6 h}}, {{1 t}, {1 h, 2 t}, {2 h, 6 t}, {6 h, 3 t}, {3 h}}}
The similarity between the given structures D and E is 58% before applying any changes, while it increases to 85% after applying the DCJRNA algorithm; see Fig. 11.
Medium pairs of RNA structures

Step 1  Alignment of Similar Components Based on Component Lengths and Stem Lengths
Calculate the similarity between components as shown in Table 10, then, assign F_{7} with G_{6}, F_{6} with G_{5}, F_{4} with G_{3}, F_{3} with G_{2}, F_{5} with G_{1}, delete F_{1}, delete F_{2,} and add G_{4}.

Step 2  Permutation Generation
Construct SortArray, fill it as shown in Table 11. After that, construct the permutations to apply the DCJ algorithm.

Step 3  Apply the Double Cut and Join Algorithm
Construct the permutations to apply the DCJ algorithm. The first permutation is (chr_{1} = {1, 2, 3, 4, 5, 6, 7} and chr_{2} = {11}). The second permutation is  (chr_{1} = {5, 3, 4, 11, 6, 7} and chr_{2} = {1, 2}). Represent each genome with its adjacencies and telomeres as:

The first permutation is: {{1 t}, {1 h, 2 t}, {2 h, 3 t}, {3 h, 4 t}, {4 h}, {5 t}, {5 h, 6 t}, {6 h, 7 t}, {7 h}, {11 t}, {11 h}}

The second permutation is: {{5 t}, {5 h, 3 t}, {3 h, 4 t}, {4 h, 11 t}, {11 h, 6 t}, {6 h, 7 t}, {7 h}, {1 t}, {1 h, 2 t}, {2 h}}
After applying the DCJ algorithm, we obtain the number of the DCJ operations, which is 4, and the sorting scenario is:

{{{1 t}, {1 h, 2 t}, {2 h, 3 t}, {3 h, 4 t}, {4 h}, {5 t}, {5 h, 6 t}, {6 h, 7 t}, {7 h}, {11 t}, {11 h}},

{{1 t}, {1 h, 2 t}, {2 h, 6 t}, {3 h, 4 t}, {4 h}, {5 t}, {5 h, 3 t}, {6 h, 7 t}, {7 h}, {11 t}, {11 h}}

{{1 t}, {1 h, 2 t}, {2 h, 6 t}, {3 h, 4 t}, {4 h, 11 t}, {5 t}, {5 h, 3 t}, {6 h, 7 t}, {7 h}, {11 h}}

{{1 t}, {1 h, 2 t}, {2 h}, {3 h, 4 t}, {4 h, 11 t}, {5 t}, {5 h, 3 t}, {6 h, 7 t}, {7 h}, {11 h, 6 t}}}
The similarity between the given structures F and G is 49% before applying any changes, while it increases to 94% after applying the DCJRNA algorithm; see Fig. 12.
Large pairs of RNA structures

Step 1  Alignment of Similar Components Based on Component Lengths and Stem Lengths
Calculate the similarity between components as shown in Table 4.7, then, assign H_{1} with I_{2}, H_{2} with I_{3}, H_{3} with I_{4}, H_{4} with I_{5}, H_{5} with I_{6}, H_{6} with I_{7}, H_{7} with I_{8}, H_{8} with I_{9}, H with I_{10}, H_{10} with I_{11}, H_{11} with I_{12,} and insert I_{1}.

Step 2  Permutation Generation
Construct SortArray fill it as shown in Table 12. After that, construct the permutations to apply the DCJ algorithm.

Step 3  Apply the Double Cut and Join Algorithm
Construct the permutations to apply the DCJ algorithm. The first permutation is (chr_{1} = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11} and chr_{2} = {12}). The second permutation is  (chr_{1} = {12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11} and chr_{2} = {}). Represent each genome with its adjacencies and telomeres to apply the DCJ algorithm, as the following:

The first permutation is: {{1 t}, {1 h, 2 t}, {2 h, 3 t}, {3 h, 4 t}, {4 h, 5 t}, {5 h, 6 t}, {6 h, 7 t}, {7 h, 8 t}, {8 h, 9 t}, {9 h, 10 t}, {10 h, 11 t}, {11 h}, {12 t}, {12 h}}

The second permutation is: {{12 t}, {12 h, 1 t}, {1 h, 2 t}, {2 h,3 t}, {3 h, 4 t}, {4 h, 5 t}, {5 h, 6 t}, {6 h, 7 t}, {7 h, 8 t}, {8 h, 9 t}, {9 h, 10 t}, {10 h, 11 t}, {11 h}}
After applying the DCJ operations, we get the number of the DCJ algorithm, which is 2, and the sorting scenario is:

{{{12 t}, {1 t}, {1 h, 2 t}, {2 h, 3 t}, {3 h, 4 t}, {4 h, 5 t}, {5 h, 6 t}, {6 h, 7 t}, {7 h, 8 t}, {8 h, 9 t}, {9 h, 10 t}, {10 h, 11 t}, {11 h},{12 h}},

{{12 t}, {12 h, 1 t}, {1 h, 2 t}, {2 h,3 t}, {3 h, 4 t}, {4 h, 5 t}, {5 h, 6 t}, {6 h, 7 t}, {7 h, 8 t}, {8 h, 9 t}, {9 h, 10 t}, {10 h, 11 t}, {11 h}}}
The similarity between the given structures H and I is 84% before applying any changes, while it increases to 91% after applying the DCJRNA algorithm; see Fig. 13.
Time & Memory performance experiment
The time and memory performance experiment is conducted to test the performance of the DCJRNA algorithm using different RNA structure sizes.
Experiment setup
The scalability dataset is used, while fixed parameters W_{P} equals 0 and W_{cl} and W_{sl} are equal to 1. Threshold enhancement (ε) equals 0.6 since structures are similar. The two structures in each experiment are identical which means the similarity between them is 100%.
Experiment results
Consider the maximum number of components to be N; the time complexity of step 1 is O(N log N) for the worst case. Each time we have to search for the maximum value for N values then discard the row and column related to maximum value, as a result, the next search is applied to (N1) components and so on. The time complexity of the second step is O(N), since this step determines the inserted components and the deleted components. The algorithm moves through the entries only once to fill SortArray in which they are all of size N. For step three, the time complexity is O(N) since the DCJ algorithm is used. Therefore, the worst time for the entire algorithm is O(N log N). Table 13 and Fig. 14 confirm the time performance analysis empirically using the scalability dataset. The space requirement for the first step is O(N ^{2} ) when the same number of components are present. For the second step, the memory takes O(3 N) for SortArray. For the third step, the space of memory is O(2 N). Hence, the total space requirement for DCJRNA algorithm is O(N ^{2} ). Table 13 shows time and memory performance results from this experiment and the corresponding graph representation (Fig. 14).
Conclusion
The DCJRNA algorithm is proposed and is able to describe the evolutionary scenarios that are based on rearrangements of secondary structures rather than sequences. The DCJRNA algorithm is optimal. Since RNA secondary structures reveal more functionality, this algorithm can help in the comparison between the functionality of structures. Real data is used to illustrate the details of the proposed algorithm. It demonstrates that the algorithm is able to detect the minimum number of rearrangement operations in order to make one structure more similar to the other. A rearrangement scenario increases similarity between the first structure and any other structure. This creates an ideal framework for applying rearrangement operations to secondary structures rather than sequences.
The algorithm is applied to noninteracting patterns only. Therefore, future work should extend the algorithm to consider interacting RNA patterns. In addition, the researchers would like to explore other welldefined structures, such as chemical structures, and investigate the application of a similar approach that can define a scenario for changing one structure into another structure. Using the DCJRNA approach, we would also like to develop a tool that can help biologists compare RNA structures to folded RNA structures that are based on the corresponding RNA sequence. This tool, which is unavailable, would be ideal for biologists, as suggested at the RECOMBCG conference in 2014.
References
Badr G, Turcotte M. Componentbased matching for multiple interacting RNA sequences. In: 7th International Conference on Bioinformatics Research and Application. Berlin, Heidelberg; 2011. p. 73–86.
Gesell T, Schuster P. Phylogeny and evolution of RNA structure. Methods Mol Biol. 2014;1097:319–78.
Shang L, Gardner D, Xu W, Cannone J, Miranker D, Ozer S, Gutell R. Two accurate sequence, structure, and phylogenetic templatebased RNA alignment systems. BMC Syst Biol. 2013;7(4):1–15.
Keller A, Förster F, Müller T, Dandekar T, Schultz J, Wolf M. Including RNA secondary structures improves accuracy and robustness in reconstruction of phylogenetic trees. Biol Direct. 2010;5:1–12.
Badr G, Alaqel H. Genome rearrangement for RNA secondary structure using a componentbased representation  An initial framework. New York: Poster presentation at RECOMBCG; 2014.
Alturki A, Badr G, Benhidour H. Componentbased pairwise RNA secondary structure alignment algorithm, Master Project. Riyadh: King Saud University; 2013.
Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970;48(3):443–53.
Liu J et al. A method for aligning RNA secondary structures and its application to RNA motif detection. BMC Bioinformatics. 2005;6–89. doi:10.1186/14712105689.
Jiang T, Wang L, Zhang K. Alignment of trees  An alternative to tree edit. In: Crochemore M, Gusfield D, editors. Combinatorial Pattern Matching. Berlin, Heidelberg: Springer; 1994. p. 75–86.
Hannenhelli S, Pevzner PA. Transforming cabbage into turnip (polynomial algorithm for sorting signed permutations by reversals. In: 27th Annual ACM Symposium on the Theory of Computing; 1995. p. 178–89.
Bergeron A, Mixtacki J, Stoye J. A unifying view of genome rearrangements. In: B√°cher P, Moret BE, editors. Algorithms in Bioinformatics. vol. 4175. Berlin, Heidelberg: Springer; 2006. p. 163–73.
Hannenhalli S, Pevzner PA. Transforming men into mice (polynomial algorithm for genomic distance problem). In: Foundations of Computer Science, 1995 Proceedings, 36th Annual Symposium on Foundations of Computer Science; 1995. p. 581–92.
Dias Z, Meidanis J. Genome rearrangements distance by fusion, fission, and transposition is easy. In  String Processing and Information Retrieval, SPIRE 2001 Proceedings, 8^{th} International Symposium on 13–15 Nov 2001. p. 250–3.
Yancopoulos S, Attie O, Friedberg R. Efficient sorting of genomic permutations by translocation, inversion, and block interchange. Bioinformatics. 2005;21:3340–6.
Christie  Genome rearrangement problems, Ph.D. Dissertation. Glasgow: Department of Computer Science, Glasgow University; 1998.
Chan PP, Lowe TM. GtRNAdb  A database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res. 2009;37(Database):D93–D97.
Zhang K, Shasha D. Simple fast algorithms for the editing distance between trees and related problems. SIAM J Comput. 1989;18:1245–62.
Alaqel H, Badr G. Genome rearrangement for RNA secondary structure using a componentbased representation: Master Project. Riyadh: King Saud University; 2015.
Benson DA, Cavanaugh M, Clark K, KarschMizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2013;41(Database issue):D3642.
Burge SW, Daub J, Eberhardt R, Tate J, Barquist L, Nawrocki EP, et al. Rfam 11.0–10 years of RNA families. Nucleic Acids Research. 2012:1–7.
Acknowledgements
A 2page abstract has been published in Lecture notes in computer science: Bioinformatics research and applications.
Funding
This research has been supported by the National Plan for Sciences and Technology, King Saud University, Riyadh, Saudi Arabia (Project No. 12BIO2605–02). The Funding institute did not play any role in design and conclusions. The publication costs were covered by the authors.
Availability of data and materials
Data can be available upon request.
About this supplement
This article has been published as part of BMC Bioinformatics Volume 18 Supplement 12, 2017: Selected articles from the 12th International Symposium on Bioinformatics Research and Applications (ISBRA16): bioinformatics. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume18supplement12.
Author information
Authors and Affiliations
Contributions
GB proposed, conceived, designed, and coordinated the study, helped in drafting of the manuscript, and critically revised the final manuscript. HA designed the benchmark, developed the DCJRNA steps, carried out testing and validation, and helped in drafting of the manuscript. All authors participated in analysis and interpretation of results. Both authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Badr, G.H., Alaqel, H.A. DCJRNA  double cut and join for RNA secondary structures. BMC Bioinformatics 18, 427 (2017). https://doi.org/10.1186/s1285901718306
Published:
DOI: https://doi.org/10.1186/s1285901718306
Keywords
 Genome Rearrangement
 RNA Secondary Structure
 DCJ
 Similarity Measure
 Sorting Scenario