Skip to main content

Table 6 Feature importance

From: Comprehensive machine-learning-based analysis of microRNA–target interactions reveals variable transferability of interaction rules across species

Feature/Dataset

ca1

ce1

ce2

h1

h2

h3

m1

m2

Mean

Number of GU bp within the seed\(^{n}\)

100*

87*

95*

29*

40*

100*

28

100*

72

bp in the 1st nt of the seed\(^{b}\)

63*

79

34*

70*

25*

30*

27

85*

52

Number of GU bp within the site\(^{n}\)

42*

71*

32*

100*

19

53*

35*

28*

48

Proportion of G in mRNA at the site region\(^{n}\)

12

74*

12

12

36*

33*

100*

37*

39

Duplex minimum free energy\(^{n}\)

13*

45

11

10

100*

19

35*

52*

36

Number of bp at location 2–7\(^n\)

42*

33

100*

12

18

36*

13

18

34

Proportion of GG in mRNA at the site region\(^{n}\)

30*

21

10

12

7

30*

79*

26*

27

bp in the 4th nt of the seed \(^{b}\)

8

100*

21

10

11

16

2

12

22

Number of bulges outside the seed\(^{n}\)

3

60*

6

25*

32*

9

9

8

19

bp in the 2nd nt of the seed \(^{b}\)

8

42

37*

7

11

13

15

6

17

bp in the 5th nt of the seed \(^{b}\)

12

27

14

14

6

15

29*

12

16

Number of GC bp within the seed \(^{n}\)

7

22

24*

18*

12

13

11

12

15

Number of GC bp outside the seed\(^{n}\)

4

27

11

10

27*

8

6

5

12

Accessibility (nt = 21, len = 10)\(^{n}\)

9

19

7

6

25

7

12

7

11

minimum free energy of the target site + 50nt flanking regions\(^n\)

8

11

6

7

8

11

36*

6

11

Number of mismatches inside the seed \(^n\)

4

3

15

19*

0

13

2

9

8

  1. The table shows 16 features representing the union of the top 6 features of each dataset, along with their gain values which were computed by XGBoost. The features are ordered by their mean gain, scaled to the range of (0, 100), across all datasets. For the unscaled version of the table, see Additional file 1: Table S5
  2. *Belongs to the top 6 features of the dataset
  3. \(^b\)Boolean feature
  4. \(^n\)Numeric feature