The scoring function to assess an alignment *A* consists of local item *L*(*A*) and distant item *G*(*A*), i.e., *score*(*A*) = *ω*_{
L
}*L*(*A*) + *G*(*A*), where *ω*_{
L
} denotes weight of local item.

Local item is the weighted sum of mutation score *S*_{
m
}, secondary structure compatibility score *S*_{
ss
}, solvent accessibility score *S*_{
sa
}, gap penalty score *S*_{
g
}, and structural segment compatibility score *S*_{
CLE
}[16], i.e., *L*(*A*) = *ω*_{
m
}*S*_{
m
}(*A*) + *ω*_{
ss
}*S*_{
ss
}(*A*) + *ω*_{
CLE
}*S*_{
CLE
}(*A*) + *ω*_{
sa
}*S*_{
sa
}(*A*) + *ω*_{
g
}*S*_{
g
}(*A*), *S*_{
g
}(*A*) = *ω*_{
go
}*GO* + *ω*_{
ge
}*GE*, where *GO* and *GE* are the number of gap open and gap extending, respectively. The weight of these items are to be determined via training on SALIGN benchmark.

The global item *G*(*A*), which contains the nonlocal interaction information implicitly, is captured by the dDFIRE energy over a “partial” decoy corresponding to the alignment *A*. An ideal way to measure non-local interaction is to calculate dDFIRE energy over a full-length decoy. However, it is usually time-consuming to obtain full-length decoy through running structure-generating tools such as MODELLER [17]. Thus, this strategy is unacceptable since we usually need to sample thousands of alignments. Here, we employ an alternative method to build a partial “decoy” from the alignment. Specifically, only the aligned residues are kept with their coordinates simply copied from the corresponding residues in the template.

This section are organized as follows: We first verify that dDFIRE energy function is constantly good-performing when used to evaluate “partial” decoys. Second, both local item and global item should be normalized using match state size. Third, we prove that global item of the our scoring function is effective to capture distance interaction comparing with contact-preference based scoring functions. Fourth, we show that optimal local score can be used to determine “easy” pairs for which local score item is sufficient while adding global item may lead noise contrarily. Last, we train *ω*_{
L
} on SALIGN [18] benchmark dataset.

### Performance of dDFIRE on partial structure

Since we calculate dDFIRE energy on the “partial” decoy instead of a full-length structure, thus it is necessary to verify whether the “partial dDFIRE energy” still have the power to distinguish native-like decoys. To verify this, we performed experiments on three commonly-used benchmark datasets: LKF [1], Gapless Threading [2] and Rosetta [5]. The datasets contain 178, 200 and 232 proteins, respectively; and for each protein, 100 decoys were generated as control to the native structure. The objective of this experiment is to verify whether the “partial” native structure can be distinguished from the “partial” decoys by dDFIRE.

For both native structures and decoys, the “partial” conformations were simulated through randomly excising a set of residues. At various excising percentage, the ratio of proteins for which the partial native structure has the lowest dDFIRE energy relative to all partial decoys are calculated, and denoted as accuracy in Fig.1. As demonstrated by Fig.1, on LKF and Rosetta benchmarks, dDFIRE performs constantly well even if over 40% residues are excised; and on Gapless Threading benchmark, the performance decreases slightly.

### Score normalization

We also investigate the relationship between the scores with the match state size. Analysis suggests the linearity between local(global) scores and match state size. Specifically, the linear correlation coefficient between local(global) scores and the match state size is –0.762 (–0.968) (See Fig.2 and 3 for details). Thus, it is reasonable to normalize both local and global score through dividing by the match state size.

### Effect of global items

We further investigate the effect of global item. As control, we performed comparison with the traditional way to describe non-local interactions via contact preference matrix [13, 15], i.e, *S*_{
p
} = ∑_{
i
}∑_{
j
}*δ*(*i*, *j*)*Pair*(*A*(*i*), *A*(*j*)), and where *A*(*i*) is the matched residue in the sequence, *δ*(*i*, *j*) indicates whether *i* th and *j* th residue in the template have contact, *P*_{
m
} is the profile vector at the *m* th position and *C* is the contact preference matrix.

We first give some notations before presenting the experiments to examine the effects of global items. For each query-template pair, two typical alignments are generated: the structural alignment *A*_{
R
} generated via running TMalign [19], and the optimal alignment (denoted as *A*_{
L
}) when only local item *L*(*A*) is taken into consideration, i.e. *A*_{
L
} = argmin_{
A
}*L*(*A*)*.* For each alignment *A*, its real quality is measured by TMscore [20], denoted as TM(*A*). We also use *L*(*A*) and *G*(*A*) to denote the local score and global score of *A*, and use *C*(*A*) to denote the contact-preference-based score of *A*.

The 200 query-template pairs in SALIGN [18] dataset are categorized into two classes according to the quality of *A*_{
L
}: (i) TM(*A*_{R}) – TM(*A*_{
L
}) < 0.1, 144 pairs in total; and (ii) TM(*A*_{
R
}) – TM(*A*_{
L
}) ≥ 0.1, 56 pairs in total. Intuitively, class 1 contains the pairs for which a scoring function with local score item alone is sufficient; and class 2 contains the pairs for which local score alone failed. For pairs in class 2, we expect global items can help to distinguish the reference alignment. We verify this by comparing the global score of *A*_{
L
} and *A*_{
R
} : only for pairs satisfying *A*_{
L
} – *A*_{
R
} > 0, it is likely to distinguish the reference alignment. Fig.4 and 5 suggest that for the pairs that local item alone cannot separate *A*_{
L
} from *A*_{
R
} (*L*(*A*_{
L
}) ≤ *L*(*A*_{
R
}) because of *A*_{
L
} = argmin_{
A
}*L*(*A*)), global item of our scoring function can effectively measure the quality of alignments. Specifically, we observed that *G*(*A*_{
R
}) <*G*(*A*_{
L
}) on 52 of 56 pairs. In contrast, the contact-preference-based score does not help improve this situation, only on 20 of 56 pairs, *C*(*A*_{
R
}) <*C*(*A*_{
L
})*.*

### Determining pairs for which local score item alone is sufficient

On 144 of 200 pairs of SALGIN benchmark, local score alone is sufficient to find out a “good” alignment(TM(*A*_{
R
}) – TM(*A*_{
L
}) < 0.1). In fact, in these cases, adding global item may lead to false-negative [21]. We observed that the normalization of local scores help recognizing these “easy” pairs. This is reasonable since local score contains most of the homologous information between the sequence and the template.

Fig.6 implies that TMscore value is strongly correlated with local score (linear correlation coefficient is -0.78). Besides, as the local score increases, *A*_{
L
} becomes worse, i.e., the cumulative average value of TM(*A*_{
R
}) – TM(*A*_{
L
}) increases as the local score increasing (the blue curve in Fig.6). Accordingly, we choose a threshold of local score, denoted as *θ*, to determine whether local score item is sufficient: if *L*(*A*_{
L
}) ≤ *θ*, then *A*_{
L
} is treated as a good alignment. In our method, *θ* = –87.

### Weight training process

Parameter *ω*_{
L
} is trained by classification. For each query-template pair in SALIGN benchmark, one positive alignment *A*_{
p
} and 10 negative alignments *A*_{
n
} are selected (We also have tried other number of negative alignments, similar result is obtained).

Here, we use the reference alignment as positive alignment, i.e. *A*_{
p
} = *A*_{
R
}*.* Negative alignments are chosen from the top 100 alignments returned by dynamic programming. We first cluster these alignments to remove redundancy, and then randomly select alignments satisfying TM(*A*_{
p
}) – TM(*A*_{
n
}) > 0.2.

*ω*_{
L
} should divide *A*_{
n
} and *A*_{
p
} as much as possible. Formally

*ω*_{
L
} = arg max *|H|*

where

*H* = {(*A*_{
p
}, *A*_{
n
}) ∈ *P*|*ω*_{
L
}*L*(*A*_{
p
}) + *G*(*A*_{
p
}) <*ω*_{
L
}*L*(*A*_{
n
}) + *G*(*A*_{
n
})}

= {(*A*_{
p
}, *A*_{
n
}) ∈ *P*|*ω*_{
L
}*L*(*A*_{
p
}) – *L*(*A*_{
n
})) <*G*(*A*_{
n
}) – *G*(*A*_{
p
})}*.*

*P* = {(*A*_{
p
}, *A*_{
n
})*|* *A*_{
p
} and *A*_{
n
} are from the same pair}, since alignments of different query-template pair are not comparable. The classification result is showed in Fig.7, *ω*_{
L
} = 0.0047.

After obtaining the parameter *ω*_{
L
} and *θ*, our threading algorithm can be described informally as follows: given a query-template pair, dynamic programming algorithm is employed to calculate *A*_{
L
} *.* If *L*(*A*_{
L
}) <*θ*, then *A*_{
L
} is considered as a good alignment and returned. Otherwise, local search algorithm is then used to find a better alignment under scoring function *score*(*A*) = *ω*_{
L
}*L*(*A*) + *G*(*A*)*.* The initial alignments used in this step are chosen from the dynamic programming table in the previous step.