FALCON2: a web server for high-quality prediction of protein tertiary structures

Kong, Lupeng; Ju, Fusong; Zhang, Haicang; Sun, Shiwei; Bu, Dongbo

doi:10.1186/s12859-021-04353-8

Software
Open access
Published: 15 September 2021

FALCON2: a web server for high-quality prediction of protein tertiary structures

Lupeng Kong^1,2^na1,
Fusong Ju^1,2^na1,
Haicang Zhang^1,2,
Shiwei Sun^1,2 &
…
Dongbo Bu ORCID: orcid.org/0000-0003-4119-4238^1,2

BMC Bioinformatics volume 22, Article number: 439 (2021) Cite this article

2903 Accesses
4 Altmetric
Metrics details

Abstract

Background

Accurate prediction of protein tertiary structures is highly desired as the knowledge of protein structures provides invaluable insights into protein functions. We have designed two approaches to protein structure prediction, including a template-based modeling approach (called ProALIGN) and an ab initio prediction approach (called ProFOLD). Briefly speaking, ProALIGN aligns a target protein with templates through exploiting the patterns of context-specific alignment motifs and then builds the final structure with reference to the homologous templates. In contrast, ProFOLD uses an end-to-end neural network to estimate inter-residue distances of target proteins and builds structures that satisfy these distance constraints. These two approaches emphasize different characteristics of target proteins: ProALIGN exploits structure information of homologous templates of target proteins while ProFOLD exploits the co-evolutionary information carried by homologous protein sequences. Recent progress has shown that the combination of template-based modeling and ab initio approaches is promising.

Results

In the study, we present FALCON2, a web server that integrates ProALIGN and ProFOLD to provide high-quality protein structure prediction service. For a target protein, FALCON2 executes ProALIGN and ProFOLD simultaneously to predict possible structures and selects the most likely one as the final prediction result. We evaluated FALCON2 on widely-used benchmarks, including 104 CASP13 (the 13th Critical Assessment of protein Structure Prediction) targets and 91 CASP14 targets. In-depth examination suggests that when high-quality templates are available, ProALIGN is superior to ProFOLD and in other cases, ProFOLD shows better performance. By integrating these two approaches with different emphasis, FALCON2 server outperforms the two individual approaches and also achieves state-of-the-art performance compared with existing approaches.

Conclusions

By integrating template-based modeling and ab initio approaches, FALCON2 provides an easy-to-use and high-quality protein structure prediction service for the community and we expect it to enable insights into a deep understanding of protein functions.

Background

Proteins are macromolecules composed of amino acid chains and serve important roles in a wide-range of biological processes including catalysis, immunity, and information transmission. A protein performs its biological functions by folding into specific tertiary structures; thus, the knowledge of protein structure is crucially helpful for the deep understanding of its biological functions [1]. Protein structures can be experimentally determined using X-ray crystallography, nuclear magnetic resonance, or cryo-electron microscopy. These experimental technologies, however, are usually time-consuming and thus cannot catch up with the rapid accumulation of protein sequences. Unlike these experimental technologies, the computational prediction of protein structures purely from amino acid sequences is efficient, and accurate prediction approaches are highly desired.

Protein structure prediction has received extensive studies and a large variety of prediction approaches have already been proposed. These approaches can be divided into two categories, namely, template-based modeling (TBM) approaches and ab initio prediction approaches. For a target protein of interest, TBM approaches first identify its homologous proteins with known structures (called templates) through constructing alignments, and then build tertiary structures with reference to the structure of these homologous proteins [2, 3]. Statistical models [4] and combinatorial optimization techniques [5, 6] are widely used to model and calculate the optimal protein alignment. When homologous templates of the target protein are available and high-quality target-template alignments can be constructed, TBM approaches can accurately predict structures for the target protein.

Unlike the template-based modeling approaches, ab initio prediction approaches do not require the availability of homologous templates for target proteins; instead, these approaches predict protein structures in an ab initio fashion, i.e., constructing structures with the lowest free energy [7]. For instance, Rosetta uses an energy function describing Van der Waals force, hydrophobic effects and hydrogen bonds, and uses the Monte Carlo strategy to find the structure that minimizes the energy function [7]. I-TASSER constructs high-quality structural models through iterative threading assembly refinement [8]. The past decade has witnessed a great breakthrough in ab initio prediction approaches: using the inter-residue distance derived from direct-coupling analysis of homologous protein sequences [9, 10], trRosetta [11] and AlphaFold [12] predict structures of target proteins with significantly improved accuracy. Recently, deep learning has been widely applied to improve the estimation of inter-residue distances and construct structures that satisfy the distance restrictions [13,14,15].

Recent advances have shown that the combination of TBM and ab initio approaches is promising as these two types of approaches have different emphasis [16, 17]. Generally speaking, the TBM approaches exploit structure information of homologous templates of target proteins while the recent ab initio approaches usually exploit the co-evolutionary information carried by homologous proteins. We have designed two approaches to protein structure prediction, including a template-based modeling approach ProALIGN [18] and an ab initio prediction approach ProFOLD [19]. Specifically, ProALIGN uses a deep neural network to learn the patterns of context-specific alignment motifs. These patterns enable ProALIGN to model the dependence among residue pairs and thereafter accurately construct target-template alignments for structure building. Unlike the existing approaches using handcrafted features such as covariance matrix [14, 17], ProFOLD employs an end-to-end framework called CopulaNet to estimate inter-residue distances directly from multiple sequence alignment (MSA) of the target protein.

In the study, we present the FALCON2 server that integrates the TBM approach ProALIGN and ab initio approach ProFOLD. For a target protein, we run these two approaches to predict tertiary structures simultaneously, then employs a quality assessment tool ProQ3D to estimate structure quality, and finally selects the best candidate structures from the prediction results by the two approaches. Using 104 CASP13 targets and 91 CASP14 targets, we evaluated FALCON2 server and performed a systematic analysis and comparison of these two approaches. These experimental results suggest that by integrating TBM and ab initio approaches, FALCON2 can predict protein structures with improved accuracy and efficiency. FALCON2 also has a user-friendly interface and we expect it to enable insights into a deep understanding of protein functions.

Implementation

For a target protein, FALCON2 predicts its structure using a four-step procedure, including constructing MSA of the target protein, executing ProALIGN and ProFOLD simultaneously to yield candidate structure models, and subsequently selecting the best model as the final prediction result. The flowchart of FALCON2 is shown in Fig. 1 and more details of these four steps are described as follows.

Constructing MSA of target protein

Construction of high-quality MSA for target protein is the first and fundamental step of the entire prediction procedure. The quality of MSA has great effects on protein alignment, inter-residue distance estimation, and structure quality assessment [20]. To build high-quality MSA, FALCON2 executes HHblits [21] and HMMsearch [22] to search target protein for its homologous proteins within Uniclust [23] and UniRef [24] sequence databases. For virus or bacterial proteins, the MSA thus constructed might have only a few homologous proteins. In this case, we further search target protein against metagenome databases including Metaclust database [25], BFD database [26], and MGnify database [27].

Template-based modeling using ProALIGN

To predict candidate structures for the target protein, we first construct target-template alignment by running ProALIGN with the constructed MSA as input, select the most likely alignment and template, and then generate candidate structures with reference to template structure.

Unlike the existing TBM approaching using a handcrafted scoring function for alignments, ProALIGN directly learns and infer protein alignment through exploiting the patterns of context-specific alignment motifs. Specifically, ProALIGN represents an alignment as a binary matrix in which the symbol ‘1’ denotes an aligned residue pair and ‘0’ denotes unpaired residues. This representation clearly shapes alignment motifs, e.g., aligned helices are shown as diagonal lines while alignment gaps are shown as two diagonal lines with a shift between them. These alignment motifs are context-specific, thus enabling us to recognize alignment motifs based on sequence contexts. ProALIGN uses a deep convolutional neural network to learn the patterns of alignment motifs.

For each template, ProALIGN applies the neural network to directly infer likelihoods of all possible residue pairs with target protein in their entirety, and then constructs the alignment with maximum likelihood. ProALIGN ranks all templates using a CMO-style [28] scoring function and uses the top 5 templates as references to build candidate structures using Modeller [29].

Ab initio protein structure prediction using ProFOLD

ProFOLD predicts structures for target proteins in ab initio fashion. ProFOLD uses an end-to-end framework, called CopulaNet, to estimate inter-residue distances directly from the MSA of the target protein. The CopulaNet consists of the following three key elements: (1) MSA encoder: according to each homologous protein collected in MSA, ProFOLD uses an MSA-encoder to extract context-specific mutation information of the target residues. (2) Co-evolution aggregator: ProFOLD applies a co-evolution aggregator to calculate residue co-evolution. (3) Inter-residue distances estimator: Subsequently, a distance estimator is used to estimate inter-residue distances according to the acquired residue co-evolution.

Finally, ProFOLD transforms the estimated inter-residue distances into an energy potential, and applies the gradient-descent technique to build structures that minimize this potential function [11, 30]. We run ProFOLD to generate 100 decoys and then use the top 5 decoys with the lowest energy potential as candidate structures for further selection.

Estimating structure quality and selecting the best candidate structure

For the candidate structures predicted by ProALIGN and ProFOLD, FALCON2 estimates structure quality by running ProQ3D [31]. Briefly speaking, ProQ3D assesses the quality of a structure by considering a variety of features, including residue contacts, residue conservation, and the agreement with the predicted secondary structure and solvent accessibility area. ProQ3D also takes into consideration the energy terms calculated by Rosetta [7]. ProQ3D feeds these features into a deep neural network, thus yielding the predicted structure quality, including TM-score, GDT_TS, and lDDT. By running ProQ3D on all candidate structures, FALCON2 obtains the predicted quality value lDDT and normalizes them into Z-score. FALCON2 finally selects the candidate structure with the highest lDDT as the final prediction result.

The user interface for FALCON2 server

FALCON2 provides an easy-to-use web service for protein structure prediction. It accepts protein sequence in FASTA format as input and returns the predicted structure of the target protein. FALCON2 also reports additional information for further analysis, including the constructed MSA, target-template alignments, predicted residue contacts, inter-residue distances, and Ramachandran plots of the predicted structures. FALCON2 provides an intuitive way to visualize the predicted 3D structures. Additional file 1: Figures S8-S13 show examples of the job submission page, job status page, and result visualization page.

Results and discussions

We evaluated the performance of FALCON2 over CASP13 and CASP14 official-defined domain targets, and compared FALCON2 with the best CASP server groups. The prediction results by the CASP13 and CASP14 groups were downloaded from the CASP official website.

For each of the predicted structures, we superimposed it onto the corresponding native structure, calculated TM-score, and used it to measure the quality of the predicted structure. TM-score ranges from 0 to 1, and a high TM-score implies that the two protein structures under consideration are similar. It should be noted that the sequence and template databases used by FALCON2 are regularly updated; however, to make the evaluation and comparison fair, we used only the protein sequences and structures released before CASP13 and CASP14 competitions accordingly.

The performance of FALCON2 over CASP13 targets

We first evaluated the performance of FALCON2 over 104 CASP13 official-defined domain targets, and compared FALCON2 with the best CASP13 server and human groups, including A7D, Zhang, MULTICOM, QUARK, Zhang-Server, and RaptorX-DeepModeller. For each CASP13 domain target, we constructed MSA through searching it against three sequence databases, including Uniclust30 (as of Oct. 2017), Uniref90 (as of Mar. 2018), and metagenome database Metaclust (as of Jan. 2018). We used the structures recorded in the PDB database (as of Apr. 2018) as templates to build candidate structures.

We summarized the prediction performance by FALCON2 and six CASP13 groups in Table 1. As shown in this table, over the 104 CASP13 domains, the average TM-score of the predicted protein structures by FALCON2 is 0.755, higher than the top human groups (A7D: 0.699, Zhang: 0.692, MULTICOM: 0.688) and server groups (QUARK: 0.672, Zhang-Server: 0.671, RaptorX-DeepModeller: 0.653). Specifically, for the 31 FM domains, FALCON2 achieves high prediction quality (average TM-score: 0.665), which is better than the state-of-the-art approaches (A7D: 0.580, Zhang: 0.509, MULTICOM: 0.495).

We further investigated the predicted structures by ProALIGN and ProFOLD individually and analyzed the contributions by these two components to FALCON2. Table 1 suggests that the top 1 predicted structures by ProALIGN and ProFOLD show an average TM-score of 0.644 and 0.736, respectively. By combining these two approaches, FALCON2 achieves an average TM-score of 0.755, which is higher than the two approaches.

In-depth examination suggests that the combination strategy leads to significant performance improvement, especially for the TBM target proteins. Specifically, for TBM-hard targets, the top 1 predicted structures by ProALIGN and ProFOLD show an average TM-score of 0.701 and 0.700, respectively. In contrast, FALCON2 achieves an average TM-score of 0.763.

Table 1 TM-score of the predicted structures for CASP13 targets by FALCON2 and top CASP13 groups

Full size table

The performance of FALCON2 over CASP14 targets

We also evaluated the performance of FALCON2 over 91 CASP14 official-defined domain targets, and compared FALCON2 with the top CASP14 server groups, including Zhang-Server, BAKER-ROSETTASERVER, Yang-Server, tFold, and FEIG-S. For each CASP13 domain target, we constructed MSA through searching it against four sequence databases, including Uniref30 (as of Feb. 2020), Uniref90 (as of Feb. 2020), BFD (as of Mar. 2019), and MGnify90 (as of May. 2019). We used the structures recorded in the PDB database (as of Apr. 2020) as templates to build candidate structures.

Table 2 TM-score of the predicted structures for CASP14 targets by FALCON2 and top CASP14 server groups

Full size table

As shown in Table 2, the average TM-score of the top 1 predicted structures by FALCON2 is 0.712, which is better than all CASP14 server groups (Zhang-Server: 0.706, BAKER-ROSETTASERVER: 0.655, Yang-Server: 0.657, tFold: 0.660). The superiority of FALCON2 is much clearer for the FM target proteins. The average TM-score of the top 1 predicted structures for FM target by FALCON2 is 0.562, which is better than Zhang-Server (0.555), and much better than the other CASP14 server groups (BAKER-ROSETTASERVER: 0.378, Yang-Server: 0.438, tFold: 0.456).

Overall, these results lead to similar observations that have been obtained on CASP13 target proteins, i.e., the combination strategy has higher prediction accuracy than the individual prediction approach, especially for the TBM target proteins.

Analyzing the contributions by ProFOLD and ProALIGN to FALCON2

In order to figure out the contribution of ProFOLD and ProALIGN to FALCON2, we performed the head-to-head comparison of the two approaches on CASP13 and CASP14 targets. Figure 2 shows that in general, ProFOLD has better prediction performance than ProALIGN and in some cases, the prediction structures by ProALIGN are better than ProFOLD.

To further examine in what cases ProALIGN outperforms ProFOLD, we divided the CASP13 and CASP14 target proteins into groups according to the availability of high-quality templates. In particular, for each target protein, we calculate the TM-score between its native structure and the most similar template. Next, we divide the target proteins into four groups with calculated TM-score within [0.00, 0.40], (0.40, 0.60], (0.60, 0.80], and (0.80, 1.00], respectively. As shown in Table 3, for the targets in the [0.00, 0.40], (0.40, 0.60], and (0.60, 0.80] groups, ProFOLD generates higher-quality protein structure than ProALGIN. In contrast, for the targets in (0.80, 1.00] group, ProALIGN outperforms ProFOLD. These results suggest that when high-quality templates are available, ProALIGN is superior to ProFOLD and in other cases, ProFOLD shows better performance.

Table 3 Quality of the predicted structures by ProALIGN, ProFOLD and FALCON2 on CASP13 and CASP14 target proteins

Full size table

Case studies

Using two proteins (T0950 and T0966) as representatives, we demonstrated the details of the prediction procedure of FALCON2, including the constructed MSA, predicted inter-residue distances, the selected templates, and the constructed structure models.

Case study 1: CASP13 target T0966

The target protein T0966 has a total of 492 residues, which was classified as TBM-hard in the CASP13 competition. The protein is a MARTX toxin effector domain from Vibrio vulnificus CMCP [32], and its native structure has already been solved and deposited in PDB as 5w6iA.

Table 4 Precision of the contacts predicted by ProFOLD for T0966

Full size table

For T0966, FALCON2 constructed an MSA through searching it against three sequence databases, including Uniclust30 (as of Oct. 2017), Uniref90 (as of Mar. 2018), and Metaclust (as of Jan. 2018). The constructed MSA contains a total of 126 homologous proteins, implying that its quality is relatively lower. Next, FALCON2 executed ProFOLD to predict inter-residue distances for this target. However, due to the low-quality MSA, the accuracy of the predicted inter-residue contacts is relatively lower. As shown in Table 4, the prediction accuracy of top L long-range residue contacts is only 0.443. The predicted inter-residue distances also deviate significantly from the native one (Fig. 3). Consequently, the predicted structure by ProFOLD achieved a TM-score of only 0.310.

Table 5 The top 5 templates reported by ProALIGN for target T0966

Full size table

Table 6 The top 5 predicted structures reported by ProQ3D for target T0966

Full size table

For this target, FALCON2 also executed ProALIGN to yield candidate structures. Table 5 shows the top 5 templates reported by ProALIGN. In the case of template 2ebhX, its sequence identity with T0966 is 25.4%, and according to this template, ProALIGN constructed a target-template alignment with a high confidence score (0.699). In fact, this template is substantially similar to the native structure (TM-score: 0.820), and using this template to construct target-template alignment, ProALIGN yielded a high-quality structure with TM-score 0.833. Finally, according to the predicted lDDT reported by ProQ3D (Table 6), FALCON2 selected the structure predicted by ProALIGN as the final prediction result (TM-score: 0.833; Fig. 4).

Case study 2: CASP13 target T0950

Table 7 Precision of the inter-residue contacts predicted by ProFOLD for target T0950

Full size table

The target protein T0950 has a total of 353 residues, which was classified as FM in the CASP13 competition. T0950 is a membrane protein from Photorhabdus luminescens [33], and its native structure has already been solved and deposited in PDB as 6ek4A.

Table 8 The top 5 templates reported by ProALIGN for target T0950

Full size table

Table 9 The top 5 predicted structures reported by ProQ3D for target T0950

Full size table

For T0950, FALCON2 constructed an MSA through searching it against three sequence databases, including Uniclust30 (as of Oct. 2017), Uniref90 (as of Mar. 2018), and Metaclust (as of Jan. 2018). The constructed MSA contains a total of 462 homologous proteins, implying that its quality is relatively high. Next, FALCON2 executed ProFOLD to predict inter-residue distances for this target. Using the high-quality MSA, ProFOLD yielded accurate distance prediction. As shown in Table 7, the prediction accuracy of top L long-range residue contacts reaches 0.675. The predicted inter-residue distance matrix is also similar to the native one (Fig. 5). Consequently, the predicted structure by ProFOLD achieved a TM-score of 0.730.

For this target, FALCON2 also executed ProALIGN to yield candidate structures. Table 8 shows the top 5 templates reported by ProALIGN. As no homologous template has been deposited in the template database used in this study, ProALIGN failed to find a similar template and thus cannot generate high-quality structure models. Finally, according to the predicted lDDT reported by ProQ3D (Table 9), FALCON2 selected a structure predicted by ProFOLD as the final prediction result (TM-score: 0.730; Fig. 6).

Conclusion

In this study, we present FALCON2, a web server for high-quality protein structure prediction. Using CASP13 and CASP14 target proteins as representatives, we demonstrate that FALCON2 can successfully predict structures for both TBM and FM target proteins when high-quality MSA can be obtained. We also observed that TBM and ab initio approaches have different emphasis, and the combination of these two types of approaches can lead to improved prediction accuracy. FALCON2 provides a user-friendly graphic interface, making it easy to use for the community. We expect FALCON2 web service to enable insights into the structure and function of proteins, especially the proteins with important roles in health and disease.

Availability and requirements

Project name: FALCON2 server

Project home page: http://protein.ict.ac.cn/FALCON2

Operating system(s): Windows, Linux, Mac

Programming language: Python, PHP, C++

License: GPL

Any restrictions to use by non-academics: license needed

Availability of data and materials

The FALCON2 web server is freely available online at http://protein.ict.ac.cn/FALCON2.

The data generated and analyzed during the current study are available at http://protein.ict.ac.cn/FALCON2/experiment_data/falcon2data.tgz.

Abbreviations

TBM:: Template-based modeling method
CASP:: Critical assessment of protein structure prediction
MSA:: Multiple sequence alignment
lDDT:: local Distance Difference Test
PSSM:: Position specific scoring matrix

References

Branden CI, Tooze J. Introduction to protein structure. New York: Garland Science; 2012.
Book Google Scholar
Källberg M, Wang H, Wang S, Peng J, Wang Z, Lu H, Xu J. Template-based protein structure modeling using the RaptorX web server. Nat Protoc. 2012;7(8):1511–22.
Article PubMed PubMed Central Google Scholar
Wang C, Zhang H, Zheng W-M, Xu D, Zhu J, Wang B, Ning K, Sun S, Li SC, Bu D. FALCON@ home: a high-throughput protein structure prediction server based on remote homologue recognition. Bioinformatics. 2016;32(3):462–4.
Article CAS PubMed Google Scholar
Yang Y, Faraggi E, Zhao H, Zhou Y. Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics. 2011;27(15):2076–82.
Article CAS PubMed PubMed Central Google Scholar
Ma J, Peng J, Wang S, Xu J. A conditional neural fields model for protein threading. Bioinformatics. 2012;28(12):59–66.
Article CAS Google Scholar
Zhu J, Wang S, Bu D, Xu J. Protein threading using residue co-variation and deep learning. Bioinformatics. 2018;34(13):263–73.
Article Google Scholar
Rohl CA, Strauss CE, Misura KM, Baker D. Protein structure prediction using Rosetta. Methods Enzymol. 2004;383:66–93.
Article CAS PubMed Google Scholar
Yang J, Zhang Y. I-TASSER server: new development for protein structure and function predictions. Nucleic Acids Res. 2015;43(W1):174–81.
Article Google Scholar
Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, Zecchina R, Onuchic JN, Hwa T, Weigt M. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci. 2011;108(49):1293–301.
Article Google Scholar
Zhang H, Gao Y, Deng M, Wang C, Zhu J, Li SC, Zheng W-M, Bu D. Improving residue-residue contact prediction via low-rank and sparse decomposition of residue correlation matrix. Biochem Biophys Res Commun. 2016;472(1):217–22.
Article CAS PubMed Google Scholar
Yang J, Anishchenko I, Park H, Peng Z, Ovchinnikov S, Baker D. Improved protein structure prediction using predicted interresidue orientations. Proc Natl Acad Sci. 2020;117(3):1496–503.
Article CAS PubMed PubMed Central Google Scholar
Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Žídek A, Nelson AW, Bridgland A, et al. Improved protein structure prediction using potentials from deep learning. Nature. 2020;577(7792):706–10.
Article CAS Google Scholar
Wang S, Sun S, Li Z, Zhang R, Xu J. Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput Biol. 2017;13(1):1005324.
Article Google Scholar
Li Y, Hu J, Zhang C, Yu D-J, Zhang Y. ResPRE: high-accuracy protein contact prediction by coupling precision matrix with deep residual neural networks. Bioinformatics. 2019;35(22):4647–55.
Article CAS PubMed PubMed Central Google Scholar
Shrestha R, Fajardo E, Gil N, Fidelis K, Kryshtafovych A, Monastyrskyy B, Fiser A. Assessing the accuracy of contact predictions in CASP13. Proteins: Struct Funct Bioinf. 2019;87(12):1058–68.
Article CAS Google Scholar
Xu J, Wang S. Analysis of distance-based protein structure prediction by deep learning in CASP13. Proteins: Struct Funct Bioinf. 2019;87(12):1069–81.
Article CAS Google Scholar
Xu J. Distance-based protein folding powered by deep learning. Proc Natl Acad Sci. 2019;116(34):16856–65.
Article CAS PubMed PubMed Central Google Scholar
Kong L, Ju F, Zheng W-M, Sun S, Xu J, Bu D. ProALIGN: directly learning alignments for protein structure prediction via exploiting context-specific alignment motifs. 2020;bioRxiv
Ju F, Zhu J, Shao B, Kong L, Liu T-Y, Zheng W-M, Bu D. CopulaNet: learning residue co-evolution directly from multiple sequence alignment for protein structure prediction. Nat Commun. 2021;12(1):2535.
Article CAS PubMed PubMed Central Google Scholar
Zhang C, Zheng W, Mortuza S, Li Y, Zhang Y. DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins. Bioinformatics. 2020;36(7):2105–12.
Article CAS PubMed Google Scholar
Remmert M, Biegert A, Hauser A, Söding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods. 2012;9(2):173–5.
Article CAS Google Scholar
Johnson LS, Eddy SR, Portugaly E. Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinform. 2010;11(1):1–8.
Article Google Scholar
Mirdita M, von den Driesch L, Galiez C, Martin MJ, Söding J, Steinegger M. Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic Acids Res. 2017;45(D1):170–6.
Article Google Scholar
Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007;23(10):1282–8.
Article CAS PubMed Google Scholar
Steinegger M, Söding J. Clustering huge protein sequence sets in linear time. Nat Commun. 2018;9(1):1–8.
Article CAS Google Scholar
Steinegger M, Mirdita M, Söding J. Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold. Nat Methods. 2019;16(7):603–6.
Article CAS PubMed Google Scholar
Mitchell AL, Almeida A, Beracochea M, Boland M, Burgin J, Cochrane G, Crusoe MR, Kale V, Potter SC, Richardson LJ, et al. MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res. 2020;48(D1):570–8.
Google Scholar
Di Lena P, Fariselli P, Margara L, Vassura M, Casadio R. Fast overlapping of protein contact maps by alignment of eigenvectors. Bioinformatics. 2010;26(18):2250–8.
Article PubMed Google Scholar
Webb B, Sali A. Comparative protein structure modeling using MODELLER. Curr Protoc Bioinform. 2016;54(1):5–6.
Article Google Scholar
Zhang C, Liu S, Zhou Y. Accurate and efficient loop selections by the DFIRE-based all-atom statistical potential. Protein Sci. 2004;13(2):391–9.
Article CAS PubMed PubMed Central Google Scholar
Uziela K, Menendez Hurtado D, Shu N, Wallner B, Elofsson A. ProQ3D: improved model quality assessments using deep learning. Bioinformatics. 2017;33(10):1578–80.
CAS PubMed Google Scholar
Biancucci M, Minasov G, Banerjee A, Herrera A, Woida PJ, Kieffer MB, Bindu L, Abreu-Blanco M, Anderson WF. Gaponenko V et al The bacterial Ras/Rap1 site-specific endopeptidase RRSP cleaves Ras through an atypical mechanism to disrupt Ras-ERK signaling. Sci Signal. 2018;11(550):eaat8335.
Article PubMed PubMed Central Google Scholar
Bräuning B, Bertosin E, Praetorius F, Ihling C, Schatt A, Adler A, Richter K, Sinz A, Dietz H, Groll M. Structure and mechanism of the two-component α-helical pore-forming toxin YaxAB. Nat Commun. 2018;9(1):1–14.
Article Google Scholar

Download references

Acknowledgements

We greatly appreciate Shuaicheng Li for fruitful discussions on this study.

Funding

We would like to thank the National Key Research and Development Program of China (2020YFA0907000), and the National Natural Science Foundation of China (31671369, 31770775, 62072435) for providing financial supports for this study and publication charges.

Author information

Lupeng Kong and Fusong Ju have contributed equally to this work

Authors and Affiliations

Key Lab of Intelligent Information Processing, Big-Data Academy, Institute of Computing Technology, Chinese Academy of Sciences, 100190, Beijing, China
Lupeng Kong, Fusong Ju, Haicang Zhang, Shiwei Sun & Dongbo Bu
University of Chinese Academy of Sciences, 100049, Beijing, China
Lupeng Kong, Fusong Ju, Haicang Zhang, Shiwei Sun & Dongbo Bu

Authors

Lupeng Kong
View author publications
You can also search for this author in PubMed Google Scholar
Fusong Ju
View author publications
You can also search for this author in PubMed Google Scholar
Haicang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shiwei Sun
View author publications
You can also search for this author in PubMed Google Scholar
Dongbo Bu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.B. conceived the study. L.K. and J.F. designed and implemented FALCON2. L.K. and J.F. performed the experiments, and L.K., J.F., D.B., H.Z and S.S analyzed the experimental results. L.K. and D.B. wrote and revised the manuscript. All authors read and approved the manuscript.

Corresponding author

Correspondence to Dongbo Bu.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1

. Files containing additional implementation detail and additional tables of results.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Kong, L., Ju, F., Zhang, H. et al. FALCON2: a web server for high-quality prediction of protein tertiary structures. BMC Bioinformatics 22, 439 (2021). https://doi.org/10.1186/s12859-021-04353-8

Download citation

Received: 24 June 2021
Accepted: 01 September 2021
Published: 15 September 2021
DOI: https://doi.org/10.1186/s12859-021-04353-8

FALCON2: a web server for high-quality prediction of protein tertiary structures

Abstract

Background

Results

Conclusions

Background

Implementation

Constructing MSA of target protein

Template-based modeling using ProALIGN

Ab initio protein structure prediction using ProFOLD

Estimating structure quality and selecting the best candidate structure

The user interface for FALCON2 server

Results and discussions

The performance of FALCON2 over CASP13 targets

The performance of FALCON2 over CASP14 targets

Analyzing the contributions by ProFOLD and ProALIGN to FALCON2

Case studies

Case study 1: CASP13 target T0966

Case study 2: CASP13 target T0950

Conclusion

Availability and requirements

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Supplementary Information

Additional file 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us