Artificial neural network models for prediction of intestinal permeability of oligopeptides
© Jung et al; licensee BioMed Central Ltd. 2007
Received: 30 March 2007
Accepted: 11 July 2007
Published: 11 July 2007
Oral delivery is a highly desirable property for candidate drugs under development. Computational modeling could provide a quick and inexpensive way to assess the intestinal permeability of a molecule. Although there have been several studies aimed at predicting the intestinal absorption of chemical compounds, there have been no attempts to predict intestinal permeability on the basis of peptide sequence information. To develop models for predicting the intestinal permeability of peptides, we adopted an artificial neural network as a machine-learning algorithm. The positive control data consisted of intestinal barrier-permeable peptides obtained by the peroral phage display technique, and the negative control data were prepared from random sequences.
The capacity of our models to make appropriate predictions was validated by statistical indicators including sensitivity, specificity, enrichment curve, and the area under the receiver operating characteristic (ROC) curve (the ROC score). The training and test set statistics indicated that our models were of strikingly good quality and could discriminate between permeable and random sequences with a high level of confidence.
We developed artificial neural network models to predict the intestinal permeabilities of oligopeptides on the basis of peptide sequence information. Both binary and VHSE (principal components score V ectors of H ydrophobic, S teric and E lectronic properties) descriptors produced statistically significant training models; the models with simple neural network architectures showed slightly greater predictive power than those with complex ones. We anticipate that our models will be applicable to the selection of intestinal barrier-permeable peptides for generating peptide drugs or peptidomimetics.
Successful drug development requires not only the optimization of pharmacological specificity and potency, but also a method for efficient drug delivery to the target site. Many drug candidates fail to achieve their therapeutic potentials because of poor bioavailability . Oral drug delivery avoids the pain and discomfort associated with injections and also the risk of accidents and infections caused by misuse of needles. For these reasons, the oral route is by far the easiest and most convenient mode of drug administration, and oral availability is a highly desirable property for candidate drugs under development. However, before an orally administered drug can reach its site of action, it must first cross the intestinal epithelial barrier by passive diffusion, carrier- or receptor-mediated uptake or active transport and enter the systemic circulation . Molecules with low permeability and/or absorption rates are not suitable for oral administration, and there has been great interest in finding ways to avoid producing potent but non-permeating molecules . Several screening paradigms for evaluating drug absorption have been employed to enhance the probability of success through the stages of drug development and a number of methods have been developed to assess oral availability using in vivo, in vitro, in situ or in silico models .
The most widely-accepted in vitro absorption model uses Caco-2 cell monolayers. Because Caco-2 cells express several types of transporter proteins, both the passive and active transport potentials of a compound can be investigated [5–7] and several experimental methods have been developed using this model to test the absorption of drugs by the human intestine [8–10]. However, these experimental cell-system methods are rather labor-intensive and not easily applicable to high-throughput screening. As an alternative approach, computational modeling can provide a quick and inexpensive way of evaluating the intestinal permeability of a compound before synthesis. This enables us to prioritize molecules for in vitro and in vivo studies and improve the overall properties of the compounds that proceed along the drug discovery pathway. A number of models for Caco-2 cell permeability or human intestinal absorption have been reported that predict the oral absorption properties of drugs, mostly limited to small organic molecules [11–14].
Rapid developments in biotechnology and peptide synthesis have made it possible to exploit the unique pharmacological activities of peptides; thousands of different peptides have been designed, synthesized and subjected to a range of screening procedures and biological assays. To analyze the vast amounts of biological data on peptides, quantitative structure-activity relationship (QSAR) models have been successfully employed. For example, several QSAR models have been developed to predict the peptide binding activities of target proteins, resulting in good correlations with in vitro data [15–19], and these have proved useful in generating leads through the screening of large peptide libraries. It is surprising that QSAR models have seldom been applied to other pharmacological properties of peptides, especially since failure to comply with pharmacological demands is likely to terminate the development of a candidate peptide drug [20, 21]. Although a few previous QSAR studies have investigated the affinities of peptides to intestinal transport proteins, the machine-learning processes were performed not on the basis of sequence information but of chemical structure [22–24]. There have been a few reports on the prediction of intestinal absorption of non-peptide compounds from molecular structure. Wessel et al. reported a QSAR study on a set of 86 compounds with known percentage human intestinal absorption (%HIA) values . To obtain a predictive model, they used a neural network to map molecular structure descriptors to %HIA. Polley et al. applied Bayesian regularized neural networks to develop a statistically significant QSAR model for human intestinal absorption .
In this work, we report the first QSAR models to predict the intestinal permeabilities of peptides on the basis of their sequences. A group of peptides crossing the intestinal barrier were selected from a random phage-peptide library using the 'peroral phage display technique', a newly developed in vivo technique in which a phage-peptide library is administered orally to rats and the intestinal barrier-permeable phages are collected from the internal organs. Using the sequence set of the selected phage-displayed peptides, we constructed an artificial neural network model to evaluate the intestinal permeabilities of peptides using various descriptors of the physicochemical properties and occurrence of the amino acid residues.
Comparison for relative hydrophilicity and hydrophobicity of amino acids for the real data sets.
Prediction accuracy for models with various network architecturesa.
N hidden b
1 : 1 Data set
1 : 3 Data set
N hidden b
1 : 1 Data set
1 : 3 Data set
To test the effect of the number of objects on overtraining, we also constructed neural network models for 1:3 data set in which the negative control data set was three times larger than the positive; Table 2 summarizes the capacity of these models for prediction. Considering models with the same network architectures, the differences in ROC scores between the training and test set were generally smaller in the 1:3 than the 1:1 data set. This result shows that the performance of the model is less affected by overtraining if the size of the data set is increased.
The results of validation for models with network architecture (7 × 20)-0-1a.
1 : 1 Data set
0.841 ± 0.002
0.760 ± 0.005
Comparison of truth table statistics for the test sets for two models
1 : 1 Data set
1 : 3 Data set
(7 × 20)-0-1
(7 × 8)-0-1
We have developed models for predicting the intestinal permeabilities of peptides. Our models produced nearly identical statistics for multiple training runs and efficiently discriminated among peptides on the basis of intestinal permeability. As shown in the decoy set analysis, models trained with random sequences had no prediction capacity, but the peptide sequences collected from the in vivo experiment served well as positive control sets for the QSAR models.
Although we tried to optimize the network architecture and to minimize overtraining and other related problems during the course of development, some factors in our model might cause prediction errors. We assumed that randomly-selected heptapeptide sequences can be used as negative controls. This assumption can be rationalized on the grounds that heptapeptides with random sequences are very likely to be intestinal barrier-impermeable because the sequences obtained from the in vivo experiment only covered a very small portion of the entire 'heptapeptide space'. Thus, our model correctly predicts permeable rather than impermeable peptides. This indicates that a model based on 1:3 data set is preferable for eliminating intestinal barrier-impermeable peptides if the random sequences chosen as negative controls do indeed show negligible intestinal absorption; as shown in Table 4, the specificity of 1:3 data set models is superior to that of 1:1 data set models. Consequently, our model has the best predictive power for the selection of the permeable peptide in relation to the reliability of the data set.
In this work, we developed models for prediction of intestinal permeabilities of peptides using the feed-forward neural network as an algorithm for training. As shown in Table 2, some problems could be detected in the feed-forward neural network, which are including overfitting, network architecture optimization, and selection of the best QSAR model. To avoid shortcomings such as overtraining which appears to have happened in our models with larger network complexity, robust QSAR models using the Bayesian regularized neural network would be more desirable [26, 29]. The development of QSAR models using the more robust methods like the Bayesian neural network would be a fruitful approach of future work in terms of predictive ability and robustness of model for intestinal permeability of peptide.
Burden et al.  noted that property-based descriptors require a more flexible modeling method than binary descriptors to take account of larger contributions from cross terms or nonlinearity. However, our models produced very similar results on the discrimination of intestinal permeability using binary and VHSE descriptors; no statistically significant difference was observed in their ROC scores for the test sets.
The models most widely used for predicting passive intestinal absorption are drug-likeness prediction models such as the Rule of 5 model introduced by Lipinski et al. ; they have the advantages of being simple, easy to interpret and quick to compute. In general, such approaches are formulated on the basis of group additive methods, so the predicted intestinal permeability is similar for peptides consisting of the same numbers and types of amino acids, even though they may have different sequences. However, our analysis showed that the intestinal permeability of a peptide depends on its sequence (Figure 2) and cannot be explained simply by using the drug-likeness prediction models of passive transport. Because of its large size, the peptide-phage complex is expected to be transported across the intestinal barrier by other mechanisms such as transcytosis.
Systemic delivery of macromolecules via the oral pathway remains one of the most challenging problems in the drug delivery field, and transcytosis may be a mechanism for transporting therapeutic agents across the intestinal barrier. If a carrier molecule, either a natural ligand or an antibody binding to a transcytotic receptor on the intestinal epithelium, is covalently bound to a therapeutic agent by a short linker, the conjugate can bind to the cognate receptor and undergo vesicular trafficking across the intestinal barrier . This 'carrier-drug conjugate' approach has been tried using drugs conjugated to immunoglobulin G (IgG), lactoferrin, transferrin or folic acid, all of which have cognate transcytosis receptors in enterocytes . To utilize the transcytosis mechanism for an oral drug delivery system, it is essential to identify ligands that can bind to the receptors and facilitate efficient transcytosis across the intestinal barrier. Our QSAR study on the selection of intestinal barrier-permeable peptides should be applicable to the development of peptide 'carriers' for delivering large molecules such as proteins and drugs.
We used artificial neural networks to develop the first models for predicting the intestinal permeabilities of peptides on the basis of sequence information. The high quality models obtained were capable of making reliable predictions. These models are expected to find applications in the selection of intestinal barrier-permeable peptides from large peptide libraries, and the selected peptides might be used to facilitate the transport of large molecules across the intestinal barrier.
Preparation of intestinal barrier-permeable peptides
The positive control data set of peptides that can cross the intestinal barrier was obtained from the 852 heptapeptide sequences identified by the peroral phage display experiments. The negative control data set was generated from random sequences that had the same frequencies of occurrence of each amino acid residue as in the Ph.D.-C7C phage library. The random sequences were then compared with the positive control data and any common sequences were removed from the negative control data. For 1:1 data sets, the positive and negative control data comprised the same number of peptides. To evaluate the effect of data size, we also generated second data set in which there was three times more negative control than positive control data. The 2556 random heptapeptide sequences were used as the negative control in 1:3 data set. About 80% of each data set was used for network training and the remaining data were used for the test set to validate the trained network.
Two types of amino acid descriptors, binary and VHSE, were used to encode important features of individual peptide sequences. The binary descriptor used a set of 20 binary digits to encode each amino acid (all zeros except for the one characterizing the given amino acid) . For example, 7 × 20 = 140 variables were used to encode a heptapeptide. The VHSE descriptor is a property descriptor composed of 8 variables for each amino acid and characterizes the hydrophobic, steric and electronic properties of the 20 coded amino acids . For a heptapeptide, 7 × 8 = 56 variables were used to build models based on this descriptor.
Neural network model
We used the machine-learning method to drive structure-activity relationships. The calculations were carried out on a Pentium 2.2 GHz machine using the nnet of the VR 7.2 package  for feed-forward neural networks with a single hidden layer and for multinomial log-linear models. We used a three-layer neural network architecture containing a single hidden layer in which the number of neurons was increased from 0 to 3. This network consisted of a multilayer system of neurons, with each neuron in a given layer fully connected to all the neurons in the two adjacent levels. A neural network was trained to map a set of input data to a corresponding set of output data by iterative adjustment of the weights. The activation function of the hidden layer units is the logistic function and the output units are linear. The Broyden-Fletcher-Goldfarb-Shanno (BFGS) method was used as the optimization function. To help the optimization process and to avoid over-fitting, the weight decay was set at 0.001. The maximum number of iterations for network training was 50,000 and the other parameters were given the default values set by the nnet of the VR 7.2 package. Before the learning network was applied, the input value of the positive control was 0.9 and that of the negative control was 0.1.
To score the models, the ROC score, which is the area under the ROC curve , was used for each training and test set. The score is 1 for a perfect classification and 0.5 for a random classification. All the ROC scores reported were generated from a leave-group-out cross-validation of real and decoy set.
Validation using decoy set
We prepared supplementary model trained with decoy set  and compared that model with the model trained with real data set for the ability to discriminate between intestinal barrier-permeable and impermeable peptides. The positive as well as the negative control data of the decoy set was generated from random peptides. The decoy set was prepared carefully to ensure that there was no redundant peptide in the positive control data and no overlap between the positive and the negative control data. To ensure the consistency of data, non-redundant peptide subset of the real set was also prepared; the positive control data set prepared comprised 677 peptides.
This work was supported by a grant (Code: 20050401034696) from BioGreen 21 Program, Rural Development Administration, Republic of Korea.
- Yang CY, Dantzig AH, Pidgeon C: Intestinal peptide transport systems and oral drug availability. Pharm Res 1999, 16: 1331–1343. 10.1023/A:1018982505021View ArticlePubMed
- Fujikawa M, Ano R, Nakao K, Shimizu R, Akamatsu M: Relationships between structure and high-throughput screening permeability of diverse drugs with artificial membranes: Application to prediction of Caco-2 cell permeability. Bioorganic & Medicinal Chemistry 2005, 13: 4721–4732. 10.1016/j.bmc.2005.04.076View Article
- Egan WJ, Lauri G: Prediction of intestinal permeability. Advanced Drug Delivery Reviews 2002, 54: 273–289. 10.1016/S0169-409X(02)00004-2View ArticlePubMed
- Lin J, Sahakian DC, de Morais SM, Xu JJ, Polzer RJ, Winter SM: The role of absorption, distribution, metabolism, excretion and toxicity in drug discovery. Curr Top Med Chem 2003, 3: 1125–1154. 10.2174/1568026033452096View ArticlePubMed
- Liang R, Fei YJ, Prasad PD, Rammamoorthy S, Han H, Yang-Feng TL, Hediger MA, Ganapathy V, Leibach FH: Human intestinal H+/Peptide cotransporter. Cloning, functional expression, and chromosomal localization. J Biol Chem 1995, 270: 6456–6463. 10.1074/jbc.270.12.6456View ArticlePubMed
- Tamai I, Takanaga H, Maeda H, Sai Y, Ogihara T, Higashida H, Tsuji A: Participation of a proton-cotransporter, MCT1, in the intestinal transport of monocarboxylic acids. Biochem Biophys Res Commun 1995, 214: 482–489. 10.1006/bbrc.1995.2312View ArticlePubMed
- Ueda K, Cornwell MM, Gottesman MM, Pastan I, Roninson IB, Ling V, Riordan JR: The mdr1 gene, responsible for multidrug-resistance, codes for P-glycoprotein. Biochem Biophys Res Commun 1986, 141: 956–962. 10.1016/S0006-291X(86)80136-XView ArticlePubMed
- Pade V, Stavchansky S: Link between drug absorption solubility and permeability measurements in Caco-2 cells. J Pharm Sci 1998, 87: 1604–1607. 10.1021/js980111kView ArticlePubMed
- Camenisch G, Alsenz J, van de Waterbeemd H, Folkers G: Estimation of permeability by passive diffusion through Caco-2 cell monolayers using the drugs' lipophilicity and molecular weight. Eur J Pharm Sci 1998, 6: 317–324.PubMed
- Neuhott S, Unqell AL, Zamora I, Artursson P: pH-Dependent passive and active transport of acidic drugs across Caco-2 cell monolayers. Eur J Pharm Sci 2005, 25: 211–220.View Article
- Klopman G, Stefan LR, Saiakhov RD: ADME evaluation. 2. A computer model for the prediction of intestinal absorption in humans. Eur J Pharm Sci 2002, 17: 253–263. 10.1016/S0928-0987(02)00219-1View ArticlePubMed
- Hou TJ, Zhang W, Xia K, Qiao XB, Xu XJ: ADME evaluation in drug discovery. 5. Correlation of Caco-2 permeation with simple molecular properties. J Chem Inf Comput Sci 2004, 44: 1585–1600. 10.1021/ci049884mView ArticlePubMed
- Ren S, Lien EJ: Caco-2 cell permeability vs human gastrointestinal absorption: QSPR analysis. Prog Drug Res 2000, 54: 1–23.View ArticlePubMed
- Kulkarni A, Han Y, Hopfinger AJ: Predicting Caco-2 cell permeation coefficients of organic molecules using membrane-interaction QSAR analysis. J Chem Inf Comput Sci 2002, 42: 331–342. 10.1021/ci010108dView ArticlePubMed
- Seibert KJ: Quantitative structure-activity relationship modeling of peptide and protein behavior as a function of amino acid composition. J Agric Food Chem 2001, 49: 851–858. 10.1021/jf000718yView Article
- Wu J, Aluko RE, Nakai S: Structural requirements of Angiotensin I-converting enzyme inhibitory peptides: quantitative structure-activity relationship study of di- and tripeptides. J Agric Food Chem 2006, 54: 732–738. 10.1021/jf051263lView ArticlePubMed
- Burden FR, Winkler DA: Predictive Bayesian neural network models of MHC class II peptide binding. J Mol Graph Model 2005, 23: 481–489. 10.1016/j.jmgm.2005.03.001View ArticlePubMed
- Guan P, Doytchinova IA, Walshe VA, Borrow P, Flower DR: Analysis of peptide-protein binding using amino acid descriptors: Prediction and experimental verification for human histocompatibility complex HLA-A*0201. J Med Chem 2005, 48: 7418–7425. 10.1021/jm0505258View ArticlePubMed
- Hou T, McLaughlin W, Lu B, Chen K, Wang W: Prediction of binding affinities between the human amphiphysin-1 SH3 domain and its peptide ligands using homology modeling, molecular dynamics and molecular field analysis. J Proteome Res 2006, 5: 32–43. 10.1021/pr0502267View ArticlePubMed
- Kennedy T: Managing the drug discovery/development interface. Drug Discov Today 1997, 2: 436–444. 10.1016/S1359-6446(97)01099-4View Article
- Prentis RA, Lis Y, Walker SR: Pharmaceutical innovation by the seven UK-owned pharmaceutical companies (1964–1985). Br J Clin Pharmacol 1988, 25: 387–396.PubMed CentralView ArticlePubMed
- Gebauer S, Knutter I, Hartrodt B, Brandsch M, Neubert K, Thondorf I: Three-dimensional quantitative structure-activity relationship analyses of peptide substrates of the mammalian H+/peptide cotransporter PEPT1. J Med Chem 2003, 46: 5725–5734. 10.1021/jm030976xView ArticlePubMed
- Biegel A, Gebauer S, Hartrodt B, Brandsch M, Neubert K, Thondorf I: Three-dimensional quantitative structure-activity relationship analyses of β-lactam antibiotics and tripeptides as substrates of the mammalian H+/Peptide cotransporter PEPT1. J Med Chem 2005, 48: 4410–4419. 10.1021/jm048982wView ArticlePubMed
- Andersen R, Jorgensen FS, Olsen L, Vabeno J, Thorn K, Nielsen CU, Steffansen B: Development of a QSAR model for binding of tripeptides and tripeptidomimetics to the human intestinal di-/tripeptide transporter hPEPT1. Pharm Res 2006, 23: 483–492. 10.1007/s11095-006-9462-yView ArticlePubMed
- Wessel MD, Jurs PC, Tolan JW, Muskal SM: Prediction of human intestinal absorption of drug compounds from molecular structure. J Chem Inf Comput Sci 1998, 38: 726–735. 10.1021/ci980029aView ArticlePubMed
- Polley MJ, Burden FR, Winkler DA: Predictive human intestinal absorption QSAR models using Bayesian regularized neural networks. Aust J Chem 2005, 58: 859–863. 10.1071/CH05202View Article
- Creighton TE: Proteins: Structure and molecular properties. Volume 154. 2nd edition. WH Freeman; 1992:154.
- Cramer RD, Bunce JD, Patterson DE, Frank IE: Crossvalidation, bootstrapping, and partial least squares compared with multiple regression in conventional QSAR studies. Quant Struct-Act Relat 1988, 7: 18–25. 10.1002/qsar.19880070105View Article
- Burden FR, Winkler DA: Robust QSAR models Bayesian regularized neural networks. J Med Chem 1999, 42: 3183–3187. 10.1021/jm980697nView ArticlePubMed
- Lipinski CA, Lombardo F, Dominy BW, Feeney PJ: Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 2001, 46: 3–26. 10.1016/S0169-409X(00)00129-0View ArticlePubMed
- Ivanenkov VV, Menon AG: Peptide-mediated transcytosis of phage display vectors in MDCK cells. Biochem Biophys Res Commun 2000, 276: 251–257. 10.1006/bbrc.2000.3358View ArticlePubMed
- Swaan PW: Recent advances in intestinal macromolecular drug delivery via receptor-mediated transport pathways. Pharm Res 1998, 15: 826–834. 10.1023/A:1011908128045View ArticlePubMed
- Mei H, Lian ZH, Zhou Y, Li SZ: A new set of amino acid descriptors and its application in peptide QSARs. Biopolymer (Peptide Science) 2005, 80: 775–786. 10.1002/bip.20296View Article
- The nnet of VR 7.2 package[http://www.r-project.org/]
- Hanley JA, McNeil BJ: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982, 143: 29–36.View ArticlePubMed
- Springer C, Adalsteinsson H, Young MM, Kegelmeyer PW, Roe DC: PostDock: a structural, empirical approach to scoring protein ligand complexes. J Med Chem 2005, 48: 6821–6831. 10.1021/jm0493360View ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.