Enumerating tree-like chemical graphs with given upper and lower bounds on path frequencies
- Masaaki Shimizu^{1},
- Hiroshi Nagamochi^{1}Email author and
- Tatsuya Akutsu^{2}
https://doi.org/10.1186/1471-2105-12-S14-S3
© Shimizu et al; licensee BioMed Central Ltd. 2011
Published: 14 December 2011
Abstract
Background
Enumeration of chemical graphs satisfying given constraints is one of the fundamental problems in chemoinformatics and bioinformatics since it leads to a variety of useful applications including structure determination of novel chemical compounds and drug design.
Results
In this paper, we consider the problem of enumerating all tree-like chemical graphs from a given set of feature vectors, which is specified by a pair of upper and lower feature vectors, where a feature vector represents the frequency of prescribed paths in a chemical compound to be constructed. This problem can be solved by applying the algorithm proposed by Ishida et al. to each single feature vector in the given set, but this method may take much computation time because in general there are many feature vectors in a given set. We propose a new exact branch-and-bound algorithm for the problem so that all the feature vectors in a given set are handled directly. Since we cannot use the bounding operation proposed by Ishida et al. due to upper and lower constraints, we introduce new bounding operations based on upper and lower feature vectors, a bond constraint, and a detachment condition.
Conclusions
Our proposed algorithm is useful for enumerating tree-like chemical graphs with given upper and lower bounds on path frequencies.
Keywords
Introduction
Development of novel drugs is one of the major goals in chemoinformatics and bioinformatics. To achieve this purpose, it is important not only to investigate common chemical properties over chemical compounds having common structural patterns [1–3] but also to study methods of enumerating chemical structures satisfying given constraints. The enumeration of chemical structures has a long history. Actually, Cayley [4] considered the enumeration of structural isomers of alkanes in the 19th century. Applications for the enumeration of chemical compounds include structure determination using mass-spectrum and/or NMR-spectrum [5, 6], virtual exploration of chemical universe [7, 8], reconstruction of molecular structures from their signatures [9, 10], and classification of chemical compounds [11].
In the field of machine learning, the pre-image problem [12, 13] has been studied. In this problem, a desired object is computed as a feature vector in a feature space, and then the feature vector is mapped back to the input space, where this mapped back object is called a pre-image. The definition of the feature vectors based on the frequency of labeled paths [14, 15] or small fragments [11, 16] has been widely used. Akutsu and Fukagawa [17] formulated the graph pre-image problem as the problem of inferring graphs from the frequency of paths of labeled vertices, which corresponds to the pre-image problem, and proved that the problem is NP-hard even for planar graphs with bounded degrees [17]. Nagamochi [18] proved that a graph determined by frequency of paths with length 1 can be found in polynomial time if any.
To enumerate tree-like chemical graphs, Fujiwara et al. [19] proposed a branch-and-bound algorithm which consists of a branching procedure based on the tree enumeration algorithm due to Nakano and Uno [20, 21] and bounding operations designed by the path frequency and the atom-atom bonds. In addition, to reduce the size of search trees, Ishida et al. [22] introduced a new bounding operation, called the detachment-cut, based on the result by Nagamochi [18]. Implementations of the algorithm proposed by Ishida et al. [22] are available at a web server (http://sunflower.kuicr.kyoto-u.ac.jp/tools/enumol/) for enumerating tree-like chemical graphs with given path frequency. However, an instance with constraint which is specified by one feature vector admits no solution in many cases. Therefore, it is needed to introduce a more relaxed constraint than a single feature vector to obtain some solutions in the tree-like chemical graph enumeration problem.
In this paper, we are given a set of feature vectors, which is specified by a pair of upper and lower feature vectors, and enumerate all tree-like chemical graphs satisfying one of the vectors. It seems that this can be done by simply applying the algorithm proposed by Ishida et al. to each single feature vector in the given set. However, this method will take much computation time because in general there are many feature vectors in a given set. We propose a new exact branch-and-bound algorithm for the problem so that all the feature vectors in a given set are handled directly.
Methods
Preliminaries and problem formulation
Let deg(v; G) denote the degree of a vertex v in a graph G. The tree-like chemical graph enumeration problem with given one feature vector can be formulated as follows [19].
Enumeration of Tree-like chemical graphs with given Path Frequency (ETPF)
Given a set Σ of labels, a valence function val : Σ → ℤ_{+} and a feature vector g of level K, find all (Σ, val)-labeled multitrees T such that f_{ K }(T) = g and deg(v;T) = val(ℓ(v)) for all vertices v ∈ V(T).
Observe that a large number of chemical compounds contain a high proportion of hydrogens. Based on this fact, another model can be considered in the problem ETPF by removing all hydrogen atoms. These two different models were proposed by Fujiwara et al. [19] and Ishida [23].
Enumeration of Tree-like chemical graphs with given Upper and Lower bounds on path Frequencies (ETULF)
Given a set Σ of labels, a valence function val : Σ → ℤ_{+} and feature vectors g_{ U } and g_{ L } of level K (g_{ L } ≤ g_{ U }), find all (Σ, val)-labeled multitrees T such that g_{ L } ≤ f_{ K }(T) ≤ g_{ U } and deg(v;T) = val(ℓ(v)) for all vertices v ∈ V(T).
For the problem ETULF, we assume that g_{ L }(ℓ) = g_{ U }(ℓ) for an atom type ℓ ∈ Σ, where g(L) denotes the entry in g that corresponds to a label sequence L (thus g(ℓ) specifies the number of vertices of label ℓ) and that g_{ L }(L) ≤ g_{ U }(L) for any label sequence L (|L| ≥ 2).
Note that the number n of vertices is given by Σ_{ℓ∈Σ}g(ℓ). To solve the problem ETULF, we start with an empty graph, and repeatedly extend the current tree T by appending a new vertex with each label ℓ ∈ Σ to obtain a valid tree (a tree that does not violate any constraints on output trees) one by one until we get n vertices. In order to avoid duplicate outputs, we follow the branch-and-bound framework of Fujiwara et al. [19], which first defines a canonical representation for isomorphic trees, and then lists them using the algorithm of Nakano and Uno [20, 21] (the branching operation) discarding invalid trees with some bounding operations. Since we cannot directly use the bounding operation proposed by Ishida et al. [22] due to upper and lower constraints, we introduce some new bounding operations.
Canonical representation of trees and the branching operation
In this section, we explain a canonical representation of trees introduced by Fujiwara et al. [19] and the branching operation based on the canonical representation.
First of all, we introduce a root of a tree based on the following theorem.
Theorem 1 (Jordan [24]) For any tree with n′ vertices, either there exists a unique vertex v* such that each subtree obtained by removing v* contains at most vertices, or there exists a unique edge e* such that both of the subtrees obtained by removing e* contain exactly vertices.
Such a vertex v* and an edge e* in Theorem 1 are called unicentroid and bicentroid, respectively. Either unicentroid or bicentroid is called as centroid. Note that there exists a bicentroid only for an even n′. Since a case of bicentroid is similar to a case of unicentroid, now we only explain a case of unicentroid.
Given an arbitrary order of labels, we define the order of depth-label sequences as follows. For any T_{1} and T_{2}, we denote L(T_{1}) >L(T_{2}) if L(T_{1}) is lexicographically larger than L(T_{2}). Then the canonical representation of a rooted tree is defined by the largest depth-label sequence among all its plane embeddings. Actually this is equivalent to the left-heavy plane embedding [20, 21].
Thus our branching task is to list all centroid-rooted left-heavy trees with n vertices and m (= |Σ|) labels. Following the scheme [20, 21], we define a parent-child relation between two left-heavy trees. The parent P(T) of a left-heavy tree T is obtained from T by removing its rightmost leaf. Clearly P(T) is still left-heavy In this way, we can define a family tree of left-heavy trees whose leaves are exactly what we want to obtain.
Therefore we only need to enumerate the (leaf) nodes of . This can be done by starting from the empty tree (the root node of ) and repeatedly appending a new leaf to some appropriate place on the rightmost path of the current tree. Our branching operation employs the algorithm of Nakano and Uno [20, 21], which extends the current tree T (i.e., finds a child of T) in constant time [19].
Bounding operations
In this section, we explain how to check the validity of the current tree T. If we can conclude that T and all its descendants are not valid, then we can discard T. Our bounding operation discards T if at least one of the following criteria is violated:
(C1) The root of T remains the centroid of an output (the centroid constraint);
(C2) deg(v;T) ≤ val(l(v)) for all v ∈ V(T) (the valence constraint);
(C3) f_{ K }(T) ≤ g_{ U }, and |T| = n and g_{ L } ≤ f_{ K }(T) (the feature vector constraint);
(C4) T can be extended to a connected and loopless tree with n vertices (the detachment constraint);
(C5) T can have a descendant which has an appropriate number of multiple bonds (the multiplicity constraint).
(C1) and (C2) are the same as the work by Fujiwara et al. [19] and not difficult to check. (C3) and (C4) are different from the work by Fujiwara et al. [19] and Ishida et al. [22] due to upper and lower constraints. (C5) is a new bounding operation that we propose in this paper. In the following three subsections, we will discuss three bounding operations resulting from (C3), (C4), and (C5), called as feature-vector-cut, detachment-cut, and multiplicity-cut, respectively.
Feature-vector-cut procedure
In the problem ETULF, we cannot use the bounding operation proposed by Fujiwara et al. [19] directly due to upper and lower feature vectors, but we can introduce a bounding operation based on upper and lower feature vectors by modifying Fujiwara et al.’s work slightly.
If T violates (1), then we discard T.
If T violates (2), then we discard T.
Detachment-cut procedure
This subsection describes the definition of detachment [18] and a new bounding operation based on it for the problem ETULF. Let G be a multigraph that may have self-loops, which represents the graph obtained from a chemical graph H by contracting the vertices with the same label into a single vertex, where each vertex in G corresponds a label in H (note that we do not eliminate any edges in H in contracting vertices to obtain G). A process of regaining H from G is described as follows. Given a function r : V(G) → ℤ_{+}, an r-detachment H of G is a multigraph obtained from G by splitting each vertex v ∈ V(G) into a set of r(v) copies of v, denoted by W_{ v } = {v^{1}, v^{2} …, v^{ r }^{(}^{ v }^{)}}, so that each edge {u, v} ∈ E(G) joins some vertices u^{ i } ∈ W_{ u } and v^{ j } ∈ W_{ v }. Hence an r-detachment H of G is not unique in general. A self-loop {u, u} in G may be mapped to a self-loop {u^{ i },u^{ i }} or a non-loop edge {u^{ i },u^{ j }} in a detachment H of G. Note that, for all vertex pairs {u, v} ∈ V(G), the number of edges between subsets W_{ u } and W_{ v } in H is equal to that of edges between vertices u and v in G.
where r(X) = Σ _{ v }_{∈} _{ X }r(v), c(G′) denotes the number of connected components of a graph G′, G – X denotes the graph obtained from a graph G by removing the vertices in X together with all edges incident to vertices in X, and d(A, B; G) denotes the number of edges (u, v) ∈ E with u ∈ A and v ∈ B.
Ishida et al. [22] proposed a bounding operation for the problem ETPF based on Theorem 2. However, we cannot use the bounding operation proposed by Ishida et al. for the problem ETULF due to upper and lower constraints. We now describe our new bounding operation based on detachments for the problem ETULF. The new bounding operation, called detachment-cut tests whether the current multitree T has a multitree that is consistent with given path frequencies among its descendants in the family tree, based on the difference between the feature vector f_{ K }(T) and the input feature vectors g_{ U } and g_{ L }.
- (a)
- (b)
r(X) + c(G_{ U } – X) – d(X, V_{ U }; G_{ U }) ≤ 1 (∀X ⊆ V_{ U }, X ≠ ∅).
In the first condition, we check whether the number of the rest of bonds is large enough to satisfy the lower feature vector constraint. In the second condition, we check whether T has a connected and loopless descendant based on G_{ U } and Theorem 2.
Multiplicity-cut procedure
which means that only M edges are used to construct multiple bonds in an output tree. Note that M ≥ 0. We calculate M from an input of the problem ETULF before the enumeration algorithm starts.
Now we describe the multiplicity-cut based on M(T) and M.
Results
This section reports the experimental results of our algorithm. First of all, we mention that the problem ETULF can be solved by applying the algorithm proposed by Ishida et al. [22] to each single feature vector in a given set of feature vectors, i.e., the problem ETULF can regard as a set of the problem ETPF. Then we call an algorithm for the problem ETULF based on the algorithm proposed by Ishida et al. RepEnum (Repeated Enumeration). On the other hand, we call our algorithm SimEnum (Simultaneous Enumeration). It is to be noted that RepEnum is one of the fastest tools to enumerate tree-like chemical structures from a given molecular formula (i.e., feature vector with K = 0) [22] and, to our knowledge, there does not exist any other available tool to enumerate chemical structures from a given feature vector based on path frequency (i.e., feature vector with general K).
Now we compare the performances of two algorithms, SimEnum and RepEnum, and we also compare the performances of two algorithms, SimEnum including multiplicity-cut and SimEnum not including multiplicity-cut. We have tested the algorithm SimEnum for some widths between upper and lower feature vectors. Tests were carried out on a PC with CPU AMD Athlon Dual Core Processor 5050e using instances based on some chemical compounds selected from the KEGG LIGAND database [25] (http://www.genome.jp/ligand/). Note that we treat a benzene ring contained in these compounds as a new virtual atom of valence six.
We define w ∈ ℤ_{+} to be a width between upper and lower feature vectors. From a feature vector g, we construct two feature vectors g_{ U } and g_{ L } as follows. For each entry a > 0 of g, let g_{ U } be the upper feature vector, where each entry a_{ U } is given by a + w and g_{ L } be the lower one, where each entry a_{ L } is given by max{0, a – w}. Note that if w = 0, then an instance for the problem ETULF is equivalent for the problem ETPF.
Comparison of previous method and our method
Entry Formula | SimEnum | RepEnum | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
n | K | w | f _{ v } | time (s) | nodes | solutions | time (s) | nodes | solutions | solved | |
1 | 1 | 3^{6} | 1037.04 | 177,074,686 | 414,890 | 163.32 | 44,340,488 | 414,890 | 729 | ||
2 | 1 | 3^{18} | 2.97 | 392,246 | 44 | T.O. | 2,381,360,000 | N.F. | 65,909,572 | ||
3 | 1 | 3^{34} | 1.22 | 145,213 | 2 | T.O. | 3,293,260,000 | N.F. | 96,860,588 | ||
C00062 | 26 | 4 | 1 | 3^{53} | 0.33 | 34,539 | 1 | T.O. | 2,780,050,000 | N.F. | 81,766,176 |
C_{6}H_{14}N_{2}O_{4} | 5 | 1 | 3^{71} | 0.24 | 20,361 | 1 | T.O. | 1,561,230,000 | N.F. | 45,918,529 | |
6 | 1 | 3^{85} | 0.25 | 15,166 | 1 | T.O. | 569,590,000 | N.F. | 16,752,647 | ||
7 | 1 | 3^{96} | 0.18 | 14,547 | 1 | T.O. | 79,870,000 | N.F. | 2,349,117 | ||
1 | 1 | 3^{6} | T.O. | 377,260,000 | N.F. | T.O. | 413,000,000 | N.F. | 460 | ||
2 | 1 | 3^{18} | 7.24 | 845,760 | 25 | T.O. | 1,442,760,000 | N.F. | 70,175,902 | ||
3 | 1 | 3^{31} | 2.81 | 307,151 | 7 | T.O. | 3,316,970,000 | N.F. | 195,115,882 | ||
C03343 | 37 | 4 | 1 | 3^{47} | 1.03 | 99,945 | 1 | T.O. | 2,494,780,000 | N.F. | 146,751,764 |
C_{16}H_{22}O_{4} | 5 | 1 | 3^{64} | 0.98 | 87,600 | 1 | T.O. | 1,050,480,000 | N.F. | 61,792,941 | |
6 | 1 | 3^{82} | 0.76 | 60,194 | 1 | T.O. | 315,820,000 | N.F. | 18,577,647 | ||
7 | 1 | 3^{99} | 0.57 | 42,538 | 1 | T.O. | 41,450,000 | N.F. | 2,438,235 | ||
1 | 1 | 3^{8} | T.O. | 157,320,000 | N.F. | T.O. | 200,490,000 | N.F. | 1,388 | ||
2 | 1 | 3^{26} | 37.59 | 1,940,295 | 238 | T.O. | 2,911,390,000 | N.F. | 66,167,954 | ||
3 | 1 | 3^{48} | 1.71 | 60,792 | 3 | T.O. | 2,673,940,000 | N.F. | 60,771,363 | ||
C07178 | 46 | 4 | 1 | 3^{71} | 0.35 | 14,248 | 1 | T.O. | 1,925,490,000 | N.F. | 43,761,136 |
C_{21}H_{28}N_{2}O_{5} | 5 | 1 | 3^{92} | 0.27 | 10,866 | 1 | T.O. | 743,940,000 | N.F. | 16,907,727 | |
6 | 1 | 3^{110} | 0.27 | 10,680 | 1 | T.O. | 93,880,000 | N.F. | 2,133,636 | ||
7 | 1 | 3^{125} | 0.24 | 9,276 | 1 | T.O. | 19,270,000 | N.F. | 437,954 | ||
1 | 1 | 3^{5} | T.O. | 382,470,000 | N.F. | T.O. | 552,290,000 | N.F. | 61 | ||
2 | 1 | 3^{16} | T.O. | 211,800,000 | N.F. | T.O. | 530,930,000 | N.F. | 10,451,912 | ||
3 | 1 | 3^{27} | 1395.13 | 144,244,042 | 206 | T.O. | 3,314,260,000 | N.F. | 194,956,470 | ||
C03690 | 61 | 4 | 1 | 3^{41} | 121.36 | 11,332,363 | 4 | T.O. | 2,392,530,000 | N.F. | 140,737,058 |
C_{24}H_{38}O_{4} | 5 | 1 | 3^{57} | 83.70 | 6,978,557 | 2 | T.O. | 958,650,000 | N.F. | 56,391,176 | |
6 | 1 | 3^{75} | 40.11 | 2,923,819 | 1 | T.O. | 298,600,000 | N.F. | 17,564,705 | ||
7 | 1 | 3^{92} | 16.50 | 1,096,128 | 1 | T.O. | 38,670,000 | N.F. | 2,274,705 |
Comparison of varying width
Entry Formula | SimEnum | |||||
---|---|---|---|---|---|---|
n | K | w | time (s) | nodes | solutions | |
2 | 0 | 0.51 | 55,196 | 6 | ||
2 | 1 | 3.58 | 400,501 | 44 | ||
2 | 2 | 7.58 | 835,509 | 503 | ||
C00062 | 26 | 2 | 3 | 10.84 | 1,163,548 | 2,351 |
C_{6}H_{14}N_{2}O_{4} | 2 | 4 | 12.55 | 1,349,057 | 5,430 | |
2 | 5 | 13.29 | 1,431,075 | 9,852 | ||
2 | 50 | 14.31 | 1,537,496 | 25,425 | ||
2 | 0 | 0.34 | 35,952 | 9 | ||
2 | 1 | 8.39 | 845,760 | 25 | ||
2 | 2 | 48.27 | 4,815,369 | 41 | ||
C03343 | 37 | 2 | 3 | 149.83 | 14,781,738 | 305 |
C_{16}H_{22}O_{4} | 2 | 4 | 377.01 | 37,435,878 | 40,732 | |
2 | 5 | 639.68 | 63,459,180 | 106,870 | ||
2 | 50 | 1118.75 | 110,703,034 | 510,079 | ||
2 | 0 | 2.33 | 111,781 | 16 | ||
2 | 1 | 46.81 | 2,246,578 | 238 | ||
2 | 2 | 96.52 | 4,715,072 | 1,375 | ||
C07178 | 46 | 2 | 3 | 152.18 | 7,420,060 | 6,824 |
C_{21}H_{28}N_{2}O_{5} | 2 | 4 | 179.42 | 8,744,563 | 19,180 | |
2 | 5 | 199.66 | 9,677,513 | 29,891 | ||
2 | 50 | 255.01 | 12,292,587 | 54,861 | ||
5 | 0 | 19.50 | 1,482,017 | 2 | ||
5 | 1 | 220.14 | 16,063,569 | 5 | ||
5 | 2 | 439.12 | 33,037,741 | 32 | ||
C03690 | 61 | 5 | 3 | 684.88 | 52,207,745 | 178 |
C_{24}H_{38}O_{4} | 5 | 4 | 1024.96 | 78,509,554 | 349 | |
5 | 5 | 1285.55 | 98,762,291 | 615 | ||
5 | 50 | T.O. | 136,835,134 | N.F. |
Here, we briefly discuss practical values on K and w though we do not have concrete evidence and these values depend on target classes of chemical compounds. It is suggested from the results on similar feature vectors [9, 10, 15] that K between 3 to 10 should be used. Though there is no previous result on w, it is seen from Table 2 that w cannot be large because there may exist too many solutions. Therefore, w less than 4 should be used.
Conclusions
We considered the problem of enumerating all tree-like chemical graphs from a given set of feature vectors, which is specified by upper and lower feature vectors based on frequencies of paths, and proposed a new exact branch-and-bound algorithm. Our experimental results show that our algorithm outperforms the naive algorithm based on a previous method. In comparison to the algorithm based on Ishida et al. [22], our algorithm can greatly reduce the number of search nodes and the computation time and enumerate all the feasible solutions in many instances.
However, the search space of the problem ETULF is much larger than that of the problem ETPF due to upper and lower constraints and in fact there are many search nodes for solving the problem ETULF by our algorithm. One of the future works is to improve the bounding operations, or introduce a new bounding operation. Actually, in the feature-vector-cut mentioned in subsection , information of a lower feature vector g_{ L } is only used if |T| = n. Another future work is to develop a web server that implements our proposed algorithm. Generalization of the proposed techniques for other types of kernel functions and other problems is also left as a future work.
Declarations
Acknowledgements
This work was partially supported by Grant-in-Aid #22240009 from Mext, Japan.
This article has been published as part of BMC Bioinformatics Volume 12 Supplement 14, 2011: 22nd International Conference on Genome Informatics: Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/12?issue=S14.
Authors’ Affiliations
References
- Bytautas L, Klein DJ: Chemical combinatorics for alkane-isomer enumeration and more. Journal of Chemical Information and Computer Sciences 1998, 38: 1063–1078. 10.1021/ci980095cGoogle Scholar
- Bytautas L, Klein DJ: Formula periodic table for acyclic hydrocarbon isomer classes: combinatorially averaged graph invariants. Physical Chemistry Chemical Physics 1999, 1: 5565–5572.View ArticleGoogle Scholar
- Bytautas L, Klein DJ: Isomer combinatorics for acyclic conjugated polyenes: enumeration and beyond. Theoretical Chemistry Accounts 1999, 101: 371–387. 10.1007/s002140050455View ArticleGoogle Scholar
- Cayley A: On the analytic forms called trees with applications to the theory of chemical combinations. Reports British Association for the Advancement of Science 1875, 45: 257–305.Google Scholar
- Buchanan BG, Feigenbaum EA: DENDRAL and Meta-DENDRAL: their applications dimension. Aritificial Intelligence 1978, 11: 5–24. 10.1016/0004-3702(78)90010-3View ArticleGoogle Scholar
- Funatsu K, Sasaki S: Recent advances in the automated structure elucidation system, CHEMICS. Utilization of two-dimensional NMR spectral information and development of peripheral functions for examination of candidates. Journal of Chemical Information and Computer Sciences 1996, 36: 190–204. 10.1021/ci950152rGoogle Scholar
- Fink T, Reymond JL: Virtual exploration of the chemical universe up to 11 atoms of C, N, O, F: assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new ring systems, stereochemistry, physicochemical properties, compound classes, and drug discovery. Journal of Chemical Information and Computer Sciences 2007, 47: 342–353. 10.1021/ci600423uGoogle Scholar
- Mauser H, Stahl M: Chemical fragment spaces for de novo design. Journal of Chemical Information and Computer Sciences 2007, 47: 318–324. 10.1021/ci6003652Google Scholar
- Faulon JL, Churchwell CJ, Jr DPV: The signature molecular descriptor. 2. Enumerating molecules from their extended valence sequences. Journal of Chemical Information and Computer Sciences 2003, 43: 721–734. 10.1021/ci020346oPubMedGoogle Scholar
- Hall LH, Dailey ES: Design of molecules from quantitative structure-activity relationship models. 3. Role of higher order path counts: path 3. Journal of Chemical Information and Computer Sciences 1993, 33: 598–603. 10.1021/ci00014a012Google Scholar
- Deshpande M, Kuramochi M, Wale N, Karypis G: Frequent substructure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering 2005, 17: 1036–1050.View ArticleGoogle Scholar
- Bakir GH, Weston J, Schölkopf B: Learning to find pre-images. Advances in Neural Information Processing Systems 2003, 16: 449–456.Google Scholar
- Bakir GH, Zien A, Tsuda K: Learning to find graph pre-images. Lecture Notes in Computer Science 2004, 3175: 253–261. 10.1007/978-3-540-28649-3_31View ArticleGoogle Scholar
- Kashima H, Tsuda K, Inokuchi A: Marginalized kernels between labeled graphs. Proceedings of the Twentieth International Conference on Machine Learning, AAAI Press 2003, 321–328.Google Scholar
- Mahé P, Ueda N, Akutsu T, Perret JL, Vert JP: Graph kernels for molecular structure-activity relationship analysis with support vector machines. Journal of Chemical Information and Modeling 2005, 45: 939–951. 10.1021/ci050039tView ArticlePubMedGoogle Scholar
- Byvatov E, Fechner U, Sadowski J, Schneider G: Comparison of support vector machine and artificial neural network systems for drug/nondrug classification. Journal of Chemical Information and Computer Sciences 2003, 43: 1882–1889. 10.1021/ci0341161PubMedGoogle Scholar
- Akutsu T, Fukagawa D: Inferring a graph from path frequency. Lecture Notes in Computer Science 2005, 3537: 371–392. 10.1007/11496656_32View ArticleGoogle Scholar
- Nagamochi H: A detachment algorithm for inferring a graph from path frequency. Algorithmica 2009, 53: 207–224. 10.1007/s00453-008-9184-0View ArticleGoogle Scholar
- Fujiwara H, Wang J, Zhao L, Nagamochi H, Akutsu T: Enumerating treelike chemical graphs with given path frequency. Journal of Chemical Information and Modeling 2008, 48: 1345–1357. 10.1021/ci700385aView ArticlePubMedGoogle Scholar
- Nakano S, Uno T: Generating colored trees. Lecture Notes in Computer Science 2005, 3787: 249–260. 10.1007/11604686_22View ArticleGoogle Scholar
- Nakano S, Uno T: Efficient generation of rooted trees. NII Technical Report NII-2003–005E 2003.Google Scholar
- Ishida Y, Zhao L, Nagamochi H, Akutsu T: Improved algorithms for enumerating tree-like chemical graphs with given path frequency. Genome Informatics 2008, 21: 53–64.View ArticlePubMedGoogle Scholar
- Ishida Y: Improved algorithms for enumerating tree-like chemical graphs with given path frequency. Master thesis of Graduate School of Informatics in Kyoto University 2008.Google Scholar
- Kvasnicka V, Pospichal J: Constructive enumeration of acyclic molecules. Collect Czech Chem Commun 1991, 56: 1777–1802. 10.1135/cccc19911777View ArticleGoogle Scholar
- Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M: KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res 2010, 36: D355-D360.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.