Skip to main content
Figure 7 | BMC Bioinformatics

Figure 7

From: GraphFind: enhancing graph searching by low support data mining techniques

Figure 7

GraphFind system. Preprocessing (Step 1): Store each graph in the database in a set of Berkeley DB tables each corresponding to a label-path-set; that is the set of the id-paths (e.g. (3,0), (3,2) in g1) of all the paths representing a label sequence (e.g. CB in g1). For each graph only some label-path-sets are shown. The maximum length (number of edges) of a label-path is l p =3. The fingerprint (index) of a database is a Berkeley DB hash table where each entry in a column is the number of occurrences of a label-path in that graph. Querying: Construct the query fingerprint (l p =3)(Step 2). Compare the fingerprint of the query with the database fingerprint (Step 3): a database graph, for which at least one value in its fingerprint is less than the corresponding value in the fingerprint of the query, is filtered out (Step 4). g2 and g3 are not selected as candidates since they do not contain the path ABCA. Decompose the query into patterns (Step 5) (l p =3). From each candidate graph, select the label-path-sets corresponding to the patterns in the query (Step 6) and combine the id-paths of such tables following the query decomposition criteria. In the patterns (C B, A*BC A*), only labels with equal marks (e.g. _, *) represent the same node occurrences. For example, (1,0,3,1) can not be combined with (3,0) because the nodes labeled B must be different (same motivation applies to (1,2,3,1) and (3,2)). The subgraph obtained by combining (1,2,3,1) and (3,0) is shown in “Filtered Database (2)”. They are the only subgraphs that may match the query. Subgraph matching will be performed by applying the VF2 algorithm [11] to those subgraphs instead of to the entire graphs.

Back to article page