Skip to main content
Figure 9 | BMC Bioinformatics

Figure 9

From: BIOZON: a system for unification, management and analysis of heterogeneous biological data

Figure 9

Graphical representation of a fuzzy search. (a) Complex searches find paths in the data graph. In this pictorial representation, nodes in result paths must occur where sets of objects satisfying different search constraints intersect. Introducing similarity extends some query steps to include similar results, thus enabling the discovery of paths in the graph where none existed before. This graph illustrates a complex fuzzy search for structures of proteins that belong to enzyme family 1.1.1.1 and are involved in known interactions. Circles on the graph represent sets of matching documents, and where they intersect, there are matches. The dotted lines represent extensions to the sets based on similarity. Without similarity, the set of proteins with structures (P structures ) intersects with the set of proteins in enzyme family 1.1.1.1 (P1.1.1.1), meaning that there exists a protein with a structure that is a 1.1.1.1 enzyme. Likewise, P structure intersects with P interaction . However, there is no intersection between the three sets, and therefore no proteins that are in family 1.1.1.1 and involved in an interaction. Creating a fuzzy search with threshold of 1e-100 extends the set of 1.1.1.1 proteins but there are still no matching results. Increasing the threshold to 1e-50 produces the desired intersection, thus allowing connected paths spanning the entire query space. (b) Similarity may be introduced at multiple graph steps, further increasing the solution space to a complex query. For example, a search for E. Coli proteins that are members of enzyme families 1.1.1.145 and 5.3.3.1 returns no results. There are two possible areas in the query graph where similarity relations may be used to extend the query to fuzzy results: on proteins that are classified as 1.1.1.145, and on proteins that are classified as 5.3.3.1. When the evalue threshold is reduced to 1e-10 one protein (docID 737980) is returned with intriguing similarity to proteins that contain both domains. These proteins are observed in higher organisms as part of the estrogen, androgen and C21-Steroid hormone metabolism pathways.

Back to article page