Graphical representation of a fuzzy search. (a) Complex searches find paths in the data graph. In this pictorial representation, nodes in result paths must occur where sets of objects satisfying different search constraints intersect. Introducing similarity extends some query steps to include similar results, thus enabling the discovery of paths in the graph where none existed before. This graph illustrates a complex fuzzy search for structures of proteins that belong to enzyme family 126.96.36.199 and are involved in known interactions. Circles on the graph represent sets of matching documents, and where they intersect, there are matches. The dotted lines represent extensions to the sets based on similarity. Without similarity, the set of proteins with structures (P
) intersects with the set of proteins in enzyme family 188.8.131.52 (P184.108.40.206), meaning that there exists a protein with a structure that is a 220.127.116.11 enzyme. Likewise, P
intersects with P
. However, there is no intersection between the three sets, and therefore no proteins that are in family 18.104.22.168 and involved in an interaction. Creating a fuzzy search with threshold of 1e-100 extends the set of 22.214.171.124 proteins but there are still no matching results. Increasing the threshold to 1e-50 produces the desired intersection, thus allowing connected paths spanning the entire query space. (b) Similarity may be introduced at multiple graph steps, further increasing the solution space to a complex query. For example, a search for E. Coli proteins that are members of enzyme families 126.96.36.199 and 188.8.131.52 returns no results. There are two possible areas in the query graph where similarity relations may be used to extend the query to fuzzy results: on proteins that are classified as 184.108.40.206, and on proteins that are classified as 220.127.116.11. When the evalue threshold is reduced to 1e-10 one protein (docID 737980) is returned with intriguing similarity to proteins that contain both domains. These proteins are observed in higher organisms as part of the estrogen, androgen and C21-Steroid hormone metabolism pathways.