Using commutative hashes to optimize database queries
Categories: Technical | Tags: Database Hash Optimization
Using commutative hashes to optimize database queries
On a simple graph-database, structured as two tables of nodes and edges, we may optimize the database querying using hashes. For example we take a graph representing the word-relations in a text corpora. The database is declared as follows:

Two nodes are referenced by a single edge. The SQL-Query for picking up a specific edge by given nodes X and Y looks like this:
SELECT * FROM EDGE
WHERE( FIRST_NODE=X AND SECOND_NODE=Y )
OR
( FIRST_NODE=Y AND SECOND_NODE=X )
On larger databases the executing of this query may become to slow. The simplest way to optimize the query is to build a hash value from the both nodes and query the edge by hash.
SELECT * FROM EDGE WHERE HASH=?
In this case the hash must be commutative
H(X,Y) = H(Y,X) := H(X) ^ H(Y) = H(Y) ^ H(X)
For example a java stub for calculating a commutative hash
public static int hashCode(Object first, Object second){return first.hashCode() ^ second.hashCode();}
Using hashes this way, allowes to speeding up the query execution on large databases.
July 28, 2011 | Share:





