Using commutative hashes to optimize database queries



 On a simple graph-database, structured as two tables of nodes and edges, we may optimize the database querying using hashes. For example we take a graph representing the word-relations in a text corpora. The database is declared as follows:

 

graphERM.png

 Two nodes are referenced by a single edge. The SQL-Query for picking up a specific edge by given nodes X and Y looks like this:

 
 SELECT * FROM EDGE 
  WHERE

    ( FIRST_NODE=X AND SECOND_NODE=Y  )   

OR  

( FIRST_NODE=Y AND SECOND_NODE=X  )

 On larger databases the executing of this query may become to slow. The simplest way to optimize the query is to build a hash value from the both nodes and query the edge by hash.

 
SELECT * FROM EDGE WHERE HASH=?
 
In this case the hash must be commutative
  H(X,Y) = H(Y,X) := H(X) ^ H(Y) = H(Y) ^ H(X)
 
For example a java stub for calculating a commutative hash
public static int hashCode(Object first, Object second){
   return first.hashCode() ^ second.hashCode();
}

Using hashes this way, allowes to speeding up the query execution on large databases.