Tuesday 4 September 2007

Grokking hierarchical queries in Oracle

Oracle provides a very powerful SQL extension to evaluate hierarchical queries (transitive closures) on tables. This recently saved my bacon in a system which processed a table modelling a tree containing several hundred thousand rows. The alternative would have been to do some very ugly and inefficient iterative querying.

There's a few tutorials about Oracle hierarchical queries on the Web, but I didn't find any of them gave me a good mental model of how these queries are evaluated. In particular, they didn't really help me in figuring out how to traverse a tree structure either upwards or downwards. So here's my attempt at explaining some patterns for using hierarchical queries. (This isn't a complete tutorial - for that check the Oracle 10g documentation)

For modelling tree-structured data, the usual pattern is to have a table with id and parent_id columns. In a hierarchical query you may wish to traverse the tree either upwards (towards the root(s)) or downwards (towards the leaves). But what query syntax should be used to produce the desire direction?

The general syntax for CONNECT BY is:

CONNECT BY [PRIOR] col1 = [PRIOR] col2

A further constraint is that the PRIOR keyword can appear once only, but it can appear on either side of the equality condition. Since the equality condition is symmetric, there are really only two possibilities:
  1. CONNECT BY PRIOR parent_id = id (which is equivalent to CONNECT BY id = PRIOR parent_id)
  2. CONNECT BY PRIOR id = parent_id (which is equivalent to CONNECT BY parent_id = PRIOR id)
Oracle evaluates the hierarchical query in the following way:

1. The result rowset is initialized with the rows determined by the START WITH clause
2. All rows for which the CONNECT_BY condition is true are added to the result rowset. The PRIOR keyword determines which one of the condition expressions is evaluated in the context of the result rowset rows.
3. Step 2 is repeated until no further rows match.

Formally, this procedure computes the transitive closure of the relation defined by the initial START WITH set and the CONNECT BY condition.

Given this evaluation rule, we obtain the following rule-of-thumb for writing a query to traverse in a desired direction:
  • To traverse upwards, use form #1
  • To traverse downwards, use form #2
There's several other useful parameters for defining hierarchical queries in Oracle, such as NOCYCLE, LEVEL, CONNECT_BY_ROOT, CONNECT_BY_ISLEAF, etc. It's well worth studying how to use these to improve query power and performance.

No comments: