April 20, 2014

Moving Subtrees in Closure Table Hierarchies

Many software developers find they need to store hierarchical data, such as threaded comments, personnel org charts, or nested bill-of-materials. Sometimes it’s tricky to do this in SQL and still run efficient queries against the data. I’ll be presenting a webinar for Percona on February 28 at 9am PST. I’ll describe several solutions for storing and querying trees in an SQL database, including the design I call Closure Table.

In Closure Table, we store every path in a tree, not only direct parent-child references, but also grandparent-grandchild, and every other path, no matter how long. We even store paths of length zero, which means a node is its own parent. So if A is a parent of B, and B is a parent of C and C is a parent of D, we need to store the following paths: A-A, A-B, A-C, A-D, B-B, B-C, B-D, C-C, C-D, D-D. This makes it easy to query for all descendants of A, or all ancestors of D, or many other common queries that are difficult if you store hierarchies according to textbook solutions.

ancestor CHAR(1) NOT NULL,
descendant CHAR(1) NOT NULL,
PRIMARY KEY (ancestor, descendant)

Because there isn’t much written about using the Closure Table design, I periodically get questions about how to solve certain problems. Here’s one I got this week (paraphrased):

I’m using Closure Table, which I learned about from your book and your presentations. I can neither find nor invent a pure-SQL solution for moving a subtree to a new position in my tree. Right now, I’m reading the nodes of the subtree into my host script, deleting them and re-inserting them one by one, which is awful. Are you aware of a more efficient solution for moving a subtree in this design?

Moving subtrees can be tricky in both Closure Table and the Nested Sets model.  It can be easier with the Adjacency List design, but in that design you don’t have a convenient way to query for all nodes of a tree. Everything has tradeoffs.

In Closure Table, remember that adding a node involves copying some of the existing paths, while changing the endpoint for descendant.  When you’re inserting a single new node, you just have one descendant to add, joined to all the paths of its ancestors.

Here’s a diagram of a skinny tree lying on its side.  A is the root of the tree, and C is the leaf.  We want to add new node D.

A -> B -> C …add… D

The Closure Table already has paths A-A, A-B, A-C, B-B, B-C, C-C.  You want new paths A-D, B-D, C-D, D-D.  Basically you need to copy any path terminating with C (the immediate parent of D), and change the endpoint of that path to D.

That is, copying A-C, B-C, C-C and changing the right C to D gives A-D, B-D, C-D.  Then add your reflexive path D-D manually.

INSERT INTO TreePaths (ancestor, descendant, length)
SELECT t.ancestor, 'D', t.length+1
FROM TreePaths AS t
WHERE t.descendant = 'C'
SELECT 'D', 'D', 0;

The above is the simple case of adding a single new node.

When you insert a subtree, you’re inserting multiple nodes, and you want to create new paths for each one, as many new paths as the number of ancestors of the new location times the number of nodes in the subtree.  Suppose A has another child F, and below F we have a subtree D -> E (see illustration):

We want to move the subtree starting at D, and place it under B.

Before we move the subtree, we need to disconnect it from its old position in the tree. Delete the outdated paths for the old location of the D subtree (but not the intra-subtree paths). Conceptually, we might try the following SQL statement:

WHERE descendant IN (SELECT descendant FROM TreePaths WHERE ancestor = 'D')
AND ancestor NOT IN (SELECT descendant FROM TreePaths WHERE ancestor = 'D');

But MySQL complains: “You can’t specify target table ‘TreePaths’ for update in FROM clause.” We can’t DELETE and SELECT from the same table in a single query in MySQL. But we can use MySQL’s multi-table DELETE syntax, to find any path that terminate at a descendant of D, but only if the path’s start isn’t also a descendant of D. We do this by using an other join to try to find a third path, starting from D and terminating at the start of the path we want to delete. Only delete when no such third path is found.

DELETE a FROM TreePaths AS a
JOIN TreePaths AS d ON a.descendant = d.descendant
LEFT JOIN TreePaths AS x
ON x.ancestor = d.ancestor AND x.descendant = a.ancestor
WHERE d.ancestor = 'D' AND x.ancestor IS NULL;

That deletes paths that terminate within the subtree (descendants of D), but not paths that begin within the subtree (paths whose ancestor is D or any of descendants of D). So it deletes A-D, A-E, F-D, F-E, but not D-D, D-E, E-E.

To insert the subtree under its new location, we want new paths A-D, B-D, C-D, D-D as well as A-E, B-E, C-E, and D-E.  We already have D-E, you just have to not lose that path in the process of moving (same applies to any other intra-subtree paths if it’s larger than just this simple example).  We also already have D-D and E-E and any other reflexive paths.

If we were inserting just a single new node, we would hardcode ‘D’ as the id of the new node.  But now we need to insert all the nodes of the subtree.  We use a Cartesian join between the ancestors of B (going up) and the descendants of D (going down).

INSERT INTO TreePaths (ancestor, descendant, length)
SELECT supertree.ancestor, subtree.descendant,
FROM TreePaths AS supertree JOIN TreePaths AS subtree
WHERE subtree.ancestor = 'D'
AND supertree.descendant = 'B';

This inserts A-D, B-D, A-E, B-E.  We already have D-D, D-E, E-E since we’re moving an existing subtree. See the second illustration for the result.

Basically, for both deleting outdated paths and adding new paths, we can imagine a kind of boundary separating the subtree from the rest of the tree.  You’re concerned with changing paths that cross over that boundary.  Everything else takes care of itself.

I also wrote about Closure Table in my book, SQL Antipatterns: Avoiding the Pitfalls of Database Programming.

About Bill Karwin

Bill Karwin has been a software professional for over 20 years. He's helped thousands of developers with SQL technology. Bill authored the book "SQL Antipatterns," collecting frequent blunders and showing better solutions.


  1. Hi Bill,

    I’m actually currently researching how a closure tree will compare to a nested set tree for a reasonably sized tree, which led me to your blog post. You’re right about the solution being non-trivial, and your solution looks fairly similar to the one that you commented on here: http://jdobbie.blogspot.com/2009/07/closure-trees.html . While certainly not revolutionary, moving subtrees wasn’t mentioned in the paper, and I couldn’t find anyone who’d written code for it before me, so a link back would be greatly appreciated.

    Jonathan Dobbie

  2. Vihren says:

    You solved one of my biggest problems in just a couple of minutes reading and modifying the queries respecting my tables. It works perfect in any case (if the element that is moved has children or doesn’t have).

    Great article!

    Many thanks for sharing the idea!

  3. Thanks for your comments Jonathan and Vihren. I’m glad my article was helpful for you!

  4. alejandro says:

    hi bill,

    first of all thanks a lot for all your articules about closure table.Also for your book. I´m learning quite a lot and i have started to use it.However after a couple of weeks coding i have crossed with changes in the design of my data structure. Now we are not going to use a tree but a graph.

    I have been reading some info about how to implement graphs in relational DBs, like this one that use mix of adjacent list with helper tables: http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html

    I have been doing some changes in our closure table design to make it behave like a graph but still i dont get the full relationship to represent a graph with a closure table. Can you point me out to read some info about it,please?

    i will really appreciate any help in this particular moment since i would not like to get rid of all the work i have done to use closure table model in our project.

    many thanks

  5. Ricardo Correa says:

    Hi Bill,

    I’m considering using a closure table to represent a hierarchical model allowing for multiple parents (say E as descendant of D as well as C in your last diagram).
    There is however a problem I’ve hit which is moving or deleting a node that is present in two branches of the tree. As an example, let’s say we have a tree similar to the one in your last illustration, but with an extra E node below C: A-B, B-C, C-E, B-D, D-E, A-F (just the immediate connections), and we want to delete the node E descending from D (D-E). In this case we cannot delete the connections to all the other ancestors of E because those connections would still be valid for the node E descending from C. How can I implement a check to make sure we delete A-E, B-E, E-E if and only if they’re the last remaining links to E? Have you already worked on this problem?.

    Ricardo Correa

  6. Hi Ricardo,

    What you’re describing is a more general-case of a directed acyclic graph (http://en.wikipedia.org/wiki/Directed_acyclic_graph), which is a superset of trees. Working with graphs gets a lot more complicated than working with trees, as you’ve found out. I haven’t tested the Closure Table techniques with DAG structures, since these are used less commonly than trees.

    I’m still working on puzzles like how to select two branches independently or the problem you have, which is how to delete one link without deleting all the ancestors’ links if there’s another branch.

    It might be more clear to use the Path Enumeration design for your purposes.

    I’ve seen only one book that investigates methods of using SQL to work with DAGs: “SQL Design Patterns” by Vadim Tropashko.

  7. Annie says:

    Hi Bill,

    I’m using the closure table hierarchy to store regions in a database using MySQL and I’m wondering how can I create a unordered list with indentation (for each level) with the datas. I know the level of each branch (with the length column) but they are different for each and can be more or less in the future. How can reordered the receive data correctly to do so ? I saw the article with the concatenate path of each entry but the order is not what I need to put it in html. With the tree traversal, it is easy to do so because you’re “walking the path”.

    I’m using MySQL, PHP or Coldfusion (I have to do it in both languages so if you have any exemple in one of them) so maybe a query to reorder datas and then put it in an array or structure to loop over…

    Thanks, Annie.

  8. Annie says:

    I succeed to get my query in correct order but I want to order each level (or path length) by name (because it is country and regions, etc.)

    I used this query :

    And I also used the tree traversal way to display my unordered list.

    Thanks, Annie.

  9. Annie, the example you link to shows sorting by a ‘breadcrumbs’ string formed by concatenating the id of a given node’s ancestors. To support a user-defined sibling order, you would have to create a ‘breadcrumbs’ string by concatenating some other labels, e.g. name in your case. You should store the name outside the closure table. And you should pad each name to a fixed-length string.

  10. Annie says:

    Thanks for the answer! I’m not sure how to do this though to make sure each level of, in my case, regions is in alphabetical order. I tried several things but it was never in the correct order. Here is my query :

    SELECT *, GROUP_CONCAT(crumbs.ancestor ORDER BY crumbs.level DESC) AS breadcrumbs

    FROM region__region RR
    INNER JOIN region__tree RTR
    ON RTR.descendant = RR.region_id
    INNER JOIN region__tree crumbs
    ON crumbs.descendant = RTR.descendant

    WHERE RTR.ancestor = 1 AND RTR.level > 0
    GROUP BY RR.region_id
    ORDER BY breadcrumbs ASC

    And I list my regions with this code (in Coldfusion):

    #RepeatString(”, currDepth-level)#


    #RepeatString(”, currDepth-1)#

    Thank you for your help! Keep up the good work, you’re very interesting to read! I would like to attend one of your conference but a bit far from where I live in Canada.

  11. Annie says:

    #RepeatString(”, currDepth-level)#


    #RepeatString(”, currDepth-1)#

  12. Annie, if you want to sort by your regionName (which I assume is in RR), make breadcrumbs from that instead of the numeric ancestor id.

    SELECT *, GROUP_CONCAT(RRcrumbs.regionName ORDER BY crumbs.level DESC) AS breadcrumbs

    FROM region__region RR
    INNER JOIN region__tree RTR
    ON RTR.descendant = RR.region_id
    INNER JOIN region__tree crumbs
    ON crumbs.descendant = RTR.descendant
    INNER JOIN region__region RRcrumbs
    ON crumbs.ancestor = RRcrumbs.region_id

    WHERE RTR.ancestor = 1 AND RTR.level > 0
    GROUP BY RR.region_id
    ORDER BY breadcrumbs ASC

    You may need to use a SEPARATOR character that doesn’t occur in any regionName, and you may need to use LPAD() to make sure the regionName string is of fixed length, or you could get the wrong sort order.

  13. Annie says:

    Thank you! Working pretty well now!

  14. Wolfgang Doll says:

    Hi Bill,
    I just try to write a reusable Django app named “django-ct”. Your book and this blog post has helped me to create the specification. Is that what I have written in your spirit?
    Please take a look: https://django-ct.readthedocs.org/

  15. Hans says:

    > It can be easier with the Adjacency List design,
    > but in that design you don’t have a convenient way to query for all nodes of a tree

    It is very easy using recursive queries which are supported by nearly all DBMS nowadays – MySQL and SQLite being the only ones not to.

  16. Hi Hans,

    Yes, you’re right, SQL-99 defines a “common table expression” syntax, and many other RDBMS implementations support this.

    AFAIK, Oracle 11g, Microsoft SQL Server 2005, IBM DB2 v7.2, PostgreSQL 8.4, and the H2 database for Java support recursive CTE syntax.

    MySQL does not support it in current versions, and this has been a long-standing feature request:
    “SQL-99 Derived table WITH clause (CTE)” http://bugs.mysql.com/bug.php?id=16244 (Jan 2006)

    CTE and recursive CTE features have also been requested by these other bugs:

    http://bugs.mysql.com/bug.php?id=20712 (Jun 2006)
    http://bugs.mysql.com/bug.php?id=21893 (Aug 2006)
    http://bugs.mysql.com/bug.php?id=23156 (Oct 2006)
    http://bugs.mysql.com/bug.php?id=26251 (Feb 2007)
    http://bugs.mysql.com/bug.php?id=48610 (Nov 2009)
    http://bugs.mysql.com/bug.php?id=60940 (Apr 2011)

    Unfortunately, I don’t see any matching worklog tasks at http://dev.mysql.com/worklog/, so we have no expectation that CTE’s or recursive CTE’s will be implemented in MySQL any time soon.

  17. Ethan Collins says:

    Hi Bill,

    I see you only defining the primary key over (ancestor, descendant). However, there are many accesses based on descendant. Since this table will also potentially be very long, don’t we create a KEY on descendant as well?

  18. Hi Ethan,

    Yes indeed, depending on the queries we use, the closure table may benefit from additional indexes. I was trying to focus on the logic required for the query, since that’s complex enough for one blog post. :-)

  19. Nate Starner says:


    Thank you for taking the time to write about this. The whole concept of a closure table and its implementation in MySQL is brilliant – it has saved me so much time and has allowed me to do things I would have otherwise not have been able to do.

    Thanks so much!!

Speak Your Mind