By Gijs Van Tulder
http://www.sitepoint.com/article/hierarchical-data-database/
April 30th 2003
Reader Rating: 9.2
Whether you want to build your own forum, publish the messages from a mailing list on your Website, or write your own cms: there will be a moment that you'll want to store hierarchical data in a database. And, unless you're using a XML-like database, tables aren't hierarchical; they're just a flat list. You'll have to find a way to translate the hierarchy in a flat file.
Storing trees is a common problem, with multiple solutions. There are two major approaches: the adjacency list model, and the modified preorder tree traversal algorithm.
In this article, we'll explore these two methods of saving hierarchical data. I'll use the tree from a fictional online food store as an example. This food store organizes its food by category, by colour and by type. The tree looks like this:
This article contains a number of code examples that show how to save and retrieve data. Because I use that language myself, and many other people use or know that language too, I chose to write the examples in PHP. You can probably easily translate them to your own language of choice.
The Adjacency List Model
The first, and most elegant, approach we'll try is called the ‘adjacency list model' or the ‘recursion method'. It's an elegant approach because you'll need just one, simple function to iterate through your tree. In our food store, the table for an adjacency list looks like this:
As you can see, in the adjacency list method, you save the ‘parent' of each node. We can see that ‘Pear' is a child of ‘Green', which is a child of ‘Fruit' and so on. The root node, ‘Food', doesn't have a parent value. For simplicity, I've used the ‘title' value to identify each node. Of course, in a real database, you'd use the numerical id of each node.
Give Me the Tree
Now that we've inserted our tree in the database, it's time to write a display function. This function will have to start at the root node -- the node with no parent -- and should then display all children of that node. For each of these children, the function should retrieve and display all the child nodes of that child. For these children, the function should again display all children, and so on.
As you might have noticed, there's a regular pattern in the description of this function. We can simply write one function, which retrieves the children of a certain parent node. That function should then start another instance of itself for each of these children, to display all their children. This is the recursive mechanism that gives the ‘recursion method' its name.
<?php
// $parent is the parent of the children we want to see
// $level is increased when we go deeper into the tree,
// used to display a nice indented tree
function display_children($parent, $level) {
// retrieve all children of $parent
$result = mysql_query('SELECT title FROM tree '.
'WHERE parent="'.$parent.'";');
// display each child
while ($row = mysql_fetch_array($result)) {
// indent and display the title of this child
echo str_repeat(' ',$level).$row['title']."\n";
// call this function again to display this
// child's children
display_children($row['title'], $level+1);
}
}
?>
To display our whole tree, we'll run the function with an empty string as $parent
and $level = 0: display_children('',0);
For our food store tree, the function returns:
Food
Fruit
Red
Cherry
Yellow
Banana
Meat
Beef
Pork
Note that if you just want to see a subtree, you can tell the function to start with another node. For example, to display the ‘Fruit' subtree, you would run display_children('Fruit',0);
The Path to a Node
With almost the same function, it's possible to look up the path to a node if you only know the name or id of that node. For instance, the path to ‘Cherry' is ‘Food' > ‘Fruit' > ‘Red'. To get this path, our function will have to start at the deepest level: ‘Cherry'. It then looks up the parent of this node and adds this to the path. In our example, this would be ‘Red'. If we know that ‘Red' is the parent of ‘Cherry', we can calculate the path to ‘Cherry' by using the path to ‘Red'. And that's given by the function we've just used: by recursively looking up parents, we'll get the path to any node in the tree.
<?php
// $node is the name of the node we want the path of
function get_path($node) {
// look up the parent of this node
$result = mysql_query('SELECT parent FROM tree '.
'WHERE title="'.$node.'";');
$row = mysql_fetch_array($result);
// save the path in this array
$path = array();
// only continue if this $node isn't the root node
// (that's the node with no parent)
if ($row['parent']!='') {
// the last part of the path to $node, is the name
// of the parent of $node
$path[] = $row['parent'];
// we should add the path to the parent of this node
// to the path
$path = array_merge(get_path($row['parent']), $path);
}
// return the path
return $path;
}
?>
This function now returns the path to a given node. It returns that path as an array, so to display the path we can use print_r(get_path('Cherry'));
If you do this for 'Cherry', you'll see:
Array
(
[0] => Food
[1] => Fruit
[2] => Red
)
Disadvantages
As we've just seen, this is a great method. It's easy to understand, and the code we need is simple, too. What then, are the downsides of the adjacency list model? In most programming languages, it's slow and inefficient. This is mainly caused by the recursion. We need one database query for each node in the tree.
As each query takes some time, this makes the function very slow when dealing with large trees.
The second reason this method isn't that fast, is the programming language you'll probably use. Unlike languages such as Lisp, most languages aren't designed for recursive functions. For each node, the function starts another instance of itself. So, for a tree with four levels, you'll be running four instances of the function at the same time. As each function occupies a slice of memory and takes some time to initiate, recursion is very slow when applied to large trees.