Trees in SQL with Prime NumbersBy Enzo Calamia
This article presents an original way to implement trees in SQL. The technique works well, but I would not advise anyone to use it in production. It is intended as a fun way to do things!
Suppose we want to implement a commenting system. Each comment can
have one reply (or more), which is also a comment. An reply being a
comment, it can have replies as well.
In other words: we have a tree of comments.
A possible approach is to keep a reference of the direct parent. This approach is straightforward. However, traversals are problematic (at least, without recursive CTEs):
- Retrieving all the parents requires as many self-
JOINs as there are parents.
- We cannot
COUNT()the parents of a comment in a comfortable manner.
This article presents an arithmetical approach that is reference-free, thus making traversal operations much smoother.
Here are sample data for our nested comments:
id | parents | text ---+----------+---------------------------------------- 2 | 1 | 3 | 2 | First comment 5 | 2 | Second comment 7 | 2 | Third comment 11 | 10 | First comment on the second comment 13 | 6 | First comment on the first comment 17 | 6 | Second comment on the first comment 19 | 14 | Comment on third comment 23 | 266 | Comment on comment on third comment 29 | 10 | Second comment on the second comment 31 | 290 | Comment on the second comment on the second comment 37 | 8990 | Comment on comment on the second comment on the second comment 41 | 332630 | Second comment on comment on comment on the second comment on the second comment 43 | 13637830 | Comment on second comment on comment on comment on the second comment on the second comment 47 | 332630 | First comment on comment on comment on the second comment on the second comment
Each node has a pair of two integers:
The root has the pair
(2, 1) and is empty.
Any child with a pair
(id, parents) must follow two rules:
idis a prime number, and it is unique.
parentsis the product
id × parentsof its direct parent.
The critical thing to note is that each
parents value is a product of
a prime number and previous
parents value. The previous
parents value is also
a product of a prime number and a previous
parents value until
(2, 1) is reached.
It is essential because it means that any
parents has a prime decomposition
by which we can retrieve all the parents'
Here is the representation of the tree we have modelized in our
Retrieving the parents of a node
Let's suppose we want comment #47 and its parents.
Comment #47 has a
parents = 332630.
332630 = 2 × 5 × 29 × 31 × 37. That is, its parents are the nodes with respectively IDs 37, 31, 29, 5 and 2. We can now retrieve the parents with a simple SQL query:
SELECT * FROM comment WHERE id IN (2, 5, 29, 31, 37); id | parents | text ---+---------+---------------------------------------------------------------- 2 | 1 | 5 | 2 | Second comment 29 | 10 | Second comment on the second comment 31 | 290 | Comment on the second comment on the second comment 37 | 8990 | Comment on comment on the second comment on the second comment
(Typically, the factorization method is implemented in the language from which we call the database.
But, of course, we could have implemented a factorization routine right into the database and done
something like this:
SELECT * FROM comment WHERE id IN PrimeFactors(332630);.)
Counting parents is pretty easy too, and we don't even need to ask the database. We simply count how many prime factors we have. Here, we have 4 factors: 5, 29, 31 and 37 (we don't count 2 because it is the root).
Retrieving the children of a node
Children of a given node with the pair
(id, parents) are all nodes in the form
(_,id × parents).
Again, the SQL query to retrieve them is pretty simple:
SELECT * from comment WHERE parents = 5 * 2; id | parents | text ---+---------+-------------------------------------- 11 | 10 | First comment on the second comment 29 | 10 | Second comment on the second comment
SELECT count(*) FROM comment WHERE parents = 5 * 2; count ------- 2
To insert a new node, we need the parent's ID and its
We multiply them so we can set our
We also need a new unique and prime ID. Here is a safe manner to insert a new node: We have a table containing as many prime numbers as needed (here, ~10k). Then, we can insert a new node in a single transaction like this:
WITH newid AS ( SELECT n FROM primes WHERE n > (SELECT max(id) FROM comment) LIMIT 1; ) INSERT INTO comment (id, parents, text) VALUES ((SELECT n FROM newid), :parents, :text);
Assuming we have a
primes table looking like this:
SELECT * FROM primes; n ---- 2 3 5 7 11 13 17 ... 99929 99961 99971 99989 99991 (9592 rows)
One of the greatest strength of an RDBMS is its consistency model. By implementing the method presented
in this article, we lost the ensured consistency of our "references" because we can put any number
in the row
parents and the DB won't check if it corresponds to something. That check would be
granted with a reference-based approach.
It is possible to implement triggers on
UPDATEs to overcome this problem, by
programmatically checking consistency properties. However, this would bring a pretty significant amount
Serhiy Morozov, Hossein Saiedian and Hanzhang Wang (January 2014). Reusable Prime Number Labeling Scheme for Hierarchical Data Representation in Relational Database. Journal of Computing and Information Technology. p.3, sect. 2. (DOI: 10.2498/cit.1002310)