concerned about performance It's hard to address performance without knowing anything about the characteristics, any of the 6V's of your data, or what's currently set up, what you're starting with or migrating from. Give the native, built-in with recursive approach a go, run some tests and see if that's good enough for you. Is it better to use recursive queries ( WITH RECURSIVE ) with the parent_id approach, or should I consider PostgreSQL extensions like ltree to handle trees more efficiently? If you have the time and budget to consider things, add Apache AGE and pgrouting to that list. You need to have some test cases established, some target throughput/latencies you're comfortable with to properly compare anything. You can find some examples here on SO tags: pgrouting tag - I think most of these, at least the recent ones, show a comparison between RCTEs and pgr_ functions, at least those I answered to. apache-age tag
痛点为 AI 基于上游原始证据的初步提炼;未包含额外中国市场检索。
用户在 PostgreSQL 中建模和查询树形层级数据(如分类与子分类)时,面临性能与数据完整性之间的权衡。使用基本的邻接表(parent_id)配合递归查询(WITH RECURSIVE)在深层级查询所有后代时性能堪忧,而 ltree 等扩展虽可能提升效率,但存在数据易被意外损坏的弱点。用户需要一种既高效又保证数据一致性的方案,但现有方法要么查询慢,要么易出错,导致开发者在选型时陷入决策困难,不得不自行测试多种方案(如闭包表、Apache AGE、pgrouting),耗费大量时间与精力。
Stack Overflow question
I am working on a project where I need to represent a tree structure (categories and subcategories) in PostgreSQL. I know the basic "Adjacency List" approach, where each row has a parent_id pointing to its parent node. However, I am concerned about performance when I need to query all descendants of a node at deep levels. Is it better to use recursive queries ( WITH RECURSIVE ) with the parent_id approach, or should I consider PostgreSQL extensions like ltree to handle trees more efficiently? Thanks for the help!
Question details
- View count
- 47
- Answer count
- 2
- Last activity
- 2026/05/28
Answers
Best practice for the sake of data integrity is to use Adjacency List, because it resists data anomalies. I have not tested PostgreSQL ltree , but I have compared adjacency list, nested sets, materialized path, and a design which I call closure table. Closure table has better performance than the other approaches for many types of queries you may want to do on hierarchical data. I tested with a hierarchy of over 500k nodes, which is probably larger than any tree you intend to query. You can see my old answer about closure table here on SO: What is the most efficient way to parse a flat table into a tree? I gave a couple of presentations about comparing methods: Models for hierarchical data Recursive Query Throwdown And I write more about closure table in one chapter of my book: SQL Antipatterns, Volume 1: Avoiding the Pitfalls of Database Programming Ltree uses a design like materialized path, then indexes it with multiple types of index (btree, gist, hash) to support different types of queries. It's possible that this will help its performance. To be sure, you'll have to test it yourself with the queries and the data you intend to use. But materialized path still has a weakness with respect to data integrity. That is, you can accidentally corrupt your data by setting paths in a way that doesn't match the true hierarchy.
源数据· Raw Archive
- source
- Stack Overflow
- upstream_source
- stackoverflow
- upstream_item_id
- 79947682
- daily_ranking_item_id
- 34ea950b-e7fc-46b0-b77b-8e26ae2eaf3c
- rank_date
- 2026-05-29
- rank
- 9
- name
- What is the best practice to model and query a tree (hierarchical) data structure in PostgreSQL
- tagline
- postgresql, tree, recursive-query
- description
- I am working on a project where I need to represent a tree structure (categories and subcategories) in PostgreSQL. I know the basic "Adjacency List" approach, where each row has a parent_id pointing to its parent node. However, I am concerned about performance when I need to query all descendants of a node at deep levels. Is it better to use recursive queries ( WITH RECURSIVE ) with the parent_id approach, or should I consider PostgreSQL extensions like ltree to handle trees more efficiently? Thanks for the help!
- votes_count
- 2
- comments_count
- 2
- created_at_on_source
- 2026-05-28T04:48:07.000Z
{
"stackoverflow": {
"score": 2,
"view_count": 47,
"is_answered": true,
"top_answers": [
{
"body": "concerned about performance It's hard to address performance without knowing anything about the characteristics, any of the 6V's of your data, or what's currently set up, what you're starting with or migrating from. Give the native, built-in with recursive approach a go, run some tests and see if that's good enough for you. Is it better to use recursive queries ( WITH RECURSIVE ) with the parent_id approach, or should I consider PostgreSQL extensions like ltree to handle trees more efficiently? If you have the time and budget to consider things, add Apache AGE and pgrouting to that list. You need to have some test cases established, some target throughput/latencies you're comfortable with to properly compare anything. You can find some examples here on SO tags: pgrouting tag - I think most of these, at least the recent ones, show a comparison between RCTEs and pgr_ functions, at least those I answered to. apache-age tag",
"score": 1,
"answer_id": 79947811,
"is_accepted": false
},
{
"body": "Best practice for the sake of data integrity is to use Adjacency List, because it resists data anomalies. I have not tested PostgreSQL ltree , but I have compared adjacency list, nested sets, materialized path, and a design which I call closure table. Closure table has better performance than the other approaches for many types of queries you may want to do on hierarchical data. I tested with a hierarchy of over 500k nodes, which is probably larger than any tree you intend to query. You can see my old answer about closure table here on SO: What is the most efficient way to parse a flat table into a tree? I gave a couple of presentations about comparing methods: Models for hierarchical data Recursive Query Throwdown And I write more about closure table in one chapter of my book: SQL Antipatterns, Volume 1: Avoiding the Pitfalls of Database Programming Ltree uses a design like materialized path, then indexes it with multiple types of index (btree, gist, hash) to support different types of queries. It's possible that this will help its performance. To be sure, you'll have to test it yourself with the queries and the data you intend to use. But materialized path still has a weakness with respect to data integrity. That is, you can accidentally corrupt your data by setting paths in a way that doesn't match the true hierarchy.",
"score": 0,
"answer_id": 79947972,
"is_accepted": false
}
],
"answer_count": 2,
"accepted_answer_id": null,
"last_activity_date": 1779983250
}
}{
"stats": {
"score": 2,
"view_count": 47,
"is_answered": true,
"answer_count": 2,
"creation_date": 1779943687,
"last_edit_date": null,
"accepted_answer_id": null,
"last_activity_date": 1779983250
},
"api_wrapper": {
"backoff": null,
"has_more": true,
"page_size": 8,
"quota_max": 300,
"quota_remaining": 294
},
"question_id": 79947682,
"answer_fetch": {
"has_more": false,
"answers_fetched": 2,
"answer_page_size": 3
},
"snapshot_version": "stackoverflow_question_v1"
}{
"id": "da6bfd25-fbb8-4219-9a0e-e8971e6ccd9b",
"daily_ranking_item_id": "34ea950b-e7fc-46b0-b77b-8e26ae2eaf3c",
"source": "stackoverflow",
"external_id": "79947682",
"fetched_at": "2026-05-28T22:02:15.509Z",
"question_raw": {
"body": "<p>I am working on a project where I need to represent a tree structure (categories and subcategories) in PostgreSQL.</p>\n<p>I know the basic "Adjacency List" approach, where each row has a <code>parent_id</code> pointing to its parent node. However, I am concerned about performance when I need to query all descendants of a node at deep levels.</p>\n<p>Is it better to use recursive queries (<code>WITH RECURSIVE</code>) with the <code>parent_id</code> approach, or should I consider PostgreSQL extensions like <code>ltree</code> to handle trees more efficiently?</p>\n<p>Thanks for the help!</p>\n",
"link": "https://stackoverflow.com/questions/79947682/what-is-the-best-practice-to-model-and-query-a-tree-hierarchical-data-structur",
"tags": [
"postgresql",
"tree",
"recursive-query"
],
"owner": {
"link": "https://stackoverflow.com/users/32768720/kathya-sofia-melgar-marroquin",
"user_id": 32768720,
"user_type": "registered",
"account_id": 46416045,
"reputation": 1,
"display_name": "KATHYA SOFIA MELGAR MARROQUIN",
"profile_image": "https://www.gravatar.com/avatar/8b7ac6d1370741fbea20ada9379bdb91?s=256&d=identicon&r=PG&f=y&so-version=2"
},
"score": 2,
"title": "What is the best practice to model and query a tree (hierarchical) data structure in PostgreSQL",
"view_count": 47,
"is_answered": true,
"question_id": 79947682,
"answer_count": 2,
"creation_date": 1779943687,
"content_license": "CC BY-SA 4.0",
"last_activity_date": 1779983250
},
"answers_raw": [
{
"body": "<blockquote>\n<p>concerned about performance</p>\n</blockquote>\n<p>It's hard to address performance without knowing anything about the <a href=\"https://en.wikipedia.org/wiki/Big_data#Characteristics\" rel=\"nofollow noreferrer\">characteristics, any of the 6V's</a> of your data, or what's currently set up, what you're starting with or migrating from. Give the native, built-in <code>with recursive</code> approach a go, run some tests and see if that's good enough for you.</p>\n<blockquote>\n<p>Is it better to use recursive queries (<code>WITH RECURSIVE</code>) with the <code>parent_id</code> approach, or should I consider PostgreSQL extensions like <code>ltree</code> to handle trees more efficiently?</p>\n</blockquote>\n<p>If you have the time and budget to consider things, add <a href=\"https://age.apache.org/age-manual/master/clauses/match.html#example\" rel=\"nofollow noreferrer\"><code>Apache AGE</code></a> and <a href=\"https://access.crunchydata.com/documentation/pgrouting/latest/pgr_dijkstra.html#one-to-one\" rel=\"nofollow noreferrer\"><code>pgrouting</code></a> to that list. You need to have some test cases established, some target throughput/latencies you're comfortable with to properly compare anything.</p>\n<p>You can find some examples here on SO tags:</p>\n<ul>\n<li><a href=\"https://stackoverflow.com/questions/tagged/pgrouting\"><code>pgrouting</code> tag</a> - I think most of these, at least the recent ones, show a comparison between RCTEs and <code>pgr_</code> functions, at least those I answered to.</li>\n<li><a href=\"https://stackoverflow.com/questions/tagged/apache-age\"><code>apache-age</code> tag</a></li>\n</ul>\n",
"owner": {
"link": "https://stackoverflow.com/users/5298879/zegarek",
"user_id": 5298879,
"user_type": "registered",
"account_id": 6897470,
"reputation": 31336,
"display_name": "Zegarek",
"profile_image": "https://i.sstatic.net/Qv2LD.png?s=256"
},
"score": 1,
"answer_id": 79947811,
"is_accepted": false,
"question_id": 79947682,
"creation_date": 1779961376,
"content_license": "CC BY-SA 4.0",
"last_activity_date": 1779961376
},
{
"body": "<p>Best practice for the sake of data integrity is to use Adjacency List, because it resists data anomalies.</p>\n<p>I have not tested PostgreSQL <code>ltree</code>, but I have compared adjacency list, nested sets, materialized path, and a design which I call closure table.</p>\n<p>Closure table has better performance than the other approaches for many types of queries you may want to do on hierarchical data. I tested with a hierarchy of over 500k nodes, which is probably larger than any tree you intend to query.</p>\n<p>You can see my old answer about closure table here on SO:</p>\n<ul>\n<li><a href=\"https://stackoverflow.com/questions/192220/what-is-the-most-efficient-way-to-parse-a-flat-table-into-a-tree/192462#192462\">What is the most efficient way to parse a flat table into a tree?</a></li>\n</ul>\n<p>I gave a couple of presentations about comparing methods:</p>\n<ul>\n<li><a href=\"https://www.slideshare.net/slideshow/models-for-hierarchical-data/4179181\" rel=\"nofollow noreferrer\">Models for hierarchical data</a></li>\n<li><a href=\"https://www.slideshare.net/slideshow/recursive-query-throwdown/75838632\" rel=\"nofollow noreferrer\">Recursive Query Throwdown</a></li>\n</ul>\n<p>And I write more about closure table in one chapter of my book:</p>\n<ul>\n<li><a href=\"https://pragprog.com/titles/bksap1/sql-antipatterns-volume-1/\" rel=\"nofollow noreferrer\">SQL Antipatterns, Volume 1: Avoiding the Pitfalls of Database Programming</a></li>\n</ul>\n<p>Ltree uses a design like materialized path, then indexes it with multiple types of index (btree, gist, hash) to support different types of queries. It's possible that this will help its performance. To be sure, you'll have to test it yourself with the queries and the data you intend to use.</p>\n<p>But materialized path still has a weakness with respect to data integrity. That is, you can accidentally corrupt your data by setting paths in a way that doesn't match the true hierarchy.</p>\n",
"owner": {
"link": "https://stackoverflow.com/users/20860/bill-karwin",
"user_id": 20860,
"user_type": "registered",
"account_id": 10929,
"reputation": 568664,
"accept_rate": 92,
"display_name": "Bill Karwin",
"profile_image": "https://i.sstatic.net/iavmP.jpg?s=256"
},
"score": 0,
"answer_id": 79947972,
"is_accepted": false,
"question_id": 79947682,
"creation_date": 1779983250,
"content_license": "CC BY-SA 4.0",
"last_activity_date": 1779983250
}
],
"tags_raw": [
"postgresql",
"tree",
"recursive-query"
],
"stats_raw": {
"score": 2,
"view_count": 47,
"is_answered": true,
"answer_count": 2,
"creation_date": 1779943687,
"last_edit_date": null,
"accepted_answer_id": null,
"last_activity_date": 1779983250
},
"selection_meta": {
"site": "stackoverflow",
"api_wrapper": {
"backoff": null,
"has_more": true,
"page_size": 8,
"quota_max": 300,
"quota_remaining": 294
},
"answer_fetch": {
"backoff": null,
"has_more": false,
"answers_fetched": 2,
"quota_remaining": 262,
"answer_page_size": 3
},
"snapshot_version": "stackoverflow_question_v1",
"selection_strategy": "tag_whitelist_unanswered_high_score_recent_active"
},
"created_at": "2026-05-28T22:02:15.797Z",
"updated_at": "2026-05-28T22:02:15.797Z"
}