返回 Discover
Field DispatchStack Overflow9 · 2026-05-28

Best practice for revision control of data in SQL databases

Tags
mysqlsqlpostgresql
Score
2
Answers
2
Views
1,607
Answered
Yes
Accepted
Yes
痛点分析发布于 2026/05/27

痛点为 AI 基于上游原始证据的初步提炼;未包含额外中国市场检索。

痛点

用户在维护SQL数据库时,需要记录数据的修订历史,但现有流程中直接修改数据会导致无法追溯错误变更。用户尝试通过创建额外的_revisions表来存储历史版本,但担心每次查询都检查修订表会带来性能开销。这种手动设计版本控制方案的方式,不仅增加了数据库结构的复杂性(需要同步修改主表和修订表),还可能导致查询性能下降、维护成本高,以及因结构变更不同步而引发数据不一致的风险。

§ Dossier

Stack Overflow question

My entire database occasionally has entries which are wrong, but instead of altering the data directly I'd like the ability to keep a revision of changes. These changes occur very rarely. Ideally something like this: - (original table fields) | revision_version | origin | user | timestamp So say I had a table called posts with the following schema: - title | description | timestamp | author An additional table called posts_revisions would be created thusly: - title | description | timestamp | author | revision_version | origin | user | timestamp origin being the source of the change, be it a bot, user generated or what have you. As you can imagine this is a rather large change to the existing database, my current concern is the performance hit of checking the _revisions tables for every query. Is this best practice for this sort of thing?

§ Dossier

Question details

View count
1,607
Answer count
2
Last activity
2026/05/27
§ Dossier

Answers

For this type of problem, I keep a current table and a history table. The history table has the following additional columns: HistoryID EffectiveDate EndDate VersionNumber CreatedBy CreatedAt The effective and end dates are the time span where the values are valid. The version is just incremented every time there is a change for a record. The id , CreatedAt , and CreatedBy are columns I put into almost every table in the database. Generally, I keep the history table up-to-date with nightly jobs, that compare the tables and then use MERGE to combine the data. An alternative is to wrap all changes in stored procedures, and to update both tables there. Another alternative is to use triggers, that detect when a change occurs. However, I shy away from triggers, preferring the first two alternatives. I must admit that disk space is not a big consideration for these tables. So, there is no problem storing the data twice, once in the results once in the history. It would be just a minor tweak to store only history in the history table, with the current records in the "current" table. One downside to this approach is changing the structure of the base table. If you want to add a column, you need to add it to the history table as well as the base table.

评论作者信息不可用Accepted2 votes

If the tables are used for summary purposes (especially by business users if they have some SQL access) I think it is best to remove the data and place it into another table. While flags and revisions are sometimes fine, when you have to do something along the lines of select sum(select someVar where revision_version=max(revision_version and someID=ID)) then it really gets beyond simple. If you have a table that is being used for quick and nasty data collection, replace the data and if needed, place the old data into a revisions table. If only some application will access it AND it isn't a performance issue then keep it in the main table.

评论作者信息不可用1 votes
源数据· Raw Archive
source
Stack Overflow
upstream_source
stackoverflow
upstream_item_id
11777820
daily_ranking_item_id
9a1dcff5-c9d9-405d-9a4d-37f0ffb998b5
rank_date
2026-05-28
rank
9
name
Best practice for revision control of data in SQL databases
tagline
mysql, sql, postgresql
description
My entire database occasionally has entries which are wrong, but instead of altering the data directly I'd like the ability to keep a revision of changes. These changes occur very rarely. Ideally something like this: - (original table fields) | revision_version | origin | user | timestamp So say I had a table called posts with the following schema: - title | description | timestamp | author An additional table called posts_revisions would be created thusly: - title | description | timestamp | author | revision_version | origin | user | timestamp origin being the source of the change, be it a bot, user generated or what have you. As you can imagine this is a rather large change to the existing database, my current concern is the performance hit of checking the _revisions tables for every query. Is this best practice for this sort of thing?
votes_count
2
comments_count
2
created_at_on_source
2012-08-02T12:51:38.000Z
topics
mysqlsqlpostgresql
media / source-specific data
{
  "stackoverflow": {
    "score": 2,
    "view_count": 1607,
    "is_answered": true,
    "top_answers": [
      {
        "body": "For this type of problem, I keep a current table and a history table. The history table has the following additional columns: HistoryID EffectiveDate EndDate VersionNumber CreatedBy CreatedAt The effective and end dates are the time span where the values are valid. The version is just incremented every time there is a change for a record. The id , CreatedAt , and CreatedBy are columns I put into almost every table in the database. Generally, I keep the history table up-to-date with nightly jobs, that compare the tables and then use MERGE to combine the data. An alternative is to wrap all changes in stored procedures, and to update both tables there. Another alternative is to use triggers, that detect when a change occurs. However, I shy away from triggers, preferring the first two alternatives. I must admit that disk space is not a big consideration for these tables. So, there is no problem storing the data twice, once in the results once in the history. It would be just a minor tweak to store only history in the history table, with the current records in the \"current\" table. One downside to this approach is changing the structure of the base table. If you want to add a column, you need to add it to the history table as well as the base table.",
        "score": 2,
        "answer_id": 11778895,
        "is_accepted": true
      },
      {
        "body": "If the tables are used for summary purposes (especially by business users if they have some SQL access) I think it is best to remove the data and place it into another table. While flags and revisions are sometimes fine, when you have to do something along the lines of select sum(select someVar where revision_version=max(revision_version and someID=ID)) then it really gets beyond simple. If you have a table that is being used for quick and nasty data collection, replace the data and if needed, place the old data into a revisions table. If only some application will access it AND it isn't a performance issue then keep it in the main table.",
        "score": 1,
        "answer_id": 11777928,
        "is_accepted": false
      }
    ],
    "answer_count": 2,
    "accepted_answer_id": 11778895,
    "last_activity_date": 1779892910
  }
}
raw_payload
{
  "stats": {
    "score": 2,
    "view_count": 1607,
    "is_answered": true,
    "answer_count": 2,
    "creation_date": 1343911898,
    "last_edit_date": null,
    "accepted_answer_id": 11778895,
    "last_activity_date": 1779892910
  },
  "api_wrapper": {
    "backoff": null,
    "has_more": true,
    "page_size": 8,
    "quota_max": 300,
    "quota_remaining": 205
  },
  "question_id": 11777820,
  "answer_fetch": {
    "has_more": false,
    "answers_fetched": 2,
    "answer_page_size": 3
  },
  "snapshot_version": "stackoverflow_question_v1"
}
source_raw_snapshot
{
  "id": "45e5e978-b36f-4c78-bf7e-67b0ac65410d",
  "daily_ranking_item_id": "9a1dcff5-c9d9-405d-9a4d-37f0ffb998b5",
  "source": "stackoverflow",
  "external_id": "11777820",
  "fetched_at": "2026-05-27T22:01:45.075Z",
  "question_raw": {
    "body": "<p>My entire database occasionally has entries which are wrong, but instead of altering the data directly I'd like the ability to keep a revision of changes. </p>\n\n<p>These changes occur very rarely.</p>\n\n<p>Ideally something like this: -</p>\n\n<pre><code> (original table fields) | revision_version | origin | user | timestamp\n</code></pre>\n\n<p>So say I had a table called <strong>posts</strong> with the following schema: -</p>\n\n<pre><code>title | description | timestamp | author\n</code></pre>\n\n<p>An additional table called <strong>posts_revisions</strong> would be created thusly: -</p>\n\n<pre><code>title | description | timestamp | author | revision_version | origin | user | timestamp\n</code></pre>\n\n<ul>\n<li><strong>origin</strong> being the source of the change, be it a bot, user generated or what have you.</li>\n</ul>\n\n<p>As you can imagine this is a rather large change to the existing database, my current concern is the performance hit of checking the _revisions tables for every query. Is this best practice for this sort of thing?</p>\n",
    "link": "https://stackoverflow.com/questions/11777820/best-practice-for-revision-control-of-data-in-sql-databases",
    "tags": [
      "mysql",
      "sql",
      "postgresql"
    ],
    "owner": {
      "link": "https://stackoverflow.com/users/1014991/jason",
      "user_id": 1014991,
      "user_type": "registered",
      "account_id": 999864,
      "reputation": 369,
      "accept_rate": 78,
      "display_name": "Jason",
      "profile_image": "https://www.gravatar.com/avatar/ef4e648586c9d0848fe3ab4eef7a18f8?s=256&d=identicon&r=PG"
    },
    "score": 2,
    "title": "Best practice for revision control of data in SQL databases",
    "view_count": 1607,
    "closed_date": 1779895413,
    "is_answered": true,
    "question_id": 11777820,
    "answer_count": 2,
    "closed_reason": "Opinion-based",
    "creation_date": 1343911898,
    "accepted_answer_id": 11778895,
    "last_activity_date": 1779892910
  },
  "answers_raw": [
    {
      "body": "<p>For this type of problem, I keep a current table and a history table.</p>\n<p>The history table has the following additional columns:</p>\n<ul>\n<li>HistoryID</li>\n<li>EffectiveDate</li>\n<li>EndDate</li>\n<li>VersionNumber</li>\n<li>CreatedBy</li>\n<li>CreatedAt</li>\n</ul>\n<p>The effective and end dates are the time span where the values are valid.  The version is just incremented every time there is a change for a record.  The <code>id</code>, <code>CreatedAt</code>, and <code>CreatedBy</code> are columns I put into almost every table in the database.</p>\n<p>Generally, I keep the history table up-to-date with nightly jobs, that compare the tables and then use <code>MERGE</code> to combine the data.  An alternative is to wrap all changes in stored procedures, and to update both tables there.  Another alternative is to use triggers, that detect when a change occurs.  However, I shy away from triggers, preferring the first two alternatives.</p>\n<p>I must admit that disk space is not a big consideration for these tables.  So, there is no problem storing the data twice, once in the results once in the history.  It would be just a minor tweak to store only history in the history table, with the current records in the &quot;current&quot; table.</p>\n<p>One downside to this approach is changing the structure of the base table.  If you want to add a column, you need to add it to the history table as well as the base table.</p>\n",
      "owner": {
        "link": "https://stackoverflow.com/users/1144035/gordon-linoff",
        "user_id": 1144035,
        "user_type": "registered",
        "account_id": 1165580,
        "reputation": 1276912,
        "display_name": "Gordon Linoff",
        "profile_image": "https://www.gravatar.com/avatar/e514b017977ebf742a418cac697d8996?s=256&d=identicon&r=PG"
      },
      "score": 2,
      "answer_id": 11778895,
      "is_accepted": true,
      "question_id": 11777820,
      "creation_date": 1343915548,
      "last_edit_date": 1779892910,
      "content_license": "CC BY-SA 4.0",
      "last_activity_date": 1779892910
    },
    {
      "body": "<p>If the tables are used for summary purposes (especially by business users if they have some SQL access) I think it is best to remove the data and place it into another table. While flags and revisions are sometimes fine, when you have to do something along the lines of <code>select sum(select someVar where revision_version=max(revision_version and someID=ID))</code> then it really gets beyond simple.</p>\n\n<p>If you have a table that is being used for quick and nasty data collection, replace the data and if needed, place the old data into a revisions table. If only some application will access it <em>AND it isn't a performance issue</em> then keep it in the main table.</p>\n",
      "owner": {
        "link": "https://stackoverflow.com/users/1450077/fluffeh",
        "user_id": 1450077,
        "user_type": "registered",
        "account_id": 1559965,
        "reputation": 33587,
        "display_name": "Fluffeh",
        "profile_image": "https://www.gravatar.com/avatar/45daf5e1eea889ba1893440d645aa83a?s=256&d=identicon&r=PG"
      },
      "score": 1,
      "answer_id": 11777928,
      "is_accepted": false,
      "question_id": 11777820,
      "creation_date": 1343912266,
      "content_license": "CC BY-SA 3.0",
      "last_activity_date": 1343912266
    }
  ],
  "tags_raw": [
    "mysql",
    "sql",
    "postgresql"
  ],
  "stats_raw": {
    "score": 2,
    "view_count": 1607,
    "is_answered": true,
    "answer_count": 2,
    "creation_date": 1343911898,
    "last_edit_date": null,
    "accepted_answer_id": 11778895,
    "last_activity_date": 1779892910
  },
  "selection_meta": {
    "site": "stackoverflow",
    "api_wrapper": {
      "backoff": null,
      "has_more": true,
      "page_size": 8,
      "quota_max": 300,
      "quota_remaining": 205
    },
    "answer_fetch": {
      "backoff": null,
      "has_more": false,
      "answers_fetched": 2,
      "quota_remaining": 172,
      "answer_page_size": 3
    },
    "snapshot_version": "stackoverflow_question_v1",
    "selection_strategy": "tag_whitelist_unanswered_high_score_recent_active"
  },
  "created_at": "2026-05-27T22:01:45.559Z",
  "updated_at": "2026-05-27T22:01:45.559Z"
}