Zach Angell

Lead Software Engineer

Engineering

Database Partitioning Best Practices

Partitioning Postgres in Prod Without Downtime

November 01, 2023

Zach Angell

Lead Software Engineer

When the database powering the entirety of Prefect Cloud needed some love, the biggest challenge was making sure customers were not impacted. The road ahead: making structural changes to a 400 million row table that is relied upon for critical customer operations.

When researching how to do this, the team came up short on finding relevant articles. Even GPT-4 came up short. All the existing guides worked in theory, in a vacuum, in development. Not on a table with almost half a billion rows powering an application that takes a company of 60+ people to support and sell.

The reality is: typically you realize you’ve got a database performance issue when you already have large tables backing an important application. Editing anything about the living heart of an application (the database) is hard.

Partitioning is an extreme example of this challenge. Partitioning tables involves rebuilding them from the ground up and you probably realize you need table partitions when you have large tables and a critical highly reliable application.

This post exists to help all others moving forward on how to partition tables in a massive Postgres database without incurring significant downtime in a production environment.

Background on partitioning

If you’ve ever used a web application and a page or chart is slow to load, it’s probably a database performance issue. Today more than ever, efficiently storing and fetching data is the core problem facing many backend application engineers. Database selection and schema design are critical to building successful applications, and that is no different for the engineering team at Prefect.

We discovered a cheat code for Postgres performance that works almost as well as R1, R1, CIRCLE, R2, UP, DOWN, UP, DOWN, UP, DOWN: partitioning.

👉 Definition
Partitioning Postgres means splitting up a large logical table into a bunch of smaller physical tables.

For example, let’s say you have a table full of log entries for the past 30 days, logs_table. A partitioned design for logs_table might involve 30 smaller physical tables that each contain a single day’s worth of log data. As you can imagine, this takes effort to accomplish. And if done improperly, runs the risk of taking down your whole application for some time. It's a necessary step towards scalability and a hard one to do without impacting your customers.

Performance drives the need for database partitioning

While prefect is an open source package, it is elevated in Prefect Cloud, a fully featured workflow orchestration solution for scheduling and observability. Backing Prefect Cloud are several large Postgres databases, while our engineering team spends a lot of time ensuring queries are running efficiently.

Beginning in March, Prefect Cloud saw a significant increase in volume and traffic driven by both new user growth as well as existing teams scaling up — by most measures we were doubling approximately every two months. But there was a problem…

Large database tables can exhibit poor query performance and latency due to indexes that are too large to fit into memory and unreliable table statistics. Partitioning a table essentially breaks the table up into many smaller virtual tables that the query planner can take advantage of, helping to avoid these issues. At the time of our decision, our fastest-growing table had 400 million rows and we expected it to have over 1 billion rows by the time the partitioning work would be completed.

If that large of a table influenced application performance, imagine how long it would take to actually create the partition. And, what it would take to not impact the application while partitioning.

Challenges of database partitioning

The best time to partition a table is when it’s created. That ship had sailed for us, as it does when you think about tables being too large. We needed to partition our existing tables with significant technical constraints.

Minimizing downtime and customer-facing impact. Our customers rely on Prefect Cloud for production critical data workflows. We did not have the luxury of taking our database and thus platform offline for an extended period of time in order to partition existing tables. If we could take tables offline, partitioning a table is a simple matter of creating a copy, backfilling, and swapping out the copy.

Preserving data integrity. During the database partitioning process, all INSERT/UPDATE/DELETEs should give the same guarantees as before. Loss of data or inconsistent data was unacceptable. Changes had to occur in real time. Without this constraint, we would take tables offline or accept some risk of data loss or inconsistency.

Avoiding excessive application code changes. We were making changes to our most important tables, which our codebase interacts with heavily. It was too large a burden to make our entire codebase (and all the engineers working on that codebase) directly aware of the database partitioning process.

Database partitioning strategy with no downtime

The hardest part about partitioning is doing so without any downtime. Partitioning itself isn't complex—many databases have a one or two line command to do so. With large tables, however, this process can take minutes if not hours. Under no circumstances can critical applications be down for that long.

The following steps will allow you to partition a table without any application downtime. Before we dive into details, let’s outline our high level strategy:

Create a new, partitioned copy of the original table in Postgres
Create a VIEW in Postgres that unites the new and old tables with the following logic: INSERTS go in the new table; UPDATES and DELETES go to both tables; SELECTS return the union of both tables
Copy data between the old and new table in Postgres atomically, ensuring a row only exists in one table or the other but not both
After all data is copied over, replace the VIEW with the new partitioned table

This strategy allows us to present data from the original and new partitioned tables to the application seamlessly and preserve data integrity without altering the application code. In particular, we do all the steps that take the most amount of time in the background—and only point our application to the new tables once partitioning is complete.

That's how you achieve partitioned tables with no downtime.

Postgres partitioning: your detailed guide

For this venture, we'll use large_table as a demonstration.

Before diving into any schema changes we recommend:

Auditing application code to ensure database queries do not query across more than one partition. Queries across multiple partitions will perform much worse than the same query on one large table.
Removing all foreign key constrains pointing to the original database table. This may necessitate application code changes for proper foreign key handling. Doing this before moving any data is crucial, otherwise you may lose data when moving rows from the original to partitioned table.
Ensuringplan_cache_mode is set toforce_custom_plan. For more on this, read the story about how we discovered this tip: More Memory, More Problems.

Now onto the Postgres schema changes, which must be performed in a single transaction:

1. First, create the partitioned table new_large_table and all its partitions. Because this table will be actively used, also create all of the large_table indexes on new_large_table.

1
2-- Step 1: Create the new partitioned table
3CREATE TABLE new_large_table
4// TODO - example columns
5PARTITION BY RANGE (workspace_id);
6
7-- Create partitions of new_large_table
8-- Create indexes on new_large_table

2. Next, rename the original table

1-- Step 2: Rename the old table
2ALTER TABLE large_table RENAME TO large_table_orig;

3. Next, create a VIEW that makes the old and new Postgres tables invisible to the application and forwards operations to those two tables.

SELECTs read from both the combined tables. Note that because rows exist strictly in one table or the other, we can use the more efficient UNION ALL instead of UNION.

1-- Step 3a: Create a view that combines the two tables
2CREATE OR REPLACE VIEW large_table AS (
3    SELECT * FROM new_large_table
4    UNION ALL
5    SELECT * FROM large_table_orig
6);

INSERTs go to the new_large_table.

1-- Step 3b: Forward INSERTs to the new table
2CREATE OR REPLACE RULE insert_rule_large_table AS ON INSERT TO large_table
3  DO INSTEAD (
4  INSERT INTO new_large_table (id, workspace_id, data)
5  VALUES (NEW.id, NEW.workspace_id, NEW.data)
6  RETURNING new_large_table.*;
7);

DELETEs and UPDATEs propagate to both tables using a rule and trigger. Importantly, these operations must still return the correct amount of rows affected no matter which table the data is in.

1-- Step 3c: UPDATE and DELETE triggers and rules.
2-- Note: Any operations on new_large_table MUST contain a partition key to prevent performance issues.
3CREATE OR REPLACE FUNCTION large_table_delete_trigger()
4  RETURNS TRIGGER AS $$
5  BEGIN
6      DELETE FROM large_table_orig WHERE id = OLD.id AND workspace_id = OLD.workspace_id;
7      IF NOT FOUND THEN
8          DELETE FROM new_large_table WHERE id = OLD.id AND workspace_id = OLD.workspace_id;
9      END IF;
10      RETURN NEW;
11  END;
12  $$
13  LANGUAGE plpgsql;
14
15CREATE TRIGGER delete_large_table_trigger
16	INSTEAD OF DELETE ON large_table
17	FOR EACH ROW EXECUTE PROCEDURE large_table_delete_trigger();
18
19CREATE OR REPLACE FUNCTION large_table_update_trigger()
20  RETURNS TRIGGER AS $$
21  BEGIN
22      UPDATE large_table_orig SET
23          id = NEW.id,
24          workspace_id = NEW.workspace_id,
25          data = NEW.data
26          WHERE id = OLD.id AND workspace_id = OLD.workspace_id;
27      IF NOT FOUND THEN
28          UPDATE new_large_table SET
29          id = NEW.id,
30          workspace_id = NEW.workspace_id,
31          data = NEW.data
32          WHERE id = OLD.id AND workspace_id = OLD.workspace_id;
33      END IF;
34      RETURN NEW;
35  END;
36  $$
37  LANGUAGE plpgsql;
38
39CREATE TRIGGER update_large_table_trigger
40  INSTEAD OF UPDATE ON large_table
41  FOR EACH ROW EXECUTE PROCEDURE large_table_update_trigger();

The application remains none the wiser, with the VIEW handling all data access.

Data migration and cleanup

Move data in batches with this SQL code run on a loop until the old Postgres table is emptied:

1BEGIN;
2WITH batch AS (
3	DELETE from large_table_orig
4	WHERE id IN (
5		SELECT id FROM large_table_orig
6		LIMIT 1000
7	)
8	RETURNING *
9)
10INSERT INTO new_large_table SELECT * from batch;
11COMMIT;

Finally, delete the view and rename the new table.

1DROP VIEW IF EXISTS large_table;
2ALTER TABLE new_large_table RENAME TO large_table;
3DROP TABLE IF EXISTS large_table_orig;

And that, ladies and gentlemen, is how you partition a Postgres database table without anyone noticing, including, quite possibly, the table itself. In theory, at least.

Level setting expectations

In practice, admittedly, the process is not so clean.

Performing the above for 4 large database tables took 2-3 months including planning, testing, and implementation. In theory this process leads to no downtime. Despite our best efforts to avoid customer disruption, we had a few days of degraded service and brief periods of outages.

Throughout the process, we communicated actively with our customers and the rest of the organization to convey both the future benefits of the work and any ongoing issues. This is for the longevity of Prefect Cloud, which continues to grow and release new features.

Happy partitioning!

Prefect makes complex workflows simpler, not harder. Try Prefect Cloud for free for yourself, download our open source package, join our Slack community, or talk to one of our engineers to learn more.