Mastering PostgreSQL Table Partitioning

Mastering PostgreSQL Table Partitioning

Featured on Hashnode

Table partitioning is a highly effective technique used to improve the performance of very large database tables. By dividing the table's content into smaller sub-tables, known as partitions, the overall size of the table is reduced, leading to significant performance enhancements.

When should you partition a table?

  • If your table is too big for your server's RAM.
    You should consider partitioning it: when a table reaches a few gigabytes in size, it's time to split it up.

  • If you're working with massive amounts of data
    Don't bother partitioning until you've got millions of records. Otherwise, you won't see much of a performance boost.

  • If your table can be logically broken down into smaller chunks, Example: you have a table full of server logs. You could split them up by date, so all the logs from the same day are in one single partition. This makes it much easier to do tasks like deleting old logs by just dropping the partition.

Available partitioning types

Postgres has built-in support for three types of partitioning covering the most common use cases.

Partition by Range

The table is partitioned into “ranges” defined by a key column or a set of columns, with no overlap between the ranges of values assigned to different partitions

In the following example, the people table will be partitioned by birth_date

CREATE TABLE people (
    id int not null,
    birth_date date not null,
    country_code character(2) not null,
    name text
) PARTITION BY RANGE (birth_date);

CREATE TABLE people_y2000 PARTITION OF people
    FOR VALUES FROM ('2000-01-01') TO ('2001-01-01');

CREATE TABLE people_y2001 PARTITION OF people
    FOR VALUES FROM ('2001-01-01') TO ('2002-01-01');

CREATE TABLE people_y2002 PARTITION OF people
    FOR VALUES FROM ('2002-01-01') TO ('2003-01-01');

Let’s try it:

INSERT INTO people (id, birth_date, country_code, name) VALUES
   (1, '2000-01-01', 'US', 'John'),
   (2, '2000-02-02', 'IT', 'Jane'),
   (3, '2001-03-03', 'FR', 'Bob');
> INSERT 0 3

SELECT schemaname,relname,n_live_tup 
   FROM pg_stat_user_tables 
   ORDER BY n_live_tup DESC;

schemaname  |   relname    | n_live_tup 
------------+--------------+------------
 public     | people_y2000 |          2
 public     | people_y2001 |          1
 public     | people_y2002 |          0

As you can see we inserted three records into the master table, people. Since the table is partitioned by birth_date, two records have been added to partition people_y2000, one into people_y2001 while people_y2002 is still empty.

Partition by List

The table is partitioned by explicitly listing which key values appear in each partition.

Taking the same example, let’s add a country_code column and use it as the partitioning key

CREATE TABLE people (
    id int not null,
    birth_date date not null,
    country_code character(2) not null,
    name text
) PARTITION BY LIST (country_code);

-- Partition for people living in Europe
CREATE TABLE people_EU PARTITION OF people
    FOR VALUES IN ('AT', 'DE', 'IT', 'FR', 'ES', ..... );

-- Partition for people living in United States
CREATE TABLE people_US PARTITION OF people
    FOR VALUES IN ('US');

Let’s try it:

INSERT INTO people (id, birth_date, country_code, name) VALUES
   (1, '2000-01-01', 'US', 'John'),
   (2, '2000-02-02', 'IT', 'Jane'),
   (3, '2001-03-03', 'FR', 'Bob');
> INSERT 0 3

SELECT schemaname,relname,n_live_tup 
   FROM pg_stat_user_tables 
   ORDER BY n_live_tup DESC;

 schemaname |  relname  | n_live_tup 
------------+-----------+------------
 public     | people_eu |          2
 public     | people_us |          1

Again, PostgreSQL moved every row to the correct partition.

Partition by Hash

The table is partitioned by specifying a modulus and a remainder for each partition. Each partition will hold the rows for which the hash value of the partition key divided by the specified modulus will produce the specified remainder.

This type is useful when we can’t logically divide our data, but we can only reduce the table size by spreading rows into many smaller partitions.

The following SQL will divide people into three tables, every table will contain (almost) the same number of rows.

CREATE TABLE people (
    id int not null,
    birth_date date not null,
    country_code character(2) not null,
    name text
) PARTITION BY HASH (id);

CREATE TABLE people_1 PARTITION OF people
    FOR VALUES WITH (MODULUS 3, REMAINDER 0);

CREATE TABLE people_2 PARTITION OF people
    FOR VALUES WITH (MODULUS 3, REMAINDER 1);

CREATE TABLE people_3 PARTITION OF people
    FOR VALUES WITH (MODULUS 3, REMAINDER 2);

Let’s try it:

INSERT INTO people (id, birth_date, country_code, name) VALUES
   (1, '2000-01-01', 'US', 'John'),
   (2, '2000-02-02', 'IT', 'Jane'),
   (3, '2001-03-03', 'FR', 'Bob');
> INSERT 0 3

SELECT schemaname,relname,n_live_tup 
   FROM pg_stat_user_tables 
   ORDER BY n_live_tup DESC;

 schemaname | relname  | n_live_tup 
------------+----------+------------
 public     | people_1 |          1
 public     | people_2 |          1
 public     | people_3 |          1

As you can see, the three records have been evenly split across all the partitions available.

Default partition

What happens when you try to insert a record that can’t fit into any partition?

Let’s go back to the people table defined in the list partitioning chapter and try to add Linda, from Canada:

INSERT INTO people (id, birth_date, country_code, name) VALUES
   (4, '2002-04-04', 'CA', 'Linda');

ERROR:  no partition of relation "people" found for rowDETAILS: Partition key of the failing row contains (country_code) = (CA).

The INSERT will fail because PostgreSQL doesn’t know where to add that record.

The most obvious solution would be to add a new partition, but if we have to do it for every country in the world, we would end up with hundreds of tables with a very small number of records. Not really nice.

Luckily, it’s possible to define a DEFAULT partition!

CREATE TABLE people_default PARTITION OF people DEFAULT;

Trying the same inserts again, it will result in:

INSERT INTO people (id, birth_date, country_code, name) VALUES
   (1, '2000-01-01', 'US', 'John'),
   (2, '2000-02-02', 'IT', 'Jane'),
   (3, '2001-03-03', 'FR', 'Bob'),
   (4, '2002-04-04', 'CA', 'Linda');
> INSERT 0 4

schemaname |    relname     | n_live_tup 
------------+----------------+------------
 public     | people_eu      |          2
 public     | people_us      |          1
 public     | people_default |          1

As you can see, Linda has now been added to people_default.

Sub Partitioning

A single partition can also be a partitioned table!

Back to the LIST example, we can imagine that people_EU will contain a lot of records, so we may want to subdivide it by hash:

CREATE TABLE people (
    id int not null,
    birth_date date not null,
    country_code character(2) not null,
    name text
) PARTITION BY LIST (country_code);

CREATE TABLE people_US PARTITION OF people
    FOR VALUES IN ('US');

CREATE TABLE people_EU PARTITION OF people
    FOR VALUES IN ('AT', 'DE', 'IT', 'FR', 'ES', ..... )
    PARTITION BY HASH (id);

CREATE TABLE people_EU_1 PARTITION OF people_EU
    FOR VALUES WITH (MODULUS 3, REMAINDER 0);

CREATE TABLE people_EU_2 PARTITION OF people_EU
    FOR VALUES WITH (MODULUS 3, REMAINDER 1);

CREATE TABLE people_EU_3 PARTITION OF people_EU
    FOR VALUES WITH (MODULUS 3, REMAINDER 2);

This will result in:

INSERT INTO people (id, birth_date, country_code, name) VALUES
   (1, '2000-01-01', 'US', 'John'),
   (2, '2000-02-02', 'IT', 'Jane'),
   (3, '2001-03-03', 'FR', 'Bob');
> INSERT 0 3

SELECT schemaname,relname,n_live_tup 
   FROM pg_stat_user_tables 
   ORDER BY n_live_tup DESC;

schemaname |   relname   | n_live_tup 
------------+-------------+------------
 public     | people_eu_2 |          1
 public     | people_eu_1 |          1
 public     | people_us   |          1
 public     | people_eu_3 |          0

Partition operations

Attaching and detaching partitions

As we have seen, single partitions can be created and dropped whenever we want, but what if we want to exclude some records from the master table without deleting them?

The answer is: DETACH

ALTER TABLE people DETACH PARTITION people_us;

A detached partition will act as a normal table, so it will be possible to insert records that would violate the partition constraints.

The reverse operation, ATTACH, is as easy as:

ALTER TABLE people ATTACH PARTITION people_us FOR VALUES IN ('US');

Indexing

Too bad, PostgreSQL doesn’t allow to create a single index covering every partition of the table, but you have to create an index for every partition.

The bad news about this is that the primary key, or any other unique index, must include the columns used on the partition by statement.

-- THIS WON'T WORK

CREATE UNIQUE INDEX idx_uniq ON people (id);
> ERROR:  insufficient columns in UNIQUE constraint definition
> DETAILS: UNIQUE constraint on table "people" lacks column "country_code" which is part of the partition key.


-- THIS WORKS!

CREATE UNIQUE INDEX idx_uniq ON people (id, country_code);
> CREATE INDEX

The reason behind this is the fact that every partition is treated as an independent table, so adding the partition key to the index is the only way to grant the uniqueness of a record across the whole table.

Note that creating an index on the master table will automatically replicate it to every attached partition:

CREATE UNIQUE INDEX idx_uniq ON people (id, country_code);


-- Check created indexes

SELECT tablename, indexname FROM pg_indexes
WHERE schemaname = 'public' ORDER BY tablename, indexname;

tablename      | indexname
---------------+-------------------------------------
people         | idx_uniq
people_default | people_default_id_country_code_idx
people_eu      | people_eu_id_country_code_idx

And that's all you need to know!
Follow me if you liked this article or leave a comment.

Did you find this article valuable?

Support Matteo Crosta by becoming a sponsor. Any amount is appreciated!