PostgreSQL Multicolumn Indexes
This article describes how to create multicolumn indexes in PostgreSQL.
As the name suggests, a multicolumn index is an index defined on multiple columns of a table. Multi-column index, also known as compound index or composite index.
In PostgreSQL, only the B-tree, GIST, GIN, and BRIN index types support multicolumn indexes. A multicolumn index supports up to 32 columns.
PostgreSQL Multicolumn Index Rules
When defining a multi-column index, you should place frequently used columns in the WHERE
clause at the beginning of the column list, and less frequently used columns in subsequent conditions. Otherwise, the PostgreSQL optimizer might not use the index.
For example, the following statement defines an index on the a
, b
and c
columns:
CREATE INDEX index_name
ON table_name(a, b, c);
In the above syntax, the PostgreSQL optimizer considers the use of an index in the following cases:
WHERE a = v1 and b = v2 and c = v3;
or
WHERE a = v1 and b = v2;
or
WHERE a = v1;
However, indexes are not considered in the following cases:
WHERE c = v3;
or
WHERE b = v2 and c = v3;
PostgreSQL multicolumn index Examples
Let’s demonstrate a PostgreSQL multicolumn index on the customer
table in the Sakila sample database.
View information about the customer
table:
\d customer
Table "public.customer"
Column | Type | Collation | Nullable | Default
-------------+-----------------------------+-----------+----------+-----------------------------------------------
customer_id | integer | | not null | nextval('customer_customer_id_seq'::regclass)
store_id | smallint | | not null |
first_name | character varying(45) | | not null |
last_name | character varying(45) | | not null |
email | character varying(50) | | |
address_id | smallint | | not null |
activebool | boolean | | not null | true
create_date | date | | not null | 'now'::text::date
last_update | timestamp without time zone | | | now()
active | integer | | |
Indexes:
"customer_pkey" PRIMARY KEY, btree (customer_id)
"idx_fk_address_id" btree (address_id)
"idx_fk_store_id" btree (store_id)
"idx_last_name" btree (last_name)
...
There are already two indexes in the customer
table, in order to demonstrate the multi-column index, let’s drop the index named idx_last_name
first:
DROP INDEX idx_last_name;
Create an index on the last_name
and first_name
columns using the following statement :
CREATE INDEX ON customer (last_name, first_name);
View the customer
table:
\d customer
Table "public.customer"
Column | Type | Collation | Nullable | Default
-------------+-----------------------------+-----------+----------+-----------------------------------------------
customer_id | integer | | not null | nextval('customer_customer_id_seq'::regclass)
store_id | smallint | | not null |
first_name | character varying(45) | | not null |
last_name | character varying(45) | | not null |
email | character varying(50) | | |
address_id | smallint | | not null |
activebool | boolean | | not null | true
create_date | date | | not null | 'now'::text::date
last_update | timestamp without time zone | | | now()
active | integer | | |
Indexes:
"customer_pkey" PRIMARY KEY, btree (customer_id)
"customer_last_name_first_name_idx" btree (last_name, first_name)
"idx_fk_address_id" btree (address_id)
"idx_fk_store_id" btree (store_id)
We find that customer_last_name_first_name_idx
indexe is defined on last_name
and first_name
columns.
Use the EXPLAIN
statement to see if the following statement uses an index:
EXPLAIN
SELECT * FROM customer
WHERE last_name = 'A'
AND first_name = 'B';
QUERY PLAN
---------------------------------------------------------------------------------------------------
Index Scan using customer_last_name_first_name_idx on customer (cost=0.28..8.29 rows=1 width=70)
Index Cond: (((last_name)::text = 'A'::text) AND ((first_name)::text = 'B'::text))
(2 rows)
Here, the PostgreSQL optimizer chooses to use the index when both last_name
and first_name
in the WHERE
condition .
Use the EXPLAIN
statement to see if the following statement uses an index:
EXPLAIN
SELECT * FROM customer
WHERE last_name = 'A';
QUERY PLAN
---------------------------------------------------------------------------------------------------
Index Scan using customer_last_name_first_name_idx on customer (cost=0.28..8.29 rows=1 width=70)
Index Cond: ((last_name)::text = 'A'::text)
(2 rows)
Here, the PostgreSQL optimizer chooses to use the index when only last_name
in the WHERE
condition. This is because last_name
column is the first column in the index.
Use the EXPLAIN
statement to see if the following statement uses an index:
EXPLAIN
SELECT * FROM customer
WHERE first_name = 'B';
QUERY PLAN
----------------------------------------------------------
Seq Scan on customer (cost=0.00..15.49 rows=1 width=70)
Filter: ((first_name)::text = 'B'::text)
(2 rows)
Here, the PostgreSQL optimizer does not choose to use the index when only first_name
in the WHERE
condition. This is because first_name
column is not the first column of the index.
Conclusion
This article discusses PostgreSQL multicolumn indexes and how the order of multicolumn indexes affects the PostgreSQL optimizer.
When you define a multicolumn index, you should always consider the business context to determine which columns are frequently used for lookups, and place those columns at the beginning of the column list when defining the index.