PostgreSQL GROUP BY - Grouping and Aggregate
This article describes how to use the GROUP BY
clause in a PostgreSQL SELECT
statement to group rows according to a specified column or expression.
In PostgreSQL, the GROUP BY
clause is used to group rows according to a specified column or expression.
Sometimes, we need to aggregate the result set according to a certain dimension. This is often used in statistics, consider the following scenarios:
- Average grades by class.
- Aggregate total scores by student.
- Sales by year or month.
- Count the number of users by country or region.
PostgreSQL GROUP BY
syntax
The GROUP BY
clause is an optional clause of a SELECT
statement. To group rows in a SELECT
statement, use the GROUP BY
clause with the following syntax:
SELECT column1[, column2, ...], aggregate_function(ci)
FROM table
[WHERE clause]
GROUP BY column1[, column2, ...];
[HAVING clause]
Explanation:
-
The
column1[, column2, ...]
is the colums to group by, at least one column, but multiple columns. -
The
aggregate_function(ci)
is an aggregate function, used to aggregate. This is optional. You can use one or more of the following aggregate functions:sum()
: Calculate the sum of the specified values within the groupavg()
: Calculate the mean of the specified values within the groupmax()
: Calculate the maximum value of the specified values in the groupMIN()
: Calculate the minimum value of the specified values in the groupcount()
: Count the number of the specified values in a group
-
The columns of
SELECT
must be the columns in the grouping clause. -
The
WHERE
clause is optional and is used to filter rows before grouping. -
The
HAVING
clause is optional and is used to filter groups.
PostgreSQL GROUP BY
Examples
We will use the tables in the Sakila sample database for demonstration, please install the Sakila sample database in PostgreSQL first.
Simple Example for GROUP BY
The following statement uses the GROUP BY
clause to find a list of last names from the actor
table.
SELECT last_name
FROM actor
GROUP BY last_name;
last_name
--------------
AKROYD
BRIDGES
HUNT
GIBSON
ALLEN
SUVARI
HESTON
MONROE
...
WILSON
(121 rows)
In this example, the GROUP BY
clause groups all rows by the last_name
column.
The output of this example is exactly the same as the following statement with DISTINCT
:
SELECT DISTINCT last_name FROM actor;
Examples with GROUP BY
and Aggregate Functions
The following statement uses the GROUP BY
clause and aggregate function count()
to find the list of last names and the number of each last name from the actor
table.
SELECT last_name, count(*)
FROM actor
GROUP BY last_name
ORDER BY count(*) DESC;
last_name | count
--------------+-------
KILMER | 5
TEMPLE | 4
NOLTE | 4
WILLIAMS | 3
PECK | 3
HOPKINS | 3
DAVIS | 3
HARRIS | 3
DEGENERES | 3
...
CLOSE | 1
(121 rows)
In this example, the steps of execution is as follows:
- First, the
GROUP BY
clause groups all rows in theactor
table by last name. - Then, use the aggregate function
count(*)
to aggregate the number of rows for each last name. - Finally, use the
ORDER BY
clause to sort in descending order bycount(*)
, and the surnames with the most numbers come first.
Likewise, if you want to find the number of films for each film rating from the film
table, use the following statement:
SELECT rating, count(*)
FROM film
GROUP BY rating
ORDER BY count(*) DESC;
rating | count
--------+-------
PG-13 | 223
NC-17 | 210
R | 195
PG | 194
G | 178
(5 rows)
Examples with GROUP BY
, LIMIT
, Aggregate Functions
The following example uses the GROUP BY
clause, LIMIT
clause, and the sum()
function to find the top 10 customers from the payment
table.
SELECT customer_id, sum(amount) total
FROM payment
GROUP BY customer_id
ORDER BY total DESC
LIMIT 10;
customer_id | total
-------------+--------
526 | 221.55
148 | 216.54
144 | 195.58
178 | 194.61
137 | 194.61
459 | 186.62
469 | 177.60
468 | 175.61
236 | 175.58
181 | 174.66
(10 rows)
In this example, the steps of execution is as follows:
- First, the
GROUP BY
clause groups all rows by thecustomer_id
column. - Then, use the aggregate function
sum(amount)
to sum theamount
columns of all rows in each group and usetotal
as the column alias . - Then, use the
ORDER BY
clause to sort groups in descending order bytotal
. - Finally, use the
LIMIT 10
clause to return the first 10 record rows.
Examples with GROUP BY
and HAVING
The following example uses GROUP BY
clause, HAVING
clause, and the sum()
function to find customers with a total spend of more than $180 from the payment
table.
SELECT customer_id, sum(amount) total
FROM payment
GROUP BY customer_id
HAVING sum(amount) > 180
ORDER BY total DESC;
customer_id | total
-------------+--------
526 | 221.55
148 | 216.54
144 | 195.58
178 | 194.61
137 | 194.61
459 | 186.62
(6 rows)
In this example, the steps of execution is as follows:
- First, the
GROUP BY
clause group all rows by thecustomer_id
column. - Then, use the aggregate function
sum(amount)
to sum theamount
columns of all rows in each group and usetotal
as the column alias. - Then, use the
HAVING
clause to filter the groups that meetssum(amount)
is greater than180
. - Finally, use the
ORDER BY
clause to sort groups in descending order.
The HAVING
clause is used to filter the grouped data and it requires a logical expression as a condition, where the column name or expression in the logical expression can only be the column, expression, or aggregate function applied in the group clause.
Conclusion
In this article, we introduced the use of the GROUP BY
clause group rows by specified columns or expressions.