PostgreSQL cume_dist() Function
The PostgreSQL cume_dist()
function returns the cumulative distribution of the current row, that is, the ratio of the number of rows from the first row to the last row with the same value as the current row in the total number of rows in the partition.
The PostgreSQL cume_dist()
function is often used to display the highest or lowest percentage of records in a recordset. For example, the best 5% of students in this exam, etc.
cume_dist()
Syntax
Here is the syntax of the PostgreSQL cume_dist()
function:
cume_dist()
OVER (
[PARTITION BY partition_column_list]
[ORDER BY order_column_list]
)
Parameters
partition_column_list
-
List of columns for partitioning.
partition_column_list
-
List of columns for sorting.
Return value
The PostgreSQL cume_dist()
function returns a value greater than 0 and less than or equal to 1, which is the cumulative distribution of the current row. Its calculation formula is:
(the number of rows from the first row to the last row with the same value) / the total number of rows in the partition
cume_dist()
Examples
Preparing Data
Use the following CREATE TABLE
statement to create a table named student_grade
to store grades of students:
CREATE TABLE student_grade (
id SERIAL PRIMARY KEY,
name VARCHAR(50) NOT NULL,
class CHAR(1) NOT NULL,
subject VARCHAR(20) NOT NULL,
grade INT NOT NULL
);
This student_grade
table has 5 columns as following:
id
- The row ID, primary key.name
- The name of a student.class
- The class a student is in.subject
- The name of a subject.grade
- The grades for a subject and a student.
Insert some rows into the student_grade
table using the following INSERT
statement:
INSERT INTO student_grade
(name, class, subject, grade)
VALUES
('Tim', 'A', 'Math', 9),
('Tom', 'A', 'Math', 7),
('Jim', 'A', 'Math', 8),
('Tim', 'A', 'English', 7),
('Tom', 'A', 'English', 8),
('Jim', 'A', 'English', 7),
('Lucy', 'B', 'Math', 8),
('Jody', 'B', 'Math', 6),
('Susy', 'B', 'Math', 9),
('Lucy', 'B', 'English', 6),
('Jody', 'B', 'English', 7),
('Susy', 'B', 'English', 8);
Use the following SELECT
statement to show all rows of this table:
SELECT * FROM student_grade;
id | name | class | subject | grade
----+------+-------+---------+-------
1 | Tim | A | Math | 9
2 | Tom | A | Math | 7
3 | Jim | A | Math | 8
4 | Tim | A | English | 7
5 | Tom | A | English | 8
6 | Jim | A | English | 7
7 | Lucy | B | Math | 8
8 | Jody | B | Math | 6
9 | Susy | B | Math | 9
10 | Lucy | B | English | 6
11 | Jody | B | English | 7
12 | Susy | B | English | 8
(12 rows)
Example 1
The following statement uses a PostgreSQL cume_dist()
function to divide all rows into 2 partitions and calculate the cumulative distribution of each student’s grades:
SELECT *,
cume_dist() OVER (
PARTITION BY subject
ORDER BY grade
)
FROM student_grade;
id | name | class | subject | grade | cume_dist
----+------+-------+---------+-------+---------------------
10 | Lucy | B | English | 6 | 0.16666666666666666
11 | Jody | B | English | 7 | 0.6666666666666666
6 | Jim | A | English | 7 | 0.6666666666666666
4 | Tim | A | English | 7 | 0.6666666666666666
5 | Tom | A | English | 8 | 1
12 | Susy | B | English | 8 | 1
8 | Jody | B | Math | 6 | 0.16666666666666666
2 | Tom | A | Math | 7 | 0.3333333333333333
7 | Lucy | B | Math | 8 | 0.6666666666666666
3 | Jim | A | Math | 8 | 0.6666666666666666
9 | Susy | B | Math | 9 | 1
1 | Tim | A | Math | 9 | 1
(8 rows)
Note that the window function in the SQL statement above:
cume_dist() OVER (
PARTITION BY subject
ORDER BY grade
)
In the OVER
clause,
- The
PARTITION BY subject
partitions all rows by subject - The
ORDER BY grade
sorts all rows within each partition in ascending order by grade - The
cume_dist()
calculate the cumulative distribution for each row.
Note that within each subject, rows with the same grade have the same cumulative distribution.