How the CUME_DIST() function works in Mariadb?
The MariaDB CUME_DIST()
function is used to calculate the cumulative distribution of a value within a group of values.
The MariaDB CUME_DIST()
function is used to calculate the cumulative distribution of a value within a group of values. It returns the relative position of a specified value in a group of values, ranging from 0 to 1.
Syntax
The syntax for the MariaDB CUME_DIST()
function is as follows:
CUME_DIST() OVER (
PARTITION BY expression1
ORDER BY expression2 [ASC | DESC]
)
PARTITION BY expression1
(optional): This clause is used to divide the result set into partitions or groups based on the specified expression.ORDER BY expression2 [ASC | DESC]
: This clause is required and is used to sort the values within each partition based on the specified expression. The sort order can be either ascending (ASC
) or descending (DESC
).
The CUME_DIST()
function returns a value between 0 and 1, representing the cumulative distribution of the current row’s value within its partition.
Examples
Example 1: Basic Usage
This example demonstrates the basic usage of the CUME_DIST()
function.
DROP TABLE IF EXISTS employees;
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(50),
salary INT
);
INSERT INTO employees (id, name, salary) VALUES
(1, 'John', 5000),
(2, 'Jane', 6000),
(3, 'Bob', 4000),
(4, 'Alice', 6000),
(5, 'Mike', 7000);
SELECT
name,
salary,
CUME_DIST() OVER (ORDER BY salary) AS cume_dist
FROM employees;
The output will be:
+-------+--------+--------------+
| name | salary | cume_dist |
+-------+--------+--------------+
| Bob | 4000 | 0.2000000000 |
| John | 5000 | 0.4000000000 |
| Alice | 6000 | 0.8000000000 |
| Jane | 6000 | 0.8000000000 |
| Mike | 7000 | 1.0000000000 |
+-------+--------+--------------+
This example shows the cumulative distribution of salaries for each employee. The CUME_DIST()
function calculates the relative position of each salary value within the overall sorted set of salaries.
Example 2: Partition by Department
This example demonstrates how to use the PARTITION BY
clause to partition the data by department.
DROP TABLE IF EXISTS employees;
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(50),
salary INT,
dept VARCHAR(50)
);
INSERT INTO employees (id, name, salary, dept) VALUES
(1, 'John', 5000, 'Sales'),
(2, 'Jane', 6000, 'Marketing'),
(3, 'Bob', 4000, 'Sales'),
(4, 'Alice', 6000, 'Marketing'),
(5, 'Mike', 7000, 'Sales');
SELECT
name,
dept,
salary,
CUME_DIST() OVER (PARTITION BY dept ORDER BY salary) AS cume_dist
FROM employees;
The output will be:
+-------+-----------+--------+--------------+
| name | dept | salary | cume_dist |
+-------+-----------+--------+--------------+
| Jane | Marketing | 6000 | 1.0000000000 |
| Alice | Marketing | 6000 | 1.0000000000 |
| Bob | Sales | 4000 | 0.3333333333 |
| John | Sales | 5000 | 0.6666666667 |
| Mike | Sales | 7000 | 1.0000000000 |
+-------+-----------+--------+--------------+
In this example, the CUME_DIST()
function calculates the cumulative distribution within each department, partitioning the data by the dept
column and sorting by the salary
column.
Example 3: Descending Order
This example demonstrates how to use the DESC
keyword to sort the values in descending order.
SELECT
name,
salary,
CUME_DIST() OVER (ORDER BY salary DESC) AS cume_dist
FROM employees;
The output will be:
+-------+--------+--------------+
| name | salary | cume_dist |
+-------+--------+--------------+
| Mike | 7000 | 0.2000000000 |
| Alice | 6000 | 0.6000000000 |
| Jane | 6000 | 0.6000000000 |
| John | 5000 | 0.8000000000 |
| Bob | 4000 | 1.0000000000 |
+-------+--------+--------------+
In this example, the CUME_DIST()
function calculates the cumulative distribution based on the descending order of salaries.
Example 4: Tie Values
This example shows how the CUME_DIST()
function handles tie values.
DROP TABLE IF EXISTS students;
CREATE TABLE students (
id INT PRIMARY KEY,
name VARCHAR(50),
score INT
);
INSERT INTO students (id, name, score) VALUES
(1, 'John', 80),
(2, 'Jane', 90),
(3, 'Bob', 85),
(4, 'Alice', 90),
(5, 'Mike', 80);
SELECT
name,
score,
CUME_DIST() OVER (ORDER BY score) AS cume_dist
FROM students;
The output will be:
+-------+-------+--------------+
| name | score | cume_dist |
+-------+-------+--------------+
| John | 80 | 0.4000000000 |
| Mike | 80 | 0.4000000000 |
| Bob | 85 | 0.6000000000 |
| Alice | 90 | 1.0000000000 |
| Jane | 90 | 1.0000000000 |
+-------+-------+--------------+
In this example, the CUME_DIST()
function assigns the same cumulative distribution value to rows with tie values (in this case, ‘John’ and ‘Mike’ have the same score of 80).
Example 5: Multiple Partitions and Order By
This example demonstrates how to use multiple partitions and order by clauses.
DROP TABLE IF EXISTS sales;
CREATE TABLE sales (
id INT PRIMARY KEY,
product VARCHAR(50),
region VARCHAR(50),
revenue INT
);
INSERT INTO sales (id, product, region, revenue) VALUES
(1, 'Product A', 'North', 1000),
(2, 'Product A', 'South', 2000),
(3, 'Product B', 'North', 1500),
(4, 'Product B', 'South', 2500),
(5, 'Product A', 'North', 1200);
SELECT
product,
region,
revenue,
CUME_DIST() OVER (PARTITION BY product, region ORDER BY revenue) AS cume_dist
FROM sales;
The output will be:
+-----------+--------+---------+--------------+
| product | region | revenue | cume_dist |
+-----------+--------+---------+--------------+
| Product A | North | 1000 | 0.5000000000 |
| Product A | North | 1200 | 1.0000000000 |
| Product A | South | 2000 | 1.0000000000 |
| Product B | North | 1500 | 1.0000000000 |
| Product B | South | 2500 | 1.0000000000 |
+-----------+--------+---------+--------------+
In this example, the CUME_DIST()
function calculates the cumulative distribution within each product and region partition, sorted by the revenue column.
Related Functions
The following are a few functions related to the MariaDB CUME_DIST()
function:
- MariaDB
RANK()
function is used to assign a rank to each row within a partition, with ties receiving the same rank. - MariaDB
DENSE_RANK()
function is used to assign a rank to each row within a partition, with ties receiving consecutive ranks. - MariaDB
PERCENT_RANK()
function is used to calculate the relative rank of a value within a group of values, ranging from 0 to 1.
Conclusion
The MariaDB CUME_DIST()
function is a powerful tool for calculating the cumulative distribution of values within groups or partitions. It can be used in various scenarios, such as analyzing sales data, academic performance, or any other data that requires ranking or distribution analysis. By understanding how to use the CUME_DIST()
function effectively, you can gain valuable insights into your data and make informed decisions based on the relative positions of values within their respective groups.