How the NTILE() function works in Mariadb?

The NTILE() function is a window function that divides a result set into a specified number of groups of approximately equal size.

Posted on

The NTILE() function is a window function that divides a result set into a specified number of groups of approximately equal size. It assigns each row in the result set a group number, starting from 1. The group number indicates which group the row belongs to.

Syntax

The syntax of the NTILE() function is as follows:

NTILE(number_of_groups) OVER (
  [PARTITION BY partition_expression]
  [ORDER BY sort_expression [ASC | DESC]]
)

The NTILE() function accepts one argument:

  • number_of_groups: The number of groups to divide the result set into. It must be a positive integer value.

The NTILE() function requires an OVER clause that defines the window specification. The window specification can include the following optional clauses:

  • PARTITION BY: This clause divides the result set into partitions based on the values of the partition_expression. The NTILE() function is applied to each partition separately. If this clause is omitted, the whole result set is treated as a single partition.
  • ORDER BY: This clause specifies the order of the rows within each partition. The NTILE() function assigns group numbers based on this order. The sort_expression can be a column name, an expression, or a combination of both. The optional ASC or DESC keywords specify the ascending or descending order, respectively. The default order is ascending.

Examples

Let’s look at some examples of using the NTILE() function in Mariadb.

Example 1: Simple usage

Suppose we have a table called employees that stores the information of the employees in a company. The table has the following columns:

  • id: The employee ID
  • name: The employee name
  • salary: The employee salary

The table contains the following data:

SELECT * FROM employees;
+----+---------+--------+
| id | name    | salary |
+----+---------+--------+
|  1 | Alice   |  5000  |
|  2 | Bob     |  6000  |
|  3 | Charlie |  7000  |
|  4 | David   |  8000  |
|  5 | Eve     |  9000  |
|  6 | Frank   | 10000  |
|  7 | Grace   | 11000  |
|  8 | Harry   | 12000  |
|  9 | Irene   | 13000  |
| 10 | Jack    | 14000  |
+----+---------+--------+

We want to divide the employees into four groups based on their salaries, and assign each employee a group number. We can use the NTILE() function as follows:

SELECT id, name, salary, NTILE(4) OVER (ORDER BY salary) AS group_number
FROM employees;
+----+---------+--------+--------------+
| id | name    | salary | group_number |
+----+---------+--------+--------------+
|  1 | Alice   |  5000  |            1 |
|  2 | Bob     |  6000  |            1 |
|  3 | Charlie |  7000  |            1 |
|  4 | David   |  8000  |            2 |
|  5 | Eve     |  9000  |            2 |
|  6 | Frank   | 10000  |            2 |
|  7 | Grace   | 11000  |            3 |
|  8 | Harry   | 12000  |            3 |
|  9 | Irene   | 13000  |            3 |
| 10 | Jack    | 14000  |            4 |
+----+---------+--------+--------------+

The result shows that the NTILE() function divides the employees into four groups of equal size (three rows per group), and assigns each employee a group number based on their salary order. The lowest salary group has the group number 1, and the highest salary group has the group number 4.

Example 2: Using PARTITION BY clause

Suppose we have another table called sales that stores the sales data of the products in a company. The table has the following columns:

  • id: The sale ID
  • product: The product name
  • quantity: The quantity sold
  • price: The unit price
  • date: The sale date

The table contains the following data:

SELECT * FROM sales;
+----+---------+----------+-------+------------+
| id | product | quantity | price | date       |
+----+---------+----------+-------+------------+
|  1 | A       |       10 |    50 | 2024-01-01 |
|  2 | B       |       20 |    40 | 2024-01-02 |
|  3 | C       |       30 |    30 | 2024-01-03 |
|  4 | A       |       15 |    50 | 2024-01-04 |
|  5 | B       |       25 |    40 | 2024-01-05 |
|  6 | C       |       35 |    30 | 2024-01-06 |
|  7 | A       |       20 |    50 | 2024-01-07 |
|  8 | B       |       30 |    40 | 2024-01-08 |
|  9 | C       |       40 |    30 | 2024-01-09 |
| 10 | A       |       25 |    50 | 2024-01-10 |
+----+---------+----------+-------+------------+

We want to divide the sales of each product into two groups based on the quantity sold, and assign each sale a group number. We can use the NTILE() function with the PARTITION BY clause as follows:

SELECT id, product, quantity, price, date, NTILE(2) OVER (PARTITION BY product ORDER BY quantity) AS group_number
FROM sales;
+----+---------+----------+-------+------------+--------------+
| id | product | quantity | price | date       | group_number |
+----+---------+----------+-------+------------+--------------+
|  1 | A       |       10 |    50 | 2024-01-01 |            1 |
|  4 | A       |       15 |    50 | 2024-01-04 |            1 |
|  7 | A       |       20 |    50 | 2024-01-07 |            2 |
| 10 | A       |       25 |    50 | 2024-01-10 |            2 |
|  2 | B       |       20 |    40 | 2024-01-02 |            1 |
|  5 | B       |       25 |    40 | 2024-01-05 |            1 |
|  8 | B       |       30 |    40 | 2024-01-08 |            2 |
|  3 | C       |       30 |    30 | 2024-01-03 |            1 |
|  6 | C       |       35 |    30 | 2024-01-06 |            1 |
|  9 | C       |       40 |    30 | 2024-01-09 |            2 |
+----+---------+----------+-------+------------+--------------+

The result shows that the NTILE() function divides the sales of each product into two groups of equal size (two rows per group), and assigns each sale a group number based on their quantity order. The lowest quantity group has the group number 1, and the highest quantity group has the group number 2.

Example 3: Using special parameters

The NTILE() function can also accept special parameters, such as NULL, 0, or negative values. However, these parameters may cause unexpected results or errors. Let’s see some examples of using special parameters with the NTILE() function.

Example 3.1: Using NULL as the parameter

If we use NULL as the parameter of the NTILE() function, the function will return NULL for all rows, regardless of the window specification. For example:

SELECT id, name, salary, NTILE(NULL) OVER (ORDER BY salary) AS group_number
FROM employees;

Example 3.2: Using 0 as the parameter

If we use 0 as the parameter of the NTILE() function, the function will cause a division by zero error, and the query will fail. For example:

SELECT id, name, salary, NTILE(0) OVER (ORDER BY salary) AS group_number
FROM employees;
ERROR 1365 (22012): Division by zero

Example 3.3: Using negative values as the parameter

If we use negative values as the parameter of the NTILE() function, the function will behave as if the absolute value of the parameter is used. For example, using -4 as the parameter is equivalent to using 4 as the parameter. For example:

SELECT id, name, salary, NTILE(-4) OVER (ORDER BY salary) AS group_number
FROM employees;
+----+---------+--------+--------------+
| id | name    | salary | group_number |
+----+---------+--------+--------------+
|  1 | Alice   |  5000  |            1 |
|  2 | Bob     |  6000  |            1 |
|  3 | Charlie |  7000  |            1 |
|  4 | David   |  8000  |            2 |
|  5 | Eve     |  9000  |            2 |
|  6 | Frank   | 10000  |            2 |
|  7 | Grace   | 11000  |            3 |
|  8 | Harry   | 12000  |            3 |
|  9 | Irene   | 13000  |            3 |
| 10 | Jack    | 14000  |            4 |
+----+---------+--------+--------------+

The result is the same as using 4 as the parameter.

The NTILE() function is one of the many window functions that Mariadb supports. Window functions are functions that operate on a set of rows and return a single value for each row from the underlying query. Window functions can be used to perform various calculations or operations on the result set, such as ranking, aggregation, or analytics.

Some of the related window functions to the NTILE() function are:

  • RANK(): This function assigns a rank to each row within a partition based on the order of the rows. The rank of a row is one plus the number of ranks that come before it. Rows with equal values have the same rank, and leave gaps in the ranking sequence.
  • DENSE_RANK(): This function is similar to the RANK() function, except that it does not leave gaps in the ranking sequence. Rows with equal values have the same rank, and the next rank is assigned without any gaps.
  • ROW_NUMBER(): This function assigns a sequential number to each row within a partition based on the order of the rows. The row number starts from 1 and does not have any gaps. Rows with equal values have different row numbers.
  • PERCENT_RANK(): This function calculates the relative rank of each row within a partition based on the order of the rows. The percent rank of a row is the ratio of its rank minus one to the total number of rows in the partition minus one. The percent rank ranges from 0 to 1, and can be used to measure the percentile of each row.

Here are some examples of using these related window functions with the employees table:

SELECT id, name, salary,
  RANK() OVER (ORDER BY salary) AS rank,
  DENSE_RANK() OVER (ORDER BY salary) AS dense_rank,
  ROW_NUMBER() OVER (ORDER BY salary) AS row_number,
  PERCENT_RANK() OVER (ORDER BY salary) AS percent_rank
FROM employees;
+----+---------+--------+------+------------+------------+--------------+
| id | name    | salary | rank | dense_rank | row_number | percent_rank |
+----+---------+--------+------+------------+------------+--------------+
|  1 | Alice   |  5000  |    1 |          1 |          1 |         0.00 |
|  2 | Bob     |  6000  |    2 |          2 |          2 |         0.11 |
|  3 | Charlie |  7000  |    3 |          3 |          3 |         0.22 |
|  4 | David   |  8000  |    4 |          4 |          4 |         0.33 |
|  5 | Eve     |  9000  |    5 |          5 |          5 |         0.44 |
|  6 | Frank   | 10000  |    6 |          6 |          6 |         0.56 |
|  7 | Grace   | 11000  |    7 |          7 |          7 |         0.67 |
|  8 | Harry   | 12000  |    8 |          8 |          8 |         0.78 |
|  9 | Irene   | 13000  |    9 |          9 |          9 |         0.89 |
| 10 | Jack    | 14000  |   10 |         10 |         10 |         1.00 |
+----+---------+--------+------+------------+------------+--------------+

The result shows the different values returned by each window function for each employee based on their salary order.

Conclusion

In this article, we have learned how the NTILE() function works in Mariadb. We have seen the syntax of the function, and how to use it with different parameters and window specifications. We have also looked at some examples of using the function with different data sets, and some related window functions that can be used for similar purposes.

The NTILE() function is a useful tool for dividing a result set into groups of equal or approximate size, and assigning each row a group number. It can be used for various scenarios, such as data analysis, reporting, or segmentation. However, we should also be aware of the potential errors or unexpected results that may occur when using special parameters, such as NULL, 0, or negative values.