SQL Server NTILE() Function
In SQL Server, the NTILE()
function can divide a result set into a specified number of buckets and assign each row to one of the buckets. This function is commonly used to calculate percentiles or to group data into a fixed number of groups.
Syntax
The syntax for the NTILE()
function is as follows:
NTILE (integer_expression) OVER (
[PARTITION BY partition_expression, ... ]
ORDER BY sort_expression [ASC|DESC], ...
)
Here, integer_expression
is the number of buckets to divide the data into, PARTITION BY
and ORDER BY
are optional and can be set as needed.
Usage
The NTILE()
function is commonly used in the following scenarios:
- Grouping: dividing data into a specified number of groups.
- Percentile calculation: for example, dividing data into 4 buckets can calculate the 25th, 50th, and 75th percentiles.
Examples
Suppose we have the following Employees
table:
EmployeeID | FirstName | LastName | Salary |
---|---|---|---|
1 | John | Doe | 50000 |
2 | Jane | Smith | 60000 |
3 | Bob | Johnson | 70000 |
4 | Mike | Lee | 80000 |
5 | Sarah | Brown | 90000 |
6 | Tom | Davis | 100000 |
7 | Karen | Wilson | 110000 |
8 | Amy | Anderson | 120000 |
9 | James | Taylor | 130000 |
10 | Lisa | Jackson | 140000 |
The following example divides the data in the Employees
table into 3 groups:
SELECT EmployeeID, FirstName, LastName, Salary,
NTILE(3) OVER (ORDER BY Salary) AS GroupNumber
FROM Employees;
After executing the SQL statement above, we will get the following result:
EmployeeID | FirstName | LastName | Salary | GroupNumber |
---|---|---|---|---|
1 | John | Doe | 50000 | 1 |
2 | Jane | Smith | 60000 | 1 |
3 | Bob | Johnson | 70000 | 1 |
4 | Mike | Lee | 80000 | 2 |
5 | Sarah | Brown | 90000 | 2 |
6 | Tom | Davis | 100000 | 2 |
7 | Karen | Wilson | 110000 | 3 |
8 | Amy | Anderson | 120000 | 3 |
9 | James | Taylor | 130000 | 3 |
10 | Lisa | Jackson | 140000 | 3 |
We can see that the NTILE()
function divides the data into 3 buckets and assigns each row to one of the buckets. The first bucket contains the lowest 3 salaries, the second bucket contains the middle 3 salaries, and the third bucket contains the highest 4 salaries.
Conclusion
The NTILE()
function can divide a dataset into a specified number of groups, with each group containing roughly equal numbers of rows. It is frequently used in scenarios such as data analysis and report creation, particularly when data needs to be grouped and presented or grouped and calculated. By utilizing the NTILE()
function, we can more flexibly control the number of groups and the number of data points within each group, thus better meeting our needs.