SQL Server PERCENTILE_DISC() Function

PERCENTILE_DISC() is an aggregate function in SQL Server used to calculate the specified percentile of a set of values. This function returns a discrete value that is computed from the given set of values.

Syntax

The syntax for the PERCENTILE_DISC() function is as follows:

PERCENTILE_DISC(percentile) WITHIN GROUP (ORDER BY expression [ASC|DESC], ...)
OVER (PARTITION BY partition_expression1, partition_expression2,...)

where:

  • percentile: Required. A value between 0 and 1 that specifies the percentile to calculate.
  • expression: Required. Specifies the column or expression used for sorting.
  • ASC|DESC: Optional. Used to specify ascending or descending order.
  • PARTITION BY: Optional. Used to specify partition columns that group data by the specified column.

Usage

The PERCENTILE_DISC() function is typically used in the following scenarios:

  • For large datasets, we want to find a specific percentile of a set of data.
  • For datasets with similar values, we need to obtain the mode or median of the dataset.

Examples

Example 1

Assuming we have the following employees table:

ID Name Salary
1 John 3000
2 Mike 2500
3 Alice 4000
4 Tom 5000
5 Jane 6000
6 Bob 3500

We can use the following query to calculate the median of the Salary column:

SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY Salary) OVER () as MedianSalary
FROM employees;

Executing the above SQL statement will yield the following result:

MedianSalary
3750

Example 2

Now suppose we have the following scores table:

ID Name Score
1 John 80
2 Mike 70
3 Alice 90
4 Tom 85
5 Jane 95
6 Bob 85

We can use the following query to calculate the mode of the Score column:

SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY Score) OVER () as ModeScore
FROM scores;

Executing the above SQL statement will yield the following result:

ModeScore
85

Conclusion

The PERCENTILE_DISC() function is a very useful function that can help us calculate a specific percentile of a set of data and can also help us find the mode or median of a dataset with similar values.