A window function operates over a specified set of
It lets us perform calculations across any number of table rows that are linked to the current row. It can be defined by combining partitioning, ordering, and framing clauses within the OVER()
SELECT
, WHERE
, HAVING
, and ORDER BY
and can either be predefined or user-defined.
Window functions in SQL can be broadly categorized into the following three categories:
Analytic window functions
Analytic window functions are also known as value window functions.
In contrast, a regular function operates on individual rows and returns a single result for each row.
It can be predefined or user-defined. Predefined functions are built into SQL databases to perform specific tasks, such as mathematical calculations, text manipulation, and date operations. The commonly used predefined functions include ABS()
, ROUND()
, UPPER()
, SUBSTRING()
, GETDATE()
, and DATEADD()
.
On the other hand, an aggregate function operates on a dataset and usually returns a single value for the dataset. It can be predefined or user-defined. The commonly used functions include SUM()
, MIN()
, MAX()
, AVG()
, and COUNT()
.
Let’s explore window functions in more detail.
In SQL, a window refers to the set of rows on which the window function operates. Within a window, a window frame specifies the range of rows to consider for our calculations. Partitions are used to group rows together before applying the window function.
A partition can be defined using the PARTITION BY
clause. The window functions can be predefined or user-defined.
Window: The set of rows on which the window function operates. Within a window, a
Window frame: It specifies the range of rows using OVER()
to consider for our calculations within a window.
Partition: It groups rows together before applying the window function. A partition can be defined using the PARTITION BY
clause.
The following is the syntax for a window function:
window_function ()OVER ( [PARTITION BY clause][ORDER BY clause][window_frame clause])
The window_function()
represents the window function. We replace it with the actual window function to operate.
We create partitions by defining the column/condition in PARTITION BY clause
. This clause becomes optional if there’s no need to create partitions. The window function will treat the dataset as a single window in such cases.
The ORDER BY clause
specifies the order within each partition.
The window_frame clause
is used to define the range of rows considered for the calculations.
In SQL, aggregate window functions are used to return a single scalar value after performing calculations on a set of values. These functions include SUM()
, MIN()
, MAX()
, AVG()
, and COUNT()
.
The SUM()
function: The SUM()
function returns a scalar value, the total sum of a numeric value column.
The MIN()
function: The MIN()
function returns the smallest value of the selected column.
The MAX()
function: The MAX()
function returns the most significant value of the selected column.
The AVG()
function: The AVG()
function returns a scalar value that is the average value of a numeric value column.
The COUNT()
function: The COUNT()
function returns a scalar value for the number of rows after calculating a set of values.
In SQL, ranking window functions assign a rank to each row in the result set based on specified criteria. It helps in determining the row’s relative position. These functions include RANK()
, DENSE_RANK()
, ROW_NUMBER()
, NTILE()
, and PERCENT_RANK()
.
Ranking window functions in SQL offer powerful tools for analyzing and processing data. These functions provide various ways to assign ranks, row numbers, and percentile ranks to rows in a result set.
The ROW_NUMBER()
function: The ROW_NUMBER()
function assigns a unique row number to each row within a result set. It starts from 1
and increments by 1 for each subsequent row. It is essential for various tasks, including ranking, data partitioning, and paging. ROW_NUMBER()
assigns a unique number to each row without considering duplicates.
The RANK()
and DENSE_RANK()
functions: The R
ANK()
and DENSE_RANK()
functions assign a rank to each row in the result set.
In case of duplicate values, the RANK()
function assigns the same rank to duplicate values and the next rank is calculated after incrementing the current rank value. On the other hand, the DENSE_RANK()
function assigns the same value to all the duplicate values.
The next unique value will get the next consecutive number as a rank.
The NTILE()
function: The NTILE()
function divides the result set into groups known as buckets or tiles. It then assigns a bucket number to each row based on the specified number within the function.
The PERCENT_RANK()
function: The PERCENT_RANK()
computes a relative ranking of the value based on the rows in a dataset. It returns a value ranging from 0
to 1
(inclusive). If there are duplicate values, PERCENT_RANK
assigns the same percentile rank to each duplicate value, considering the highest ranking value has no ties.
In SQL, analytic window functions assign values to the rows in the result set based on the values in the other rows. Analytic window functions are also called value window functions. These functions include LAG()
, LEAD()
, FIRST_VALUE()
, LAST_VALUE()
, and NTH_VALUE()
.
In conclusion, analytic window functions in SQL offer powerful tools for analyzing and processing data. These functions provide various ways to find specific values in a result set.
The LAG()
function: The LAG()
function assigns the previous value to each row within a result set until the last row in the dataset. There are two optional parameters available with the LAG()
function, OFFSET
and DEFAULT
.
The LEAD()
function: The LEAD()
function assigns the next value to each row within a result set until the last row in the dataset. Similar to LAG()
function, there are two optional parameters available with the LEAD()
function, OFFSET
and DEFAULT
.
The FIRST_VALUE()
function: The window function FIRST_VALUE()
returns the first value in the dataset. It returns the value from the first row in the window frame based on the specified order.
The LAST_VALUE()
function: The window function LAST_VALUE()
returns the last value in the dataset. It returns the value in the last row in the window frame based on the specified order.
The NTH_VALUE()
function: The window function NTH_VALUE()
returns the Nth value in the dataset. It returns the value in the nth row in the window frame based on the specified order.
Attempt the following quiz to check your understanding of the window functions.
Let’s say we have a table StudentGrades
in which we have the columns StudentID
, StudentName
, Subject
, and Marks
. Select the correct option to find the count of all the students who got more than 85 marks:
SELECT SUM(Marks) AS Above85
FROM StudentGrades
WHERE Marks > 85;
SELECT COUNT(*) AS Above85
FROM StudentGrades
WHERE Marks >= 85;
SELECT COUNT(*) AS Above85
FROM StudentGrades
WHERE Marks > 85;
SELECT COUNT(*) AS Above85
FROM StudentGrades
WHERE Marks < 85;
In conclusion, window functions provide us with more control over the analysis.
We discussed different types of window functions, including aggregate, ranking, and value window functions. Window functions provide us with the ability to transform complex data analysis into more manageable tasks. The classification of window functions is as follows:
Aggregate | Ranking | Value |
AVG() | ROW_NUMBER() | LAG() |
MAX() | RANK() | LEAD() |
MIN() | DENSE_RANK() | FIRST_VALUE() |
SUM() | PERCENT_RANK() | LAST_VALUE() |
COUNT() | NTILE() | NTH_VALUE() |
Free Resources