What are analytic window functions in SQL?

A window function lets you perform calculations across any number of table rows that are linked to the current row. Window functions in SQL can be broadly categorized into the following three categories:

Aggregate window functions
Ranking window functions
Analytic window functions

In SQL, analytic window functions assign values to each row in the result set based on the values in the other rows within the same partition. The result set is the output table generated by a query. Analytic window functions, also known as value window functions, include LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE(), and NTH_VALUE().

The window_function() represents the window function. We replace it with the actual window function to perform the operation.
We create partitions by defining the column/condition in PARTITION BY clause. This clause becomes optional if there’s no need to create partitions. The window function will treat the dataset as a single window in such cases.
The ORDER BY clause specifies the order within each partition.
The window_frame clause is used to define the range of rows considered for the calculations.

Note: The syntax for window functions is consistent across all types, including aggregate, analytic, and ranking window functions.

Understanding analytic window functions

This section explores the core concepts and functionality of analytic window functions in SQL, providing insights into how these functions assign values and help in data analysis.

Example database

We have some data on the student’s exam results in the Exam_result table that we can view by using the SELECT query.

Note: In the query, we have used only the records for the students of Mathematics subject.

Lines 1–4: We use the SELECT statement to read specific columns from the Exam_result table, including std_name, marks, and subject_name.
Lines 5–7: The LAG() window function is used to assign the previous value to each row of the marks column. In the LAG() function, we have used 1 as OFFSET and 0 as DEFAULT. We set the alias of the new column as LAG_default. The function is applied to each row within the partition, and the result will be presented in the LAG_default column in descending order as written in ORDER BY clause.
Line 8: The FROM clause specifies the source table Exam_result from which the data is being queried.

The `LEAD()` function

The LEAD() function assigns the value from the next row within a result set until it reaches the end of the dataset. Similar to LAG() function, there are two optional parameters available with the LEAD() function:

OFFSET: Specifies how many rows ahead to look from the current row. The default value for this parameter is 1.
DEFAULT: Provides a value to use when the OFFSET goes beyond the available rows. The default value for this parameter is 0.

Note: In the query, we have used only the records for the students of Mathematics subject.

Lines 1–4: We use the SELECT statement to read specific columns from the Exam_result table, including std_name, marks, and subject_name.
Lines 5–7: The LEAD() window function is used to assign the next row value to each row of the marks column. In the LEAD() function, we have used 1 as OFFSET and 0 as DEFAULT. We set the alias of the new column as LEAD_default. The function is applied to each row within the partition, and the result will be presented in the LEAD_default column in descending order as written in ORDER BY clause.
Line 8: The FROM clause specifies the source table Exam_result from which the data is being queried.

Note: The LAG() and LEAD() functions are window functions that allow us to access data from previous and subsequent rows within a specified window, respectively.

The `FIRST_VALUE()` function

The window function FIRST_VALUE() returns the first value in the dataset. It returns the value from the first row in the window frame based on the specified order.

Lines 1–4: We use the SELECT statement to read specific columns from the Exam_result table, including std_id, std_name, marks, and subject_name.
Lines 5–9: The FIRST_VALUE() window function is used to calculate the first value of the marks column in each partition defined by the subject_name column. The function is applied to each row within the partition, and the result will be presented in the FIRST_VALUE(marks) column in ascending order as written in ORDER BY clause.
Line 10: The FROM clause specifies the source table Exam_result from which the data is being queried.

The `LAST_VALUE()` function

The window function LAST_VALUE() returns the last value in the dataset. It returns the value in the last row in the window frame based on the specified order.

Lines 1–4: We use the SELECT statement to read specific columns from the Exam_result table, including std_id, std_name, marks, and subject_name.
Lines 5–12: The LAST_VALUE() window function is used to calculate the last value of the marks column in each partition defined by the subject_name column. The function is applied to each row within the partition, and the result is stored in the LAST_VALUE(marks) column. The output will be presented in ascending order of marks column. The ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING clause specifies that the window frame for each row in the partition includes all rows from the first row to the last row of the partition, without regard to the actual values of the rows but based on their order. This means that for each row, the function considers all rows in the partition for calculating the result.
Lines 13: The FROM clause specifies the source table Exam_result from which the data is being queried.

The `NTH_VALUE()` function

The window function NTH_VALUE() returns the Nth value in the dataset. It returns the value in the nth row in the window frame based on the specified order.

Lines 1–4: We use the SELECT statement to read specific columns from the Exam_result table, including std_id, std_name, marks, and subject_name.
Lines 5–12: The NTH_VALUE() window function is used to calculate the value of the marks column in each partition defined by the subject_name column. The function is applied to each row within the partition, and the result will be presented in the NTH_VALUE(marks) column. The output will be presented in descending order of marks column. The ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING clause specifies that the window frame for each row in the partition includes all rows from the first row to the last row of the partition. The output changes the value of NULL record. You may try removing this clause to see how the output looks without this clause.
Lines 13: The FROM clause specifies the source table Exam_result from which the data is being queried.

Conclusion

In conclusion, analytic window functions in SQL offer powerful tools for analyzing and processing data. These functions provide various ways to find specific values in a result set. We saw each function with the coding examples. We learned how these functions work within partitions and how understanding and utilizing window functions can greatly enhance our data analysis capabilities in SQL.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources