Window Functions in MySQL 8
Window functions are a new feature in MySQL 8. Window functions are not as simple as the other built-in MySQL functions. While most functions operate only on the data pertaining to a single row of data, Window functions operate over a “window” of data when performing a SQL query, and also for each query row.
What Is a Window Function?
A Window function is a function that performs a calculation over a window, or set, of rows related to the current query row. The window calculation is performed for each row of data. Below is a simple example to illustrate Window functions.
A Simple Example
Consider the sales data for magazines from different publishers stored in a table magazine_data is to be queried. A single publisher could be publishing multiple magazines. You may be interested in total magazine sales from all publishers, or you could be interested in magazine sales per publisher. To start off, the query SELECT SUM(sales) AS total_sales FROM magazine_data; returns total sales for all magazines. A Window function is not used yet. If you are interested in finding sales for a particular publisher your query would be SELECT publisher, SUM(sales) AS publisher_sales FROM magazine_data WHERE publisher=”Some Publisher”;. Still a Window function is not used.
But before introducing Window functions, you must become familiar with two clauses that are used to define the window of data.
The OVER and WINDOW Clauses
The OVER clause specifies the window of data to perform a related calculation over. The OVER clause could make use of a PARTITION BY clause to group rows over to perform a related calculation. The OVER clause may select a subset within the current partition using a frame of data defined by ROWS, or RANGE clauses. The OVER clause could specify a named window of data. If a named window of data is used, the WINDOW clause is used to define that window of data.
Getting Back to the Example
Having discussed how a Window function selects the window of data to perform a calculation over for the data related to the current row, the magazine sales example may be used to illustrate Window functions. This time, we are interested in sales generated for each magazine by each publisher in addition to the total magazine sales. But we are not only interested in comparing individual magazine sales over total magazine sales from all publishers. We are also interested in being able to compare individual magazine sales with other magazines published by the same publisher. That is where a Window function becomes useful. A query making use of a Window function would be the following:
SELECT publisher, magazine_name, sales AS magazine_sales, SUM(sales) OVER() AS total_sales, SUM(sales) OVER(PARTITION BY publisher) AS publisher_sales FROM magazine_data ORDER BY publisher, magazine_name;
The first OVER() clause selects the complete query set as a single partition. The second OVER() clause however partitions rows by publisher and performs a calculation over the partition returning sales sum by publisher. A partial result table may be something like the following:
| publisher | magazine_name |magazine_sales |total_sales | publisher_sales |
| publisher1| magazine1 | 30 | 1000 | 100 |
| publisher1| magazine2 | 60 | 1000 | 100 |
| publisher1| magazine3 | 10 | 1000 | 100 |
Window functions could be classified as non-aggregate window functions and aggregate window functions. Most aggregate functions like SUM() and AVG() can be used as Window functions by including the OVER clause. Not all aggregate functions can be used as Window functions; only the AVG(), BIT_AND(), BIT_OR(), BIT_XOR(), COUNT(), JSON_ARRAYAGG(), JSON_OBJECTAGG(), MAX(), MIN(), STDDEV_POP(), STDDEV(), STD(), STDDEV_SAMP(), SUM(),VAR_POP(), VARIANCE(), and VAR_SAMP() functions can be used as Window functions. Non aggregate window functions include CUME_DIST(), DENSE_RANK(), FIRST_VALUE(), LAG(), LAST_VALUE(), LEAD(), NTH_VALUE(), NTILE(), PERCENT_RANK(), RANK(), and ROW_NUMBER(). An essential difference to note when an aggregate function is used as a Window function vis-à-vis when it is not, is that when used as a Window function a result is produced per row, whereas when used as an aggregate grouping function the result is a single aggregate result row.