Using SQL Window and Offset Window Functions



Hello There!

Today I was faced with a task of diseminating sequence of events in a report, so I thought I would share how I approached the task.

The Rexy can be found here : https://rextester.com/AVTOWD47990



Using window functions such as ROW_NUMBER(), RANK(), DENSE_RANK() and NTILE() we can gain insight into how data are ranked amoung a result set. Each of these functions have specific use cases, rank and dense rank treat ties amoungst data slightly differently while ROW_NUMBER() is useful for assigning a ID field to a result set if for some reason one is not already present.



Have a look at the data below which has 4 fields, OrderId,CustId,Val and OrderDate. Typically you will be comparing values based on a sequence of a date field. In this example we will apply first the window functions and secondly offset window functions.



OrderId CustId Val OrderDate
10782 12 18 2017-01-01
10807 12 23 2016-01-01
10586 12 28 2018-01-01
10767 28 30 2017-02-01
10898 28 33 2016-02-01
10900 28 36 2018-02-01
10883 36 36 2017-03-01
11051 36 40 2016-03-01
10815 36 45 2018-03-01



Lets first apply the window function query to our data, this will show how each treats data slightly differently.



SELECT 
ROW_NUMBER() OVER(ORDER BY VAL) AS ROWNUM,
RANK() OVER(ORDER BY VAL) AS RANK,
DENSE_RANK() OVER(ORDER BY VAL) AS DENSE_RANK,
NTILE(5) OVER(ORDER BY VAL) AS NTILE,
VAL

FROM #TEMP_ORDERVALUES
ORDER BY VAL



This will produce the result set



RowNum Rank Dense_Rank Ntile Val
1 1 1 1 18
2 2 2 1 23
3 3 3 2 28
4 4 4 2 30
5 5 5 3 33
6 6 6 3 36
7 6 6 3 36
8 8 7 4 40
9 9 8 5 45



Lets Examine row numbers 6,7 and 8 to gain a better understanding of the window functions.



RowNum Rank Dense_Rank Ntile Val
1 1 1 1 18
2 2 2 1 23
3 3 3 2 28
4 4 4 2 30
5 5 5 3 33
6 6 6 3 36
7 6 6 3 36
8 8 7 4 40
9 9 8 5 45



Firstly lets cover NTILE(), NTILE accepts an integer parameter which is the number of groups to partition by. For example I choose 5, so we can mod the number of rows by that parameter 9 % 5 = 1, we will have 4 equal groups (with 2 in this case) and 1 odd group with only one.



Next Rank, Rank will honor the ties in our data for example for rows 6 and 7 we have val of 36. Notice that rank honors the tie by assigning 6 to both rows but then skips the next rank and jumps straight to 8. This is essentially the different between RANK() and DENSE_RANK()



Lastly Dense_Rank will honor the ties in data and keep the sequence of rank instead of jumping to the next value. So instead of jumping to rank 8 for row number 8, it is assigned a dense_rank of 7, dense_rank will always have contigious values.



Last but not least is the LAG() and LEAD() functions, these are referred to as OFFSET WINDOW functions, so just imagine putting little windows inside of your result sets and that allow you to view the next logical iteration of a row. This iterations are decided by the PARTITION BY statement in the function, each call to this function will need a PARTITION BY and ORDER BY inside of the over clause. The Complete syntax looks as follows:



SELECT TOP 9999 CUSTID,
ORDERID,
VAL,
LAG(VAL) OVER(PARTITION BY CUSTID ORDER BY ORDERDATE,ORDERID) AS PREVVAL,
LEAD(VAL) OVER(PARTITION BY CUSTID ORDER BY ORDERDATE,ORDERID) AS NEXTVAL
FROM #TEMP_ORDERVALUES



Here we are saying, lets select the next VAL field, per CustId and order by OrderDate and OrderId (Ascending is assumed if you do not specify ASC or DESC). Lag Works the same way, a common hang up is the Order By Clause inside of the Partition, usually you can play around and figure out the correct combination of ASC DESC and PARTITION BY to achieve your reporting needs.

Here is the result set from the query above, notice that when the custID is the first in window the PREVVAL field is NULL and when it is the final VAL in window the NEXTVAL is NULL. This confirms that our window is setup correctly for CUSTID.



CustID OrderID Val PrevVal NextVal
12 10807 23 NULL 18
12 10782 18 23 28
12 10586 28 18 NULL
28 10898 33 NULL 30
28 10767 30 33 36
28 10900 36 30 NULL
36 11051 40 NULL 36
36 10883 36 40 45
36 10815 45 36 NULL



#https://rextester.com/PHHX28464

Ian Fogelman

Ian Fogelman

My name is Ian Fogelman. I like to develop data driven solutions with SQL Server, Python, .NET and predictive analytics.