Skip to content

Tilda Aggregate And Window Functions

Laurent Hasson edited this page Jun 22, 2022 · 1 revision

<-- View Columns

Standard Use Case

When defining a view, one of the most powerful, and most often used, concept is that of aggregates. This is especially important for analytical workloads. The simplest example is to think about an Orders table containing customer orders. A typical design is to have a date, a total amount and a customer id for example. In practice, there would be of course additional fields such as the store where the purchase occurred, and a way to list the individual items purchased. But for the purpose of this example, let's just simplify to look at a main orders table.

orderRefnum dt amt
111 2022-04-05 100.00
112 2022-04-08 150.00
113 2022-04-10 126.00
114 2022-04-16 98.00
115 2022-04-20 300.00
222 2022-05-09 175.00
223 2022-05-10 300.00
224 2022-05-23 265.00

A common task is to use aggregates to simply get the number of orders, total sales and average order amt per month.

select date_trunc('month', dt)::DATE, count(*), sum(amt), avg(amt)
  from Orders
 group by 1 
 order by 1

This will return 2 rows, one per month:

m count sum avg
2022-04 5 774.00 154.80
2022-05 3 740.00 246.67

Aggregate Vs. Window functions

Postgres has a nice write-up here https://www.postgresql.org/docs/current/tutorial-window.html explaining this difference. In very simplistic ways, aggregates group multiple rows together as one final row, whereas window functions take values from multiple rows but flattens those values for every original row.

If you use a window function, it is as if the calculation happened separately and is then added to every row. Contrary to the first "aggregate" example above, let's say we want to be able to see all orders along with the monthly amount average:

select orderRefnum, dt, amt
     , avg(amt) over(partition by date_trunc('month', dt)) month_avg
  from Orders
 order by 2

There is no group by and logically, it's replaced by the "partition by" clause. The results are as follows:

orderRefnum dt amt month_avg
111 2022-04-05 100.00 154.80
112 2022-04-08 150.00 154.80
113 2022-04-10 126.00 154.80
114 2022-04-16 98.00 154.80
115 2022-04-20 300.00 154.80
222 2022-05-09 175.00 246.67
223 2022-05-10 300.00 246.67
224 2022-05-23 265.00 246.67

Tilda Syntax

With Tilda views, you can use aggregate/window functions in simple and powerful ways:

// An aggregate view column
{ "sameAs":"someSchema.SomeTable.someColumnInt" 
 ,"name": "someColumnIntX"
 ,"description":"ColumnInt X blah blah blah"
 ,"aggregate": "SUM"|"AVG"|"MIN"|"FIRST"|"MAX"|"LAST"
              |"DEV"|"VAR"|"COUNT"|"ARRAY"|"ARRAYCAT"
              |"ROW_NUMBER"|"RANK"|"PERCENT_RANK"
              |"LEAD"|"LAG"|"NTH_VALUE"
 ,"orderBy":["col1","col2"]
 ,"partitionBy":["col3","col4"]
 ,"range": "((UNBOUNDED|\\d+|CURRENT)\\s+(PRECEDING|FOLLOWING|ROW))\\s+AND\\s+((UNBOUNDED|\\d+|CURRENT)\\s+(PRECEDING|FOLLOWING|ROW))"
 ,"distinct": true|false
 ,"filter": "\"colA\" > 10"
}

Aggregate and Window functions have been unified syntactically. The way "lag" or "max" are defined is the same, but semantically, they are quite different. In Tilda, a function is interpreted as an aggregates or a window by the following conventions:

  • The functions SUM, AVG, MIN, FIRST, MAX, LAST, DEV, VAR, COUNT, ARRAY and ARRAYCAT are by default normal aggregates.
  • The functions ROW_NUMBER, RANK, PERCENT_RANK, LEAD, LAG and NTH_VALUE are always window functions and will require "partitionBy" to be specified, and optionally, "range".
  • If you specify the "partitionBy" attribute, the function used will always be so as a window function.
  • If you specify the "partitionBy" attribute, the "range" attribute is optional and obviously, you cannot specify a range without a partition.

For the two examples above, the Tilda syntax would then be as follows:

    // Aggregate function
    { "name":"Order_View"
     ,"description":"blah blah"
     ,"columns":[
          { "sameas":"Orders.dt"          , "expression":"date_trunc('month', ?)", "type":"DATE" }
         ,{ "sameas":"Orders.amt"         , "name":"monthAvgAmt"
                                          , "aggregate":"AVG"
                                          , "description":"blah blah."
          }
       ]
    }

    // Window function
    { "name":"Order_View"
     ,"description":"blah blah"
     ,"columns":[
          { "sameas":"Orders.orderRefnum"  }
         ,{ "sameas":"Orders.dt"           }
         ,{ "sameas":"Orders.amt"          }
         ,{ "sameas":"Orders.amt"         , "name":"monthAvgAmt"
                                          , "aggregate":"AVG", "partitionBy":["date_trunc('month', dt)"]
                                          , "description":"blah blah."
          }
       ]
    }

Additional attributes

Additionally, you can define aggregate attributes:

  • aggregate: SUM|AVG|MIN|FIRST|MAX|LAST|DEV|VAR|COUNT|ARRAY|ARRAYCAT|ROW_NUMBER|RANK|PERCENT_RANK|LEAD|LAG|NTH_VALUE. Some aggregates may not be supported by some databases by default and may need to be created. For example, FIRST and LAST are not supported out of the box in Postgres and are defined in the supporting Tilda SQL functions. ARRAY for example is not supported in SQLServer and may instead return a comma-separated concatenation of the data retrieved.
  • partitionBy: the partitioning columns.
  • range: an expression denoting the range (aka frame) of the window, which is documented in most database documentation (e.g., https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS for Postgres). Possible values are:
    • UNBOUNDED PRECEDING AND CURRENT: the default if unspecified, implements a running sum for example, including the current row
    • 10 PRECEDING and 10 FOLLOWING: applies the aggregate to the previous and following 10 rows
    • 30 PRECEDING and 1 PRECEDING: applies the aggregate to the previous 30 rows, not including the current one
  • orderBy: an optional array of 1 or more columns to order the results in the aggregate if applicable, e.g., first, last, array.
  • distinct: true|false, false by default, indicating whether the results in the aggregate should be unique if applicable, e.g., first, last, array.
  • filter: a boolean expression using SQL syntax that can be used to filter the results for an aggregate, for example, "\"colA\" > 10". The expression provided is used as is and can therefore make use of database-specific functionality. The filter feature is in and of itself not supported in all databases: it is in Postgres but not in SQLServer for example. Note that column names must be escaped properly.

🎈 NOTE: Since you can use any valid code in the filter, Tilda can't validate it. If incorrect, you will get an error when migrating your schema and the view is created.

🎈 NOTE: If you define an expression, it takes precedence, and any aggregate will operate over the final value.

🎈 NOTE: Coalesce happens at the final level and will consume any value returned by an expression or an aggregate (or combination).

🎈 NOTE: If you use aggregates, these must be defined last in the list of columns. Tilda will enforce this and will automatically handle any group-by clause in the final SQL based on the non-aggregate columns provided.

🎈 NOTE: Window functions such as ROW_NUMBER, RANK, PERCENT_RANK, LEAD, LAG and NTH_VALUE are new to Tilda 2.1.

🎈 NOTE: Although it is encouraged to use the default names for each function, some habits die hard. Some of those functions are not standardized and have different names across different databases. As such, there are aliases available for those who have amultiple database personality disorder that they experience daily:

  • ROW_NUMBER: ROW_NUM, ROWNUMBER, ROWNUM.
  • ARRAY_CAT: ARRAYCAT.
  • PERCENT_RANK: PERCENTRANK, PCT_RANK, PCTRANK, RANK_PERCENT, RANKPERCENT, RANK_PCT, RANKPCT.
  • NTH_VALUE: NTH.
Clone this wiki locally