Defining pre-aggregates
Pre-aggregates are defined under the pre_aggregates key in your model configuration.
If you’re using Lightdash YAML instead of dbt model YAML, see the Lightdash YAML syntax guide for the surrounding model structure.
dbt v1.9 and earlier
dbt v1.10+ and Fusion
Lightdash YAML
models:
- name: orders
meta:
pre_aggregates:
- name: orders_daily_by_status
dimensions:
- status
metrics:
- total_order_amount
- average_order_size
filters:
- order_date: inThePast 52 weeks
time_dimension: order_date
granularity: day
models:
- name: orders
config:
meta:
pre_aggregates:
- name: orders_daily_by_status
dimensions:
- status
metrics:
- total_order_amount
- average_order_size
filters:
- order_date: inThePast 52 weeks
time_dimension: order_date
granularity: day
type: model
name: orders
pre_aggregates:
- name: orders_daily_by_status
dimensions:
- status
metrics:
- total_order_amount
- average_order_size
filters:
- order_date: inThePast 52 weeks
time_dimension: order_date
granularity: day
Configuration reference
| Property | Required | Description |
|---|
name | Yes | Unique identifier for the pre-aggregate. Must contain only letters, numbers, and underscores. |
dimensions | Yes | List of dimension names to include. Must contain at least one dimension. |
metrics | Yes | List of metric names to include. Must contain at least one metric. |
filters | No | Static filters applied when materializing the pre-aggregate. Matching queries must include an equivalent or narrower filter to use this pre-aggregate. |
time_dimension | No | A time-based dimension for date grouping. Must be paired with granularity. |
granularity | No | Time granularity for the time_dimension. Valid values: hour, day, week, month, quarter, year. Must be paired with time_dimension. |
max_rows | No | Maximum number of rows to store in the materialization. If the aggregation exceeds this limit, the result is truncated. Must be a positive integer. |
refresh | No | Schedule configuration for automatic re-materialization. See Scheduling refreshes. |
materialization_role | No | Fixed user attributes to use when materializing the pre-aggregate. See Materialization role. |
If you specify time_dimension, you must also specify granularity, and vice versa.
Filtered pre-aggregates
Use filters when you want a pre-aggregate to materialize only a subset of the source data.
For example, this pre-aggregate only stores data for the last 52 weeks:
dbt v1.9 and earlier
dbt v1.10+ and Fusion
Lightdash YAML
models:
- name: orders
meta:
pre_aggregates:
- name: recent_orders_daily
dimensions:
- status
metrics:
- total_order_amount
- order_count
filters:
- order_date: inThePast 52 weeks
time_dimension: order_date
granularity: day
models:
- name: orders
config:
meta:
pre_aggregates:
- name: recent_orders_daily
dimensions:
- status
metrics:
- total_order_amount
- order_count
filters:
- order_date: inThePast 52 weeks
time_dimension: order_date
granularity: day
type: model
name: orders
pre_aggregates:
- name: recent_orders_daily
dimensions:
- status
metrics:
- total_order_amount
- order_count
filters:
- order_date: inThePast 52 weeks
time_dimension: order_date
granularity: day
This is useful when a rolling time window is queried frequently and deserves its own smaller materialization.
How query matching works with filters
Filtered pre-aggregates are only used when the query filters are compatible with the pre-aggregate definition:
- A query with the same or narrower filter can use the pre-aggregate
- A query without the filter, or with a broader or incompatible filter, falls back to another pre-aggregate or the warehouse
For the example above:
order_date inThePast 12 weeks can use the pre-aggregate
order_date inThePast 52 weeks can use the pre-aggregate
order_date inThePast 104 weeks cannot use the pre-aggregate
- no
order_date filter: cannot use the pre-aggregate
If a field is only used for filtering, you should still include it in the pre-aggregate’s dimensions list so Lightdash can match and re-aggregate queries correctly.
Multiple pre-aggregates per model
You can define multiple pre-aggregates on the same model, each targeting different query patterns. It is better to have multiple small, focused pre-aggregates rather than a single one containing all metrics and dimensions. Including too many dimensions increases the number of unique combinations, which generates large materialization files — this defeats the purpose of pre-aggregates, since they are meant to be smaller and faster than querying the warehouse directly.
For example, you might want a fine-grained daily pre-aggregate for detailed dashboards and a coarser monthly one for summary views:
models:
- name: orders
config:
meta:
pre_aggregates:
- name: orders_daily_by_status
dimensions:
- status
metrics:
- total_order_amount
- order_count
time_dimension: order_date
granularity: day
- name: orders_monthly_summary
dimensions:
- status
metrics:
- total_order_amount
time_dimension: order_date
granularity: month
max_rows: 1000000
When a query matches multiple pre-aggregates, Lightdash picks the smallest one.
Scheduling refreshes
By default, pre-aggregates are materialized when your dbt project compiles. You can also schedule automatic refreshes using cron expressions, using your project’s configured timezone (defaults to UTC):
pre_aggregates:
- name: orders_daily_by_status
dimensions:
- status
metrics:
- total_order_amount
time_dimension: order_date
granularity: day
refresh:
cron: "0 6 * * *" # Every day at 6:00 AM UTC
Materialization triggers
Pre-aggregates can be materialized through four different triggers:
| Trigger | When it happens |
|---|
| Compile | Automatically when your dbt project is compiled |
| Cron | On the schedule you define in refresh.cron |
| Manual | When you trigger a refresh from the Lightdash UI |
Row limits
You can set max_rows to cap the size of a materialization. If the aggregation produces more rows than the limit, the result is truncated.
When max_rows is applied, some data is excluded from the materialization. Queries that match the pre-aggregate may return incomplete results. Use this setting carefully and monitor for the “max rows applied” warning in the monitoring UI.
Materialization role
If your model uses user attributes in SQL, the user who triggers the materialization can affect what gets stored in the pre-aggregate.
For example, if the model has sql_filter: "customers.region IN (${ld.attr.allowed_regions})", the materialized table will only contain the regions that the admin who triggered the materialization happens to have access to. If that admin has allowed_regions: ["EMEA"], the pre-aggregate will only contain EMEA rows.
Use materialization_role to define a fixed set of user attributes for materializing the pre-aggregate. At query time, Lightdash ignores materialization_role and uses the real viewer user’s attributes instead, so a user with allowed_regions: ["EMEA"] still only sees EMEA rows.
dbt v1.9 and earlier
dbt v1.10+ and Fusion
models:
- name: orders
meta:
joins:
- join: customers
sql_on: ${customers.customer_id} = ${orders.customer_id}
sql_filter: "customers.region IN (${ld.attr.allowed_regions})"
pre_aggregates:
- name: orders_daily_by_region
dimensions:
- customers.region
metrics:
- total_order_amount
time_dimension: order_date
granularity: day
materialization_role:
email: materialize@acme.com
attributes:
allowed_regions:
- EMEA
- APAC
- NA
models:
- name: orders
config:
meta:
joins:
- join: customers
sql_on: ${customers.customer_id} = ${orders.customer_id}
sql_filter: "customers.region IN (${ld.attr.allowed_regions})"
pre_aggregates:
- name: orders_daily_by_region
dimensions:
- customers.region
metrics:
- total_order_amount
time_dimension: order_date
granularity: day
materialization_role:
email: materialize@acme.com
attributes:
allowed_regions:
- EMEA
- APAC
- NA
With this setup, the materialized table always contains EMEA, APAC, and NA rows regardless of who triggered the materialization. When a viewer queries the pre-aggregate, Lightdash applies their own user attributes to the query, not the materialization_role.
Complete example
Here’s a full model definition with a pre-aggregate, including joins, scheduling, and row limits:
dbt v1.9 and earlier
dbt v1.10+ and Fusion
Lightdash YAML
models:
- name: orders
meta:
joins:
- join: customers
sql_on: ${customers.customer_id} = ${orders.customer_id}
pre_aggregates:
- name: orders_daily_by_status
dimensions:
- status
- customers.country
metrics:
- total_order_amount
- average_order_size
filters:
- status: completed
time_dimension: order_date
granularity: day
max_rows: 5000000
refresh:
cron: "0 6 * * *"
columns:
- name: order_date
meta:
dimension:
type: date
- name: status
meta:
dimension:
type: string
- name: amount
meta:
metrics:
total_order_amount:
type: sum
average_order_size:
type: average
models:
- name: orders
config:
meta:
joins:
- join: customers
sql_on: ${customers.customer_id} = ${orders.customer_id}
pre_aggregates:
- name: orders_daily_by_status
dimensions:
- status
- customers.country
metrics:
- total_order_amount
- average_order_size
filters:
- status: completed
time_dimension: order_date
granularity: day
max_rows: 5000000
refresh:
cron: "0 6 * * *"
columns:
- name: order_date
config:
meta:
dimension:
type: date
- name: status
config:
meta:
dimension:
type: string
- name: amount
config:
meta:
metrics:
total_order_amount:
type: sum
average_order_size:
type: average
type: model
name: orders
joins:
- join: customers
sql_on: ${customers.customer_id} = ${orders.customer_id}
pre_aggregates:
- name: orders_daily_by_status
dimensions:
- status
- customers.country
metrics:
- total_order_amount
- average_order_size
filters:
- status: completed
time_dimension: order_date
granularity: day
max_rows: 5000000
refresh:
cron: "0 6 * * *"
dimensions:
- name: order_date
type: date
- name: status
type: string
metrics:
total_order_amount:
type: sum
sql: ${TABLE}.amount
average_order_size:
type: average
sql: ${TABLE}.amount
With this pre-aggregate, the following queries would be served from materialized data:
- Total order amount by status, grouped by day, week, month, or year
- Average order size by status, grouped by month
- Total order amount filtered to completed orders
- Order amount by customer country, grouped by quarter
These queries would not match and would query the warehouse directly:
- Queries including
count_distinct metrics
- Queries grouped by a dimension not in the pre-aggregate (for example,
customer_id)
- Queries with hourly granularity (finer than the pre-aggregate’s
day)
- Queries without
status = completed or with a broader status filter
- Queries with custom dimensions, custom metrics, or table calculations