Skip to main content

Defining pre-aggregates

Pre-aggregates are defined under the pre_aggregates key in your model configuration. If you’re using Lightdash YAML instead of dbt model YAML, see the Lightdash YAML syntax guide for the surrounding model structure.
models:
  - name: orders
    meta:
      pre_aggregates:
        - name: orders_daily_by_status
          dimensions:
            - status
          metrics:
            - total_order_amount
            - average_order_size
          filters:
            - order_date: inThePast 52 weeks
          time_dimension: order_date
          granularity: day

Configuration reference

PropertyRequiredDescription
nameYesUnique identifier for the pre-aggregate. Must contain only letters, numbers, and underscores.
dimensionsYesList of dimension names to include. Must contain at least one dimension.
metricsYesList of metric names to include. Must contain at least one metric.
filtersNoStatic filters applied when materializing the pre-aggregate. Matching queries must include an equivalent or narrower filter to use this pre-aggregate.
time_dimensionNoA time-based dimension for date grouping. Must be paired with granularity.
granularityNoTime granularity for the time_dimension. Valid values: hour, day, week, month, quarter, year. Must be paired with time_dimension.
max_rowsNoMaximum number of rows to store in the materialization. If the aggregation exceeds this limit, the result is truncated. Must be a positive integer.
refreshNoSchedule configuration for automatic re-materialization. See Scheduling refreshes.
materialization_roleNoFixed user attributes to use when materializing the pre-aggregate. See Materialization role.
If you specify time_dimension, you must also specify granularity, and vice versa.

Filtered pre-aggregates

Use filters when you want a pre-aggregate to materialize only a subset of the source data. For example, this pre-aggregate only stores data for the last 52 weeks:
models:
  - name: orders
    meta:
      pre_aggregates:
        - name: recent_orders_daily
          dimensions:
            - status
          metrics:
            - total_order_amount
            - order_count
          filters:
            - order_date: inThePast 52 weeks
          time_dimension: order_date
          granularity: day
This is useful when a rolling time window is queried frequently and deserves its own smaller materialization.

How query matching works with filters

Filtered pre-aggregates are only used when the query filters are compatible with the pre-aggregate definition:
  • A query with the same or narrower filter can use the pre-aggregate
  • A query without the filter, or with a broader or incompatible filter, falls back to another pre-aggregate or the warehouse
For the example above:
  • order_date inThePast 12 weeks can use the pre-aggregate
  • order_date inThePast 52 weeks can use the pre-aggregate
  • order_date inThePast 104 weeks cannot use the pre-aggregate
  • no order_date filter: cannot use the pre-aggregate
If a field is only used for filtering, you should still include it in the pre-aggregate’s dimensions list so Lightdash can match and re-aggregate queries correctly.

Multiple pre-aggregates per model

You can define multiple pre-aggregates on the same model, each targeting different query patterns. It is better to have multiple small, focused pre-aggregates rather than a single one containing all metrics and dimensions. Including too many dimensions increases the number of unique combinations, which generates large materialization files — this defeats the purpose of pre-aggregates, since they are meant to be smaller and faster than querying the warehouse directly. For example, you might want a fine-grained daily pre-aggregate for detailed dashboards and a coarser monthly one for summary views:
models:
  - name: orders
    config:
      meta:
        pre_aggregates:
          - name: orders_daily_by_status
            dimensions:
              - status
            metrics:
              - total_order_amount
              - order_count
            time_dimension: order_date
            granularity: day
          - name: orders_monthly_summary
            dimensions:
              - status
            metrics:
              - total_order_amount
            time_dimension: order_date
            granularity: month
            max_rows: 1000000
When a query matches multiple pre-aggregates, Lightdash picks the smallest one.

Scheduling refreshes

By default, pre-aggregates are materialized when your dbt project compiles. You can also schedule automatic refreshes using cron expressions, using your project’s configured timezone (defaults to UTC):
pre_aggregates:
  - name: orders_daily_by_status
    dimensions:
      - status
    metrics:
      - total_order_amount
    time_dimension: order_date
    granularity: day
    refresh:
      cron: "0 6 * * *"  # Every day at 6:00 AM UTC

Materialization triggers

Pre-aggregates can be materialized through four different triggers:
TriggerWhen it happens
CompileAutomatically when your dbt project is compiled
CronOn the schedule you define in refresh.cron
ManualWhen you trigger a refresh from the Lightdash UI

Row limits

You can set max_rows to cap the size of a materialization. If the aggregation produces more rows than the limit, the result is truncated.
When max_rows is applied, some data is excluded from the materialization. Queries that match the pre-aggregate may return incomplete results. Use this setting carefully and monitor for the “max rows applied” warning in the monitoring UI.

Materialization role

If your model uses user attributes in SQL, the user who triggers the materialization can affect what gets stored in the pre-aggregate. For example, if the model has sql_filter: "customers.region IN (${ld.attr.allowed_regions})", the materialized table will only contain the regions that the admin who triggered the materialization happens to have access to. If that admin has allowed_regions: ["EMEA"], the pre-aggregate will only contain EMEA rows. Use materialization_role to define a fixed set of user attributes for materializing the pre-aggregate. At query time, Lightdash ignores materialization_role and uses the real viewer user’s attributes instead, so a user with allowed_regions: ["EMEA"] still only sees EMEA rows.
models:
  - name: orders
    meta:
      joins:
        - join: customers
          sql_on: ${customers.customer_id} = ${orders.customer_id}
      sql_filter: "customers.region IN (${ld.attr.allowed_regions})"
      pre_aggregates:
        - name: orders_daily_by_region
          dimensions:
            - customers.region
          metrics:
            - total_order_amount
          time_dimension: order_date
          granularity: day
          materialization_role:
            email: materialize@acme.com
            attributes:
              allowed_regions:
                - EMEA
                - APAC
                - NA
With this setup, the materialized table always contains EMEA, APAC, and NA rows regardless of who triggered the materialization. When a viewer queries the pre-aggregate, Lightdash applies their own user attributes to the query, not the materialization_role.

Complete example

Here’s a full model definition with a pre-aggregate, including joins, scheduling, and row limits:
models:
  - name: orders
    meta:
      joins:
        - join: customers
          sql_on: ${customers.customer_id} = ${orders.customer_id}
      pre_aggregates:
        - name: orders_daily_by_status
          dimensions:
            - status
            - customers.country
          metrics:
            - total_order_amount
            - average_order_size
          filters:
            - status: completed
          time_dimension: order_date
          granularity: day
          max_rows: 5000000
          refresh:
            cron: "0 6 * * *"
    columns:
      - name: order_date
        meta:
          dimension:
            type: date
      - name: status
        meta:
          dimension:
            type: string
      - name: amount
        meta:
          metrics:
            total_order_amount:
              type: sum
            average_order_size:
              type: average
With this pre-aggregate, the following queries would be served from materialized data:
  • Total order amount by status, grouped by day, week, month, or year
  • Average order size by status, grouped by month
  • Total order amount filtered to completed orders
  • Order amount by customer country, grouped by quarter
These queries would not match and would query the warehouse directly:
  • Queries including count_distinct metrics
  • Queries grouped by a dimension not in the pre-aggregate (for example, customer_id)
  • Queries with hourly granularity (finer than the pre-aggregate’s day)
  • Queries without status = completed or with a broader status filter
  • Queries with custom dimensions, custom metrics, or table calculations