Speed up dashboards and reduce warehouse costs by serving queries from pre-computed, materialized summaries.
Availability: Pre-aggregates are an Early Access feature available on Enterprise plans only.
Pre-aggregates let you define materialized summaries of your data directly in your dbt YAML. When a user runs a query in Lightdash, the system checks if the query can be answered from a pre-aggregate instead of querying your warehouse. If it matches, the query is served from the pre-computed results, making it significantly faster and reducing warehouse load.This is especially useful for dashboards with high traffic or expensive aggregations that don’t need real-time data.Any query that goes through the Lightdash semantic layer can hit a pre-aggregate — this includes the Lightdash app, the API, MCP, AI agents, the Embed SDK, and the React SDK.
Getting started
Define pre-aggregates in your dbt project and configure scheduling.
Monitoring and debugging
Track materialization status, debug query matching, and view hit/miss stats.
Define — You add a pre_aggregates block to your dbt model YAML, specifying which dimensions and metrics to include.
Materialize — Lightdash runs the aggregation query against your warehouse and stores the results. This happens automatically on compile, on a cron schedule you define, or when you trigger it manually.
Match — When a user runs a query, Lightdash checks if every requested dimension, metric, and filter is covered by a pre-aggregate.
Serve — If a match is found, the query is served from the materialized data instead of hitting your warehouse.
Suppose you have an orders table with thousands of rows, and you define a pre-aggregate with dimensions status and metrics total_amount (sum) and order_count (count), with a day granularity on order_date.Your warehouse data:
order_date
status
customer
amount
2024-01-15
shipped
Alice
$100
2024-01-15
shipped
Bob
$50
2024-01-15
pending
Charlie
$75
2024-01-16
shipped
Alice
$200
2024-01-16
pending
Charlie
$30
…
…
…
…
Lightdash materializes this into a pre-aggregate:
order_date_day
status
total_amount
order_count
2024-01-15
shipped
$150
2
2024-01-15
pending
$75
1
2024-01-16
shipped
$200
1
2024-01-16
pending
$30
1
Now when a user queries “total amount by status, grouped by month”, Lightdash re-aggregates from the daily pre-aggregate instead of scanning the full table:
order_date_month
status
total_amount
January 2024
shipped
$350
January 2024
pending
$105
This works because sum can be re-aggregated — summing daily sums gives the correct monthly sum.
When a user runs a query, Lightdash automatically checks if a pre-aggregate can serve the results. A pre-aggregate matches when all of the following are true:
Every dimension in the query is included in the pre-aggregate
Every metric in the query is included in the pre-aggregate
Every dimension used in filters is included in the pre-aggregate
If the pre-aggregate itself defines filters, the query must include an equivalent or narrower filter
The query does not contain custom dimensions, custom metrics, or table calculations
If the query uses a time dimension, the requested granularity must be equal to or coarser than the pre-aggregate’s granularity
A day pre-aggregate serves day, week, month, quarter, and year queries. A month pre-aggregate serves month, quarter, and year — but notday or week, since those need finer-grained data.
When multiple pre-aggregates match a query, Lightdash picks the smallest one (fewest dimensions, then fewest metrics as tiebreaker).
Pre-aggregates can define static filters in their YAML definition. This lets you materialize a smaller slice of data for a common query pattern, such as status = completed or order_date: inThePast 52 weeks.When a pre-aggregate has definition filters:
Matching queries must include the same filter or a narrower one
Queries without the filter, or with a broader or incompatible filter, fall back to another pre-aggregate or the warehouse
This prevents Lightdash from serving incomplete results from a filtered materialization.
Pre-aggregates are not compatible with personal warehouse connections. Materialization always runs under a single user’s credentials, so warehouse-level access rules are not applied per viewer. If you rely on personal warehouse connections to enforce data access, use results caching instead.
Not all metrics work this way. Consider count_distinct with the same daily pre-aggregate from above. If a daily pre-aggregate stores “2 distinct customers on 2024-01-15” and “1 distinct customer on 2024-01-16”, you can’t sum those to get the monthly distinct count — Alice ordered on both days and would be counted twice:
order_date_day
status
distinct_customers
2024-01-15
shipped
2 (Alice, Bob)
2024-01-16
shipped
1 (Alice)
Re-aggregating: 2 + 1 = 3, but the correct monthly answer is 2 (Alice, Bob). The pre-aggregate lost track of which customers were counted.We’re investigating supporting count_distinct through approximation algorithms. Follow this issue for updates.For similar reasons, the following metric types are also not supported:
Lightdash has two independent systems for speeding up queries: results caching and pre-aggregates. They work differently and are designed to be used together, not as replacements for each other.
Results caching stores the exact result of any query that runs through Lightdash, keyed by a hash of the generated SQL. The first time a query runs, Lightdash executes it against your warehouse and caches the result in S3. Subsequent identical queries are served from the cache until it expires (24 hours by default).Any change to the query — a different filter, column, limit, or user attribute — produces a new SQL hash, a new cache entry, and another warehouse query. Results caching covers every query shape, including custom metrics, table calculations, and SQL runner queries.See the caching guide for details.
Pre-aggregates are summary tables you define in your dbt YAML. Lightdash materializes them on a schedule (or on compile, or manually) and stores the results in S3. When a user query matches the pre-aggregate’s dimensions, metrics, filters, and granularity, Lightdash serves the query from the materialized data using in-memory DuckDB workers. The warehouse is not touched at query time, even on the first query.A single pre-aggregate can serve many different queries. A daily pre-aggregate with five dimensions can answer day, week, month, quarter, and year queries across any subset of those dimensions and with any narrower filter. Results caching, in contrast, needs one cache entry per unique SQL.
You have high-traffic dashboards with predictable query patterns
You want to reduce warehouse cost or improve latency on the first query, not just repeat visits
The metrics are re-aggregatable (sum, count, min, max, average)
You’re willing to design and schedule the materializations
Use results caching when:
Query patterns are ad-hoc or unpredictable
You need count_distinct, median, percentile, custom SQL metrics, table calculations, or custom dimensions/metrics
You’re using the SQL runner
You don’t want upfront configuration work
In most cases, both should be enabled. Pre-aggregates handle your heaviest, most predictable workloads. Results caching is the safety net for everything else.
When both systems are enabled, they act as two layers of caching. A query that matches a pre-aggregate is served from the materialized data by DuckDB workers. The result of that DuckDB query can then be stored in the results cache, so subsequent identical requests skip even the DuckDB step and return the cached result directly. This means pre-aggregates eliminate the warehouse hit, and results caching eliminates repeated computation on top of that.