Skip to main content

Cache common queries

The dbt Semantic Layer allows you to cache common queries in order to speed up performance and reduce compute on expensive queries.

There are two different types of caching:

While you can use caching to speed up your queries and reduce compute time, knowing the difference between the two depends on your use case: result caching happens automatically by leveraging your data platform's cache, while declarative caching allows you to 'declare' the queries you specifically want to cache. With declarative caching, you need to anticipate which queries you want to cache.

Prerequisites

Result caching

Result caching leverages your data platform’s built-in caching layer and features. MetricFlow generates the same SQL for multiple query requests, this means it can take advantage of your data platform’s cache. Double-check your data platform's specifications.

Here's how caching works, using Snowflake as an example, and should be similar across other data platforms:

  1. Run from cold cache When you run a semantic layer query from your BI tool that hasn't been executed in the past 24 hours, the query scans the entire dataset and doesn't use the cache.
  2. Run from warm cache If you rerun the same query after 1 hour, the SQL generated and executed on Snowflake remains the same. On Snowflake, the result cache is set per user for 24 hours, which allows the repeated query to use the cache and return results faster.

Different data platforms might have different caching layers and cache invalidation rules. Here's a list of resources on how caching works on some common data platforms:

Declarative caching

Declarative caching enables you to pre-warm the cache using saved queries by setting the cache config to true in your saved_queries settings. This is useful for optimizing performance for key dashboards or common ad-hoc query requests. For configuration details, refer to Declarative caching setup.

📹 Check out this video demo to see how declarative caching works!

This video demonstrates the concept of declarative caching, how to run it using the dbt Cloud scheduler, and how fast your dashboards load as a result.

How declarative caching works:

  • Make sure your saved queries YAML configuration file has exports defined.
  • Running a saved query triggers the dbt Semantic Layer to:
    • Build a cached table from a saved query, with exports defined, into your data platform.
    • Make sure any query requests that match the saved query's inputs use the cache, returning data more quickly.
    • Automatically invalidates the cache when it detects new and fresh data in any upstream models related to the metrics in your cached table.
    • Refreshes (or rebuilds) the cache the next time you run the saved query.

Refer to the following diagram, which illustrates what happens when the dbt Semantic Layer receives a query request:

Overview of the declarative cache query flowOverview of the declarative cache query flow

Declarative caching setup

To populate the cache, you need to configure an export in your saved query YAML file configuration and set the cache config to true. You can't cache a saved query without an export defined.

semantic_model.yml
saved_queries:
- name: my_saved_query
... # Rest of the saved queries configuration.
config:
cache:
enabled: true # Set to true to enable, defaults to false.
exports:
- name: order_data_key_metrics
config:
export_as: table

To enable saved queries at the project level, you can set the saved-queries configuration in the dbt_project.yml file. This saves you time in configuring saved queries in each file:

dbt_project.yml
saved-queries:
my_saved_query:
config:
+cache:
enabled: true

Run your declarative cache

After setting up declarative caching in your YAML configuration, you can now run exports with the dbt Cloud job scheduler to build a cached table from a saved query into your data platform.

  • Use exports to set up a job to run a saved query dbt Cloud.
  • The dbt Semantic Layer builds a cache table in your data platform in a dedicated dbt_sl_cache schema.
  • The cache schema and tables are created using your deployment credentials. You need to grant read access to this schema for your Semantic Layer user.
  • The cache refreshes (or rebuilds) on the same schedule as the saved query job.
Overview of the cache creation flow.Overview of the cache creation flow.

After a successful job run, you can go back to your dashboard to experience the speed and benefits of declarative caching.

Cache management

dbt Cloud uses the metadata from your dbt model runs to intelligently manage cache invalidation. When you start a dbt job, it keeps track of the last model runtime and checks the freshness of the metrics upstream of your cache.

If an upstream model has data in it that was created after the cache was created, dbt Cloud invalidates the cache. This means queries won't use outdated cases and will instead query directly from the source data. Stale, outdated cache tables are periodically dropped and dbt Cloud will write a new cache the next time your saved query runs.

You can manually invalidate the cache through the dbt Semantic Layer APIs using the InvalidateCacheResult field.

0