Caching expensive WP API requests

A while back I needed to setup a custom WordPress REST API endpoint to provide a headless app with data that required some expensive database queries. The use-case for the application was such that it would likely be used and reused many times by the end user, requiring them to hit the endpoint each time.

So, how can we take a situation like this and make it as efficient and performant as possible?

One thing to consider is a multi-layered caching approach.

The first layer should be some kind of in-memory storage like Redis.

Lots of managed WordPress hosting services will be setup to allow object caching. If your server supports it, a Redis server can be setup quite easily using a plugin like this. Usually these object cache plugins will automatically store expensive database queries and WordPress Transients in memory, but we're going to use the functions wp_cache_get and wp_cache_set for interacting with WP_Object_Cache which gives us more explicit control over our cached data and can be used for persistence if you use a plugin that supports persistent caching.

The second layer is in the user's browser, storing data and determining if it has gone stale and needs to be refreshed from the server.

The server cache

So first step, we'll need to get our data and put it in the cache in the first place.

Let's say we are getting some data from 3 different post types - recipes, cook_methods, and ingredients.

First we'll need a function to get the modified times of posts from those post types and sort them to get the time of the most recent modification:

/**
 * Returns the latest modified unix timestamp of all posts in the provided post types.
 *
 * @param mixed $post_types array (multiple) or string (singular) of post types to get timestamps from.
 *
 * @return mixed The latest timestamp.
 */
function get_last_modified_time( array|string $post_types ): string|int|false {
    // Query arguments, get all the posts of the relevant post types.
    // https://10up.github.io/Engineering-Best-Practices/php/
    $args = [
        'post_type'              => $post_types,
        'posts_per_page'         => 5000, // phpcs:ignore
        'no_found_rows'          => true,
        'update_post_term_cache' => false,
    ];

    // Array to store all the modified times.
    $modified_times = [];

    // Start query.
    $last_mod_query = new WP_Query( $args );
    if ( $last_mod_query->have_posts() ) {
        while ( $last_mod_query->have_posts() ) {
            $last_mod_query->the_post();

            $modified_times[] = get_post_modified_time();
        }
    }

    // Sort modified times in DESC order.
    rsort( $modified_times );

    // Return the first one (last modified time).
    return $modified_times[0];
}

Now we're going to need to create WP REST API responses with the data that our endpoint provides. It's worth abstracting this into it's own function since we're going to be using it to return either cached or uncached data and setting the headers each time gets a bit repetitious.

/**
 * Returns an instance of WP_REST_Response constructed with the provided output and custom ETag and Last Modified headers.
 *
 * @param array $data               Array of data to be output as the REST response data and hashed for use as ETag header.
 * @param mixed $last_modified_time Timestamp to be converted and used for Last Modified header.
 *
 * @return WP_REST_Response Data endpoint response.
 */
function create_REST_response( array $data, string|int $last_modified_time ): WP_REST_Response {
    // Format timestamp for Last-Modified header.
    $last_modified_formatted = gmdate( 'D, d M Y H:i:s', $last_modified_time ) . ' GMT';
    // Format data for ETag header.
    $etag_value = md5( json_encode( $data ) );
    // Wrap in double quotes for ETag header format.
    $etag = '"' . strval( $etag_value ) . '"';

    // Create the response object.
    $response = new WP_REST_Response( $data );
    // Set the Last Modified header.
    $response->header( 'Last-Modified', $last_modified_formatted );
    // Set the ETag header.
    $response->header( 'ETag', $etag );

    return $response;
}

We'll also need a main function for the endpoint itself. This function should do the following:

  • Get the last modified times for our post types we want to query.
  • Check that against the timestamp from our Redis cache.
  • If it's the same - return the cached data and don't bother with the expensive query.
  • If a post has been modified since the cache was built, run the expensive query again and replace the cached data with the new results.
/**
 * Returns an instance of WP_REST_Response with either fresh or cached content.
 *
 * @return WP_REST_Response Endpoint response.
 */
function custom_endpoint(): WP_REST_Response {
    $cache_last_modified_key = 'my_cache_last_modified_key';
    $cache_key               = 'my_cache_key';

    /**
     * Get the real last modified time of posts in our post types, and compare that with the cached one.
     * If they match (no new uncached data), just send the cached data.
     */

    $last_modified_time = intval( // Ensure Integer.
        $this->get_last_modified_time( [ 'recipes', 'cook_methods', 'ingredients' ] );
    );

    $cached_last_modified_time = intval( wp_cache_get( $cache_last_modified_key ) );

    if ( $cached_last_modified_time === $last_modified_time ) {
        // Get the cached data.
        $data = wp_cache_get( $cache_key );

        // Return the response.
        $response = $this->create_REST_response( $data, $last_modified_time );

		return $response;
    }


    /**
     * Otherwise (there's new changes since the cache was built), do the expensive query, rebuild the cache, send the response.
     */

    // Do your expensive query or operation.
    $expensive_query_result = ...

    // Cache the result.
    wp_cache_set( $cache_key, $expensive_query_result );
    wp_cache_set( $cache_last_modified_key, $last_modified_time );

    // Return the REST response.
    $response = $this->create_REST_response( $data, $last_modified_time );

    return $response;
}

Now we have the functionality for:

  • Getting our data via our expensive query.
  • Storing it in a fast cache and rebuilding that cache only when necessary due to content changes.
  • Setting the appropriate headers on the REST API response to allow our client side application to determine whether there's new data.

Let's move onto the frontend.

The client cache

Remember the idea for this app is that it's something people are likely to reuse. With that in mind, here's the goal for the client side application.

The first time the user uses the app, it will need to hit WP for the data.

The data and last modified time will then be stored in the users browser. Localstorage will suffice here.

The next time the user opens the app, the app will make a special kind of request to the WordPress endpoint called an OPTIONS request. An OPTIONS request is a way of fetching only the headers from the server and not the full response, so it's much faster and only a tiny amount of data is transferred.

Since we have the last modified time returned by our endpoint in the HTTP headers, we can use that to determine whether the content in the browser is stale and therefore needs us to run the full request.

// Checks HTTP headers from endpoint to determine if there's content on the server we don't have.
async function isDataStale() {
  const localLastModified = localStorage.getItem("apiDataLastModified")
  const localTimersData = localStorage.getItem("apiData")

  // localStorage items haven't been setup yet, consider stale.
  if (!localLastModified || !localTimersData) {
    return true
  }

  // Check headers for last modified date.
  const responseHead = await fetch("https://example.com/my-endpoint", {
    method: "HEAD",
  })
  const remoteLastModified = new Date(
    responseHead.headers.get("Last-Modified")
  ).getTime()

  // Last modified dates dont match, needs updating.
  if (parseInt(localLastModified) !== parseInt(remoteLastModified)) {
    return true
  }

  // Otherwise, we're ok to use the stored data.
  return false
}

This function is all we need to determine if the data we have stored in the browser is out of date.

If the value returned by isDataStale is false, then there's no point running the full request to the server because we know for a fact that we already have the most recent data in localStorage.

Summary

So to summarise, the main concepts here are:

  • Manually caching expensive query results on the server, using in-memory caching solutions like Redis, Memcached or SQLite.
  • Using an inexpensive query to validate whether the cache needs to be refreshed.
  • Using HTTP headers to provide the client with the information it needs to implement it's own caching.
  • Using an in-browser storage solution like localStorage to store data locally on the user's device.
  • Using an OPTIONS request to determine whether the local data is stale and needs refreshing from the server.

With a setup like this we can be sure that most of our users will either be getting the data from our server's in-memory cache (fast), or from their browser's localStorage (faster). Only occasionally will a single user be the unlucky one who triggers the full query and causes the cache to be rebuilt. With some creativity and extra work this could also be avoided by serving the stale content to that user and triggering the cache rebuild in the background if it suits your use case.

There are many other caching strategies that differ from what I've outlined here, for example you could use stale-while-revalidate on the client side, or Transients if you want your cached data to be invalidated at regular intervals. It's definitely worth exploring different caching strategies and knowing how to create your own caching setup when needed. Ultimately the right approach depends on your specific situation.