Varnish 4.0: Allow Clients to Specify Cache TTL

Varnish is an in-memory cache tailored for use with HTTP servers. Perhaps appropriately for such a narrow usage, the Varnish Configuration Language (VCL) used to specify server behavior is opinionated and restricted in its structure. While you can always drop down into writing C code sections inside VCL documents when you want to do something complicated, that is rarely a good plan unless you know exactly what you are doing. It requires a good understanding of the Varnish internals, and C code is always going to be more fragile than simply using the VCL as it is written. So if you must create something out of the ordinary, it is worth spending a little time to see if you can twist the VCL into accomplishing your goal.

For me, a recent example of something out of the ordinary for Varnish was a need to allow client requests to specify an acceptable time to live (TTL) for cached content on a request by request basis. This makes absolutely no sense in the standard use case for Varnish, in which it sits in front of a web server and the owner controls TTL settings in order to tune load and cache characteristics. But there are any number of other scenarios in which Varnish can be a useful piece of equipment provided that the client can override the default server TTL. How do we go about doing this, however?

Considering the Constraints

At the time of writing Varnish 4.0 has only standard modules available: the rich ecosystem of additions and convenient extra features available for Vanish 3.* has yet to catch up. Here are a couple of important things to understand about vanilla VCL:

Writing a VCL configuration file is a matter of defining standard functions that will be invoked at various points in the process of serving a cached request.
A set of built-in objects with typed attributes are available in the scope of these functions, but not all are available to any given function.
There are no variables and no ability to create other objects.
There is no ability to create new attributes on existing objects anywhere other than as request headers, and those headers must be strings.

Thus trying to build non-standard behaviors in Varnish is something of a puzzle game: the developer versus the constraints of the VCL, especially when it comes to passing typed information from one function definition to another.

Example VCL

One approach is for a TTL to be accepted as a client request header and then compared against cached responses. E.g. via the command line:

curl -H "X-Cache-TTL-Requested: 300s" http://localhost

In order to achieve this end, the example below takes over over the obj.keep property. It no longer serves the intended purpose in a standard Varnish setup, but is here instead a reference value that allows us to construct the length of time that a cached object has been present in memory. Time spent in cache is not normally directly available in vanilla Varnish, but it is is needed here in order to compare against the client's provided TTL.

#
# Example Varnish VCL file.
#
# This is not a complete file, but rather only illustrates how to allow clients to
# specify an acceptable TTL on a request by request basis. If any client triggers
# a refetch by providing a short enough TTL, then cache is of course updated
# for all clients.
#

# Tell the VCL compiler that this is the new 4.0 format.
vcl 4.0;

# We need access to functions in the standard VMOD library, particularly those
# used to convert between value types.
import std;

# ---------------------------------------------------------------------------
# Backends.
# ---------------------------------------------------------------------------

# A simple example backend status: connect to another server on this same
# machine.
backend server {
  .host = "127.0.0.1";
  .port = "8080";
  .max_connections = 300;
  .first_byte_timeout = 10s;
  .connect_timeout = 5s;
  .between_bytes_timeout = 2s;
  .probe = {
    .url = "/status";
    .interval = 5s;
    .timeout = 5s;
    .window = 5;
    .threshold = 3;
  }
}

# ---------------------------------------------------------------------------
# vcl_recv
# ---------------------------------------------------------------------------

# Invoked before the server checks to see whether a response is cached
# already.
sub vcl_recv {

  #
  # A lot of boilerplate is inserted into vcl_recv in most Varnish configuration
  # files: acceptable request methods to cache, sorting out cookies, changing
  # behavior by path, and so on. For the sake of clarity we'll skip that here.
  #
  # See this for some examples:
  #
  # https://github.com/mattiasgeniar/varnish-4.0-configuration-templates
  #
  # Insert the standard Varnish boilerplate here.
  #

  # Create a status page to use for health checks made against this server.
  # The page will look a little strange, but that doesn't matter so long as it
  # provides a 200 response.
  if (req.url ~ "^/varnish-status") {
    return (synth(200, "OK."));
  }

  # In this example there is only one backend.
  set req.backend_hint = server;

  # Sort out TTL for caching based on the request headers.
  #
  # Note that req.http.* is copied into bereq.http.*, and we make use of this
  # in carrying this TTL information through to the caching process.
  #
  # But first we need to convert the string type header into a duration type.
  set req.ttl = std.duration(req.http.X-Cache-TTL, 300s);
  # Now there is an implicit cast going on here that converts the duration into
  # a string representation of a float.
  set req.http.X-Cache-TTL-Requested = req.ttl;

  # If we've got this far, then treat this as a cacheable request.
  return (hash);
}

# ---------------------------------------------------------------------------
# vcl_hash
# ---------------------------------------------------------------------------

# Invoked after vcl_recv to create a hash value for the request.
#
# This is used as a key to look up the object in Varnish.
sub vcl_hash {
  # Most uses of Varnish will add more than just this to the hash.
  hash_data(req.url);
}

# ---------------------------------------------------------------------------
# vcl_hit
# ---------------------------------------------------------------------------

# Invoked when a cache lookup is successful.
sub vcl_hit {

  # Here set up some debugging headers that will be passed back out to
  # the client in the HTTP response.
  set req.http.X-Cache-Keep = obj.keep;
  set req.http.X-Cache-TTL-Remaining = obj.ttl;
  set req.http.X-Cache-Age = obj.keep - obj.ttl;

  # Different requests ask for the same object but with different TTLs. We are
  # repurposing obj.keep to store the original requested TTL. Thus we have
  # (obj.keep - obj.ttl) as the time spent in storage.
  #
  # A straightforward cache hit, so deliver it.
  if (obj.keep - obj.ttl <= req.ttl) {
    set req.http.X-Cache-Result = "hit";
    return (deliver);
  }

  # If it isn't a straightforward cache hit, now things become more complex. See:
  # https://www.varnish-cache.org/docs/trunk/users-guide/vcl-grace.html
  #
  # When several clients are requesting the same page Varnish will send one
  # request to the backend and place the others on hold while fetching one copy
  # from the backend. In some products this is called request coalescing and
  # Varnish does this automatically.
  #
  # If you are serving thousands of hits per second the queue of waiting
  # requests can get huge. There are two potential problems - one is a
  # thundering herd problem, suddenly releasing a thousand threads to serve
  # content might send the load sky high. Secondly nobody likes to wait. To
  # deal with this we can instruct Varnish to keep the objects in cache beyond
  # their TTL and to serve the waiting requests somewhat stale content.
  #
  # Since we have no fresh cached object then let us look at the stale ones.
  #
  # The backend is healthy, so use the cache but limit additional stale age to
  # 10 seconds.
  if ( std.healthy(req.backend_hint) ) {
    if (obj.keep - obj.ttl - 10s <= req.ttl) {
      set req.http.X-Cache-Result = "hit-with-slight-grace";
      return (deliver);
    }
    # No candidate for grace. Fetch a fresh object.
    else {
      set req.http.X-Cache-Result = "stale-hit-so-fetch";
      return(fetch);
    }
  }
  # The backend is unhealthy, so use full grace.
  else {
    if (obj.keep - obj.ttl - obj.grace <= req.ttl) {
      set req.http.X-Cache-Result = "hit-with-full-grace";
      return (deliver);
    }
    else {
      # This will of course result in an error response since the backend is
      # unhealthy.
      set req.http.X-Cache-Result = "stale-hit-so-fetch";
      return (fetch);
    }
  }
}

# ---------------------------------------------------------------------------
# vcl_miss
# ---------------------------------------------------------------------------

# Invoked after a cache lookup if the requested document was not found in
# the cache. Its purpose is to decide whether or not to attempt to retrieve the
# document from the backend.
sub vcl_miss {
  # We can end up here after a hit and then fetch.
  if (!req.http.X-Cache-Result) {
    set req.http.X-Cache-Result = "miss-so-fetch";
  }
  return (fetch);
}

# ---------------------------------------------------------------------------
# vcl_backend_response
# ---------------------------------------------------------------------------

# Invoked after the response headers have been successfully retrieved from
# the backend.
sub vcl_backend_response {
  # Allow stale content, in case the backend goes down.
  # Tell Varnish to keep all objects for 6 hours beyond their TTL.
  set beresp.grace = 6h;

  # We are abusing reresp.keep as a way to keep track of the actual requested
  # timeout and thus generate the age of the cached response.
  #
  # Note that req.http.X-Cache-TTL always exists as at least a default value,
  # and is automatically copied into bereq, which is why we can use it here.
  # The req object cannot be referenced in this function.
  set beresp.keep = std.duration(bereq.http.X-Cache-TTL, 300s);
  set beresp.ttl = beresp.keep;

  return (deliver);
}

# ---------------------------------------------------------------------------
# vcl_deliver
# ---------------------------------------------------------------------------

# Invoked when we deliver the HTTP request to the user. This is the last
# chance to modify headers sent to the client.
sub vcl_deliver {
  # Various headers set earlier for debugging purposes.
  if (req.http.X-Cache-Keep) {
    set resp.http.X-Cache-Keep = req.http.X-Cache-Keep;
  }
  if (req.http.X-Cache-TTL-Remaining) {
    set resp.http.X-Cache-TTL-Remaining = req.http.X-Cache-TTL-Remaining;
  }
  if (req.http.X-Cache-Age) {
    set resp.http.X-Cache-Age = req.http.X-Cache-Age;
  }
  if (req.http.X-Cache-TTL-Requested) {
    set resp.http.X-Cache-TTL-Requested = req.http.X-Cache-TTL-Requested;
  }
  if (req.http.X-Cache-Result) {
    set resp.http.X-CacheResult = req.http.X-Cache-Result;
  }

  # Note that obj.hits behaviour changed in 4.0, now it counts per objecthead,
  # not per object. Also obj.hits may not be reset in some cases where bans
  # are in use. So take hits with a grain of salt. All in all, this isn't useful at all
  # for debugging in 4.0.
  # set resp.http.X-Cache-Hits = obj.hits;

  return (deliver);
}

Function Invocation Order

It is useful to keep in mind the implicit order in which the functions defined are invoked in various circumstances. Remember that there are numerous other functions as well that happen not to be used here, such as vcl_fetch for example:

Request that is not yet cached:

vcl_recv
vcl_hash
vcl_miss
vcl_backend_response
vcl_deliver

Request reading from cache:

vcl_recv
vcl_hash
vcl_hit
vcl_deliver

Request reading from cache but where the vcl_hit code requires a cache update:

vcl_recv
vcl_hash
vcl_hit
vcl_miss
vcl_backend_response
vcl_deliver