Skip to content

Caching

Jeremy Echols edited this page Jun 9, 2020 · 5 revisions

RAIS Caching

info.json responses

We've implemented a simple LRU cache for info.json responses, which holds 10,000 entries by default. The cached data is extremely small, making this a very efficient cache: the data saved is under 50 bytes per info response.

The info.json data is very easy to generate, so the value of caching may seem questionable, but it can slightly reduce local file IO when traffic is heavy, and more importantly network IO if you use an object store for your images.

Image responses

The server can optionally cache generated tiles under specific circumstances, but doesn't inherently cache the other images such as thumbnails. Tiles which are requested at a width and height of 1024 or below, in JPG format, can be cached by setting TileCacheLen in /etc/rais.toml, or the RAIS_TILECACHELEN environment variable. This is disabled by default.

Setting the tile cache length to anything greater than zero will enable the cache.

Tile caching is generally only recommended for systems with a small number of images or systems that expect a lot of traffic to hit a small subset of the collection, such as might be the case if there's a few featured images. On our Historic Oregon Newspapers site, we have a 1,000-item cache (just in case of a large influx of traffic to a particular newspaper), and it's typically only getting hit on 2% of all requests.

If you don't have a lot of extra RAM and your collection usage is fairly random, it's best to avoid a cache. But if you have some extra RAM, it can be valuable to create a small tile cache even on large collections just to better handle an unexpected influx of traffic to a small number of images, such as you might expect if part of your collection gets featured in an online exhibit.

Thumbnails

For resize requests such as thumbnails, caching is very beneficial, but for now RAIS doesn't try to accommodate this. For our needs, Apache handles this well enough, and if we needed something more powerful, we'd probably look at dedicated cache systems like varnish.

Note that RAIS returns a valid Last-Modified header based on the last time the JP2 file changed, which a cache can use to determine if RAIS should be hit.

Quick refresher: a IIIF URL looks like this:

{scheme}://{server}{/prefix}/{identifier}/{region}/{size}/{rotation}/{quality}.{format}

For thumbnail requests, the "region" is typically "full", and "size" is typically going to be some small width following by a comma, and an empty height. For instance:

{scheme}://{server}{/prefix}/{identifier}/full/1000,/{rotation}/{quality}.{format}

It should be fairly easy to cache a URL like this in a dedicated cache application, though we haven't actually done this ourselves.

Apache has a very simple cache module, mod_disk_cache. However, it caches by prefix, meaning all IIIF URLs or none. We got around this by making our thumbnail requests all use a different prefix than the rest of the IIIF requests use. Once that was in place, we configured a simple mod_disk_cache in Apache:

# Cache thumbnails (and only thumbnails)
CacheRoot /var/cache/httpd/mod_disk_cache
CacheEnable disk /images/resize

# Allow a total of 4096 content directories at two levels so we never have
# more than 64 directories in any other directory.  If we cache a million
# thumbnails, we'll still only end up with about 250 files per content
# directory.
CacheDirLength 1
CacheDirLevels 2

# Change !RAIS_HOST! below to serve tiles and thumbnails from RAIS
AllowEncodedSlashes NoDecode
ProxyPassMatch ^/images/resize/([^/]*)/full/([0-6][0-9][0-9],.*jpg)$ http://!RAIS_HOST!:12415/images/iiif/$1/full/$2 nocanon
ProxyPassMatch ^/images/iiif/(.*(jpg|info\.json))$ http://!RAIS_HOST!:12415/images/iiif/$1 nocanon

This setup splits thumbnail requests (up to 699 pixels wide) from tile requests, letting us cache thumbnails on disk for a much longer time than RAIS would store any tile in memory.

This won't be the smartest cache, but it will help when search results pages are used on large collections. It is highly advisable that the htcacheclean tool be used in tandem with Apache cache directives, and it's probably worth reading the Apache caching guide.