Caching

Table of Contents

Overview

The performance of websites can be greatly improved by caching — however before doing so, profound knowledge of the topic is required.

There are only two hard things in Computer Science: cache invalidation and naming things.
— Phil Karlton

Technically, caching in the context of websites can be divided into server side caching, shared proxy caching and private browser caching. In all cases, additional storage is used to keep information that has already been computed or rendered, so that the work does not have to be done again, and content can be viewed faster by the users.

When looking at server side caching, when a web page is requested, the server checks if it already has the fully rendered page in its cache. This is called a cache hit and the server can serve the page directly from the cache without having to load content from another source, or spending computing time for rendering the HTML markup. When the page is not already available in the cache, this is called a cache miss. In this case, the server will prepare the page to be served, but also stores it in the cache to speed up future requests. 

How the server performs the cache lookup depends heavily on the project setup, but will usually include information like the URL of the web page as well as authentication data (if applicable).

For client side caching, a user’s browser can also store elements which have been loaded from the server and are reused in several pages. Examples for this would by stylesheets, JavaScript files or images. Ideally, the browser will only request those files from the server again, if it has reason to believe that their content has changed.

Shared proxy caches act as a gateway between the user and the source server, storing (or caching) the server’s resources. They will not be covered in more detail in this document.

Server Side Caching

Web Pages

While server side caching usually means serving a complete rendered web page directly from a cache, the portals in Gentics Content Management Platform internally use caching in different manners to reduce requests to Gentics Mesh or other data sources (e.g. by caching responses to GraphQL requests which load navigation information).

Details vary depending on which Gentics Portal is being used: 

Gentics Portal | java

    will cache rendered pages in a disc backed cache (meaning the cache will persist through restarts of the portal), unless caching is explicitly disabled in the configuration.

Gentics Portal | php

    a web cache can be configured using the caching capabilities of the web server used to run the page (e.g. Apache’s mod_cache).

In both cases, if it has been requested before, a requested page is not rendered again but served directly from the cache instead.

Images

Images resized by the Gentics Image Store will also be cached by the server so that following requests do not have to compute the resized image again.  

Client Side Caching

A web server delivering content to a client may indicate that the user’s browser is allowed to store the content and reuse it again under certain conditions for a specific amount of time. This is done by making use of the Cache-Control, Last-Modified and ETag HTTP headers, as described below:

Controlling Freshness

The server can tell the client for how long a resource is valid and may be used without checking. For example, setting Cache-Control: max-age=3600 tells the browser that the response can be reused without checking for the next hour, before it becomes stale.[1]

To configure the Cache-Control header for static files, binaries and rendered pages, please consult the documentation of your Gentics Portal.

Controlling Validation

Once a resource is stale (because it was last requested more than max-age seconds ago), the browser can ask the server to validate that the resource is still valid. This is achieved either by setting the Last-Modified or ETag[2] HTTP headers.

When a client like the web browser needs to validate that a resource is still fresh, it can send a so called conditional request with a If-Modified-Since or If-None-Match header respectively. In the first case the browser says "Send me the resource again, if it changed since the Last-Modified timestamp", and in the second case it says "Send me the resource again, if its ETag is no longer the one you sent last time".

In either case the server will respond with the status code 302 Not Modified and an empty body, if the resource is still valid (usually resetting the freshness information), or otherwise send the changed resource.

In our experience, rules for validation are highly project and content type specific, therefore setting these headers cannot always be done automatically by Gentics Content Management Platform. Instead, we support the whole variety of reasonable use cases. Please refer to the documentation of your Gentics Portal for further information.

Summary

The Gentics Content Management Platform offers server side caching configuration options and supports providing the necessary HTTP headers to enable client side caching — and it even takes care of many use cases automatically. Please refer to the documentation of your Gentics Portal for details.  

Further Reading


1. The Expires HTTP header serves a similar purpose, but is obsolete in the presence of a Cache-Control: max-age directive.
2. An ETag is a relatively short string which changes, whenever the contents of a resource changes.