Skip to content

Commit

Permalink
Rework query results caching docs, add generated examples
Browse files Browse the repository at this point in the history
Partially apply changes to 3.11
  • Loading branch information
Simran-B committed Nov 29, 2024
1 parent ba1110d commit 10d8c88
Show file tree
Hide file tree
Showing 19 changed files with 1,229 additions and 458 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -21,48 +21,48 @@ are not part of a cluster setup.

The cache can be operated in the following modes:

- `off`: the cache is disabled. No query results will be stored
- `on`: the cache will store the results of all AQL queries unless their `cache`
attribute flag is set to `false`
- `demand`: the cache will store the results of AQL queries that have their
`cache` attribute set to `true`, but will ignore all others
- `off`: The cache is disabled. No query results are stored.
- `on`: The cache stores the results of all AQL queries unless the `cache`
query option is set to `false`.
- `demand`: The cache stores the results of AQL queries that have the
`cache` query option set to `true` but ignores all others.

The mode can be set at server startup and later changed at runtime.
The mode can be set at server startup as well as at runtime, see
[Global configuration](#global-configuration).

## Query eligibility

The query results cache will consider two queries identical if they have exactly the
The query results cache considers two queries identical if they have exactly the
same query string and the same bind variables. Any deviation in terms of whitespace,
capitalization etc. will be considered a difference. The query string will be hashed
and used as the cache lookup key. If a query uses bind parameters, these will also be hashed
and used as part of the cache lookup key.

That means even if the query strings of two queries are identical, the query results
cache will treat them as different queries if they have different bind parameter
values. Other components that will become part of a query's cache key are the
`count`, `fullCount` and `optimizer` attributes.

If the cache is turned on, the cache will check at the very start of execution
whether it has a result ready for this particular query. If that is the case,
the query result will be served directly from the cache, which is normally
very efficient. If the query cannot be found in the cache, it will be executed
capitalization etc. is considered a difference. The query string is hashed
and used as the cache lookup key. If a query uses bind parameters, these are also
hashed and used as part of the cache lookup key.

Even if the query strings of two queries are identical, the query results cache
treats them as different queries if they have different bind parameter
values. Other components that become part of a query's cache key are the
`count`, `fullCount`, and `optimizer` attributes.

If the cache is enabled, it is checked whether it has a result ready for a
particular query at the very start of processing the query request. If this is
the case, the query result is served directly from the cache, which is normally
very efficient. If the query cannot be found in the cache, it is executed
as usual.

If the query is eligible for caching and the cache is turned on, the query
result will be stored in the query results cache so it can be used for subsequent
If the query is eligible for caching and the cache is enabled, the query
result is stored in the query results cache so it can be used for subsequent
executions of the same query.

A query is eligible for caching only if all of the following conditions are met:

- the server the query executes on is a single server (i.e. not part of a cluster)
- the query string is at least 8 characters long
- the query is a read-only query and does not modify data in any collection
- no warnings were produced while executing the query
- the query is deterministic and only uses deterministic functions whose results
are marked as cacheable
- the size of the query result does not exceed the cache's configured maximal
size for individual cache results or cumulated results
- the query is not executed using a streaming cursor
- The server the query executes on is a single server (i.e. not part of a cluster).
- The query is a read-only query and does not modify data in any collection.
- No warnings were produced while executing the query.
- The query is deterministic and only uses deterministic functions whose results
are marked as cacheable.
- The size of the query result does not exceed the cache's configured maximal
size for individual cache results or cumulated results.
- The query is not executed using a streaming cursor (`"stream": true` query option).

The usage of non-deterministic functions leads to a query not being cacheable.
This is intentional to avoid caching of function results which should rather
Expand All @@ -85,8 +85,8 @@ remove, truncate operations as well as AQL data-modification queries).
**Example**

If the result of the following query is present in the query results cache,
then either modifying data in collection `users` or in collection `organizations`
will remove the already computed result from the cache:
then either modifying data in the `users` or `organizations` collection
removes the already computed result from the cache:

```aql
FOR user IN users
Expand All @@ -95,42 +95,42 @@ FOR user IN users
RETURN { user: user, organization: organization }
```

Modifying data in other collections than the named two will not lead to this
Modifying data in other unrelated collections does not lead to this
query result being removed from the cache.

## Performance considerations

The query results cache is organized as a hash table, so looking up whether a query result
is present in the cache is relatively fast. Still, the query string and the bind
parameter used in the query will need to be hashed. This is a slight overhead that
will not be present if the cache is turned off or a query is marked as not cacheable.
is present in the cache is fast. Still, the query string and the bind
parameter used in the query need to be hashed. This is a slight overhead that
is not present if the cache is disabled or a query is marked as not cacheable.

Additionally, storing query results in the cache and fetching results from the
cache requires locking via an R/W lock. While many thread can read in parallel from
cache requires locking via a read/write lock. While many thread can read in parallel from
the cache, there can only be a single modifying thread at any given time. Modifications
of the query cache contents are required when a query result is stored in the cache
or during cache invalidation after data-modification operations. Cache invalidation
will require time proportional to the number of cached items that need to be invalidated.
requires time proportional to the number of cached items that need to be invalidated.

There may be workloads in which enabling the query results cache will lead to a performance
There may be workloads in which enabling the query results cache leads to a performance
degradation. It is not recommended to turn the query results cache on in workloads that only
modify data, or that modify data more often than reading it. Turning on the cache
will also provide no benefit if queries are very diverse and do not repeat often.
In read-only or read-mostly workloads, the cache will be beneficial if the same
modify data, or that modify data more often than reading it. Enabling the cache
also provides no benefit if queries are very diverse and do not repeat often.
In read-only or read-mostly workloads, the cache is beneficial if the same
queries are repeated lots of times.

In general, the query results cache will provide the biggest improvements for queries with
In general, the query results cache provides the biggest improvements for queries with
small result sets that take long to calculate. If query results are very big and
most of the query time is spent on copying the result from the cache to the client,
then the cache will not provide much benefit.
then the cache does not provide much benefit.

## Global configuration

The query results cache can be configured at server start using the configuration parameter
`--query.cache-mode`. This will set the cache mode according to the descriptions
above.
The query results cache can be configured at server start with the
[`--query.cache-mode`](../../components/arangodb-server/options.md#--querycache-mode)
startup option.

After the server is started, the cache mode can be changed at runtime as follows:
The cache mode can also be changed at runtime using the JavaScript API as follows:

```js
require("@arangodb/aql/cache").properties({ mode: "on" });
Expand All @@ -139,10 +139,10 @@ require("@arangodb/aql/cache").properties({ mode: "on" });
The maximum number of cached results in the cache for each database can be configured
at server start using the following configuration parameters:

- `--query.cache-entries`: maximum number of results in query result cache per database
- `--query.cache-entries-max-size`: maximum cumulated size of results in query result cache per database
- `--query.cache-entry-max-size`: maximum size of an individual result entry in query result cache
- `--query.cache-include-system-collections`: whether or not to include system collection queries in the query result cache
- `--query.cache-entries`: The maximum number of results in the query results cache per database
- `--query.cache-entries-max-size`: The maximum cumulated size of results in the query results cache per database
- `--query.cache-entry-max-size`: The maximum size of an individual result entry in query results cache
- `--query.cache-include-system-collections`: Whether to include system collection queries in the query results cache

These parameters can be used to put an upper bound on the number and size of query
results in each database's query cache and thus restrict the cache's memory consumption.
Expand All @@ -158,44 +158,47 @@ require("@arangodb/aql/cache").properties({
});
```

The above will limit the number of cached results in the query results cache to 200
results per database, and to 8 MB cumulated query result size per database. The maximum
size of each query cache entry is restricted to 8MB. Queries that involve system
The above settings limit the number of cached results in the query results cache to 200
results per database, and to 8 MiB cumulated query result size per database. The maximum
size of each query cache entry is restricted to 1 MiB. Queries that involve system
collections are excluded from caching.

You can also change the configuration at runtime with the
[HTTP API](../../develop/http-api/queries/aql-query-results-cache.md).

## Per-query configuration

When a query is sent to the server for execution and the cache is set to `on` or `demand`,
the query executor will look into the query's `cache` attribute. If the query cache mode is
`on`, then not setting this attribute or setting it to anything but `false` will make the
query executor consult the query cache. If the query cache mode is `demand`, then setting
the `cache` attribute to `true` will make the executor look for the query in the query cache.
When the query cache mode is `off`, the executor will not look for the query in the cache.
the query executor checks the query's `cache` option. If the query cache mode is
`on`, then not setting this query option or setting it to anything but `false` makes the
query executor consult the query results cache. If the query cache mode is `demand`, then setting
the `cache` option to `true` makes the executor look for the query in the query results cache.
When the query cache mode is `off`, the executor does not look for the query in the cache.

The `cache` attribute can be set as follows via the `db._createStatement()` function:

```js
var stmt = db._createStatement({
query: "FOR doc IN users LIMIT 5 RETURN doc",
cache: true /* cache attribute set here */
});
options: {
cache: true
}
});

stmt.execute();
```

When using the `db._query()` function, the `cache` attribute can be set as follows:

```js
db._query({
query: "FOR doc IN users LIMIT 5 RETURN doc",
cache: true /* cache attribute set here */
});
db._query("FOR doc IN users LIMIT 5 RETURN doc", {}, { cache: true });
```

The `cache` attribute can be set via the HTTP REST API `POST /_api/cursor`, too.
You can also set the `cache` query option in the
[HTTP API](../../develop/http-api/queries/aql-queries.md#create-a-cursor).

Each query result returned will contain a `cached` attribute. This will be set to `true`
if the result was retrieved from the query cache, and `false` otherwise. Clients can use
Each query result returned contain a `cached` attribute. It is set to `true`
if the result was retrieved from the query results cache, and `false` otherwise. Clients can use
this attribute to check if a specific query was served from the cache or not.

## Query results cache inspection
Expand All @@ -207,7 +210,7 @@ The contents of the query results cache can be checked at runtime using the cach
require("@arangodb/aql/cache").toArray();
```

This will return a list of all query results stored in the current database's query
This returns a list of all query results stored in the current database's query
results cache.

The query results cache for the current database can be cleared at runtime using the
Expand All @@ -221,5 +224,5 @@ require("@arangodb/aql/cache").clear();

Query results that are returned from the query results cache may contain execution statistics
stemming from the initial, uncached query execution. This means for a cached query results,
the *extra.stats* attribute may contain stale data, especially in terms of the *executionTime*
and *profile* attribute values.
the `extra.stats` attribute may contain stale data, especially in terms of the `executionTime`
and `profile` attribute values.
50 changes: 24 additions & 26 deletions site/content/3.11/aql/how-to-invoke-aql/with-arangosh.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,26 +195,6 @@ db._query(
).toArray(); // Each batch needs to be fetched within 5 seconds
```

#### `cache`

Whether the AQL query results cache shall be used. If set to `false`, then any
query cache lookup is skipped for the query. If set to `true`, it leads to the
query cache being checked for the query **if** the query cache mode is either
set to `on` or `demand`.

```js
---
name: 02_workWithAQL_cache
description: ''
---
db._query(
'FOR i IN 1..20 RETURN i',
{},
{ cache: true },
{}
); // result may get taken from cache
```

#### `memoryLimit`

To set a memory limit for the query, pass `options` to the `_query()` method.
Expand Down Expand Up @@ -274,12 +254,30 @@ don't need to set it on a per-query level.

#### `cache`

If you set `cache` to `true`, this puts the query result into the query result cache
if the query result is eligible for caching and the query cache is running in demand
mode. If set to `false`, the query result is not inserted into the query result
cache. Note that query results are never inserted into the query result cache if
the query result cache is disabled, and that they are automatically inserted into
the query result cache if it is active in non-demand mode.
Whether the [AQL query results cache](../execution-and-performance/caching-query-results.md)
shall be used for adding as well as for retrieving results.

If the query cache mode is set to `demand` and you set the `cache` query option
to `true` for a query, then its query result is cached if it's eligible for
caching. If the query cache mode is set to `on`, query results are automatically
cached if they are eligible for caching unless you set the `cache` option to `false`.

If you set the `cache` option to `false`, then any query cache lookup is skipped
for the query. If you set it to `true`, the query cache is checked a cached result
**if** the query cache mode is either set to `on` or `demand`.

```js
---
name: 02_workWithAQL_cache
description: ''
---
var resultCache = require("@arangodb/aql/cache");
resultCache.properties({ mode: "demand" });
~resultCache.clear();
db._query("FOR i IN 1..5 RETURN i", {}, { cache: true }); // Adds result to cache
db._query("FOR i IN 1..5 RETURN i", {}, { cache: true }); // Retrieves result from cache
db._query("FOR i IN 1..5 RETURN i", {}, { cache: false }); // Bypasses the cache
```

#### `fillBlockCache`

Expand Down
Loading

0 comments on commit 10d8c88

Please sign in to comment.