How does fio measure latency

Along with bandwidth and IOPS, latency is a frequently reported performance metric. How quickly a storage device can respond to requests has implications for data center deployments. Because of how consequential these latency measurements can be, it is useful to have a clear understanding of how exactly fio measures latency.

Fio reports up to three latency measurements. The table below lists them and paraphrases the descriptions from the documentation.

Measurement	Abbreviation	Description
Submission latency	slat	time required to submit an I/O
Completion latency	clat	time required to complete an I/O after it has been submitted
Total latency	lat	time from when the I/O was created to when it was completed

How does fio actually compute these measurements? This discussion will be based on fio 3.30. The starting point for latency measurement is struct io_u which fio uses to keep track of information for each I/O unit created. The first two members of this structure are timestamps start_time and issue_time:

/*
 * The io unit
 */
struct io_u {
        struct timespec start_time;
        struct timespec issue_time;
<snip>
}

These timestamps are used in a straightforward way to calculate the three latency measurements:

Figure 1 Fio latency definitions

Submission latency is the difference between issue_time and start_time.
Completion latency is the difference between completion time and issue_time.
Total latency is the difference between completion time and start_time.
There is no need to persistently store completion time because it is not needed after completion and total latency are calculated.

Latency measurement for asynchronous I/O

There are many variations of how fio uses these timestamps to produce latency measurements, but let us begin with the steps for an asynchronous I/O. Below is a selective call tree for latency measurement for an asynchronous I/O:

do_io
   get_io_u				# start_time set
   io_u_submit
      td_io_queue
         io_ops->queue			# call ioengine's queue()
         td_io_commit 			# if submission batch size met
            io_ops->commit		# if ioengine's commit() is defined
               record issue_time	# not all async ioengines do this
               io_u_queued		# record submission latency for ioengines recording issue_time
         record issue_time
   io_queue_event
      io_u_queued			# record submission latency only for ioengines *without* commit function
   wait_for_completions			# only called when it is time to begin reaping I/O
      io_u_queued_complete
         td_io_getevents
            io_ops->getevents		# get completions from ioengine
         init_icd			# record completion time
         ios_completed
            io_completed
               account_io_completion	# calculate completion and total latency

When fio's main I/O worker function do_io() creates an I/O for submission, it calls get_io_u() which sets the io_u's start_time member. After some housekeeping related to error handling and verification, io_u_submit() calls td_io_queue() which hands off the io_u to the ioengine's queue function. Asynchronous ioengines have either a two-step queue and commit I/O submission process or they submit I/Os with a single-step queue function call.

Asynchronous ioengines with commit functions

For asynchronous ioengines with commit functions (e.g., io_uring, libaio), the commit function is called to actually submit the I/O to the operating system. For the io_uring, io_uring_cmd, and libaio ioengines, the io_u's issue_time is then recorded. These ioengines also call io_u_queued() to calculate and record the submission latency measurement. If the batch submit size is greater than one, multiple I/Os may be queued before the ioengine's commit function is called.

However, some asynchronous ioengines with commit functions do not set issue_time within their commit functions (e.g., nfs, null). These ioengines have issue_time recorded within td_io_queue(). Also, submission latency is not calculated or reported for these ioengines.

Asynchronous ioengines without commit functions

For asynchronous ioengines without commit functions, where submission is accomplished in a single step in the ioengine's queue function (e.g., rados, rbd, windowsaio, xvnme), submission latency is calculated in a slightly different way. The io_u's issue_time is recorded within td_io_queue() (in the same way as this is done for ioengines with commit functions that do not record issue_time) and eventually the submission latency measurement is calculated and recorded within io_u_queued().

When it is time to begin checking for I/O completions, inside do_io()'s main loop is a call to the aptly named wait_for_completions(). This sets off a chain of function calls where the ioengine's getevents handler is called to obtain completed I/Os, init_icd() is called to record the completion time, and finally account_io_completion() is called to calculate and record the completion and total latencies.

Latency measurement for synchronous I/O

Synchronous ioengines begin at the same point as asynchronous ioengines with do_io() creating an I/O for submission by calling get_io_u() which sets the io_u's start_time. A selective call tree for synchronous ioengine latency measurement is below:

do_io
  get_io_u			# start_time set
  io_u_submit
    td_io_queue
      record issue_time
      io_ops->queue		# call ioengine's queue()
  io_queue_event
    io_u_sync_complete
      init_icd			# record completion time
      io_completed
        account_io_completion	# calculate completion and total latency

Once the io_u is initialized fio does some housekeeping and then records the io_u's issue_time shortly before handing off the I/O to the synchronous ioengine's queue handler. Notice that issue_time is recorded before calling the ioengine's queue function whereas for asynchronous ioengines issue_time is recorded after the call to queue (and if it exists a call to the ioengine's commit). After the synchronous ioengine has submitted the I/O to the operating system and control has returned to fio, execution reaches io_queue_event(). Then eventually init_icd() is called to record a completion time stamp and then io_completed() calls account_io_completion() to calculate completion and total latency.

This latency measurement pathway is much simpler than it was for asynchronous ioengines because I/Os have already been completed by the time the ioengine's queue function returns. Note that submission latency is never calculated for synchronous ioengines because it is not reported and submission latency would only encompasses fio's housekeeping activities. Also note that the above call tree describes only the most common case where there are no errors and a request is fully completed.

Latency measurement for built-in ioengines

The table below classifies each of the built-in ioengines by how latency is measured. Ioengines are either synchronous, asynchronous with no commit, asynchronous with commit and submission latency reported, or asynchronous with commit and submission latency not measured.

	Latency measurement
ioengine	sync	async
cpuio	sync
dev-dax	sync
dfs		without commit
e4defrag	sync
exec	sync
falloc	sync
filecreate	sync
filedelete	sync
filestat	sync
ftruncate	sync
gfapi	sync
gfapi_async		without commit
http	sync
ime_psync	sync
ime_psyncv	sync
ime_aio		with commit, without slat
io_uring		with commit and slat
io_uring_cmd		with commit and slat
libaio		with commit and slat
libcufile	sync
libhdfs	sync
libpmem	sync
librpma_apm_client		with commit and slat
librpma_apm_server	sync
librpma_gpspm_client		with commit and slat
librpma_gpspm_server	sync
libzbc	sync
mmap	sync
mtd	sync
nbd		without commit
net	sync
netsplice	sync
nfs		with commit, without slat
null	sync for qd 1	async for qd > 1: with commit, without slat
cpp_null		with commit, without slat
pmemblk	sync
posixaio		without commit
rados		without commit
rdma		with commit and slat
sg	sync
solarisaio		without commit
splice	sync
sync	sync
psync	sync
vsync	sync
pvsync	sync
pvsync2	sync
windowsaio		without commit
xnvme		without commit

Summary

The diagram below summarizes fio's latency measurement pathways. It contains the main branches for the different paths and illustrates the different points where timestamps are recorded and latencies calculated. This diagram has been updated to include the two outcomes for asynchronous ioengines with commit functions that do and do not record issue_time.

Figure 2 Fio latency summary

Observations

(Updated) For io_uring, io_uring_cmd, and libaio, issue_time is actually assigned twice, once inside the ioengine code and a second time inside td_io_queue(). This appears to be unintentional and I will be submitting a patch to clean this up. Update: patches 1 and 2 were merged.

Based on how submission, completion, and total latency are calculated the sum of mean submission latency and mean completion latency should be equal to the mean total latency, but if fio sets issue_time twice then this equality does not hold.

The listing below is the result of issuing a single I/O, and when we evaluate the relationship among submission, completion, and total latency we see that 61424 + 242709 != 308346. The sum is actually 304133 which means that about 4000ns were lost.

user@ubuntu:~/fio-dev$ fio-canonical/fio --name=test --ioengine=io_uring --number_ios=1 --rw=randread --size=1M
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=1
fio-3.30-48-g26fa
Starting 1 process

test: (groupid=0, jobs=1): err= 0: pid=172145: Mon Jun  6 16:12:42 2022
  read: IOPS=1000, BW=4000KiB/s (4096kB/s)(4096B/1msec)
    slat (nsec): min=61424, max=61424, avg=61424.00, stdev= 0.00
    clat (nsec): min=242709, max=242709, avg=242709.00, stdev= 0.00
     lat (nsec): min=308346, max=308346, avg=308346.00, stdev= 0.00
<snip>

The listing below has issue_time filled in only once via the ioengine code and now the equality does hold: 53701 + 259566 = 313267.

user@ubuntu:~/fio-dev$ fio-latency/fio --name=test --ioengine=io_uring --number_ios=1 --rw=randread --size=1M
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=1
fio-3.30-48-g26fa-dirty
Starting 1 process

test: (groupid=0, jobs=1): err= 0: pid=172220: Mon Jun  6 16:12:47 2022
  read: IOPS=1000, BW=4000KiB/s (4096kB/s)(4096B/1msec)
    slat (nsec): min=53701, max=53701, avg=53701.00, stdev= 0.00
    clat (nsec): min=259566, max=259566, avg=259566.00, stdev= 0.00
     lat (nsec): min=313267, max=313267, avg=313267.00, stdev= 0.00
<snip>

Some asynchronous ioengines (io_uring, libaio) support trim commands but trim commands are synchronous operations. We have the flag FIO_ASYNCIO_SYNC_TRIM to identify these ioengines and fio follows the synchronous latency measurement path when trim commands are handled by these ioengines. This leads to ugly code like:

if (!td_ioengine_flagged(td, FIO_SYNCIO) &&
        (!td_ioengine_flagged(td, FIO_ASYNCIO_SYNC_TRIM) ||
         io_u->ddir != DDIR_TRIM)) {

We should have a helper to identify when these ioengines are handling trim commands to make this code easier to read. Update: patch was merged.

We should update the documentation to make explicit that the timer begins counting when fio initializes an io_u. Submission latency for asynchronous ioengines is more than the time the operating system takes to queue an I/O. Submission latency also includes housekeeping that fio does. Update: patch was merged.
The documentation for completion latency says that it is very close to zero for synchronous ioengines. This appears to be a typographical error and is likely meant to convey instead that submission latency is very close to zero. Update: patch was merged.
cscope is immensely useful for navigating fio's codebase. While writing this blog post it was very helpful to use cscope to be able to quickly see all of the places where start_time and issue_time were touched.
How is latency measured with options like io_uring's sqthread_poll? This option has a polling thread in the kernel monitor the submission queue for new requests. We should think about how to handle this case appropriately for submission latency measurement and confirm that fio is doing the right thing.
Update since fio 3.30: The nfs ioengine was changed to eliminate its unneeded commit function. So it now reports submission latency. Patch is here.
Update since fio 3.30: The null ioengine when it is asynchronous due to a queue depth greater than one was changed to record issue_time like the io_uring and libaio ioengines do. Patch is here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly