-
Notifications
You must be signed in to change notification settings - Fork 2
How does fio measure latency
Along with bandwidth and IOPS, latency is a frequently reported performance metric. How quickly a storage device can respond to requests has implications for data center deployments. Because of how consequential these latency measurements can be, it is useful to have a clear understanding of how exactly fio measures latency.
Fio reports up to three latency measurements. The table below lists them and paraphrases the descriptions from the documentation.
Measurement | Abbreviation | Description |
---|---|---|
Submission latency | slat | time required to submit an I/O |
Completion latency | clat | time required to complete an I/O after it has been submitted |
Total latency | lat | time from when the I/O was created to when it was completed |
How does fio actually compute these measurements? This discussion will be based
on fio 3.30. The starting point
for latency measurement is struct io_u
which fio uses to keep track of
information for each I/O unit created. The first two members of this structure
are timestamps start_time
and issue_time
:
/*
* The io unit
*/
struct io_u {
struct timespec start_time;
struct timespec issue_time;
<snip>
}
These timestamps are used in a straightforward way to calculate the three latency measurements:
- Submission latency is the difference between
issue_time
andstart_time
. - Completion latency is the difference between completion time and
issue_time
. - Total latency is the difference between completion time and
start_time
. - There is no need to persistently store completion time because it is not needed after completion and total latency are calculated.
There are many variations of how fio uses these timestamps to produce latency measurements, but let us begin with the steps for an asynchronous I/O. Below is a selective call tree for latency measurement for an asynchronous I/O:
do_io
get_io_u # start_time set
io_u_submit
td_io_queue
io_ops->queue # call ioengine's queue()
td_io_commit # if submission batch size met
io_ops->commit # if ioengine's commit() is defined
record issue_time # not all async ioengines do this
io_u_queued # record submission latency for ioengines recording issue_time
record issue_time
io_queue_event
io_u_queued # record submission latency only for ioengines *without* commit function
wait_for_completions # only called when it is time to begin reaping I/O
io_u_queued_complete
td_io_getevents
io_ops->getevents # get completions from ioengine
init_icd # record completion time
ios_completed
io_completed
account_io_completion # calculate completion and total latency
When fio's main I/O worker function do_io()
creates an I/O for submission, it
calls get_io_u()
which sets the io_u
's start_time
member. After some
housekeeping related to error handling and verification, io_u_submit()
calls
td_io_queue()
which hands off the io_u
to the ioengine's queue
function. Asynchronous ioengines have either a two-step queue and commit I/O
submission process or they submit I/Os with a single-step queue function call.
For asynchronous ioengines with commit functions (e.g., io_uring, libaio), the
commit function is called to actually submit the I/O to the operating system.
For the io_uring, io_uring_cmd, and libaio ioengines, the io_u
's issue_time
is then recorded. These ioengines also call io_u_queued()
to calculate and
record the submission latency measurement. If the batch submit size is greater
than one, multiple I/Os may be queued before the ioengine's commit function is
called.
However, some asynchronous ioengines with commit functions do not set
issue_time
within their commit functions (e.g., nfs, null). These
ioengines have issue_time recorded within td_io_queue()
. Also, submission
latency is not calculated or reported for these ioengines.
For asynchronous ioengines without commit functions, where submission is
accomplished in a single step in the ioengine's queue function (e.g., rados,
rbd, windowsaio, xvnme), submission latency is calculated in a slightly
different way. The io_u
's issue_time
is recorded within td_io_queue()
(in
the same way as this is done for ioengines with commit functions that do not
record issue_time
) and eventually the submission latency measurement is
calculated and recorded within io_u_queued()
.
When it is time to begin checking for I/O completions, inside do_io()
's main
loop is a call to the aptly named wait_for_completions()
. This sets off a
chain of function calls where the ioengine's getevents handler is called to
obtain completed I/Os, init_icd()
is called to record the completion time,
and finally account_io_completion()
is called to calculate and record the
completion and total latencies.
Synchronous ioengines begin at the same point as asynchronous ioengines with
do_io()
creating an I/O for submission by calling get_io_u()
which sets the
io_u
's start_time
. A selective call tree for synchronous ioengine latency
measurement is below:
do_io
get_io_u # start_time set
io_u_submit
td_io_queue
record issue_time
io_ops->queue # call ioengine's queue()
io_queue_event
io_u_sync_complete
init_icd # record completion time
io_completed
account_io_completion # calculate completion and total latency
Once the io_u
is initialized fio does some housekeeping and then records the
io_u
's issue_time
shortly before handing off the I/O to the synchronous
ioengine's queue handler. Notice that issue_time
is recorded before calling
the ioengine's queue function whereas for asynchronous ioengines issue_time
is recorded after the call to queue (and if it exists a call to the ioengine's
commit). After the synchronous ioengine has submitted the I/O to the operating
system and control has returned to fio, execution reaches io_queue_event()
.
Then eventually init_icd()
is called to record a completion time stamp and
then io_completed()
calls account_io_completion()
to calculate completion
and total latency.
This latency measurement pathway is much simpler than it was for asynchronous ioengines because I/Os have already been completed by the time the ioengine's queue function returns. Note that submission latency is never calculated for synchronous ioengines because it is not reported and submission latency would only encompasses fio's housekeeping activities. Also note that the above call tree describes only the most common case where there are no errors and a request is fully completed.
The table below classifies each of the built-in ioengines by how latency is measured. Ioengines are either synchronous, asynchronous with no commit, asynchronous with commit and submission latency reported, or asynchronous with commit and submission latency not measured.
Latency measurement | ||
---|---|---|
ioengine | sync | async |
cpuio | sync | |
dev-dax | sync | |
dfs | without commit | |
e4defrag | sync | |
exec | sync | |
falloc | sync | |
filecreate | sync | |
filedelete | sync | |
filestat | sync | |
ftruncate | sync | |
gfapi | sync | |
gfapi_async | without commit | |
http | sync | |
ime_psync | sync | |
ime_psyncv | sync | |
ime_aio | with commit, without slat | |
io_uring | with commit and slat | |
io_uring_cmd | with commit and slat | |
libaio | with commit and slat | |
libcufile | sync | |
libhdfs | sync | |
libpmem | sync | |
librpma_apm_client | with commit and slat | |
librpma_apm_server | sync | |
librpma_gpspm_client | with commit and slat | |
librpma_gpspm_server | sync | |
libzbc | sync | |
mmap | sync | |
mtd | sync | |
nbd | without commit | |
net | sync | |
netsplice | sync | |
nfs | with commit, without slat | |
null | sync for qd 1 | async for qd > 1: with commit, without slat |
cpp_null | with commit, without slat | |
pmemblk | sync | |
posixaio | without commit | |
rados | without commit | |
rdma | with commit and slat | |
sg | sync | |
solarisaio | without commit | |
splice | sync | |
sync | sync | |
psync | sync | |
vsync | sync | |
pvsync | sync | |
pvsync2 | sync | |
windowsaio | without commit | |
xnvme | without commit |
The diagram below summarizes fio's latency measurement pathways. It contains
the main branches for the different paths and illustrates the different points
where timestamps are recorded and latencies calculated. This diagram has been
updated to include the two outcomes for asynchronous ioengines with commit
functions that do and do not record issue_time
.
- (Updated) For io_uring, io_uring_cmd, and libaio,
issue_time
is actually assigned twice, once inside the ioengine code and a second time insidetd_io_queue()
. This appears to be unintentional and I will be submitting a patch to clean this up. Update: patches 1 and 2 were merged.
Based on how submission, completion, and total latency are calculated the sum
of mean submission latency and mean completion latency should be equal to the
mean total latency, but if fio sets issue_time
twice then this equality does
not hold.
The listing below is the result of issuing a single I/O, and when we evaluate the relationship among submission, completion, and total latency we see that 61424 + 242709 != 308346. The sum is actually 304133 which means that about 4000ns were lost.
user@ubuntu:~/fio-dev$ fio-canonical/fio --name=test --ioengine=io_uring --number_ios=1 --rw=randread --size=1M
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=1
fio-3.30-48-g26fa
Starting 1 process
test: (groupid=0, jobs=1): err= 0: pid=172145: Mon Jun 6 16:12:42 2022
read: IOPS=1000, BW=4000KiB/s (4096kB/s)(4096B/1msec)
slat (nsec): min=61424, max=61424, avg=61424.00, stdev= 0.00
clat (nsec): min=242709, max=242709, avg=242709.00, stdev= 0.00
lat (nsec): min=308346, max=308346, avg=308346.00, stdev= 0.00
<snip>
The listing below has issue_time
filled in only once via the ioengine code
and now the equality does hold: 53701 + 259566 = 313267.
user@ubuntu:~/fio-dev$ fio-latency/fio --name=test --ioengine=io_uring --number_ios=1 --rw=randread --size=1M
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=1
fio-3.30-48-g26fa-dirty
Starting 1 process
test: (groupid=0, jobs=1): err= 0: pid=172220: Mon Jun 6 16:12:47 2022
read: IOPS=1000, BW=4000KiB/s (4096kB/s)(4096B/1msec)
slat (nsec): min=53701, max=53701, avg=53701.00, stdev= 0.00
clat (nsec): min=259566, max=259566, avg=259566.00, stdev= 0.00
lat (nsec): min=313267, max=313267, avg=313267.00, stdev= 0.00
<snip>
- Some asynchronous ioengines (io_uring, libaio) support trim commands but
trim commands are synchronous operations. We have the flag
FIO_ASYNCIO_SYNC_TRIM
to identify these ioengines and fio follows the synchronous latency measurement path when trim commands are handled by these ioengines. This leads to ugly code like:
if (!td_ioengine_flagged(td, FIO_SYNCIO) &&
(!td_ioengine_flagged(td, FIO_ASYNCIO_SYNC_TRIM) ||
io_u->ddir != DDIR_TRIM)) {
We should have a helper to identify when these ioengines are handling trim commands to make this code easier to read. Update: patch was merged.
-
We should update the documentation to make explicit that the timer begins counting when fio initializes an
io_u
. Submission latency for asynchronous ioengines is more than the time the operating system takes to queue an I/O. Submission latency also includes housekeeping that fio does. Update: patch was merged. -
The documentation for completion latency says that it is very close to zero for synchronous ioengines. This appears to be a typographical error and is likely meant to convey instead that submission latency is very close to zero. Update: patch was merged.
-
cscope is immensely useful for navigating fio's codebase. While writing this blog post it was very helpful to use cscope to be able to quickly see all of the places where
start_time
andissue_time
were touched. -
How is latency measured with options like io_uring's
sqthread_poll
? This option has a polling thread in the kernel monitor the submission queue for new requests. We should think about how to handle this case appropriately for submission latency measurement and confirm that fio is doing the right thing. -
Update since fio 3.30: The nfs ioengine was changed to eliminate its unneeded commit function. So it now reports submission latency. Patch is here.
-
Update since fio 3.30: The null ioengine when it is asynchronous due to a queue depth greater than one was changed to record
issue_time
like the io_uring and libaio ioengines do. Patch is here.