Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fdp: support scheme placement id selection #1757

Merged
merged 2 commits into from
May 21, 2024

Conversation

parkvibes
Copy link
Contributor

@parkvibes parkvibes commented May 9, 2024

fdp: support scheme placement id (index) selection

Add a new placement id selection method called scheme. It allows
users to assign a placement ID (index) depending on the offset range.
The strategy of the scheme is specified in the file by user and
is applicable using the option dp_scheme.

Signed-off-by: Hyunwoo Park dshw.park@samsung.com

t/nvmept_fdp: add tests(302,303,400,401) for fdp scheme

  • 302/303: invalid options tests
  • 400/401: check whether fdp scheme works properly

Signed-off-by: Hyunwoo Park dshw.park@samsung.com

@vincentkfu
Copy link
Collaborator

Hyunwoo, although Ankit and I have shared feedback on your patch, it's not appropriate to include our Signed-off-by tags on your scheme patch since we are not co-authors.

Please update the commit message to remove the tags for Ankit and me.

For patches accepted via the mailing list maintainers add a Signed-off-by after accepting the patch but those should not be added by the patch authors.

For more details see https://www.kernel.org/doc/html/latest/process/submitting-patches.html?highlight=signed%20off.

@parkvibes parkvibes force-pushed the enable-dataplacement-scheme branch from b07d572 to d70c81c Compare May 9, 2024 23:55
@parkvibes
Copy link
Contributor Author

Vincent, I updated messages for commit as well as PR in accordance with your guide.

t/nvmept_fdp.py Outdated
"bs": 4096,
"io_size": 4096,
"verify": "crc32c",
"fdp": 1,
Copy link
Collaborator

@vincentkfu vincentkfu May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is identical to the previous one.
"fdp": 1," should be "dataplacement": "fdp"
"fdp_pli": 3 should be "plids": 3

@vincentkfu
Copy link
Collaborator

When I run the test against a QEMU FDP device I see:

root@localhost:~/fio-dev/1757/fio# python3 t/nvmept_fdp.py -f ./fio --dut /dev/ng0n1 --debug -o 400 401
Artifact directory is nvmept-fdp-test-20240513-154330
fio path is /root/fio-dev/1757/fio/fio
Test 1 SKIPPED (User request or override)
Test 2 SKIPPED (User request or override)
Test 3 SKIPPED (User request or override)
Test 4 SKIPPED (User request or override)
Test 5 SKIPPED (User request or override)
Test 6 SKIPPED (User request or override)
Test 7 SKIPPED (User request or override)
Test 8 SKIPPED (User request or override)
Test 9 SKIPPED (User request or override)
Test 10 SKIPPED (User request or override)
Test 11 SKIPPED (User request or override)
Test 12 SKIPPED (User request or override)
Test 13 SKIPPED (User request or override)
Test 14 SKIPPED (User request or override)
Test 100 SKIPPED (User request or override)
Test 101 SKIPPED (User request or override)
Test 102 SKIPPED (User request or override)
Test 103 SKIPPED (User request or override)
Test 200 SKIPPED (User request or override)
Test 201 SKIPPED (User request or override)
Test 202 SKIPPED (User request or override)
Test 203 SKIPPED (User request or override)
Test 300 SKIPPED (User request or override)
Test 301 SKIPPED (User request or override)
Test 302 SKIPPED (User request or override)
Test 303 SKIPPED (User request or override)
DEBUG:root:Test 400: return code: 0
DEBUG:root:plid_list: [1, 2, 3]
DEBUG:root:plid_index_set_in_scheme: {0, 1, 2}
DEBUG:root:plid_list_referenced_by_scheme: [1, 2, 3]
DEBUG:root:pid 0 should not be touched by the scheme. ruamw of it 32 equals to 32
DEBUG:root:pid 8192 should not be touched by the scheme. ruamw of it 32 equals to 32
DEBUG:root:pid 16384 should not be touched by the scheme. ruamw of it 32 equals to 32
DEBUG:root:pid 24576 should not be touched by the scheme. ruamw of it 32 equals to 32
ERROR:root:pid 1 should not be touched by the scheme. ruamw of it(32) should equals to 32
DEBUG:root:pid 8193 should not be touched by the scheme. ruamw of it 32 equals to 32
DEBUG:root:pid 16385 should not be touched by the scheme. ruamw of it 32 equals to 32
DEBUG:root:pid 24577 should not be touched by the scheme. ruamw of it 32 equals to 32
ERROR:root:pid 2 should not be touched by the scheme. ruamw of it(32) should equals to 32
DEBUG:root:pid 8194 should not be touched by the scheme. ruamw of it 32 equals to 32
DEBUG:root:pid 16386 should not be touched by the scheme. ruamw of it 32 equals to 32
DEBUG:root:pid 24578 should not be touched by the scheme. ruamw of it 32 equals to 32
ERROR:root:pid 3 should not be touched by the scheme. ruamw of it(32) should equals to 32
DEBUG:root:pid 8195 should not be touched by the scheme. ruamw of it 32 equals to 32
DEBUG:root:pid 16387 should not be touched by the scheme. ruamw of it 32 equals to 32
DEBUG:root:pid 24579 should not be touched by the scheme. ruamw of it 32 equals to 32
DEBUG:root:Test 400: stderr:

DEBUG:root:Test 400: stdout:

Test 400 FAILED:  400
DEBUG:root:Test 401: return code: 0
DEBUG:root:plid_list: [1, 2, 3]
DEBUG:root:plid_index_set_in_scheme: {0, 1, 2}
DEBUG:root:plid_list_referenced_by_scheme: [1, 2, 3]
DEBUG:root:pid 0 should not be touched by the scheme. ruamw of it 32 equals to 32
DEBUG:root:pid 8192 should not be touched by the scheme. ruamw of it 32 equals to 32
DEBUG:root:pid 16384 should not be touched by the scheme. ruamw of it 32 equals to 32
DEBUG:root:pid 24576 should not be touched by the scheme. ruamw of it 32 equals to 32
ERROR:root:pid 1 should not be touched by the scheme. ruamw of it(32) should equals to 32
DEBUG:root:pid 8193 should not be touched by the scheme. ruamw of it 32 equals to 32
DEBUG:root:pid 16385 should not be touched by the scheme. ruamw of it 32 equals to 32
DEBUG:root:pid 24577 should not be touched by the scheme. ruamw of it 32 equals to 32
ERROR:root:pid 2 should not be touched by the scheme. ruamw of it(32) should equals to 32
DEBUG:root:pid 8194 should not be touched by the scheme. ruamw of it 32 equals to 32
DEBUG:root:pid 16386 should not be touched by the scheme. ruamw of it 32 equals to 32
DEBUG:root:pid 24578 should not be touched by the scheme. ruamw of it 32 equals to 32
ERROR:root:pid 3 should not be touched by the scheme. ruamw of it(32) should equals to 32
DEBUG:root:pid 8195 should not be touched by the scheme. ruamw of it 32 equals to 32
DEBUG:root:pid 16387 should not be touched by the scheme. ruamw of it 32 equals to 32
DEBUG:root:pid 24579 should not be touched by the scheme. ruamw of it 32 equals to 32
DEBUG:root:Test 401: stderr:

DEBUG:root:Test 401: stdout:

Test 401 FAILED:  401
0 test(s) passed, 2 failed, 26 skipped

This is the relevant part of my QEMU invocation:

        -device "nvme-subsys,id=nvme-subsys0,fdp=on,fdp.runs=128K,fdp.nrg=4,fdp.nruh=8" \
        -device "nvme,id=nvme0,serial=deadbeef,subsys=nvme-subsys0" \
        -drive "id=nvm-1,file=nvme.img,format=raw,if=none,discard=unmap,media=disk" \
        -device "nvme-ns,id=nvm-1,drive=nvm-1,bus=nvme0,nsid=1,logical_block_size=4096,physical_block_size=4096,fdp.ruhs=0;5;6;7" \

t/nvmept_fdp.py Outdated
"rw": 'randwrite',
"bs": 4096,
"size": 1048576,
"io_size": 1073741824,
Copy link
Collaborator

@vincentkfu vincentkfu May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making this just a very simple test with a single write to each of the PLIDs.

Something along these lines:

"io_size": 3*4096,
"rw": `write:524288`,

Fio will issue sequential write operations with a 524288-byte hole between each write. Then you should end up with a single write to each PLID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplifying the test would be great.

FYI, there was an issue which gets me to make tests a bit complex. I figured out that the test assumes that ruamw decreases whenever a write occurs on plids. In my case, I ran the test against a real device supporting FDP. But the device did not work in that way. (※ ruamw decreases after multiple writes occur and the FIO_FDP_MAX_RUAMW was large enough, 1,571,328)

In your test environment, ruamw of 1, 2, and 3 may be updated (io_size/bs=262,144, multiple of 32) times in total. The result value of ruamw, 32, can be accepted as a wrong value even though the value is correct. Simplifying the test can also resolve the failure of the test.

@parkvibes
Copy link
Contributor Author

@vincentkfu, 400 & 401 tests are simplified including some bug-fixes.

@vincentkfu
Copy link
Collaborator

vincentkfu commented May 17, 2024

Please squash the final two patches.

The test still does not work. The code assumes that each of the listed PLIDs will be touched but that's not the case.

Fio will write 64K with holes of size 64K between each write. With these contents for lba.scheme,

0, 131072, 0
131072, 524288, 2
524288, 1048576, 1

plid 0 will receive one write but plid 2 will receive two writes.

# /root/fio-dev/1757/fio/fio  --name=nvmept-fdp  --ioengine=io_uring_cmd  --cmd_type=nvme  --randrepeat=0  --filename=/dev/ng0n1  --rw=write:65536  --bs=65536  --fdp=1  --fdp_pli=1,2,3  --fdp_pli_select=scheme  --dp_scheme=lba.scheme  --number_ios=3 --debug=io
fio: set debug option io
nvmept-fdp: (g=0): rw=write, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB, (T) 64.0KiB-64.0KiB, ioengine=io_uring_cmd, iodepth=1
fio-3.37
Starting 1 process
io       14824 invalidate not supported /dev/ng0n1
io       14824 dtype set to 0x2, dspec set to 0x2000
io       14824 fill: io_u 0x55e475352c40: off=0x0,len=0x10000,ddir=1,file=/dev/ng0n1
io       14824 prep: io_u 0x55e475352c40: off=0x0,len=0x10000,ddir=1,file=/dev/ng0n1
io       14824 prep: io_u 0x55e475352c40: ret=0
io       14824 queue: io_u 0x55e475352c40: off=0x0,len=0x10000,ddir=1,file=/dev/ng0n1
io       14824 calling ->commit(), depth 1
io       14824 io_u_queued_complete: min=1
io       14824 getevents: 1
io       14824 complete: io_u 0x55e475352c40: off=0x0,len=0x10000,ddir=1,file=/dev/ng0n1
io       14824 dtype set to 0x2, dspec set to 0x6000
io       14824 fill: io_u 0x55e475352c40: off=0x20000,len=0x10000,ddir=1,file=/dev/ng0n1
io       14824 prep: io_u 0x55e475352c40: off=0x20000,len=0x10000,ddir=1,file=/dev/ng0n1
io       14824 prep: io_u 0x55e475352c40: ret=0
io       14824 queue: io_u 0x55e475352c40: off=0x20000,len=0x10000,ddir=1,file=/dev/ng0n1
io       14824 calling ->commit(), depth 1
io       14824 io_u_queued_complete: min=1
io       14824 getevents: 1
io       14824 complete: io_u 0x55e475352c40: off=0x20000,len=0x10000,ddir=1,file=/dev/ng0n1
io       14824 dtype set to 0x2, dspec set to 0x6000
io       14824 fill: io_u 0x55e475352c40: off=0x40000,len=0x10000,ddir=1,file=/dev/ng0n1
io       14824 prep: io_u 0x55e475352c40: off=0x40000,len=0x10000,ddir=1,file=/dev/ng0n1
io       14824 prep: io_u 0x55e475352c40: ret=0
io       14824 queue: io_u 0x55e475352c40: off=0x40000,len=0x10000,ddir=1,file=/dev/ng0n1
io       14824 calling ->commit(), depth 1
io       14824 io_u_queued_complete: min=1
io       14824 getevents: 1
io       14824 complete: io_u 0x55e475352c40: off=0x40000,len=0x10000,ddir=1,file=/dev/ng0n1
io       14824 close ioengine io_uring_cmd
io       14824 free ioengine io_uring_cmd

nvmept-fdp: (groupid=0, jobs=1): err= 0: pid=14824: Fri May 17 17:00:30 2024
  write: IOPS=1500, BW=93.8MiB/s (98.3MB/s)(192KiB/2msec); 0 zone resets
    slat (usec): min=45, max=125, avg=77.05, stdev=42.69
    clat (usec): min=297, max=561, avg=409.81, stdev=136.51
     lat (usec): min=343, max=687, avg=486.86, stdev=179.04
    clat percentiles (usec):
     |  1.00th=[  297],  5.00th=[  297], 10.00th=[  297], 20.00th=[  297],
     | 30.00th=[  297], 40.00th=[  371], 50.00th=[  371], 60.00th=[  371],
     | 70.00th=[  562], 80.00th=[  562], 90.00th=[  562], 95.00th=[  562],
     | 99.00th=[  562], 99.50th=[  562], 99.90th=[  562], 99.95th=[  562],
     | 99.99th=[  562]
  lat (usec)   : 500=66.67%, 750=33.33%
  cpu          : usr=0.00%, sys=100.00%, ctx=3, majf=0, minf=15
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,3,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=93.8MiB/s (98.3MB/s), 93.8MiB/s-93.8MiB/s (98.3MB/s-98.3MB/s), io=192KiB (197kB), run=2-2msec

@vincentkfu
Copy link
Collaborator

Oh, I actually didn't appreciate this originally. Wouldn't it be easier for the scheme file to provide the placement indices directly? I now see that the scheme file provides indices to be used to select from the placement ids listed in the plids option. This seems overly complicated and the documentation is not clear that this is how the placement IDs are actually chosen.

Why not just use the indices in the scheme file directly as placement ID indices? Then when plid_select=scheme, fio can just ignore the plids option.

@parkvibes
Copy link
Contributor Author

parkvibes commented May 20, 2024

In fact, from the point of view of using the index, I was in the same position as you, but I chose this method because I copied the methods of random & RR. As you said, I agree that dealing with the index may cause confusion in the case of the scheme.

I'll update the code by directly assigning dspec from the value of scheme file. Also, plids/fdp_pli options will be ignored when plid_select/fdp_pli_select=scheme.

lba.scheme will be also refined for all plids to be touched.

@parkvibes parkvibes force-pushed the enable-dataplacement-scheme branch from cfecd30 to 246231a Compare May 20, 2024 04:22
@vincentkfu
Copy link
Collaborator

There actually is a problem with write:65536

root@localhost:~/fio-dev/1757/fio# python3 t/nvmept_fdp.py --dut /dev/ng0n1 --fio ./fio -o 400 401
Artifact directory is nvmept-fdp-test-20240520-172623
fio path is /root/fio-dev/1757/fio/fio
Test 1 SKIPPED (User request or override)
Test 2 SKIPPED (User request or override)
Test 3 SKIPPED (User request or override)
Test 4 SKIPPED (User request or override)
Test 5 SKIPPED (User request or override)
Test 6 SKIPPED (User request or override)
Test 7 SKIPPED (User request or override)
Test 8 SKIPPED (User request or override)
Test 9 SKIPPED (User request or override)
Test 10 SKIPPED (User request or override)
Test 11 SKIPPED (User request or override)
Test 12 SKIPPED (User request or override)
Test 13 SKIPPED (User request or override)
Test 14 SKIPPED (User request or override)
Test 100 SKIPPED (User request or override)
Test 101 SKIPPED (User request or override)
Test 102 SKIPPED (User request or override)
Test 103 SKIPPED (User request or override)
Test 200 SKIPPED (User request or override)
Test 201 SKIPPED (User request or override)
Test 202 SKIPPED (User request or override)
Test 203 SKIPPED (User request or override)
Test 300 SKIPPED (User request or override)
Test 301 SKIPPED (User request or override)
Test 302 SKIPPED (User request or override)
Test 303 SKIPPED (User request or override)
ERROR:root:Unhandled rw value write:65536
Test 400 FAILED:  400
ERROR:root:Unhandled rw value write:65536
Test 401 FAILED:  401
0 test(s) passed, 2 failed, 26 skipped

The problem is in _check_result():

        if self.fio_opts['rw'] in ['read', 'randread']:
            self.passed = self.check_all_ddirs(['read'], job)
        elif self.fio_opts['rw'] in ['write', 'randwrite']:
            if 'verify' not in self.fio_opts:
                self.passed = self.check_all_ddirs(['write'], job)
            else:
                self.passed = self.check_all_ddirs(['read', 'write'], job)
        elif self.fio_opts['rw'] in ['trim', 'randtrim']:
            self.passed = self.check_all_ddirs(['trim'], job)
        elif self.fio_opts['rw'] in ['readwrite', 'randrw']:
            self.passed = self.check_all_ddirs(['read', 'write'], job)
        elif self.fio_opts['rw'] in ['trimwrite', 'randtrimwrite']:
            self.passed = self.check_all_ddirs(['trim', 'write'], job)
        else:
            logging.error("Unhandled rw value %s", self.fio_opts['rw'])
            self.passed = False

The original code did not anticipate this usage of the rw option. Probably the easiest thing to do would be to strip the :65536 from fio_opts['rw'] and then carry out the comparison using the stripped value of the rw option.

dataplacement.c Outdated
unsigned long long offset = io_u->offset;
int i;

for (i = 0; i < ruhs_scheme->nr_schemes; i++){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space between ) and {

dataplacement.c Outdated

for (i = 0; i < ruhs_scheme->nr_schemes; i++){
if (offset >= ruhs_scheme->scheme_entries[i].start_offset &&
offset < ruhs_scheme->scheme_entries[i].end_offset){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space between ) and {

fio.1 Outdated
@@ -2307,6 +2312,31 @@ jobs. If you want fio to use placement identifier only at indices 0, 2 and 5
specify, you would set `plids=0,2,5`. For streams this should be a
comma-separated list of Stream IDs.
.TP
.BI (io_uring_cmd,xnvme)\fR\fBdp_scheme\fP=filename
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be dp_scheme=str instead of filename.

options.c Outdated
.name = "dp_scheme",
.lname = "Data Placement Scheme",
.type = FIO_OPT_STR_STORE,
.cb = str_dp_scheme_cb,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please line up the = with the others ones.

HOWTO.rst Outdated
@@ -2541,6 +2545,26 @@ with the caveat that when used on the command line, they must come after the
identifiers only at indices 0, 2 and 5 specify ``plids=0,2,5``. For
streams this should be a comma-separated list of Stream IDs.

.. option:: dp_scheme=filename : [io_uring_cmd] [xnvme]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be dp_scheme=str instead of filename.

@parkvibes parkvibes force-pushed the enable-dataplacement-scheme branch 2 times, most recently from 3a5c4e6 to 96566b0 Compare May 21, 2024 11:57
Add a new placement id selection method called scheme. It allows
users to assign a placement ID (index) depending on the offset range.
The strategy of the scheme is specified in the file by user and
is applicable using the option dp_scheme.

Signed-off-by: Hyunwoo Park <dshw.park@samsung.com>
- 302/303: invalid options tests
- 400/401: check whether fdp scheme works properly

Signed-off-by: Hyunwoo Park <dshw.park@samsung.com>
@axboe axboe merged commit d3bcdd3 into axboe:master May 21, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants