Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common: benchmarking utility and workflow #6069

Merged
merged 4 commits into from
Mar 28, 2024
Merged

common: benchmarking utility and workflow #6069

merged 4 commits into from
Mar 28, 2024

Conversation

janekmi
Copy link
Contributor

@janekmi janekmi commented Mar 27, 2024

@janekmi janekmi added sprint goal This pull request is part of the ongoing sprint no changelog Add to skip the changelog check on your pull request labels Mar 27, 2024
@janekmi janekmi requested review from grom72 and osalyk March 27, 2024 12:46
@janekmi janekmi force-pushed the pmem-benchmark branch 2 times, most recently from 1e6b179 to 3aec78f Compare March 27, 2024 13:44
Copy link
Contributor

@grom72 grom72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, 2 of 2 files at r2, 2 of 2 files at r4, 2 of 2 files at r5, 4 of 4 files at r6, 2 of 2 files at r7, 2 of 2 files at r8, all commit messages.
Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @janekmi and @osalyk)


.github/actions/pmem_benchmark_run/action.yml line 7 at r5 (raw file):

    description: The root directory of the repository designated as runtime
    required: true
  reference:

Lets make argument names more informative

Suggestion:

reference_path

.github/actions/pmem_benchmark_run/action.yml line 10 at r5 (raw file):

    description: LD_LIBRARY_PATH where the first version of PMDK is built
    required: true
  rival:

.
Both options are acceptable ;)

Suggestion:

 rival_LD_LIBRARY_PATH

.github/actions/pmem_benchmark_run/action.yml line 31 at r5 (raw file):

        DEV=$(mount | grep ${{ inputs.pmem_path }} | cut -d' ' -f1 | cut -d'/' -f 3)
        NUMA=$(ndctl list -v | jq ".[] | select(.blockdev==\"$DEV\" and .mode==\"fsdax\") | .numa_node")
        echo "node=$NUMA" >> $GITHUB_OUTPUT

Shall this be a part of ./utils/benchmarks/run_and_combine.py?
That helps a lot to use this workflow locally.


.github/workflows/pmem_benchmark.yml line 12 at r5 (raw file):

        type: string
        default: stable-2.0
  push: # XXX

Can we ALSO run this regularly (daily) and store artifacts with a short retention period?
retention-days: 2
We can have a fresh result on a daily basis without any additional effort.

Code quote:

  workflow_dispatch:
    inputs:
      reference_ref:
        type: string
        default: master
      rival_ref:
        type: string
        default: stable-2.0
  push: # XXX

.github/workflows/pmem_benchmark.yml line 15 at r5 (raw file):

# XXX missing in Ansible files
# sudo dnf install numactl python3-pip -y

Please fix this in ansible scripts opensuse-setup.yml and rocky-setup.yml
There is already a place for that:

    - name: Install benchmarks dependencies (optional)

and for pip see https://docs.ansible.com/ansible/latest/collections/ansible/builtin/pip_module.html

Code quote:

# sudo dnf install numactl python3-pip -y

.github/workflows/pmem_benchmark.yml line 26 at r5 (raw file):

        include:
          - ROLE: reference
            REF: master # XXX ${{ inputs.reference_ref }}

github_ ref or checkout_ref

Suggestion:

GITHUB_REF

.github/workflows/pmem_benchmark.yml line 28 at r5 (raw file):

            REF: master # XXX ${{ inputs.reference_ref }}
          - ROLE: rival
            REF: stable-2.0 # XXX ${{ inputs.rival_ref }}

Suggestion:

          - ROLE: reference
            REF:  stable-2.0# XXX ${{ inputs.reference_ref }}
          - ROLE: rival
            REF: master # XXX ${{ inputs.rival_ref }}

.github/workflows/pmem_benchmark.yml line 30 at r5 (raw file):

            REF: stable-2.0 # XXX ${{ inputs.rival_ref }}
          - ROLE: runtime
            REF: ''

Combining reference and rival with runtime decreases the clarity of the solution.
I would suggest to have a separate step for runtime

Code quote:

      matrix:
        include:
          - ROLE: reference
            REF: master # XXX ${{ inputs.reference_ref }}
          - ROLE: rival
            REF: stable-2.0 # XXX ${{ inputs.rival_ref }}
          - ROLE: runtime
            REF: ''

.github/workflows/pmem_benchmark.yml line 65 at r5 (raw file):

          rival: ../rival/src/nondebug
          config: perf
          scenario: ${{ matrix.SCENARIO }}

Shall we just checkout runtime as a root folder and only have separate subfolders for reference and rival?

Suggestion:

    steps:
      - name: Benchmark
        uses: ./.github/actions/pmem_benchmark_run
        with:
          runtime_dir: .
          reference: ./reference/src/nondebug
          rival: ./rival/src/nondebug
          config: perf
          scenario: ${{ matrix.SCENARIO }}

src/benchmarks/obj_pmalloc.cpp line 239 at r6 (raw file):

	int ret = pmalloc(ob->pop, &ob->offs[i], ob->sizes[i], 0, 0);
	if (ret) {
		fprintf(stderr, "pmalloc ret: %d\n", ret);

perror() is already used in other places.

Suggestion:

perror("pmalloc");

utils/benchmarks/run_and_combine.py line 24 at r1 (raw file):

PARSER.add_argument('-p', '--pmem_path', required=True,
        help='PMEM-mounted directory to use')
PARSER.add_argument('-n', '--numa_node', required=True, help='NUMA node to use')

Redundant argument.
NUMA can be identified based on the given pmem_path.
See pmem_benchmark_run/action.yml:

        DEV=$(mount | grep ${{ inputs.pmem_path }} | cut -d' ' -f1 | cut -d'/' -f 3)
        NUMA=$(ndctl list -v | jq ".[] | select(.blockdev==\"$DEV\" and .mode==\"fsdax\") | .numa_node")
        echo "node=$NUMA" >> $GITHUB_OUTPUT

utils/benchmarks/run_and_combine.py line 38 at r1 (raw file):

        """Run PMEMBENCH according to the provided parameters"""
        config = f'src/benchmarks/{args.config}.cfg'
        file = f'{args.pmem_path}/testfile.obj'

Every test creates its own folder on pmem, let's do the same with the pmembench.

Suggestion:

file = f'{args.pmem_path}/pmembench/testfile.obj'

utils/benchmarks/run_and_combine.py line 58 at r1 (raw file):

COLUMNS_COMBINE = [

Shall we put this table before all functions?

Code quote:

COLUMNS_COMBINE

utils/benchmarks/run_and_combine.py line 60 at r1 (raw file):

COLUMNS_COMBINE = [
        'total-avg[sec]',
        'ops-per-second[1/sec]',

Can we start with one well-known metric?

Suggestion:

        'ops-per-second[1/sec]',
        'total-avg[sec]',    

utils/benchmarks/run_and_combine.py line 75 at r1 (raw file):

COLUMNS_COPY = [

.

Code quote:

COLUMNS_COPY 

utils/benchmarks/run_and_combine.py line 89 at r1 (raw file):

        'operation',
        'lib'
        'min-size',

min_size two times.

Suggestion:

        'min-size'
        'type-number',
        'operation',
        'lib'
        'min-rsize',

utils/benchmarks/run_and_combine.py line 109 at r1 (raw file):

        Output data files:
        - combined - contains data from both ref and riv, and a normalized

Suggestion:

contains data from ref and riv together with a normalized

@janekmi janekmi force-pushed the pmem-benchmark branch 2 times, most recently from 3c21371 to 38a0d6f Compare March 27, 2024 15:23
Copy link
Contributor Author

@janekmi janekmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, 2 of 2 files at r7, 2 of 2 files at r8, 1 of 5 files at r9, 2 of 2 files at r10, 2 of 2 files at r11, 2 of 2 files at r12, 1 of 7 files at r13, 2 of 2 files at r14, 2 of 2 files at r15, 2 of 2 files at r16, all commit messages.
Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @grom72 and @osalyk)


.github/actions/pmem_benchmark_run/action.yml line 7 at r5 (raw file):

Previously, grom72 (Tomasz Gromadzki) wrote…

Lets make argument names more informative

Done.


.github/actions/pmem_benchmark_run/action.yml line 10 at r5 (raw file):

Previously, grom72 (Tomasz Gromadzki) wrote…

.
Both options are acceptable ;)

Done.


.github/actions/pmem_benchmark_run/action.yml line 31 at r5 (raw file):

Previously, grom72 (Tomasz Gromadzki) wrote…

Shall this be a part of ./utils/benchmarks/run_and_combine.py?
That helps a lot to use this workflow locally.

The thing is getting to know the NUMA node of a PMEM device is easy for a human operator but requires some sophistication from software as the example above showed. We can consider doing it in the second round if we can afford it.


.github/workflows/pmem_benchmark.yml line 12 at r5 (raw file):

Previously, grom72 (Tomasz Gromadzki) wrote…

Can we ALSO run this regularly (daily) and store artifacts with a short retention period?
retention-days: 2
We can have a fresh result on a daily basis without any additional effort.

Yes, we can. But IMHO it is pointless. Without an automatic validation of the result, it will just burn down hardware. And I can bet nobody will view the results every other day anyway.


.github/workflows/pmem_benchmark.yml line 15 at r5 (raw file):

Previously, grom72 (Tomasz Gromadzki) wrote…

Please fix this in ansible scripts opensuse-setup.yml and rocky-setup.yml
There is already a place for that:

    - name: Install benchmarks dependencies (optional)

and for pip see https://docs.ansible.com/ansible/latest/collections/ansible/builtin/pip_module.html

Done.


.github/workflows/pmem_benchmark.yml line 26 at r5 (raw file):

Previously, grom72 (Tomasz Gromadzki) wrote…

github_ ref or checkout_ref

Done.


.github/workflows/pmem_benchmark.yml line 28 at r5 (raw file):

            REF: master # XXX ${{ inputs.reference_ref }}
          - ROLE: rival
            REF: stable-2.0 # XXX ${{ inputs.rival_ref }}

These values will be provided from inputs. The current values are only for testing till it will land on the master branch.


.github/workflows/pmem_benchmark.yml line 30 at r5 (raw file):

Previously, grom72 (Tomasz Gromadzki) wrote…

Combining reference and rival with runtime decreases the clarity of the solution.
I would suggest to have a separate step for runtime

Can you elaborate on where did you lost the track?


.github/workflows/pmem_benchmark.yml line 65 at r5 (raw file):

Previously, grom72 (Tomasz Gromadzki) wrote…

Shall we just checkout runtime as a root folder and only have separate subfolders for reference and rival?

It is messy. A directory inside a repository is a part of it.


src/benchmarks/obj_pmalloc.cpp line 239 at r6 (raw file):

Previously, grom72 (Tomasz Gromadzki) wrote…

perror() is already used in other places.

Done.


utils/benchmarks/run_and_combine.py line 24 at r1 (raw file):

Previously, grom72 (Tomasz Gromadzki) wrote…

Redundant argument.
NUMA can be identified based on the given pmem_path.
See pmem_benchmark_run/action.yml:

        DEV=$(mount | grep ${{ inputs.pmem_path }} | cut -d' ' -f1 | cut -d'/' -f 3)
        NUMA=$(ndctl list -v | jq ".[] | select(.blockdev==\"$DEV\" and .mode==\"fsdax\") | .numa_node")
        echo "node=$NUMA" >> $GITHUB_OUTPUT

Doing this thing:

  1. Human operator (easy)
  2. Bash (medium)
  3. Python (nasty complex)

utils/benchmarks/run_and_combine.py line 38 at r1 (raw file):

Previously, grom72 (Tomasz Gromadzki) wrote…

Every test creates its own folder on pmem, let's do the same with the pmembench.

I don't like this idea. Now the test file is created and deleted by pmembench. When this directory is created by this script it is just a liability.

  1. Has to create it.
  2. Has to handle error cases on creation.
  3. Has to delete it.
  4. Has to handle error cases on deletion.

Where this script is not about creating directories and files.

At the same time, the provided directory is hand-picked by the user so he can create a suitable directory structure considering all of these.

IMHO unnecessary complexity. Nay.


utils/benchmarks/run_and_combine.py line 58 at r1 (raw file):

Previously, grom72 (Tomasz Gromadzki) wrote…

Shall we put this table before all functions?

Done.


utils/benchmarks/run_and_combine.py line 60 at r1 (raw file):

Previously, grom72 (Tomasz Gromadzki) wrote…

Can we start with one well-known metric?

Done.


utils/benchmarks/run_and_combine.py line 75 at r1 (raw file):

Previously, grom72 (Tomasz Gromadzki) wrote…

.

Done.


utils/benchmarks/run_and_combine.py line 89 at r1 (raw file):

Previously, grom72 (Tomasz Gromadzki) wrote…

min_size two times.

Done.


utils/benchmarks/run_and_combine.py line 109 at r1 (raw file):

        Output data files:
        - combined - contains data from both ref and riv, and a normalized

Done.

Copy link
Contributor Author

@janekmi janekmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r18, all commit messages.
Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @grom72 and @osalyk)

Copy link

codecov bot commented Mar 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.12%. Comparing base (d4bd6f2) to head (449e075).
Report is 4 commits behind head on master.

❗ Current head 449e075 differs from pull request most recent head 264bd8c. Consider uploading reports for the commit 264bd8c to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6069   +/-   ##
=======================================
  Coverage   70.12%   70.12%           
=======================================
  Files         133      133           
  Lines       19568    19568           
  Branches     3261     3261           
=======================================
+ Hits        13722    13723    +1     
+ Misses       5846     5845    -1     

@janekmi janekmi added this to the 2.1.1 milestone Mar 27, 2024
Copy link
Contributor

@grom72 grom72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 7 files at r13, 1 of 2 files at r18.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @janekmi and @osalyk)


.github/actions/pmem_benchmark_run/action.yml line 31 at r5 (raw file):

Previously, janekmi (Jan Michalski) wrote…

The thing is getting to know the NUMA node of a PMEM device is easy for a human operator but requires some sophistication from software as the example above showed. We can consider doing it in the second round if we can afford it.

What about having a simple bash script that does it and returns just a number?


utils/benchmarks/run_and_combine.py line 24 at r1 (raw file):

Previously, janekmi (Jan Michalski) wrote…

Doing this thing:

  1. Human operator (easy)
  2. Bash (medium)
  3. Python (nasty complex)

What about having a simple bash script that does it?

Copy link
Contributor

@grom72 grom72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @janekmi and @osalyk)

a discussion (no related file):

git clone https://github.com/pmem/pmdk.git reference
cd rival
git checkout stable-2.0
make -j
cd ..

works perfectly and is ready for comparison between stable and actual working branches.
No jams no errors, etc.



utils/benchmarks/run_and_combine.py line 17 at r18 (raw file):

        help='LD_LIBRARY_PATH where the first version of PMDK is built')
PARSER.add_argument('--rival', metavar='LD_LIBRARY_PATH', required=True,
        help='LD_LIBRARY_PATH where the second version of PMDK is built')

Can we make it optional with the default value ./src/nondebug or something similar?

Code quote:

PARSER.add_argument('--rival', metavar='LD_LIBRARY_PATH', required=True,
        help='LD_LIBRARY_PATH where the second version of PMDK is built')

Copy link
Contributor Author

@janekmi janekmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 7 files at r19, 2 of 2 files at r20, 2 of 2 files at r21, 2 of 2 files at r22, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @grom72 and @osalyk)


.github/actions/pmem_benchmark_run/action.yml line 31 at r5 (raw file):

Previously, grom72 (Tomasz Gromadzki) wrote…

What about having a simple bash script that does it and returns just a number?

Done.


utils/benchmarks/run_and_combine.py line 24 at r1 (raw file):

Previously, grom72 (Tomasz Gromadzki) wrote…

What about having a simple bash script that does it?

Done.


utils/benchmarks/run_and_combine.py line 17 at r18 (raw file):

Previously, grom72 (Tomasz Gromadzki) wrote…

Can we make it optional with the default value ./src/nondebug or something similar?

I would prefer not to. Note this path is not preserved in the output so you would have no idea what these results are.

Copy link
Contributor Author

@janekmi janekmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r24, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @grom72 and @osalyk)

a discussion (no related file):

Previously, grom72 (Tomasz Gromadzki) wrote…
git clone https://github.com/pmem/pmdk.git reference
cd rival
git checkout stable-2.0
make -j
cd ..

works perfectly and is ready for comparison between stable and actual working branches.
No jams no errors, etc.

Done.


@janekmi janekmi modified the milestones: 2.1.1, 2.1.0 Mar 27, 2024
Signed-off-by: Jan Michalski <jan.michalski@intel.com>
Signed-off-by: Jan Michalski <jan.michalski@intel.com>
- numactl
- pandas

Signed-off-by: Jan Michalski <jan.michalski@intel.com>
Copy link
Contributor

@grom72 grom72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r19, 2 of 2 files at r20, 2 of 2 files at r21, 2 of 2 files at r23, 2 of 2 files at r24, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @janekmi and @osalyk)

a discussion (no related file):
There is one small issue with the "rival - reference" concept.
There is no information in the final results on what was compared.
Can we have a file that contains the branches' names as one of the output artefacts?



.github/workflows/pmem_benchmark.yml line 40 at r24 (raw file):

            GITHUB_REF: master # XXX ${{ inputs.reference_ref }}
          - ROLE: rival
            GITHUB_REF: stable-2.0 # XXX ${{ inputs.rival_ref }}

Who is the rival and who is the reference?

Suggestion:

          - ROLE: reference
            GITHUB_REF: stable-2.0 # XXX ${{ inputs.reference_ref }}
          - ROLE: rival
            GITHUB_REF: master # XXX ${{ inputs.rival_ref }}

Copy link
Contributor Author

@janekmi janekmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r26, 1 of 1 files at r27, 1 of 1 files at r28, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @grom72 and @osalyk)

a discussion (no related file):

Previously, grom72 (Tomasz Gromadzki) wrote…

There is one small issue with the "rival - reference" concept.
There is no information in the final results on what was compared.
Can we have a file that contains the branches' names as one of the output artefacts?

Done.



.github/workflows/pmem_benchmark.yml line 40 at r24 (raw file):

Previously, grom72 (Tomasz Gromadzki) wrote…

Who is the rival and who is the reference?

Done.

Copy link
Contributor Author

@janekmi janekmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @grom72 and @osalyk)

a discussion (no related file):
I need to use this PR to tune the scenarios which seem unstable because of the short execution time. Bear with me.


Copy link
Contributor

@grom72 grom72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r25, 1 of 2 files at r26, 1 of 1 files at r27, 1 of 1 files at r28, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @osalyk)

Copy link
Contributor

@grom72 grom72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @osalyk)

Copy link
Contributor

@grom72 grom72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @osalyk)

Copy link
Contributor

@grom72 grom72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @janekmi and @osalyk)

a discussion (no related file):

Previously, janekmi (Jan Michalski) wrote…

I need to use this PR to tune the scenarios which seem unstable because of the short execution time. Bear with me.

Shall we close this and tune in a separate one?


@janekmi janekmi force-pushed the pmem-benchmark branch 2 times, most recently from 60567c4 to 264bd8c Compare March 28, 2024 14:25
Copy link
Contributor Author

@janekmi janekmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r29, 2 of 2 files at r30, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @osalyk)

@janekmi janekmi merged commit d9c6ba8 into master Mar 28, 2024
7 checks passed
Signed-off-by: Jan Michalski <jan.michalski@intel.com>
@janekmi janekmi deleted the pmem-benchmark branch April 16, 2024 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no changelog Add to skip the changelog check on your pull request sprint goal This pull request is part of the ongoing sprint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants