Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy avoidance in SWAP* insertion ranges #1106

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kkarbowiak
Copy link
Contributor

Issue

This is a draft of proposed change for #657.

The prepostview is a template class representing a lightweight view over contents of a vector (or a part of it) with an additional element in front or at back. It has just enough facilities to allow it being used in a range-based for loop.

This is work in progress and needs some polishing before it can be merged, but should work and allow to run tests and benchmarks.

Basically, we are trading a copy + trivial iteration over copied range for no copy but slightly slower iteration (each dereference contains some logic).

Tasks

  • ...
  • Update docs/API.md (remove if irrelevant)
  • Update CHANGELOG.md (remove if irrelevant)
  • review

@jcoupey
Copy link
Collaborator

jcoupey commented May 15, 2024

I just ran a few tests on this PR versus the parent master commit. Using the usual CVRP benchmarks as I would expect the change to have a higher impact on those.

I've only run locally with small exploration levels. Using -x 0, average computing time goes from 213ms to 212ms. Using -x 1, average computing time goes from 902ms to 904ms. When looking at timing for individual instances, the differences can go in both directions but always with slight changes so it looks like what I'm measuring here is more the noise across runs rather than a significant difference.

I want to look at instance-level profiling/flamegraph to dig a bit deeper before drawing any conclusion.

@jcoupey
Copy link
Collaborator

jcoupey commented Jul 10, 2024

I want to look at instance-level profiling/flamegraph to dig a bit deeper before drawing any conclusion.

So I finally got around to doing more thorough profiling. I've compared 176f57a (parent on master) and current HEAD of this PR at 463320b on several instances.

I've been using perf to monitor the share of ls::compute_best_swap_star_choice in the overall computation (this function takes a significant part in ls::Operator::gain computations). The change in number of samples and share is not representative of any significant improvement with the PR.

@kkarbowiak I guess my initial intuition from #657 was flawed, much as suggested by your comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants