Replay Proto-X #32

wangpatrick57 · 2024-04-27T14:37:13Z

Summary: Can now replay a full tuning run from Proto-X. This is used to see how each step of tuning would have done without query timeouts and without Boot enabled.

Demo:
The image shows the data of a replayed run of TPC-H SF0.01 without Boot enabled during tuning. For each step of tuning, the replay shows the # of queries executed during the original run (which may be < 22 if the workload timed out), the # of queries that timed out during the original run, and whether the workload timed out. It also shows this same information about the replay. You can see that the replayed times are always >= the original times, which makes sense. Whenever the original run is "22,0,False" (i.e. 22 executed, 0 timed out, workload didn't time out), the replayed time matches closely.

Details:

Migrated the pipeline which pickles actions (DBMS configuration changes) while tuning and replays actions during replay to get the DBMS into the correct state.
Related to the above, modified the pipeline to store both the best per-query knobs found during execute_variations() as well as all per-query knob variations tried. We can either replay the best variation or all variations. This is especially useful if the workload timed out in the original run, in which case the "best" variation is a misnomer as it is simply an arbitrary variation.
Made Proto-X log additional information while tuning about the # of executed queries, # of timed out queries, and whether the workload timed out. During replay, all this information may now be utilized.
Fixed a bug where reset() was overwriting the logged replay information for a step, leading to a mismatch between the dumped action.pkl file (which contains the DBMS configuration state) and the run.raw.csv file (which contains the runtime information of the workload during that state).
Refactored all symlinks to have the .link extension to fix a subtle bug where a replay would overwrite the output.log file of the original run.
Standardized whether the page cache is dumped during tuning and replay (it's now not dumped in either case).
Made CLI options for the time to run the RL agent and whether Boot is enabled more fine-grained such that these values can differ between HPO and tune.

…ad or the query timed out

17zhangw

Just got a few questions, mostly around dumping the page cache.

tune/protox/agent/hpo.py

tune/protox/env/mqo/mqo_wrapper.py

tune/protox/env/pg_env.py

tune/protox/agent/replay.py

…erently

17zhangw

Just one more change.

17zhangw · 2024-05-29T04:59:53Z

tune/protox/agent/off_policy_algorithm.py

- if self.logger:
+ # We only stash the results if we're not doing HPO, or else the results from concurrent HPO would get
+ # stashed in the same directory and potentially cause a race condition.
+ if self.logger and not tuning_mode == TuningMode.HPO:


Can we also use the ray_trial_id to stash the results during HPO?

Thanks for catching that!

17zhangw

LGTM.

wangpatrick57 added 30 commits April 15, 2024 15:21

copied replay_mythril.py over

5ea0984

added replay function

1664890

Baseilne -> Baseline

6dce17a

pqt -> query_timeout

59c3d34

repository -> tuning_steps

ae99b21

removed logdir entirely

08829d8

got rid of output_log_dir entirely

cd92d82

ray results now in dbgym workspace

207097a

now linking hpo-ed params in symlinks

93b0988

now linking tuning steps

d2bb709

replay main working

7cd260f

wrote extract_from_task_run_fordpath

aa9b98f

now finding all replay dirs

caf0d6c

added all configs to replay

64aaf15

added replayargs and deleted front of replay_step()

6ece74d

now copying params.json directly into data/

e341956

now copying params.json into tuning_steps

a531c2b

merged with integrate-boot

b5f1e8e

renamed boot_config_fpath to hpo_boot_config_fpath

4471787

added hpo config fpath config to tune

315a69f

fixed bugs so that hpo runs

4e9bde6

fixed some comments

17dece3

made it past first output.log loop

2f554e2

now only reading folders in first loop

f035e01

fixed threshold limit

19310a3

can now build PostgresEnv

e084cc7

now resetting and getting min reward

0c3c146

single to double quotes

bac5238

maximal fixed

79cee72

num lines

86acc80

wangpatrick57 added 16 commits April 23, 2024 16:55

removed noop index dead code

7320e06

removed dead var

e1c3f07

renamed BestQueryRun.timeout to timed_out

9c45bf7

renamed stop_running to workload_timed_out

509f7dc

refactored execute_workload() to separately return whether the worklo…

22617e0

…ad or the query timed out

replaced workload_runtime_accum with compute_total_workload_runtime()

bf5fe73

now seeing whether workload or query timed out in replay

6d237ec

now logging this_step_run_data before validity checks

5bd43c6

added replay_all_variations option

c6b15dd

added comments to _mutilate_action_with_metrics

d0ed37f

added comment about best observed in replay.py

b64fda2

changed bool of queries timed out to an actual num

4fab4f2

added info for num executed queries

a35a576

reset now doesn't overwrite the results from step

6016334

wrote load_per_machine_envvars.sh

4736315

added build_space_good_for_boot option

9849a99

wangpatrick57 marked this pull request as ready for review April 27, 2024 18:50

wangpatrick57 requested review from lmwnshn and 17zhangw April 27, 2024 18:50

17zhangw reviewed Apr 27, 2024

View reviewed changes

wangpatrick57 added 3 commits April 28, 2024 17:57

resolved some PR comments

d2fb275

added comment about tune

af33bc7

different tune trials during hpo now name their tuning_steps dir diff…

474d7ee

…erently

wangpatrick57 requested a review from 17zhangw May 27, 2024 00:49

17zhangw requested changes May 29, 2024

View reviewed changes

now logging during HPO for both baseline and tuning steps

a6e00b9

wangpatrick57 requested a review from 17zhangw May 30, 2024 03:42

17zhangw approved these changes May 30, 2024

View reviewed changes

17zhangw merged commit 3aecdd1 into cmu-db:main May 30, 2024
1 check passed

wangpatrick57 deleted the replay-protox branch July 6, 2024 02:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replay Proto-X #32

Replay Proto-X #32

wangpatrick57 commented Apr 27, 2024 •

edited

Loading

17zhangw left a comment

17zhangw left a comment

17zhangw May 29, 2024

wangpatrick57 May 30, 2024

17zhangw left a comment

Replay Proto-X #32

Replay Proto-X #32

Conversation

wangpatrick57 commented Apr 27, 2024 • edited Loading

17zhangw left a comment

Choose a reason for hiding this comment

17zhangw left a comment

Choose a reason for hiding this comment

17zhangw May 29, 2024

Choose a reason for hiding this comment

wangpatrick57 May 30, 2024

Choose a reason for hiding this comment

17zhangw left a comment

Choose a reason for hiding this comment

wangpatrick57 commented Apr 27, 2024 •

edited

Loading