Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] consolidate TDs in ParallelEnv without buffers #2231

Merged
merged 4 commits into from
Jun 28, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jun 14, 2024

Improves perf of ParallelEnv without buffer

FPS for CartPole-v1 with gym on 4 procs (local) goes from 660fps to 1000fps on my machine.

Based on
pytorch/tensordict#814

import time

import tqdm
from torchrl.envs import ParallelEnv, GymEnv

if __name__ == "__main__":
    env = ParallelEnv(4, lambda: GymEnv("CartPole-v1"), use_buffers=False)
    env.rollout(1000, break_when_any_done=False)

    pbar = tqdm.tqdm(range(100))
    t0 = time.time()
    for i in pbar:
        env.rollout(1000, break_when_any_done=False)
        pbar.set_description(f"fps: {(1000 * (i+1)) / (time.time() - t0): 4.4f}")

Copy link

pytorch-bot bot commented Jun 14, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2231

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 1 Unrelated Failure

As of commit 18d10e8 with merge base eb6c85d (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 14, 2024
Copy link

github-actions bot commented Jun 14, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 91. Improved: $\large\color{#35bf28}3$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1201s 59.2226ms 16.8855 Ops/s 18.0148 Ops/s $\textbf{\color{#d91a1a}-6.27\%}$
test_sync 38.4063ms 32.0311ms 31.2196 Ops/s 29.0733 Ops/s $\textbf{\color{#35bf28}+7.38\%}$
test_async 50.0934ms 29.3137ms 34.1138 Ops/s 34.5825 Ops/s $\color{#d91a1a}-1.36\%$
test_simple 0.3748s 0.3741s 2.6734 Ops/s 2.6661 Ops/s $\color{#35bf28}+0.27\%$
test_transformed 0.5311s 0.5291s 1.8901 Ops/s 1.8087 Ops/s $\color{#35bf28}+4.50\%$
test_serial 1.3361s 1.2767s 0.7832 Ops/s 0.7908 Ops/s $\color{#d91a1a}-0.96\%$
test_parallel 1.1401s 1.0794s 0.9265 Ops/s 0.9117 Ops/s $\color{#35bf28}+1.62\%$
test_step_mdp_speed[True-True-True-True-True] 0.1376ms 22.9070μs 43.6547 KOps/s 44.7474 KOps/s $\color{#d91a1a}-2.44\%$
test_step_mdp_speed[True-True-True-True-False] 42.2590μs 13.5831μs 73.6208 KOps/s 74.8991 KOps/s $\color{#d91a1a}-1.71\%$
test_step_mdp_speed[True-True-True-False-True] 40.0550μs 13.1530μs 76.0285 KOps/s 76.3698 KOps/s $\color{#d91a1a}-0.45\%$
test_step_mdp_speed[True-True-True-False-False] 26.8700μs 7.9158μs 126.3296 KOps/s 128.7063 KOps/s $\color{#d91a1a}-1.85\%$
test_step_mdp_speed[True-True-False-True-True] 59.8520μs 24.1592μs 41.3920 KOps/s 41.9946 KOps/s $\color{#d91a1a}-1.43\%$
test_step_mdp_speed[True-True-False-True-False] 39.0230μs 14.6738μs 68.1487 KOps/s 68.7577 KOps/s $\color{#d91a1a}-0.89\%$
test_step_mdp_speed[True-True-False-False-True] 42.4000μs 14.4505μs 69.2018 KOps/s 69.6389 KOps/s $\color{#d91a1a}-0.63\%$
test_step_mdp_speed[True-True-False-False-False] 36.6290μs 9.1559μs 109.2198 KOps/s 111.4671 KOps/s $\color{#d91a1a}-2.02\%$
test_step_mdp_speed[True-False-True-True-True] 61.9360μs 25.4166μs 39.3444 KOps/s 39.5727 KOps/s $\color{#d91a1a}-0.58\%$
test_step_mdp_speed[True-False-True-True-False] 36.4580μs 15.9838μs 62.5632 KOps/s 62.1967 KOps/s $\color{#35bf28}+0.59\%$
test_step_mdp_speed[True-False-True-False-True] 59.0600μs 14.2372μs 70.2387 KOps/s 70.0893 KOps/s $\color{#35bf28}+0.21\%$
test_step_mdp_speed[True-False-True-False-False] 30.1360μs 9.0054μs 111.0440 KOps/s 110.7412 KOps/s $\color{#35bf28}+0.27\%$
test_step_mdp_speed[True-False-False-True-True] 57.6280μs 26.8344μs 37.2656 KOps/s 37.5487 KOps/s $\color{#d91a1a}-0.75\%$
test_step_mdp_speed[True-False-False-True-False] 49.2720μs 16.8908μs 59.2039 KOps/s 58.6740 KOps/s $\color{#35bf28}+0.90\%$
test_step_mdp_speed[True-False-False-False-True] 41.1470μs 15.3864μs 64.9924 KOps/s 63.8370 KOps/s $\color{#35bf28}+1.81\%$
test_step_mdp_speed[True-False-False-False-False] 30.1470μs 9.7631μs 102.4260 KOps/s 99.2617 KOps/s $\color{#35bf28}+3.19\%$
test_step_mdp_speed[False-True-True-True-True] 54.3310μs 25.6537μs 38.9807 KOps/s 39.4914 KOps/s $\color{#d91a1a}-1.29\%$
test_step_mdp_speed[False-True-True-True-False] 41.4180μs 16.1330μs 61.9846 KOps/s 62.3461 KOps/s $\color{#d91a1a}-0.58\%$
test_step_mdp_speed[False-True-True-False-True] 43.4220μs 16.6663μs 60.0012 KOps/s 59.9750 KOps/s $\color{#35bf28}+0.04\%$
test_step_mdp_speed[False-True-True-False-False] 34.0140μs 10.3120μs 96.9748 KOps/s 97.9057 KOps/s $\color{#d91a1a}-0.95\%$
test_step_mdp_speed[False-True-False-True-True] 74.3790μs 26.7348μs 37.4044 KOps/s 37.8008 KOps/s $\color{#d91a1a}-1.05\%$
test_step_mdp_speed[False-True-False-True-False] 68.8480μs 17.1990μs 58.1430 KOps/s 58.5675 KOps/s $\color{#d91a1a}-0.72\%$
test_step_mdp_speed[False-True-False-False-True] 44.8940μs 17.4556μs 57.2882 KOps/s 55.8906 KOps/s $\color{#35bf28}+2.50\%$
test_step_mdp_speed[False-True-False-False-False] 32.8120μs 11.2573μs 88.8314 KOps/s 87.6021 KOps/s $\color{#35bf28}+1.40\%$
test_step_mdp_speed[False-False-True-True-True] 59.9020μs 27.1851μs 36.7848 KOps/s 36.4322 KOps/s $\color{#35bf28}+0.97\%$
test_step_mdp_speed[False-False-True-True-False] 56.2720μs 18.2955μs 54.6583 KOps/s 54.6852 KOps/s $\color{#d91a1a}-0.05\%$
test_step_mdp_speed[False-False-True-False-True] 54.1610μs 18.0791μs 55.3123 KOps/s 55.5742 KOps/s $\color{#d91a1a}-0.47\%$
test_step_mdp_speed[False-False-True-False-False] 42.7690μs 11.5075μs 86.9000 KOps/s 85.9589 KOps/s $\color{#35bf28}+1.09\%$
test_step_mdp_speed[False-False-False-True-True] 42.8000μs 29.5732μs 33.8144 KOps/s 34.1094 KOps/s $\color{#d91a1a}-0.86\%$
test_step_mdp_speed[False-False-False-True-False] 52.9890μs 19.3028μs 51.8061 KOps/s 51.6715 KOps/s $\color{#35bf28}+0.26\%$
test_step_mdp_speed[False-False-False-False-True] 42.0390μs 18.7533μs 53.3238 KOps/s 55.5204 KOps/s $\color{#d91a1a}-3.96\%$
test_step_mdp_speed[False-False-False-False-False] 35.3760μs 12.2324μs 81.7502 KOps/s 79.8214 KOps/s $\color{#35bf28}+2.42\%$
test_values[generalized_advantage_estimate-True-True] 12.4540ms 9.3418ms 107.0453 Ops/s 103.7001 Ops/s $\color{#35bf28}+3.23\%$
test_values[vec_generalized_advantage_estimate-True-True] 51.2840ms 36.3196ms 27.5334 Ops/s 28.1175 Ops/s $\color{#d91a1a}-2.08\%$
test_values[td0_return_estimate-False-False] 0.2418ms 0.1661ms 6.0215 KOps/s 6.0943 KOps/s $\color{#d91a1a}-1.19\%$
test_values[td1_return_estimate-False-False] 39.7834ms 24.5024ms 40.8124 Ops/s 42.5957 Ops/s $\color{#d91a1a}-4.19\%$
test_values[vec_td1_return_estimate-False-False] 38.2765ms 35.6386ms 28.0595 Ops/s 28.6149 Ops/s $\color{#d91a1a}-1.94\%$
test_values[td_lambda_return_estimate-True-False] 37.1308ms 33.7669ms 29.6148 Ops/s 29.6019 Ops/s $\color{#35bf28}+0.04\%$
test_values[vec_td_lambda_return_estimate-True-False] 37.3138ms 34.8774ms 28.6719 Ops/s 28.1043 Ops/s $\color{#35bf28}+2.02\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.6799ms 8.2614ms 121.0455 Ops/s 119.1485 Ops/s $\color{#35bf28}+1.59\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.9017ms 1.7977ms 556.2692 Ops/s 565.1140 Ops/s $\color{#d91a1a}-1.57\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5059ms 0.3495ms 2.8612 KOps/s 2.8799 KOps/s $\color{#d91a1a}-0.65\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 49.0552ms 45.6233ms 21.9186 Ops/s 24.7836 Ops/s $\textbf{\color{#d91a1a}-11.56\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 3.7911ms 3.0204ms 331.0837 Ops/s 327.6142 Ops/s $\color{#35bf28}+1.06\%$
test_dqn_speed 6.6529ms 1.3431ms 744.5365 Ops/s 753.4103 Ops/s $\color{#d91a1a}-1.18\%$
test_ddpg_speed 3.5361ms 2.8391ms 352.2260 Ops/s 361.3506 Ops/s $\color{#d91a1a}-2.53\%$
test_sac_speed 9.3380ms 8.5319ms 117.2076 Ops/s 118.9792 Ops/s $\color{#d91a1a}-1.49\%$
test_redq_speed 15.9461ms 13.9783ms 71.5397 Ops/s 70.1969 Ops/s $\color{#35bf28}+1.91\%$
test_redq_deprec_speed 14.8197ms 13.4359ms 74.4276 Ops/s 74.2289 Ops/s $\color{#35bf28}+0.27\%$
test_td3_speed 8.7786ms 8.4958ms 117.7052 Ops/s 120.3391 Ops/s $\color{#d91a1a}-2.19\%$
test_cql_speed 38.0057ms 36.8549ms 27.1334 Ops/s 27.4417 Ops/s $\color{#d91a1a}-1.12\%$
test_a2c_speed 8.8432ms 7.4467ms 134.2869 Ops/s 137.7526 Ops/s $\color{#d91a1a}-2.52\%$
test_ppo_speed 8.9104ms 7.7297ms 129.3714 Ops/s 130.3205 Ops/s $\color{#d91a1a}-0.73\%$
test_reinforce_speed 7.6338ms 6.6577ms 150.2028 Ops/s 150.8066 Ops/s $\color{#d91a1a}-0.40\%$
test_iql_speed 34.2873ms 32.9241ms 30.3729 Ops/s 30.6301 Ops/s $\color{#d91a1a}-0.84\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.3115ms 3.4144ms 292.8735 Ops/s 295.6828 Ops/s $\color{#d91a1a}-0.95\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7406ms 0.4909ms 2.0373 KOps/s 1.7878 KOps/s $\textbf{\color{#35bf28}+13.96\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 3.8945ms 0.4650ms 2.1505 KOps/s 2.1407 KOps/s $\color{#35bf28}+0.46\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.2828ms 3.4229ms 292.1469 Ops/s 299.4209 Ops/s $\color{#d91a1a}-2.43\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.8917ms 0.4837ms 2.0675 KOps/s 2.0649 KOps/s $\color{#35bf28}+0.12\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.8149ms 0.4615ms 2.1670 KOps/s 2.1961 KOps/s $\color{#d91a1a}-1.33\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 2.4840ms 1.7286ms 578.4966 Ops/s 600.2405 Ops/s $\color{#d91a1a}-3.62\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.9854ms 1.6469ms 607.1953 Ops/s 623.0809 Ops/s $\color{#d91a1a}-2.55\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.9718ms 3.5345ms 282.9246 Ops/s 293.1909 Ops/s $\color{#d91a1a}-3.50\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.2341ms 0.6247ms 1.6007 KOps/s 1.6093 KOps/s $\color{#d91a1a}-0.54\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9066ms 0.6029ms 1.6585 KOps/s 1.6668 KOps/s $\color{#d91a1a}-0.49\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.1618ms 3.4044ms 293.7374 Ops/s 292.5491 Ops/s $\color{#35bf28}+0.41\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9672ms 0.4864ms 2.0559 KOps/s 2.0427 KOps/s $\color{#35bf28}+0.65\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7326ms 0.4732ms 2.1133 KOps/s 2.1586 KOps/s $\color{#d91a1a}-2.10\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.2156ms 3.3891ms 295.0678 Ops/s 291.4815 Ops/s $\color{#35bf28}+1.23\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.5746ms 0.4813ms 2.0778 KOps/s 2.0613 KOps/s $\color{#35bf28}+0.80\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3.7276ms 0.4662ms 2.1450 KOps/s 2.1779 KOps/s $\color{#d91a1a}-1.51\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 4.3662ms 3.5405ms 282.4472 Ops/s 283.0629 Ops/s $\color{#d91a1a}-0.22\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.1784ms 0.6273ms 1.5940 KOps/s 1.6064 KOps/s $\color{#d91a1a}-0.77\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8300ms 0.6013ms 1.6631 KOps/s 1.6637 KOps/s $\color{#d91a1a}-0.03\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1383s 6.4829ms 154.2520 Ops/s 117.6903 Ops/s $\textbf{\color{#35bf28}+31.07\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 0.1597s 15.4608ms 64.6797 Ops/s 80.8929 Ops/s $\textbf{\color{#d91a1a}-20.04\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 3.6666ms 1.1401ms 877.0875 Ops/s 972.0960 Ops/s $\textbf{\color{#d91a1a}-9.77\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1230s 6.1727ms 162.0040 Ops/s 174.2111 Ops/s $\textbf{\color{#d91a1a}-7.01\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 15.4546ms 12.5859ms 79.4541 Ops/s 83.0276 Ops/s $\color{#d91a1a}-4.30\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.5154ms 1.0415ms 960.1140 Ops/s 956.8717 Ops/s $\color{#35bf28}+0.34\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1091s 6.0538ms 165.1862 Ops/s 163.0899 Ops/s $\color{#35bf28}+1.29\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 15.4251ms 12.8259ms 77.9673 Ops/s 80.1762 Ops/s $\color{#d91a1a}-2.76\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 4.3216ms 1.2789ms 781.8920 Ops/s 815.5396 Ops/s $\color{#d91a1a}-4.13\%$

Copy link

github-actions bot commented Jun 14, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 94. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1175s 0.1173s 8.5256 Ops/s 8.5332 Ops/s $\color{#d91a1a}-0.09\%$
test_sync 0.1058s 0.1033s 9.6775 Ops/s 10.1744 Ops/s $\color{#d91a1a}-4.88\%$
test_async 0.1988s 0.1003s 9.9675 Ops/s 10.4190 Ops/s $\color{#d91a1a}-4.33\%$
test_single_pixels 0.1284s 0.1278s 7.8234 Ops/s 7.7698 Ops/s $\color{#35bf28}+0.69\%$
test_sync_pixels 89.7415ms 84.0924ms 11.8917 Ops/s 12.3087 Ops/s $\color{#d91a1a}-3.39\%$
test_async_pixels 0.1601s 69.8413ms 14.3182 Ops/s 15.0135 Ops/s $\color{#d91a1a}-4.63\%$
test_simple 0.8794s 0.8180s 1.2225 Ops/s 1.2426 Ops/s $\color{#d91a1a}-1.62\%$
test_transformed 1.1395s 1.0780s 0.9277 Ops/s 0.9309 Ops/s $\color{#d91a1a}-0.35\%$
test_serial 2.5408s 2.4798s 0.4033 Ops/s 0.4022 Ops/s $\color{#35bf28}+0.25\%$
test_parallel 2.4179s 2.3604s 0.4236 Ops/s 0.4234 Ops/s $\color{#35bf28}+0.06\%$
test_step_mdp_speed[True-True-True-True-True] 0.1015ms 34.6777μs 28.8370 KOps/s 29.3974 KOps/s $\color{#d91a1a}-1.91\%$
test_step_mdp_speed[True-True-True-True-False] 44.8620μs 19.8730μs 50.3195 KOps/s 50.5187 KOps/s $\color{#d91a1a}-0.39\%$
test_step_mdp_speed[True-True-True-False-True] 38.7300μs 19.7522μs 50.6274 KOps/s 51.3364 KOps/s $\color{#d91a1a}-1.38\%$
test_step_mdp_speed[True-True-True-False-False] 27.4200μs 11.3384μs 88.1957 KOps/s 88.4101 KOps/s $\color{#d91a1a}-0.24\%$
test_step_mdp_speed[True-True-False-True-True] 51.9310μs 36.5485μs 27.3609 KOps/s 28.1476 KOps/s $\color{#d91a1a}-2.79\%$
test_step_mdp_speed[True-True-False-True-False] 39.7310μs 21.8642μs 45.7369 KOps/s 47.0170 KOps/s $\color{#d91a1a}-2.72\%$
test_step_mdp_speed[True-True-False-False-True] 53.2600μs 21.2967μs 46.9557 KOps/s 47.0611 KOps/s $\color{#d91a1a}-0.22\%$
test_step_mdp_speed[True-True-False-False-False] 38.0010μs 13.2619μs 75.4038 KOps/s 75.8870 KOps/s $\color{#d91a1a}-0.64\%$
test_step_mdp_speed[True-False-True-True-True] 58.0410μs 37.9796μs 26.3299 KOps/s 26.7121 KOps/s $\color{#d91a1a}-1.43\%$
test_step_mdp_speed[True-False-True-True-False] 40.3810μs 23.5614μs 42.4423 KOps/s 42.4101 KOps/s $\color{#35bf28}+0.08\%$
test_step_mdp_speed[True-False-True-False-True] 39.8710μs 21.8132μs 45.8439 KOps/s 47.2785 KOps/s $\color{#d91a1a}-3.03\%$
test_step_mdp_speed[True-False-True-False-False] 31.9600μs 13.2042μs 75.7336 KOps/s 75.2700 KOps/s $\color{#35bf28}+0.62\%$
test_step_mdp_speed[True-False-False-True-True] 67.2010μs 40.0701μs 24.9562 KOps/s 25.4958 KOps/s $\color{#d91a1a}-2.12\%$
test_step_mdp_speed[True-False-False-True-False] 44.9910μs 24.9882μs 40.0190 KOps/s 39.2739 KOps/s $\color{#35bf28}+1.90\%$
test_step_mdp_speed[True-False-False-False-True] 41.2300μs 22.7438μs 43.9680 KOps/s 43.5983 KOps/s $\color{#35bf28}+0.85\%$
test_step_mdp_speed[True-False-False-False-False] 31.1700μs 14.9397μs 66.9358 KOps/s 67.0887 KOps/s $\color{#d91a1a}-0.23\%$
test_step_mdp_speed[False-True-True-True-True] 54.3300μs 37.9407μs 26.3569 KOps/s 26.7114 KOps/s $\color{#d91a1a}-1.33\%$
test_step_mdp_speed[False-True-True-True-False] 41.6620μs 23.8722μs 41.8898 KOps/s 42.6934 KOps/s $\color{#d91a1a}-1.88\%$
test_step_mdp_speed[False-True-True-False-True] 0.1915ms 25.7233μs 38.8753 KOps/s 39.2559 KOps/s $\color{#d91a1a}-0.97\%$
test_step_mdp_speed[False-True-True-False-False] 34.8600μs 15.1624μs 65.9527 KOps/s 67.3926 KOps/s $\color{#d91a1a}-2.14\%$
test_step_mdp_speed[False-True-False-True-True] 59.5110μs 40.1862μs 24.8842 KOps/s 25.4845 KOps/s $\color{#d91a1a}-2.36\%$
test_step_mdp_speed[False-True-False-True-False] 43.4910μs 25.4521μs 39.2894 KOps/s 39.6933 KOps/s $\color{#d91a1a}-1.02\%$
test_step_mdp_speed[False-True-False-False-True] 47.2310μs 27.4193μs 36.4707 KOps/s 36.6633 KOps/s $\color{#d91a1a}-0.53\%$
test_step_mdp_speed[False-True-False-False-False] 40.9400μs 16.9128μs 59.1270 KOps/s 60.0393 KOps/s $\color{#d91a1a}-1.52\%$
test_step_mdp_speed[False-False-True-True-True] 64.3710μs 41.7882μs 23.9302 KOps/s 24.3642 KOps/s $\color{#d91a1a}-1.78\%$
test_step_mdp_speed[False-False-True-True-False] 45.7700μs 27.3276μs 36.5930 KOps/s 36.7907 KOps/s $\color{#d91a1a}-0.54\%$
test_step_mdp_speed[False-False-True-False-True] 47.8410μs 27.6957μs 36.1067 KOps/s 37.5462 KOps/s $\color{#d91a1a}-3.83\%$
test_step_mdp_speed[False-False-True-False-False] 33.8410μs 16.7383μs 59.7431 KOps/s 60.3449 KOps/s $\color{#d91a1a}-1.00\%$
test_step_mdp_speed[False-False-False-True-True] 64.7610μs 44.2091μs 22.6198 KOps/s 22.9041 KOps/s $\color{#d91a1a}-1.24\%$
test_step_mdp_speed[False-False-False-True-False] 50.3410μs 29.3589μs 34.0612 KOps/s 34.0527 KOps/s $\color{#35bf28}+0.02\%$
test_step_mdp_speed[False-False-False-False-True] 50.0500μs 28.6516μs 34.9021 KOps/s 35.3558 KOps/s $\color{#d91a1a}-1.28\%$
test_step_mdp_speed[False-False-False-False-False] 35.6300μs 18.5333μs 53.9570 KOps/s 54.7167 KOps/s $\color{#d91a1a}-1.39\%$
test_values[generalized_advantage_estimate-True-True] 24.0645ms 23.5009ms 42.5516 Ops/s 42.2931 Ops/s $\color{#35bf28}+0.61\%$
test_values[vec_generalized_advantage_estimate-True-True] 89.6213ms 2.6805ms 373.0600 Ops/s 364.9216 Ops/s $\color{#35bf28}+2.23\%$
test_values[td0_return_estimate-False-False] 89.8410μs 64.6989μs 15.4562 KOps/s 15.4430 KOps/s $\color{#35bf28}+0.09\%$
test_values[td1_return_estimate-False-False] 53.5278ms 53.0963ms 18.8337 Ops/s 18.7936 Ops/s $\color{#35bf28}+0.21\%$
test_values[vec_td1_return_estimate-False-False] 1.3287ms 1.0662ms 937.9255 Ops/s 935.3004 Ops/s $\color{#35bf28}+0.28\%$
test_values[td_lambda_return_estimate-True-False] 87.0180ms 84.6711ms 11.8104 Ops/s 11.7714 Ops/s $\color{#35bf28}+0.33\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.4353ms 1.0650ms 938.9514 Ops/s 937.3132 Ops/s $\color{#35bf28}+0.17\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 23.8513ms 23.6582ms 42.2687 Ops/s 41.5246 Ops/s $\color{#35bf28}+1.79\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.9291ms 0.7006ms 1.4272 KOps/s 1.4051 KOps/s $\color{#35bf28}+1.58\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7490ms 0.6543ms 1.5284 KOps/s 1.5082 KOps/s $\color{#35bf28}+1.34\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.4879ms 1.4543ms 687.6153 Ops/s 687.1469 Ops/s $\color{#35bf28}+0.07\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.6928ms 0.6689ms 1.4949 KOps/s 1.4940 KOps/s $\color{#35bf28}+0.06\%$
test_dqn_speed 2.8332ms 1.4510ms 689.1869 Ops/s 696.2505 Ops/s $\color{#d91a1a}-1.01\%$
test_ddpg_speed 3.0951ms 2.9606ms 337.7639 Ops/s 343.2048 Ops/s $\color{#d91a1a}-1.59\%$
test_sac_speed 8.5514ms 8.3353ms 119.9712 Ops/s 118.7998 Ops/s $\color{#35bf28}+0.99\%$
test_redq_speed 0.1024s 11.7741ms 84.9325 Ops/s 93.7374 Ops/s $\textbf{\color{#d91a1a}-9.39\%}$
test_redq_deprec_speed 12.3327ms 11.6910ms 85.5359 Ops/s 87.4485 Ops/s $\color{#d91a1a}-2.19\%$
test_td3_speed 8.5480ms 8.3577ms 119.6506 Ops/s 119.6194 Ops/s $\color{#35bf28}+0.03\%$
test_cql_speed 26.0323ms 25.3865ms 39.3911 Ops/s 39.0816 Ops/s $\color{#35bf28}+0.79\%$
test_a2c_speed 6.9041ms 5.6856ms 175.8842 Ops/s 181.4542 Ops/s $\color{#d91a1a}-3.07\%$
test_ppo_speed 6.2317ms 5.9792ms 167.2457 Ops/s 172.1726 Ops/s $\color{#d91a1a}-2.86\%$
test_reinforce_speed 5.3672ms 4.6173ms 216.5748 Ops/s 216.4199 Ops/s $\color{#35bf28}+0.07\%$
test_iql_speed 20.2436ms 19.5416ms 51.1729 Ops/s 50.8615 Ops/s $\color{#35bf28}+0.61\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 4.8099ms 4.6532ms 214.9078 Ops/s 216.0321 Ops/s $\color{#d91a1a}-0.52\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.1075s 0.6938ms 1.4413 KOps/s 1.6835 KOps/s $\textbf{\color{#d91a1a}-14.38\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7240ms 0.5731ms 1.7449 KOps/s 1.7403 KOps/s $\color{#35bf28}+0.26\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.8178ms 4.6115ms 216.8501 Ops/s 217.6854 Ops/s $\color{#d91a1a}-0.38\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.2738ms 0.5908ms 1.6926 KOps/s 1.7065 KOps/s $\color{#d91a1a}-0.82\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7381ms 0.5680ms 1.7607 KOps/s 1.7712 KOps/s $\color{#d91a1a}-0.59\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 4.4607ms 2.0773ms 481.4035 Ops/s 482.2085 Ops/s $\color{#d91a1a}-0.17\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 2.1255ms 1.9622ms 509.6276 Ops/s 510.7211 Ops/s $\color{#d91a1a}-0.21\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 4.8990ms 4.7772ms 209.3255 Ops/s 212.0965 Ops/s $\color{#d91a1a}-1.31\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8643ms 0.7451ms 1.3422 KOps/s 1.2744 KOps/s $\textbf{\color{#35bf28}+5.32\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 4.4420ms 0.7319ms 1.3664 KOps/s 1.3008 KOps/s $\textbf{\color{#35bf28}+5.04\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 4.7710ms 4.6340ms 215.7942 Ops/s 216.6947 Ops/s $\color{#d91a1a}-0.42\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7285ms 0.5969ms 1.6753 KOps/s 1.6578 KOps/s $\color{#35bf28}+1.06\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7261ms 0.5724ms 1.7471 KOps/s 1.7329 KOps/s $\color{#35bf28}+0.82\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.7748ms 4.5687ms 218.8797 Ops/s 219.1000 Ops/s $\color{#d91a1a}-0.10\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7603ms 0.5936ms 1.6845 KOps/s 1.6813 KOps/s $\color{#35bf28}+0.19\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.1391s 0.7809ms 1.2806 KOps/s 1.7543 KOps/s $\textbf{\color{#d91a1a}-27.00\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 4.9728ms 4.7924ms 208.6652 Ops/s 209.9341 Ops/s $\color{#d91a1a}-0.60\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9044ms 0.7494ms 1.3343 KOps/s 1.3435 KOps/s $\color{#d91a1a}-0.68\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9021ms 0.7263ms 1.3769 KOps/s 1.3921 KOps/s $\color{#d91a1a}-1.09\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1285s 7.4243ms 134.6924 Ops/s 135.5177 Ops/s $\color{#d91a1a}-0.61\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 18.0941ms 15.6439ms 63.9227 Ops/s 63.0259 Ops/s $\color{#35bf28}+1.42\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.3607ms 1.2820ms 780.0296 Ops/s 753.6734 Ops/s $\color{#35bf28}+3.50\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1243s 7.3313ms 136.4023 Ops/s 102.9270 Ops/s $\textbf{\color{#35bf28}+32.52\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 0.1382s 18.2585ms 54.7690 Ops/s 62.3783 Ops/s $\textbf{\color{#d91a1a}-12.20\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 7.5703ms 1.4118ms 708.3106 Ops/s 769.1381 Ops/s $\textbf{\color{#d91a1a}-7.91\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1245s 7.5022ms 133.2945 Ops/s 133.7177 Ops/s $\color{#d91a1a}-0.32\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 18.3551ms 15.8477ms 63.1008 Ops/s 63.2428 Ops/s $\color{#d91a1a}-0.22\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.5474ms 1.4761ms 677.4635 Ops/s 636.8191 Ops/s $\textbf{\color{#35bf28}+6.38\%}$

@vmoens vmoens added the performance Performance issue or suggestion for improvement label Jun 21, 2024
@vmoens vmoens merged commit 443620f into main Jun 28, 2024
53 of 58 checks passed
@vmoens vmoens deleted the consolidate branch June 28, 2024 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. performance Performance issue or suggestion for improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants