Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Split-trajectories and represent as nested tensor #2043

Merged
merged 10 commits into from
Jun 28, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Mar 27, 2024

TODO:

  • Doc

Copy link

pytorch-bot bot commented Mar 27, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2043

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures

As of commit fedf22f with merge base 1083b35 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 27, 2024
Copy link

github-actions bot commented Mar 27, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 91. Improved: $\large\color{#35bf28}1$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1177s 58.9363ms 16.9675 Ops/s 18.3733 Ops/s $\textbf{\color{#d91a1a}-7.65\%}$
test_sync 31.5695ms 30.6515ms 32.6249 Ops/s 29.7085 Ops/s $\textbf{\color{#35bf28}+9.82\%}$
test_async 46.6053ms 29.3146ms 34.1128 Ops/s 35.3836 Ops/s $\color{#d91a1a}-3.59\%$
test_simple 0.3800s 0.3790s 2.6388 Ops/s 2.6668 Ops/s $\color{#d91a1a}-1.05\%$
test_transformed 0.5327s 0.5278s 1.8948 Ops/s 1.8815 Ops/s $\color{#35bf28}+0.70\%$
test_serial 1.3411s 1.2790s 0.7818 Ops/s 0.7976 Ops/s $\color{#d91a1a}-1.97\%$
test_parallel 1.1584s 1.0889s 0.9184 Ops/s 0.9335 Ops/s $\color{#d91a1a}-1.62\%$
test_step_mdp_speed[True-True-True-True-True] 64.6610μs 22.6804μs 44.0908 KOps/s 44.3902 KOps/s $\color{#d91a1a}-0.67\%$
test_step_mdp_speed[True-True-True-True-False] 42.6700μs 13.2889μs 75.2506 KOps/s 74.4898 KOps/s $\color{#35bf28}+1.02\%$
test_step_mdp_speed[True-True-True-False-True] 38.0410μs 13.3258μs 75.0424 KOps/s 75.6954 KOps/s $\color{#d91a1a}-0.86\%$
test_step_mdp_speed[True-True-True-False-False] 29.0850μs 7.8911μs 126.7251 KOps/s 128.1920 KOps/s $\color{#d91a1a}-1.14\%$
test_step_mdp_speed[True-True-False-True-True] 71.1240μs 24.2011μs 41.3205 KOps/s 41.4742 KOps/s $\color{#d91a1a}-0.37\%$
test_step_mdp_speed[True-True-False-True-False] 41.1270μs 14.5619μs 68.6722 KOps/s 67.8641 KOps/s $\color{#35bf28}+1.19\%$
test_step_mdp_speed[True-True-False-False-True] 36.6890μs 14.5813μs 68.5809 KOps/s 68.1302 KOps/s $\color{#35bf28}+0.66\%$
test_step_mdp_speed[True-True-False-False-False] 56.2950μs 9.1432μs 109.3713 KOps/s 110.0159 KOps/s $\color{#d91a1a}-0.59\%$
test_step_mdp_speed[True-False-True-True-True] 67.1760μs 25.3777μs 39.4047 KOps/s 39.5498 KOps/s $\color{#d91a1a}-0.37\%$
test_step_mdp_speed[True-False-True-True-False] 46.3470μs 15.8971μs 62.9047 KOps/s 62.0207 KOps/s $\color{#35bf28}+1.43\%$
test_step_mdp_speed[True-False-True-False-True] 42.2480μs 14.7273μs 67.9010 KOps/s 69.1462 KOps/s $\color{#d91a1a}-1.80\%$
test_step_mdp_speed[True-False-True-False-False] 58.4060μs 9.1567μs 109.2101 KOps/s 110.1174 KOps/s $\color{#d91a1a}-0.82\%$
test_step_mdp_speed[True-False-False-True-True] 59.3020μs 26.6697μs 37.4958 KOps/s 37.5527 KOps/s $\color{#d91a1a}-0.15\%$
test_step_mdp_speed[True-False-False-True-False] 55.4640μs 17.1608μs 58.2725 KOps/s 57.4144 KOps/s $\color{#35bf28}+1.49\%$
test_step_mdp_speed[True-False-False-False-True] 48.0500μs 15.7957μs 63.3083 KOps/s 63.9247 KOps/s $\color{#d91a1a}-0.96\%$
test_step_mdp_speed[True-False-False-False-False] 60.1020μs 10.3215μs 96.8853 KOps/s 97.4295 KOps/s $\color{#d91a1a}-0.56\%$
test_step_mdp_speed[False-True-True-True-True] 61.6550μs 25.6210μs 39.0304 KOps/s 39.3234 KOps/s $\color{#d91a1a}-0.75\%$
test_step_mdp_speed[False-True-True-True-False] 62.4970μs 15.9503μs 62.6946 KOps/s 61.5153 KOps/s $\color{#35bf28}+1.92\%$
test_step_mdp_speed[False-True-True-False-True] 56.5460μs 17.0852μs 58.5303 KOps/s 59.5770 KOps/s $\color{#d91a1a}-1.76\%$
test_step_mdp_speed[False-True-True-False-False] 44.2630μs 10.3670μs 96.4600 KOps/s 97.4146 KOps/s $\color{#d91a1a}-0.98\%$
test_step_mdp_speed[False-True-False-True-True] 88.5060μs 26.7586μs 37.3711 KOps/s 37.4992 KOps/s $\color{#d91a1a}-0.34\%$
test_step_mdp_speed[False-True-False-True-False] 51.0450μs 17.1904μs 58.1722 KOps/s 58.2807 KOps/s $\color{#d91a1a}-0.19\%$
test_step_mdp_speed[False-True-False-False-True] 0.1257ms 18.0186μs 55.4981 KOps/s 55.1389 KOps/s $\color{#35bf28}+0.65\%$
test_step_mdp_speed[False-True-False-False-False] 38.0410μs 11.6208μs 86.0524 KOps/s 87.2033 KOps/s $\color{#d91a1a}-1.32\%$
test_step_mdp_speed[False-False-True-True-True] 60.1720μs 28.2893μs 35.3491 KOps/s 35.9096 KOps/s $\color{#d91a1a}-1.56\%$
test_step_mdp_speed[False-False-True-True-False] 80.4410μs 18.7299μs 53.3906 KOps/s 53.5972 KOps/s $\color{#d91a1a}-0.39\%$
test_step_mdp_speed[False-False-True-False-True] 53.1300μs 18.1974μs 54.9528 KOps/s 54.8551 KOps/s $\color{#35bf28}+0.18\%$
test_step_mdp_speed[False-False-True-False-False] 74.7900μs 11.5849μs 86.3192 KOps/s 86.3038 KOps/s $\color{#35bf28}+0.02\%$
test_step_mdp_speed[False-False-False-True-True] 40.3260μs 29.5916μs 33.7934 KOps/s 33.8547 KOps/s $\color{#d91a1a}-0.18\%$
test_step_mdp_speed[False-False-False-True-False] 76.9240μs 19.5963μs 51.0301 KOps/s 50.9637 KOps/s $\color{#35bf28}+0.13\%$
test_step_mdp_speed[False-False-False-False-True] 50.2430μs 19.1918μs 52.1056 KOps/s 52.2626 KOps/s $\color{#d91a1a}-0.30\%$
test_step_mdp_speed[False-False-False-False-False] 70.1720μs 12.5586μs 79.6268 KOps/s 79.0255 KOps/s $\color{#35bf28}+0.76\%$
test_values[generalized_advantage_estimate-True-True] 10.1794ms 9.8567ms 101.4536 Ops/s 106.9063 Ops/s $\textbf{\color{#d91a1a}-5.10\%}$
test_values[vec_generalized_advantage_estimate-True-True] 37.9128ms 35.2077ms 28.4029 Ops/s 28.2846 Ops/s $\color{#35bf28}+0.42\%$
test_values[td0_return_estimate-False-False] 0.2802ms 0.1759ms 5.6851 KOps/s 5.9580 KOps/s $\color{#d91a1a}-4.58\%$
test_values[td1_return_estimate-False-False] 27.3622ms 24.4041ms 40.9768 Ops/s 42.8545 Ops/s $\color{#d91a1a}-4.38\%$
test_values[vec_td1_return_estimate-False-False] 36.8570ms 35.2215ms 28.3917 Ops/s 28.4969 Ops/s $\color{#d91a1a}-0.37\%$
test_values[td_lambda_return_estimate-True-False] 35.2837ms 34.3686ms 29.0963 Ops/s 29.6989 Ops/s $\color{#d91a1a}-2.03\%$
test_values[vec_td_lambda_return_estimate-True-False] 36.3559ms 35.2005ms 28.4087 Ops/s 28.4779 Ops/s $\color{#d91a1a}-0.24\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 11.3994ms 8.5052ms 117.5755 Ops/s 120.6351 Ops/s $\color{#d91a1a}-2.54\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.2766ms 1.9438ms 514.4468 Ops/s 548.2940 Ops/s $\textbf{\color{#d91a1a}-6.17\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5433ms 0.3575ms 2.7972 KOps/s 2.8465 KOps/s $\color{#d91a1a}-1.73\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 49.1113ms 46.7246ms 21.4020 Ops/s 21.5345 Ops/s $\color{#d91a1a}-0.62\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 3.6236ms 3.0659ms 326.1632 Ops/s 329.1358 Ops/s $\color{#d91a1a}-0.90\%$
test_dqn_speed 6.7809ms 1.3468ms 742.5062 Ops/s 760.2178 Ops/s $\color{#d91a1a}-2.33\%$
test_ddpg_speed 3.7809ms 2.8642ms 349.1335 Ops/s 357.2617 Ops/s $\color{#d91a1a}-2.28\%$
test_sac_speed 9.6893ms 8.7114ms 114.7921 Ops/s 115.2295 Ops/s $\color{#d91a1a}-0.38\%$
test_redq_speed 16.0194ms 14.0789ms 71.0282 Ops/s 72.7380 Ops/s $\color{#d91a1a}-2.35\%$
test_redq_deprec_speed 0.1098s 15.8387ms 63.1363 Ops/s 74.2895 Ops/s $\textbf{\color{#d91a1a}-15.01\%}$
test_td3_speed 9.3109ms 8.5299ms 117.2347 Ops/s 118.7561 Ops/s $\color{#d91a1a}-1.28\%$
test_cql_speed 37.9490ms 37.0601ms 26.9832 Ops/s 27.2012 Ops/s $\color{#d91a1a}-0.80\%$
test_a2c_speed 8.9949ms 7.5480ms 132.4858 Ops/s 134.0009 Ops/s $\color{#d91a1a}-1.13\%$
test_ppo_speed 9.0382ms 7.7944ms 128.2966 Ops/s 128.0878 Ops/s $\color{#35bf28}+0.16\%$
test_reinforce_speed 7.5464ms 6.7332ms 148.5175 Ops/s 148.8444 Ops/s $\color{#d91a1a}-0.22\%$
test_iql_speed 33.9415ms 32.9847ms 30.3171 Ops/s 30.6347 Ops/s $\color{#d91a1a}-1.04\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.8827ms 3.6417ms 274.5975 Ops/s 280.5338 Ops/s $\color{#d91a1a}-2.12\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8811ms 0.5118ms 1.9540 KOps/s 2.0330 KOps/s $\color{#d91a1a}-3.89\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.1158s 0.5446ms 1.8361 KOps/s 2.1393 KOps/s $\textbf{\color{#d91a1a}-14.17\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.0428ms 3.6020ms 277.6263 Ops/s 284.9831 Ops/s $\color{#d91a1a}-2.58\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.9856ms 0.4965ms 2.0141 KOps/s 2.0469 KOps/s $\color{#d91a1a}-1.60\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7152ms 0.4691ms 2.1317 KOps/s 2.1311 KOps/s $\color{#35bf28}+0.03\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.9284ms 1.7411ms 574.3502 Ops/s 585.5708 Ops/s $\color{#d91a1a}-1.92\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 2.4610ms 1.6527ms 605.0795 Ops/s 618.3040 Ops/s $\color{#d91a1a}-2.14\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 4.1398ms 3.7605ms 265.9232 Ops/s 266.2985 Ops/s $\color{#d91a1a}-0.14\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.1373ms 0.6414ms 1.5592 KOps/s 1.6012 KOps/s $\color{#d91a1a}-2.62\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8847ms 0.6115ms 1.6354 KOps/s 1.6477 KOps/s $\color{#d91a1a}-0.74\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.9224ms 3.6566ms 273.4809 Ops/s 273.2645 Ops/s $\color{#35bf28}+0.08\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0165ms 0.5034ms 1.9863 KOps/s 2.0044 KOps/s $\color{#d91a1a}-0.90\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7725ms 0.4840ms 2.0660 KOps/s 2.1101 KOps/s $\color{#d91a1a}-2.09\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.6145ms 3.7768ms 264.7765 Ops/s 281.7038 Ops/s $\textbf{\color{#d91a1a}-6.01\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.5905ms 0.4976ms 2.0095 KOps/s 2.0354 KOps/s $\color{#d91a1a}-1.27\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3.8632ms 0.4846ms 2.0636 KOps/s 2.1277 KOps/s $\color{#d91a1a}-3.01\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 4.0916ms 3.8134ms 262.2308 Ops/s 269.1923 Ops/s $\color{#d91a1a}-2.59\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.1724ms 0.6419ms 1.5579 KOps/s 1.5725 KOps/s $\color{#d91a1a}-0.93\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8348ms 0.6122ms 1.6335 KOps/s 1.5979 KOps/s $\color{#35bf28}+2.23\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1248s 8.3273ms 120.0869 Ops/s 115.9236 Ops/s $\color{#35bf28}+3.59\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 15.6027ms 12.7353ms 78.5221 Ops/s 79.6281 Ops/s $\color{#d91a1a}-1.39\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 4.4455ms 1.2317ms 811.9131 Ops/s 961.6608 Ops/s $\textbf{\color{#d91a1a}-15.57\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1044s 5.7754ms 173.1487 Ops/s 167.9057 Ops/s $\color{#35bf28}+3.12\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 15.1134ms 12.7796ms 78.2498 Ops/s 80.4645 Ops/s $\color{#d91a1a}-2.75\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.7512ms 1.0750ms 930.2172 Ops/s 901.3093 Ops/s $\color{#35bf28}+3.21\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1130s 6.1175ms 163.4648 Ops/s 167.5368 Ops/s $\color{#d91a1a}-2.43\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 15.3955ms 12.8828ms 77.6227 Ops/s 79.1766 Ops/s $\color{#d91a1a}-1.96\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.7104ms 1.2235ms 817.3240 Ops/s 837.3199 Ops/s $\color{#d91a1a}-2.39\%$

Copy link

github-actions bot commented Mar 27, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 94. Improved: $\large\color{#35bf28}1$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1176s 0.1173s 8.5286 Ops/s 8.6411 Ops/s $\color{#d91a1a}-1.30\%$
test_sync 0.1044s 0.1020s 9.7994 Ops/s 9.6976 Ops/s $\color{#35bf28}+1.05\%$
test_async 0.1865s 93.3000ms 10.7181 Ops/s 10.1942 Ops/s $\textbf{\color{#35bf28}+5.14\%}$
test_single_pixels 0.1286s 0.1284s 7.7877 Ops/s 7.8551 Ops/s $\color{#d91a1a}-0.86\%$
test_sync_pixels 85.2218ms 81.0271ms 12.3416 Ops/s 12.2743 Ops/s $\color{#35bf28}+0.55\%$
test_async_pixels 0.1592s 68.3696ms 14.6264 Ops/s 14.1283 Ops/s $\color{#35bf28}+3.53\%$
test_simple 0.8864s 0.8225s 1.2157 Ops/s 1.2485 Ops/s $\color{#d91a1a}-2.63\%$
test_transformed 1.1363s 1.0760s 0.9293 Ops/s 0.9378 Ops/s $\color{#d91a1a}-0.91\%$
test_serial 2.5594s 2.5035s 0.3994 Ops/s 0.4053 Ops/s $\color{#d91a1a}-1.44\%$
test_parallel 2.4174s 2.3687s 0.4222 Ops/s 0.4213 Ops/s $\color{#35bf28}+0.21\%$
test_step_mdp_speed[True-True-True-True-True] 81.4730μs 34.9127μs 28.6429 KOps/s 29.6697 KOps/s $\color{#d91a1a}-3.46\%$
test_step_mdp_speed[True-True-True-True-False] 43.5710μs 19.9456μs 50.1363 KOps/s 50.5212 KOps/s $\color{#d91a1a}-0.76\%$
test_step_mdp_speed[True-True-True-False-True] 44.0710μs 19.7299μs 50.6845 KOps/s 51.7963 KOps/s $\color{#d91a1a}-2.15\%$
test_step_mdp_speed[True-True-True-False-False] 43.6920μs 11.3212μs 88.3297 KOps/s 89.1527 KOps/s $\color{#d91a1a}-0.92\%$
test_step_mdp_speed[True-True-False-True-True] 81.4140μs 36.4597μs 27.4275 KOps/s 28.2447 KOps/s $\color{#d91a1a}-2.89\%$
test_step_mdp_speed[True-True-False-True-False] 42.3420μs 22.0446μs 45.3625 KOps/s 46.0232 KOps/s $\color{#d91a1a}-1.44\%$
test_step_mdp_speed[True-True-False-False-True] 45.5810μs 21.5452μs 46.4140 KOps/s 47.2947 KOps/s $\color{#d91a1a}-1.86\%$
test_step_mdp_speed[True-True-False-False-False] 32.3020μs 13.1193μs 76.2234 KOps/s 77.3478 KOps/s $\color{#d91a1a}-1.45\%$
test_step_mdp_speed[True-False-True-True-True] 65.2320μs 38.4936μs 25.9783 KOps/s 26.6012 KOps/s $\color{#d91a1a}-2.34\%$
test_step_mdp_speed[True-False-True-True-False] 44.5720μs 23.6613μs 42.2632 KOps/s 42.5413 KOps/s $\color{#d91a1a}-0.65\%$
test_step_mdp_speed[True-False-True-False-True] 87.7330μs 21.5618μs 46.3784 KOps/s 47.4728 KOps/s $\color{#d91a1a}-2.31\%$
test_step_mdp_speed[True-False-True-False-False] 36.2210μs 13.1605μs 75.9850 KOps/s 76.6381 KOps/s $\color{#d91a1a}-0.85\%$
test_step_mdp_speed[True-False-False-True-True] 64.2230μs 39.7497μs 25.1574 KOps/s 25.3262 KOps/s $\color{#d91a1a}-0.67\%$
test_step_mdp_speed[True-False-False-True-False] 58.1620μs 25.3218μs 39.4917 KOps/s 40.2167 KOps/s $\color{#d91a1a}-1.80\%$
test_step_mdp_speed[True-False-False-False-True] 44.9020μs 23.0887μs 43.3113 KOps/s 44.5463 KOps/s $\color{#d91a1a}-2.77\%$
test_step_mdp_speed[True-False-False-False-False] 41.0310μs 15.0042μs 66.6480 KOps/s 67.2305 KOps/s $\color{#d91a1a}-0.87\%$
test_step_mdp_speed[False-True-True-True-True] 0.1196ms 38.3130μs 26.1008 KOps/s 27.1732 KOps/s $\color{#d91a1a}-3.95\%$
test_step_mdp_speed[False-True-True-True-False] 46.3920μs 23.8450μs 41.9376 KOps/s 42.8070 KOps/s $\color{#d91a1a}-2.03\%$
test_step_mdp_speed[False-True-True-False-True] 49.3720μs 25.6041μs 39.0562 KOps/s 40.4646 KOps/s $\color{#d91a1a}-3.48\%$
test_step_mdp_speed[False-True-True-False-False] 41.4420μs 14.9649μs 66.8228 KOps/s 67.9087 KOps/s $\color{#d91a1a}-1.60\%$
test_step_mdp_speed[False-True-False-True-True] 66.0020μs 39.8486μs 25.0950 KOps/s 25.6664 KOps/s $\color{#d91a1a}-2.23\%$
test_step_mdp_speed[False-True-False-True-False] 52.5420μs 25.3191μs 39.4958 KOps/s 39.8862 KOps/s $\color{#d91a1a}-0.98\%$
test_step_mdp_speed[False-True-False-False-True] 51.1220μs 27.4297μs 36.4568 KOps/s 37.0942 KOps/s $\color{#d91a1a}-1.72\%$
test_step_mdp_speed[False-True-False-False-False] 37.6920μs 16.7233μs 59.7967 KOps/s 60.5128 KOps/s $\color{#d91a1a}-1.18\%$
test_step_mdp_speed[False-False-True-True-True] 70.3820μs 42.2538μs 23.6665 KOps/s 24.2283 KOps/s $\color{#d91a1a}-2.32\%$
test_step_mdp_speed[False-False-True-True-False] 47.2110μs 27.1649μs 36.8122 KOps/s 37.0314 KOps/s $\color{#d91a1a}-0.59\%$
test_step_mdp_speed[False-False-True-False-True] 54.1920μs 27.4027μs 36.4928 KOps/s 37.3710 KOps/s $\color{#d91a1a}-2.35\%$
test_step_mdp_speed[False-False-True-False-False] 36.9320μs 16.6156μs 60.1844 KOps/s 60.6175 KOps/s $\color{#d91a1a}-0.71\%$
test_step_mdp_speed[False-False-False-True-True] 75.9230μs 44.2131μs 22.6177 KOps/s 22.6856 KOps/s $\color{#d91a1a}-0.30\%$
test_step_mdp_speed[False-False-False-True-False] 53.8920μs 28.9919μs 34.4924 KOps/s 34.2442 KOps/s $\color{#35bf28}+0.72\%$
test_step_mdp_speed[False-False-False-False-True] 52.9430μs 28.8510μs 34.6608 KOps/s 35.2580 KOps/s $\color{#d91a1a}-1.69\%$
test_step_mdp_speed[False-False-False-False-False] 38.7810μs 18.6746μs 53.5487 KOps/s 54.9798 KOps/s $\color{#d91a1a}-2.60\%$
test_values[generalized_advantage_estimate-True-True] 25.2021ms 24.8311ms 40.2720 Ops/s 39.7264 Ops/s $\color{#35bf28}+1.37\%$
test_values[vec_generalized_advantage_estimate-True-True] 88.3040ms 2.6594ms 376.0255 Ops/s 366.2518 Ops/s $\color{#35bf28}+2.67\%$
test_values[td0_return_estimate-False-False] 98.8340μs 67.4336μs 14.8294 KOps/s 14.8048 KOps/s $\color{#35bf28}+0.17\%$
test_values[td1_return_estimate-False-False] 56.6738ms 55.5031ms 18.0170 Ops/s 17.7036 Ops/s $\color{#35bf28}+1.77\%$
test_values[vec_td1_return_estimate-False-False] 1.4084ms 1.0807ms 925.3255 Ops/s 921.6397 Ops/s $\color{#35bf28}+0.40\%$
test_values[td_lambda_return_estimate-True-False] 89.3289ms 87.6255ms 11.4122 Ops/s 11.2256 Ops/s $\color{#35bf28}+1.66\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.4349ms 1.0764ms 929.0151 Ops/s 923.6470 Ops/s $\color{#35bf28}+0.58\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 25.0212ms 24.8030ms 40.3177 Ops/s 39.6899 Ops/s $\color{#35bf28}+1.58\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.9848ms 0.7359ms 1.3590 KOps/s 1.3764 KOps/s $\color{#d91a1a}-1.27\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.8133ms 0.6676ms 1.4978 KOps/s 1.4892 KOps/s $\color{#35bf28}+0.58\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.6255ms 1.4681ms 681.1580 Ops/s 678.9041 Ops/s $\color{#35bf28}+0.33\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.8253ms 0.6812ms 1.4680 KOps/s 1.4567 KOps/s $\color{#35bf28}+0.77\%$
test_dqn_speed 1.7603ms 1.4728ms 678.9696 Ops/s 685.7335 Ops/s $\color{#d91a1a}-0.99\%$
test_ddpg_speed 3.3497ms 3.0052ms 332.7536 Ops/s 336.4349 Ops/s $\color{#d91a1a}-1.09\%$
test_sac_speed 9.1082ms 8.5960ms 116.3328 Ops/s 117.6222 Ops/s $\color{#d91a1a}-1.10\%$
test_redq_speed 0.1069s 12.1196ms 82.5108 Ops/s 92.0605 Ops/s $\textbf{\color{#d91a1a}-10.37\%}$
test_redq_deprec_speed 12.4419ms 11.7893ms 84.8226 Ops/s 84.4612 Ops/s $\color{#35bf28}+0.43\%$
test_td3_speed 8.7309ms 8.5920ms 116.3880 Ops/s 117.1166 Ops/s $\color{#d91a1a}-0.62\%$
test_cql_speed 27.0314ms 26.1488ms 38.2426 Ops/s 38.4593 Ops/s $\color{#d91a1a}-0.56\%$
test_a2c_speed 6.0498ms 5.8214ms 171.7797 Ops/s 172.9682 Ops/s $\color{#d91a1a}-0.69\%$
test_ppo_speed 6.4352ms 6.1271ms 163.2092 Ops/s 163.7845 Ops/s $\color{#d91a1a}-0.35\%$
test_reinforce_speed 5.5939ms 4.7160ms 212.0442 Ops/s 207.6656 Ops/s $\color{#35bf28}+2.11\%$
test_iql_speed 20.5470ms 19.9865ms 50.0337 Ops/s 49.6300 Ops/s $\color{#35bf28}+0.81\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 4.8105ms 4.6420ms 215.4254 Ops/s 213.5458 Ops/s $\color{#35bf28}+0.88\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.1099s 0.6996ms 1.4294 KOps/s 1.6726 KOps/s $\textbf{\color{#d91a1a}-14.54\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7821ms 0.5864ms 1.7053 KOps/s 1.7402 KOps/s $\color{#d91a1a}-2.00\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.9853ms 4.6096ms 216.9374 Ops/s 217.0073 Ops/s $\color{#d91a1a}-0.03\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.2792ms 0.5996ms 1.6677 KOps/s 1.6956 KOps/s $\color{#d91a1a}-1.64\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7616ms 0.5737ms 1.7431 KOps/s 1.7679 KOps/s $\color{#d91a1a}-1.41\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 2.7313ms 2.1553ms 463.9803 Ops/s 466.5706 Ops/s $\color{#d91a1a}-0.56\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 2.2744ms 2.0619ms 484.9785 Ops/s 486.2744 Ops/s $\color{#d91a1a}-0.27\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.5436ms 4.8277ms 207.1364 Ops/s 209.4074 Ops/s $\color{#d91a1a}-1.08\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9844ms 0.7568ms 1.3214 KOps/s 1.3367 KOps/s $\color{#d91a1a}-1.15\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 4.6345ms 0.7344ms 1.3617 KOps/s 1.3771 KOps/s $\color{#d91a1a}-1.12\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 4.8444ms 4.6798ms 213.6837 Ops/s 215.0638 Ops/s $\color{#d91a1a}-0.64\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7430ms 0.6108ms 1.6372 KOps/s 1.6706 KOps/s $\color{#d91a1a}-2.00\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7716ms 0.5902ms 1.6943 KOps/s 1.7472 KOps/s $\color{#d91a1a}-3.03\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.9094ms 4.6319ms 215.8931 Ops/s 218.3186 Ops/s $\color{#d91a1a}-1.11\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7928ms 0.6032ms 1.6579 KOps/s 1.6991 KOps/s $\color{#d91a1a}-2.42\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.1460s 0.8157ms 1.2260 KOps/s 1.7468 KOps/s $\textbf{\color{#d91a1a}-29.81\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.0297ms 4.8186ms 207.5310 Ops/s 209.6702 Ops/s $\color{#d91a1a}-1.02\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9552ms 0.7624ms 1.3116 KOps/s 1.3381 KOps/s $\color{#d91a1a}-1.98\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9191ms 0.7403ms 1.3509 KOps/s 1.3751 KOps/s $\color{#d91a1a}-1.76\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1324s 7.5654ms 132.1816 Ops/s 132.8713 Ops/s $\color{#d91a1a}-0.52\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 18.6826ms 16.1127ms 62.0629 Ops/s 63.0198 Ops/s $\color{#d91a1a}-1.52\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.4540ms 1.3174ms 759.0558 Ops/s 777.1983 Ops/s $\color{#d91a1a}-2.33\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1296s 7.5428ms 132.5770 Ops/s 133.4412 Ops/s $\color{#d91a1a}-0.65\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 0.1442s 18.6245ms 53.6927 Ops/s 54.5586 Ops/s $\color{#d91a1a}-1.59\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 7.6811ms 1.4398ms 694.5408 Ops/s 705.5199 Ops/s $\color{#d91a1a}-1.56\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1318s 7.7436ms 129.1395 Ops/s 130.9102 Ops/s $\color{#d91a1a}-1.35\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 18.5651ms 16.3021ms 61.3418 Ops/s 62.8656 Ops/s $\color{#d91a1a}-2.42\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.5999ms 1.5263ms 655.1975 Ops/s 672.5030 Ops/s $\color{#d91a1a}-2.57\%$

@vmoens vmoens added the enhancement New feature or request label Mar 27, 2024
@vmoens vmoens marked this pull request as draft March 27, 2024 15:36
Comment on lines 62 to 68
as_nested (bool, optional): whether to return the results as nested
tensors. Defaults to ``False``.\

.. note:: Using ``split_trajectories(tensordict, as_nested=True).to_padded_tensor(mask=mask_key)``
should result in the exact same result as ``as_nested=False``. Since this is an experimental
feature and relies on nested_tensors, which API may change in the future, we made this
an optional feature. The runtime should be faster with ``as_nested=True``.
Copy link
Contributor Author

@vmoens vmoens Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vmoens vmoens marked this pull request as ready for review March 28, 2024 13:26
@vmoens
Copy link
Contributor Author

vmoens commented Jun 28, 2024

cc @jbschlosser
interestingly, padding a nested tensor seems to be faster than padding a bunch of non-contiguous tensors!

@vmoens vmoens merged commit a563c5e into main Jun 28, 2024
42 of 47 checks passed
@vmoens vmoens deleted the nested-tensor-splittraj branch June 28, 2024 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants