Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

为什么同样是4个GPU我的训练时候的FPS很低呢,基本都在2000左右 #49

Open
mfxiaosheng opened this issue Jul 20, 2022 · 7 comments

Comments

@mfxiaosheng
Copy link

[INFO:1052 dmc:233 2022-07-20 17:39:38,765] After 1632000 (L:556800 U:528000 D:547200) frames: @ 1918.7 fps (avg@ 2318.1 fps) (L:0.0 U:0.0 D:1918.7) Stats:
{'loss_landlord': 1.9155352115631104,
'loss_landlord_down': 2.5349276065826416,
'loss_landlord_up': 2.1095376014709473,
'mean_episode_return_landlord': 0.08421196788549423,
'mean_episode_return_landlord_down': -0.08074238896369934,
'mean_episode_return_landlord_up': -0.06534682214260101}
[INFO:1052 dmc:233 2022-07-20 17:39:43,769] After 1648000 (L:563200 U:537600 D:547200) frames: @ 3197.8 fps (avg@ 2398.1 fps) (L:1279.1 U:1918.7 D:0.0) Stats:
{'loss_landlord': 2.3213179111480713,
'loss_landlord_down': 2.5349276065826416,
'loss_landlord_up': 2.6052844524383545,
'mean_episode_return_landlord': 0.09171878546476364,
'mean_episode_return_landlord_down': -0.08074238896369934,
'mean_episode_return_landlord_up': -0.08009536564350128}
[INFO:1052 dmc:233 2022-07-20 17:39:48,773] After 1654400 (L:569600 U:537600 D:547200) frames: @ 1279.1 fps (avg@ 2398.1 fps) (L:1279.1 U:0.0 D:0.0) Stats:
{'loss_landlord': 2.185067892074585,
'loss_landlord_down': 2.5349276065826416,
'loss_landlord_up': 2.6052844524383545,
'mean_episode_return_landlord': 0.09759927541017532,
'mean_episode_return_landlord_down': -0.08074238896369934,
'mean_episode_return_landlord_up': -0.08009536564350128}
[INFO:1052 dmc:233 2022-07-20 17:39:53,779] After 1673600 (L:576000 U:540800 D:556800) frames: @ 3836.1 fps (avg@ 2344.8 fps) (L:1278.7 U:639.4 D:1918.1) Stats:
{'loss_landlord': 1.77787184715271,
'loss_landlord_down': 2.7444241046905518,
'loss_landlord_up': 2.508575677871704,
'mean_episode_return_landlord': 0.10005713254213333,
'mean_episode_return_landlord_down': -0.09260766953229904,
'mean_episode_return_landlord_up': -0.08521360903978348}
[INFO:1052 dmc:233 2022-07-20 17:39:58,781] After 1680000 (L:576000 U:547200 D:556800) frames: @ 1279.5 fps (avg@ 2398.1 fps) (L:0.0 U:1279.5 D:0.0) Stats:
{'loss_landlord': 1.77787184715271,
'loss_landlord_down': 2.7444241046905518,
'loss_landlord_up': 2.264894723892212,
'mean_episode_return_landlord': 0.10005713254213333,
'mean_episode_return_landlord_down': -0.09260766953229904,
'mean_episode_return_landlord_up': -0.08965221047401428}

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-SXM... On | 00000000:0A:00.0 Off | 0 |
| N/A 31C P0 96W / 400W | 66690MiB / 81251MiB | 99% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-SXM... On | 00000000:45:00.0 Off | 0 |
| N/A 32C P0 95W / 400W | 66704MiB / 81251MiB | 98% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA A100-SXM... On | 00000000:4B:00.0 Off | 0 |
| N/A 34C P0 95W / 400W | 66700MiB / 81251MiB | 98% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA A100-SXM... On | 00000000:84:00.0 Off | 0 |
| N/A 39C P0 66W / 400W | 2653MiB / 81251MiB | 2% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+

fps一直很低,偶尔会出现FPS0的情况 偶尔也会跳到5000.请问这是正常训练的速度吗

@1978mountain
Copy link

你多卡训练的时候,遇到没有一个问题。生成的act进程,都会在0卡上占用一个相同的内存,导致启动了几个actor后,就会导致0卡显存不足,报cuda错误。

@zgz682000
Copy link

请问为什么我再阿里云上租的a100 fps只有600多,你用的命令参数是什么,可以分享一下吗

@daochenzha
Copy link
Collaborator

@zgz682000 有试过其它型号GPU嘛?

@zgz682000
Copy link

@zgz682000 有试过其它型号GPU嘛?

是的,我自己的pc显卡是1060,fps都有1000以上。

@daochenzha
Copy link
Collaborator

@zgz682000 这个我也不知道为什么,可以换换别的显卡试试

@Cyclones-Y
Copy link

@mfxiaosheng 您好,我遇到了跟您一样的问题,您在后续有解决吗?训练的速度后续还有提升过吗?

@aishxi
Copy link

aishxi commented Nov 29, 2023

我也遇到了同样的问题,请问有没有大佬支援

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants