[Bug]: DAAC trained on MultiBinary envs but returns floats when doing inference? #21

Disastorm · 2023-08-02T10:17:16Z

🐛 Bug

Note: this is on the current pip version, havn't tried the git repo version.
Also note I have a newer version of gymnasium and matplotlib than this repo specifies:

rllte-core 0.0.1b3 has requirement gymnasium[accept-rom-license]==0.28.1, but you have gymnasium 0.29.0.
rllte-core 0.0.1b3 has requirement matplotlib==3.6.0, but you have matplotlib 3.7.1.

I'll start out by explaining the shapes in the training env.

...
print("SHAPE1 " + repr(env.action_space))
envs = [make_env(env_id, seed + i) for i in range(num_envs)]
envs = AsyncVectorEnv(envs)
envs = RecordEpisodeStatistics(envs)
envs = TransformReward(envs, lambda reward: np.sign(reward))

print("SHAPE2 " + repr(envs.action_space))
return TorchVecEnvWrapper(envs, device)

the printout of the above is

SHAPE1 MultiBinary(12)
SHAPE2 Box(0, 1, (8, 12), int8)

The above code snippet is the return value of a function called make_retro_env that I created.

After training using

env = make_retro_env("SuperHangOn-Genesis", "junior-map", num_envs=8, distributed=False)
eval_env = make_retro_env("SuperHangOn-Genesis", "junior-map", num_envs=1, distributed=False)
print("SHAPE3 " + repr(env.action_space))

model = DAAC(env=env, 
              eval_env=eval_env, 
              device='cuda',
              )
model.train(num_train_steps=1000000)

Note this prints out "SHAPE3 MultiBinary(12)"

When I load the .pth that was automatically saved via the training, using

agent = th.load("./logs/default/2023-08-02-12-49-12/model/agent.pth", map_location=th.device('cuda'))
action = agent(obs)
print("action " + repr(action))

The tensors look like this:

action tensor([[ 9.9971e-02, -2.7629e-01,  4.2010e-03,  3.1142e-02, -1.2863e-01,
          3.5272e-04,  1.9941e-01,  2.8625e-01,  3.2863e-01, -6.0946e-01,
         -1.7830e-01,  1.3129e-01]], device='cuda:0')

I'm not sure if I did something wrong, or if perhaps this bug is fixed in the current repo or related to my library versions. If that is the case, let me know.
Thanks.

To Reproduce

No response

Relevant log output / Error message

No response

System Info

({'OS': 'Windows-10-10.0.19045-SP0 10.0.19045', 'Python': '3.8.16', 'Stable-Baselines3': '2.0.0', 'PyTorch': '2.0.0', 'GPU Enabled': 'True', 'Numpy': '1.23.5', 'Cloudpickle': '2.2.1', 'Gymnasium': '0.29.0', 'OpenAI Gym': '0.26.2'}, '- OS: Windows-10-10.0.19045-SP0 10.0.19045\n- Python: 3.8.16\n- Stable-Baselines3: 2.0.0\n- PyTorch: 2.0.0\n- GPU Enabled: True\n- Numpy: 1.23.5\n- Cloudpickle: 2.2.1\n- Gymnasium: 0.29.0\n- OpenAI Gym: 0.26.2\n')

Checklist

I have checked that there is no similar issue in the repo
I have read the documentation
I have provided a minimal working example to reproduce the bug
I've used the markdown code blocks for both code and stack traces.

The text was updated successfully, but these errors were encountered:

Disastorm · 2023-08-17T09:49:14Z

Any update on this? Can you confirm that if I am training on a "MultiBinary(12)" environment, the predicted tensors should all just be 0 and 1 or am I misunderstanding how its supposed to work? Am I supposed to use some kind of wrapper or conversion to the tensors that come out of the .pth file?

*edit
Is this what I'm supposed to do?

next_action = agent(obs)
dist = Bernoulli(next_action)
actions = dist.mean

It's looking like maybe this is kind of working but the results are kind of bad so maybe there is still something else i need to do on top of this?

*edit i tried with sample() instead of mean and it seems better. Is this what I should be using? How do i get deterministic?

yuanmingqi · 2023-08-25T06:22:17Z

Sorry for the late reply!

Actually, for these discrete actions, inference will output the logits rather than raw actions.

An example is

from rllte.env import make_multibinary_env
from rllte.xplore.distribution import Bernoulli
import torch as th

envs = make_multibinary_env(num_envs=8)
print(envs.observation_space, envs.action_space)

from rllte.agent import PPO
agent = PPO(env=envs)
agent.train(num_train_steps=10000)

agent = th.load("./agent.pth", map_location="cpu")
obs, _ = envs.reset()
print(agent(obs))
print(Bernoulli(logits=agent(obs)).mode)

And you will get the following output:

tensor([[ 0.2005, -0.0301,  0.0761],
        [ 0.1581, -0.2660,  0.0513],
        [-0.1131,  0.0438, -0.0546],
        [-0.0622, -0.0994, -0.0372],
        [ 0.1347,  0.2222,  0.0127],
        [-0.0412, -0.0021, -0.0644],
        [ 0.1762,  0.0138,  0.0673],
        [-0.1900, -0.0133, -0.1364]], grad_fn=<AddmmBackward0>)
tensor([[1., 0., 1.],
        [1., 0., 1.],
        [0., 1., 0.],
        [0., 0., 0.],
        [1., 1., 1.],
        [0., 0., 0.],
        [1., 1., 1.],
        [0., 0., 0.]])

Disastorm · 2023-08-25T11:01:37Z

I see thanks you can close this then.

yuanmingqi · 2023-08-27T02:20:58Z

Since we gonna publish a formal version soon, you're recommended to use the latest repo code to get more stable performance.

Disastorm added the bug Something isn't working label Aug 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: DAAC trained on MultiBinary envs but returns floats when doing inference? #21

[Bug]: DAAC trained on MultiBinary envs but returns floats when doing inference? #21

Disastorm commented Aug 2, 2023 •

edited

Loading

Disastorm commented Aug 17, 2023 •

edited

Loading

yuanmingqi commented Aug 25, 2023

Disastorm commented Aug 25, 2023

yuanmingqi commented Aug 27, 2023

[Bug]: DAAC trained on MultiBinary envs but returns floats when doing inference? #21

[Bug]: DAAC trained on MultiBinary envs but returns floats when doing inference? #21

Comments

Disastorm commented Aug 2, 2023 • edited Loading

🐛 Bug

To Reproduce

Relevant log output / Error message

System Info

Checklist

Disastorm commented Aug 17, 2023 • edited Loading

yuanmingqi commented Aug 25, 2023

Disastorm commented Aug 25, 2023

yuanmingqi commented Aug 27, 2023

Disastorm commented Aug 2, 2023 •

edited

Loading

Disastorm commented Aug 17, 2023 •

edited

Loading