IPPO actions almost always the same #240

ginesmoratalla · 2024-12-16T18:45:47Z

ginesmoratalla
Dec 16, 2024

Hello, I am new to this framework but I was trying to train IPPO on a custom environment. I do not know wether it is a problem of training or execution, when using the trained actor network in the environment, each agent always picks the same action (e.g., agent_0 always picks action 4). Since this is just the execution part, I just wanted to know if anyone can give me tips on how to debug this. Could it be a problem when training the network, or could it be found here in the execution script?

It is the first train that I do in the env, and it run for 240k steps with 4 agents. (execution script below)

env = MARL_SFE(4)
env = wrap_env(env, wrapper="pettingzoo")

# Two devices were used, both cuda and cpu somewhere else when coputing logits below, so will leave cpu for execution
#device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device = torch.device("cpu")

# Neural Network for the policy (actor)
class ActorPolicyNetwork(CategoricalMixin, Model):
    def __init__(self, observation_space, action_space, device, unnormalized_log_prob=True):
        #print("Actor Policy Network Init Device: ", device)
        Model.__init__(self, observation_space, action_space, device)
        CategoricalMixin.__init__(self, unnormalized_log_prob)

        self.actor_nn = nn.Sequential(nn.Linear(self.num_observations, 128),
                                 nn.ReLU(),
                                 nn.Linear(128, 128),
                                 nn.ReLU(),
                                 nn.Linear(128, 128),
                                 nn.ReLU(),
                                 nn.Linear(128, self.num_actions))

    def compute(self, inputs, role):
        return self.actor_nn(inputs["states"]), {}


# instantiate the agent's model network
# https://skrl.readthedocs.io/en/latest/api/multi_agents/ippo.html#models
models = {}
networks = torch.load("runs/torch/IPPO_MARL_SFE/IPPO_4_agents_240000_timestep_run/checkpoints/agent_240000.pt")
for agent_name in env.possible_agents:
    models[agent_name] = {}
    models[agent_name]["policy"] = ActorPolicyNetwork(env.observation_space(agent_name), env.action_space(agent_name), device)

    # This line aims to fix the complain about networks dictionary having more parameters than expected (apart from the neural net)
    policy_dictionary = {k: v for k, v in networks[agent_name]['policy'].items() if 'actor_nn' in k}
    models[agent_name]["policy"].load_state_dict(policy_dictionary)
    models[agent_name]["policy"].eval()

observations, _ = env.reset()
game_over = False
with torch.no_grad():
    while not game_over:
        actions = {}
        for agent_name in env.possible_agents:
                observation = observations[agent_name]
                observation_tensor = observation.clone().detach().to(device).unsqueeze(0).float()
                logits, _ = models[agent_name]["policy"].compute({"states": observation_tensor}, role="policy")
                #action = torch.distributions.Categorical(logits=logits).sample()
                action = torch.argmax(logits)

                if agent_name == "agent_0":
                    print(f"AGENT {agent_name}\n LOGITS {logits} action = {action}\nOBSERVATION: {observation}\n")

                actions[agent_name] = action

        print(f"ACTIONS {actions}\n")
        observations, _, terminations, truncations, _ = env.step(actions)
        env.render()

        if any(terminations.values() or truncations.values()):
            game_over = True

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IPPO actions almost always the same #240

{{title}}

Replies: 0 comments

Select a reply

IPPO actions almost always the same #240

ginesmoratalla Dec 16, 2024

Replies: 0 comments

ginesmoratalla
Dec 16, 2024