Shared observation of custom vectorized environment for MAPPO #102

khanhphan1311 · 2023-08-09T08:14:22Z

khanhphan1311
Aug 9, 2023

Thank you again for the great library.

I'm trying to use MAPPO for my custom vectorized environment (i.e. num_envs > 1).
In the skrl's example:

agent = MAPPO(possible_agents=env.possible_agents,
              models=models,
              memories=memories,
              cfg=cfg,
              observation_spaces=env.observation_spaces,
              action_spaces=env.action_spaces,
              device=device,
              shared_observation_spaces=env.shared_observation_spaces)

which has env.shared_observation_spaces.
Could you please give me some ideas on how to modify a vectorized Gymnasium-based environment to which includes shared_observation_spaces to use MAPPO?

I've already read load_bidexhands_env(task_name="ShadowHandOver"), wrap_env(env, wrapper="bidexhands") of skrl and the example of PettingZoo on multi-agents custom environment creation, but still don't know how to create the environment to use MAPPO.

Thanks,

Toni-SM · 2023-08-10T11:31:27Z

Toni-SM
Aug 10, 2023
Maintainer

Hi @khanhphan1311

Similarly, as we discussed in #97 (comment), it is not necessary to wrap a multi-agent environment as long as it returns the variables and types required by the skrl trainers (as shown in the figure in the Wrapping (multi-agents) page in skrl's docs).

For example (for PyTorch): note that the environment does not inherit from gym/gymnasium.Env since the multi-agent environment API is based on the PettingZoo Parallel API.

import gymnasium as gym

class CustomEnv:
    def __init__(self):
        self.shared_observation_spaces = ...  # dicionary of gym space (keys are self.possible_agents)
        self.observation_spaces = ...  # dicionary of gym space (keys are self.possible_agents)
        self.action_spaces = ...  # dicionary of gym space (keys are self.possible_agents)
        self.num_envs = ...  # int
        self.num_agents = ...  # int
        self.device = ...  # torch.device or str
        self.possible_agents = ...  # list of str
 
    def step(self, actions):
        # actions: dicionary of tensors with shape (self.num_envs, ACTION_SPACE_SIZE)
        ...
        # observations: dicionary of tensors with shape (self.num_envs, OBSERVATION_SPACE_SIZE)
        # rewards: dicionary of tensor with shape (self.num_envs, 1)
        # terminated: dicionary of tensor with shape (self.num_envs, 1)
        # truncated: dicionary of tensor with shape (self.num_envs, 1)
        # infos: dictionary of any information
        observations = {uid: OBSERVATION for uid in self.possible_agents}
        rewards = {uid: REWARD for uid in self.possible_agents}
        terminated = {uid: TERMINATED for uid in self.possible_agents}
        truncated = {uid: TRUNCATED for uid in self.possible_agents}
        infos = {uid: ANY for uid in self.possible_agents}

        # shared observation
        infos["shared_states"] = {uid: SHARED_OBSERVATION for uid in self.possible_agents}

        return observations, rewards, terminated, truncated, infos

    def reset(self):
        ...
        # observations: dicionary of tensors with shape (self.num_envs, OBSERVATION_SPACE_SIZE)
        # infos: dictionary of any information
        observations = {uid: OBSERVATION for uid in self.possible_agents}
        infos = {uid: ANY for uid in self.possible_agents}

        # shared observation
        infos["shared_states"] = {uid: SHARED_OBSERVATION for uid in self.possible_agents}

        return observations, infos

    def render(self, *args, **kwargs):
        pass

    def close(self):
        pass

    def shared_observation_space(self, uid):
        return self.shared_observation_spaces[uid]

    def observation_space(self, uid):
        return self.observation_spaces[uid]
    
    def action_space(self, uid):
        return self.action_spaces[uid]

For building the shared_observation_space you can (among others):

define a custom space (like the Bi-DexHands environment which also returns shared_obs_buf):

skrl/skrl/envs/torch/wrappers/bidexhands_envs.py

Line 66 in 00a2fd3

self._shared_obs_buf = {uid: shared_obs_buf[:,i] for i, uid in enumerate(self.possible_agents)}

or concatenate/stack all agent observations (like the PettinZoo wrapper):

skrl/skrl/envs/torch/wrappers/pettingzoo_envs.py

Lines 122 to 124 in 00a2fd3

    
           shared_observations = np.stack([observations[uid] for uid in self.possible_agents], axis=0) 
        
           shared_observations = self._observation_to_tensor(shared_observations, self._shared_observation_space) 
        
           infos["shared_states"] = {uid: shared_observations for uid in self.possible_agents}

Regarding the modification of a vectorized Gymnasium-based environment for multi-agents, it can be a bit tricky since the APIs are different from each other (as discussed in Farama-Foundation/SuperSuit#43 (comment))

A possible solution to this would be to embed the vectorized environment within the code shown above and convert the values between the two APIs for the .step() and .reset() methods.

0 replies

berttggg · 2023-08-22T15:41:50Z

berttggg
Aug 22, 2023

Thank you so much for the detailed explanation! I have a question.
May I know what is the difference between shared_observation_space and shared_state_space ?
I see in the documentation (link attached below), there are shared_observation_space and shared_state_space, but I am not clear what is the difference. Thank you.

https://skrl.readthedocs.io/en/latest/api/envs/multi_agents_wrapping.html#

1 reply

Toni-SM Aug 23, 2023
Maintainer

Hi @berttggg

Currently the state_space (or shared_state_space) is just an alias for the observation_space (or shared_observation_space). They are the same. This implementation exists for future versions, for training with asymmetric information.

berttggg · 2023-08-23T03:58:49Z

berttggg
Aug 23, 2023

And, may I know why we overwrite the def render and def close to make them do nothing?

Thank you.

1 reply

Toni-SM Aug 23, 2023
Maintainer

Hi @berttggg

The code shown above is just a template, you can implement the render and close methods according to your environment specifications.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared observation of custom vectorized environment for MAPPO #102

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Shared observation of custom vectorized environment for MAPPO #102

khanhphan1311 Aug 9, 2023

Replies: 3 comments · 2 replies

Toni-SM Aug 10, 2023 Maintainer

berttggg Aug 22, 2023

Toni-SM Aug 23, 2023 Maintainer

berttggg Aug 23, 2023

Toni-SM Aug 23, 2023 Maintainer

khanhphan1311
Aug 9, 2023

Replies: 3 comments 2 replies

Toni-SM
Aug 10, 2023
Maintainer

berttggg
Aug 22, 2023

Toni-SM Aug 23, 2023
Maintainer

berttggg
Aug 23, 2023

Toni-SM Aug 23, 2023
Maintainer