Shared observation of custom vectorized environment for MAPPO #102
Replies: 3 comments 2 replies
-
Similarly, as we discussed in #97 (comment), it is not necessary to wrap a multi-agent environment as long as it returns the variables and types required by the skrl trainers (as shown in the figure in the Wrapping (multi-agents) page in skrl's docs). For example (for PyTorch): note that the environment does not inherit from import gymnasium as gym
class CustomEnv:
def __init__(self):
self.shared_observation_spaces = ... # dicionary of gym space (keys are self.possible_agents)
self.observation_spaces = ... # dicionary of gym space (keys are self.possible_agents)
self.action_spaces = ... # dicionary of gym space (keys are self.possible_agents)
self.num_envs = ... # int
self.num_agents = ... # int
self.device = ... # torch.device or str
self.possible_agents = ... # list of str
def step(self, actions):
# actions: dicionary of tensors with shape (self.num_envs, ACTION_SPACE_SIZE)
...
# observations: dicionary of tensors with shape (self.num_envs, OBSERVATION_SPACE_SIZE)
# rewards: dicionary of tensor with shape (self.num_envs, 1)
# terminated: dicionary of tensor with shape (self.num_envs, 1)
# truncated: dicionary of tensor with shape (self.num_envs, 1)
# infos: dictionary of any information
observations = {uid: OBSERVATION for uid in self.possible_agents}
rewards = {uid: REWARD for uid in self.possible_agents}
terminated = {uid: TERMINATED for uid in self.possible_agents}
truncated = {uid: TRUNCATED for uid in self.possible_agents}
infos = {uid: ANY for uid in self.possible_agents}
# shared observation
infos["shared_states"] = {uid: SHARED_OBSERVATION for uid in self.possible_agents}
return observations, rewards, terminated, truncated, infos
def reset(self):
...
# observations: dicionary of tensors with shape (self.num_envs, OBSERVATION_SPACE_SIZE)
# infos: dictionary of any information
observations = {uid: OBSERVATION for uid in self.possible_agents}
infos = {uid: ANY for uid in self.possible_agents}
# shared observation
infos["shared_states"] = {uid: SHARED_OBSERVATION for uid in self.possible_agents}
return observations, infos
def render(self, *args, **kwargs):
pass
def close(self):
pass
def shared_observation_space(self, uid):
return self.shared_observation_spaces[uid]
def observation_space(self, uid):
return self.observation_spaces[uid]
def action_space(self, uid):
return self.action_spaces[uid] For building the
Regarding the modification of a vectorized Gymnasium-based environment for multi-agents, it can be a bit tricky since the APIs are different from each other (as discussed in Farama-Foundation/SuperSuit#43 (comment)) A possible solution to this would be to embed the vectorized environment within the code shown above and convert the values between the two APIs for the |
Beta Was this translation helpful? Give feedback.
-
Thank you so much for the detailed explanation! I have a question. https://skrl.readthedocs.io/en/latest/api/envs/multi_agents_wrapping.html# |
Beta Was this translation helpful? Give feedback.
-
And, may I know why we overwrite the Thank you. |
Beta Was this translation helpful? Give feedback.
-
Hi @Toni-SM ,
Thank you again for the great library.
I'm trying to use MAPPO for my custom vectorized environment (i.e. num_envs > 1).
In the skrl's example:
which has
env.shared_observation_spaces
.Could you please give me some ideas on how to modify a vectorized Gymnasium-based environment to which includes
shared_observation_spaces
to use MAPPO?I've already read
load_bidexhands_env(task_name="ShadowHandOver")
,wrap_env(env, wrapper="bidexhands")
of skrl and the example of PettingZoo on multi-agents custom environment creation, but still don't know how to create the environment to use MAPPO.Thanks,
Beta Was this translation helpful? Give feedback.
All reactions