A python environment for multi agent training in MuJoCo simulations.
Explore the wiki docs »
Report Bug
·
Request Feature
In this repository, we publish a python wrapper that can be used to train a large variety of different environments with reinforcement learning environments.
Clone this repository, navigate with your terminal into this repository and execute the following steps.
This is an example of how to list things you need to use the software and how to install them.
pip install -r requirements.txt
To use the environment, you have to install this repository as a pip package. Alternativly you can open a branch of this repository and implement changes directly in this repo.
- Navigate to the repository with your terminal.
- Install the repository as a pip package
pip install .
- Check whether the installation was successful
python -c "import MuJoCo_Gym"
The basic multi agent environment can be imported and used like this:
First the path for the environment has to be set. Additionaly you need to provide a list of agent names within the environment. Those names correspond to the top level body of your agent within the xml file. The json file containing additional information is optional.
from MuJoCo_Gym.mujoco_rl import MuJoCoRL
environment_path = "Examples/Environment/MultiEnvs.xml" # File containing the mujoco environment
info_path = "Examples/Environment/info_example.json" # File containing addtional environment informations
agents = ["agent1", "agent2"] # List of agents (body names) within the environment
These informations have to be stored in a dictionary. This is necessary to make the environment compatible with Ray.
config_dict = {"xmlPath":environment_path, "infoJson":info_path, "agents":agents}
environment = mujoco_rl(config_dict)
Reset the environment to start the simulation.
observation, infos = environment.reset()
Store the action of each agent in a dictionary with the agent names as keys. The array has to match the shape of the action space and the single agents have to be part of the action range.
actions = {"agent1":np.array([]), "agent2":np.array([])}
observations, rewards, terminations, truncations, infos = environment.step(actions)
To use a language channel, you have to implement it as a environment dynamic. Each environment dynamic has its own observation and action space, which will be forwarded to the agents. Note that at the moment each agent gets all environment dynamics and each dynamic is executed for each agent once during every timestep.
A basic implementation of a language channel in the environment. Note that every environment dynamic needs to implement a init(self, mujoco_gym) and a dynamic(self, agent, actions).
class Language():
def __init__(self, mujoco_gym):
self.mujoco_gym = mujoco_gym
self.observation_space = {"low": [0], "high": [3]}
self.action_space = {"low": [0], "high": [3]}
# The datastore is used to store and preserve data over one or multiple timesteps
self.dataStore = {}
def dynamic(self, agent, actions):
# At timestep 0, the utterance field has to be initialized
if "utterance" not in self.mujoco_gym.data_store[agent].keys():
self.mujoco_gym.data_store[agent]["utterance"] = 0
# Extract the utterance from the agents action
utterance = int(actions[0])
# Store the utterance in the dataStore for the environment
self.mujoco_gym.data_store[agent]["utterance"] = utterance
otherAgent = [other for other in self.mujoco_gym.agents if other != agent][0]
# Check whether the other agent has "spoken" yet (not at timestep 0)
if "utterance" in self.mujoco_gym.data_store[otherAgent]:
utteranceOtherAgent = self.mujoco_gym.data_store[otherAgent]["utterance"]
return 0, np.array([utteranceOtherAgent])
else:
return 0, np.array([0])
The environment dynamic has to be added to the environment config.
config_dict = {"xmlPath":environment_path, "infoJson":info_path, "agents":agents, "environmentDynamics":[Language]}
environment = mujoco_rl(config_dict)
A reference implementation of a reward function that gives back a positive reward if the agent gets closer to a target object. All possible target objects are filtered by tags at the beginning. Those tags are set in the info json file, which is handed over in the config dict at the beginning.
def reward_function(mujoco_gym, agent):
# Creates all the necessary fields to store the needed data within the dataStore at timestep 0
if "targets" not in mujoco_gym.data_store[agent].keys():
mujoco_gym.data_store["targets"] = mujoco_gym.filter_by_tag("target")
mujoco_gym.data_store[agent]["current_target"] =
mujoco_gym.data_store["targets"][random.randint(0, len(mujoco_gym.data_store["targets"]) - 1)]["name"]
distance = mujoco_gym.distance(agent, mujoco_gym.data_store[agent]["current_target"])
mujoco_gym.data_store[agent]["distance"] = distance
new_reward = 0
else: # Calculates the distance between the agent and the current target
distance = mujoco_gym.distance(agent, mujoco_gym.data_store[agent]["current_target"])
new_reward = mujoco_gym.data_store[agent]["distance"] - distance
mujoco_gym.data_store[agent]["distance"] = distance
reward = new_reward * 10
return reward
The done function end the current training run if the agent gets closer than 1 distance unit to the target.
def done_function(mujoco_gym, agent):
if mujoco_gym.data_store[agent]["distance"] <= 1:
return True
else:
return False
Both of them have to be included in the config dictionary.
config_dict = {"xmlPath":environment_path, "infoJson":info_path, "agents":agents, "rewardFunctions":[reward_function], "doneFunctions":[done_function]}
environment = mujoco_rl(config_dict)
For more examples, please refer to the Wiki.
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE.txt
for more information.
Cornelius Wolff - cowolff@uos.de