This is the implementation part of the bachelors thesis Natural Language Instruction-Following in a Simulated 3D World using microcosm.ai, submitted to Osnabrück University in March 2024. It contains code for generating a curriculum, training an agent, and testing it. The experiment implemented, and major parts of the model used, stem from Chaplot et al., 2017 [2]. The code is based on microcosm.ai [1] packages that are still in development, which may cause issues with compatibility. The Issues section expands on this.
- Asynchronous Advantage Actor-Critic (A3C) with LSTM and Gated Attention (GA): The repository implements the A3C algorithm with LSTM and GA.
- Multiprocessing: We use Python's multiprocessing module to train or test the model across multiple processes simultaneously.
- Flexible Training and Testing: The script can be used for both training and testing the model. The mode is determined by the --evaluate argument. Other parameters, such as the learning rate, discount factor, or number of training processes, can also be specified from the command line.
- Model Loading: The script can load a pre-trained model from a specified path, allowing the resumption of training across levels, and evaluation after checkpoints.
- Curriculum Learning: The script supports curriculum learning, where the model is trained on a series of tasks of increasing difficulty. The curriculum is defined in a specified directory.
- Environment Visualization: The script includes an option to visualize the environment. This can be useful for debugging and understanding the model's behavior. Note that this does not work during multiprocessing.
- Logging: We use Tensorboard for logging and visualisation.
-
Copy Repository to Local Device:
- Clone the repository to your local device using Git:
git clone <https://github.com/microcosmAI/instruction-following>
- Clone the repository to your local device using Git:
-
Install Conda Environment from environment.yml:
- Navigate to the root directory of the cloned repository.
- Run the following command to create a Conda environment from the provided
environment.yml
file:conda env create -f environment.yml
-
Download and Install mujoco-environment Repository:
- Download and install the
s.mujoco-environment
repository according to the installation instructions provided in that repository. The required version may be on the dev branch. You can find the installation instructions here.
- Download and install the
-
PITA Algorithm:
- A version of the PITA algorithm is included in this repository. However, depending on your system and use case, you may need to install a more up-to-date version, which could require adaptations to the provided code. PITA can be found here.
- To train an agent on a curriculum, the curriculum needs to be generated by running
python curriculum_generation.py
- This will generate a set of 6 levels at the default curriculum repository. Depending on your system, this may take a while (our testing has shown Linux to be faster than Windows for this). If you want more or fewer levels, you will need to adjust the settings in the curriculum_generation python file.
- You can set the agent to train by calling
python a3c_main.py
, which will use the default parameters. You can adjust the number of processes with the num-processes flag. - To test a trained model, wait until training has saved a model checkpoint, terminate training or wait for it to finish, and run a3c_main.py with the -e=1 flag.
You can reach me at kvatankhahba@uos.de
- Development was initially done using Ubuntu, but had to be ported to Windows because of incompatibility of an image process during multiprocessing with Linux' X server. At the time, the program would run under Ubuntu if used with only one process, but since development has progressed since then, we can not guarantee compatibility.
- Testing on Apple devices has shown that sensor data (e.g. camera resolution) may have to be defined differently. We have also observed some further incompatibilities.
- mujoco-env and PITA are both taken from their respective development branches, which has occasionally been an issue when using PITA output for mujoco-env input. Their versions are also very much subject to change.
- In summary, this project incorporates experimental packages, and is highly OS-dependant. If you have questions or issues, it is recommended to add them to this repository, or to contact me.
[1] Mayer, J. (2024). microcosm.ai. Retrieved March 19, 2024, from https://microcosm.ai/ [2] Chaplot, D. S., Sathyendra, K. M., Pasumarthi, R. K., Rajagopal, D., & Salakhutdinov, R. (2018). Gated-Attention Architectures for Task-Oriented Language Grounding. ArXiv. Link