This project implements a simulation of the 10-arm testbed problem commonly used in reinforcement learning to demonstrate the Ξ΅-greedy algorithm. Different Ξ΅-values are tested to observe their impact on the agent's ability to balance exploration and exploitation.
main.py
: The main script to run simulations. It sets up the environment, initializes agents with different Ξ΅-values, and runs the simulations.agent.py
: Defines theAgent
class, which encapsulates the behavior of an Ξ΅-greedy agent.visualization.py
: Contains functions to visualize the results of the simulations using Seaborn and Matplotlib for better aesthetic appeal.
Before running the simulation, make sure you have Python installed on your system. You will also need the following Python packages:
- NumPy
- Matplotlib
- Seaborn
You can install these packages using pip:
pip install numpy matplotlib seaborn
To run the simulation, execute the main.py
file. This can be done from the command line:
python main.py
This plot shows the average reward over episodes for different agents.
This grouped bar chart visualizes the number of times each arm was selected by different agents.
This plot compares the average reward over episodes for the optimistic initial values agent and the UCB agent.
-
Average Reward vs. Episodes:
- The UCB agent consistently achieves a higher average reward compared to Ξ΅-greedy agents.
- The optimistic initial values agent starts strong but converges to similar performance as the Ξ΅ = 0.1 agent.
-
Selections of Each Arm:
- The UCB agent explores the arms more uniformly compared to other agents.
- The Ξ΅ = 0.01 agent tends to exploit more, showing a preference for a particular arm.
-
Comparison between Optimistic and UCB Agents:
- The UCB agent outperforms the optimistic initial values agent in terms of average reward.
- The optimistic agent starts with a higher initial reward but is eventually surpassed by the UCB agent.
Feel free to fork this project. Enjoy exploring reinforcement learning with this 10-arm testbed simulation! π
This project is open-source and available under the MIT License.