This project involves designing a simulation where a robotic agent, "Mr. Krabs," utilizes a Double Deep Q-Network (DDQN) to locate and collect a forgotten stash of money in a simulation of the Krusty Krabs restaurant. The agent learns to navigate through a dynamic environment containing static and dynamic obstacles using reinforcement learning principles.
- Installation
- Objectives
- Environment Design
- Key Features
- Implementation Details
- Challenges and Solutions
- Testing and Results
Make sure you have Python installed.
Follow these steps to set up the environment and run the application:
-
Clone the Repository:
git clone https://github.com/Sambonic/krusty-krabs-navigator
cd krusty-krabs-navigator
-
Create a Python Virtual Environment:
python -m venv env
- Activate the Virtual Environment:
-
On Windows:
env\Scripts\activate
-
On macOS and Linux:
source env/bin/activate
- Ensure Pip is Up-to-Date:
python.exe -m pip install --upgrade pip
-
Install Dependencies:
pip install .
Or simply run the pip install line in the notebook
- Implement a DDQN algorithm to train the agent to navigate the environment efficiently.
- Develop a reward system that incentivizes the agent to reach the target quickly while avoiding obstacles.
- Create a lore-accurate environment based on the Krusty Krabs restaurant for simulation purposes.
- The environment and assets were designed using Ibis Paint X to resemble the Krusty Krabs restaurant.
- Static and dynamic obstacles, including customers, were added to create realistic challenges for the agent.
- Utilized Double Deep Q-Network (DDQN) for stability and to avoid Q-value overestimation.
- Employed a target and policy network with periodic updates for stable training.
- A feedforward neural network with three hidden layers of 64 neurons each, using ReLU activation.
- Target Network Update Frequency: Every 100 episodes.
- Replay Memory Size: 100,000.
- Batch Size: 32.
- Learning Rate: 0.0001.
- Exploration Strategy: Epsilon-greedy approach (decay from 1.0 to 0.01 with a rate of 0.998).
- Reward: Distance-based and success-based incentives (e.g., reaching the target yields a reward of 50.0).
- Penalty: Distance and time-based penalties, with a harsh penalty (-5.0) for invalid moves.
- Python with PyGame for simulation
- NumPy for occupancy grid management
- Priority Queues for pathfinding algorithms
- Custom asset design with Ibis Paint X
- DDQN class for agent behavior
- Environment class to house reward-penalty logic
- Complex Environment: Improved exploration strategy and reward system.
- Unstable Convergence: Enhanced neural network architecture with additional layers.
- Overfitting: Adopted DDQN to leverage its mechanisms for stability.
- Comparison of Algorithms:
Algorithm | Time Elapsed (s) |
---|---|
A* | 5.47 |
Dijkstra | 5.54 |
DDQN | 5.37 |
- The DDQN agent demonstrated slightly better performance in efficiency compared to traditional search algorithms.