Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by performing actions in an environment to maximize some notion of cumulative reward. Unlike supervised learning, where the model is trained on a fixed dataset, RL involves learning through trial and error, where the agent interacts with the environment, receives feedback, and adjusts its policy accordingly.
OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms. It provides a variety of environments ranging from classic control tasks to complex simulations, which serve as benchmarks for evaluating RL algorithms. For our use case of autonomous driving and car racing, OpenAI Gym offers the "CarRacing-v2" environment, a challenging task where an agent must learn to drive a car around a track efficiently.
- Benchmarking: OpenAI Gym provides standardized environments, allowing us to benchmark our RL algorithms against established baselines.
- Complexity: The "CarRacing-v2" environment offers a complex and dynamic scenario that simulates real-world driving challenges, making it an ideal testbed for developing robust autonomous driving agents.
- Flexibility: OpenAI Gym's flexible interface allows us to experiment with different RL algorithms and adapt the environment's parameters to test various scenarios, enhancing the agent's learning and generalization capabilities.
This repository contains an advanced reinforcement learning system designed to train agents using Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) algorithms within the "CarRacing-v2" environment. The system harnesses several self-adaptive features to bolster the training efficacy and performance of the agents.
- Self-Adaptive Training: Utilizes dynamic learning rate adjustments based on agent performance.
- Evaluation Callback: Periodically evaluates and saves the best performing model.
- Domain Randomization: Introduces variability in environment parameters to enhance agent robustness.
- GPU Acceleration: Speeds up training using GPU resources.
- Stable Baselines3 Integration: Leverages state-of-the-art reinforcement learning algorithms.
- Environment Setup: Initialization of the "CarRacing-v2" environment with domain randomization and discrete mode.
- Agent Training: Training agents using DQN and PPO algorithms to navigate the environment with CNN policies and custom learning rate schedules.
- Domain Randomization: Varies environment parameters across episodes.
- RGB Rendering: Environment rendered in RGB format.
- Evaluation Callback: Periodic performance evaluation and model saving.
- Model Saving: Stores the best model for future use.
- Animation Generation: Creates animations of agents' performance for visualization.
- Environment Setup: Initialization with domain randomization, discrete mode, and RGB rendering.
- Agent Training: Agents trained with DQN and PPO algorithms using CNN policies, GPU acceleration, and custom learning rate schedules.
- Domain Randomization: Enhances agent's generalization ability.
- RGB Rendering: Suitable for visual-based RL agents.
- GPU Acceleration: Speeds up training.
- Custom Learning Rate Schedule: Adjusts learning rate dynamically.
- Evaluation Callback: Evaluates and saves the best model periodically.
- Model Saving: Preserves the best model for future use.
- Animation Generation: Visualizes agent's performance.
We conducted extensive training using the PPO algorithm, with agents trained for over 2 million episodes. These runs allow us to observe the agent's performance under different conditions and draw meaningful conclusions. Here are some visualizations of the agent's performance:
Quality Attribute | Refinement | Priority/Difficulty (H, M, L) | Scenario |
---|---|---|---|
Performance | Training efficiency | M | The agent should learn effectively within a reasonable number of episodes to navigate the CarRacing-v2 environment efficiently. |
Performance | Model saving | L | The model should be saved quickly and efficiently after training to minimize downtime and ensure the learned knowledge is preserved. |
Usability | Visualization | M | The visualization tools, including animations, should be easy to use and provide clear insights into the agent's performance and behavior. |
Reliability | Robustness to domain randomization | H | The agent should perform well and generalize across different scenarios introduced by domain randomization. |
Maintainability | Custom learning rate schedule | M | The custom learning rate schedule should be easily adjustable and maintainable to allow for experimentation and optimization. |
Efficiency | GPU acceleration | H | The use of GPU acceleration should significantly speed up the training process without compromising the quality of the learned policy. |
This project exemplifies a self-adaptive system in reinforcement learning, focusing on feedback mechanisms, dynamic adjustments, and iterative improvements.
- Adaptation to Environmental Variability: Challenges the agent to generalize across varied conditions.
- Dynamic Adjustment Based on Performance: Optimizes learning efficiency by adjusting the learning rate.
- Feedback Loop for Improvement: Identifies and retains the most effective policies.
- Iterative Learning and Adaptation: Uses visualization tools for continuous improvement.
"Car Racing," Farama Gymnasium, https://gymnasium.farama.org/environments/box2d/car_racing/