Fleet-Scheduling-using-MADDPG-Multi-Agent-RL

Goal:

To develop a Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm to solve a Multi-Agent Environment (i.e., Vehicle Scheduling Environment) and Simple Adversary: OpenAI Multi-Agent particle environment.

Multi-Agent Environment

Two cars in a 4x4 Grid-world environment
- 1st car – Goal - To reach top right of the environment
- 2nd car – Goal - To reach top left of the environment
- State space: 16 states: {s0, s1, s2,...s15}
- Action space: {0: down, 1: up, 2: right, 3: left, 4: no move}
- Reward structure
  - Towards the target: 1
  - Away from the target: -3
  - Stays in same position: -5
  - Reaches target: 100

Simple Adversary - OpenAI Multi Agent particle environment

3 agents – 1 adversary and 2 good agents (Physical deception)
Environment – 2 landmarks (Green – target landmark, Black – dummy landmark)
Rewards:
- For agents:
  - Positive reward - based on the distance between the closest agent to the target landmark
  - Negative reward – based on the distance between the adversary to the target landmark
- For adversary:
  - Positive reward – based on the distance between the adversary to target landmark

Implementation:

Implemented Q-learning and MADDPG on both Vehicle Scheduling and Simple Adversary Environments

MADDPG:

Every Agent has
- Actor Network:
  - Inputs: States, Actions
  - Outputs: Probabilities
- Critic Network:
  - Inputs: States, Actions
  - Outputs: Q values
To avoid running targets (i.e. freeze weights) target networks are used
- Target Actor Network (i.e. performed soft updates)
- Target Critic Network (i.e. performed soft updates)

Improved Version of MADDPG

I developed an improved version of MADDPG, where I have used the ε-greedy approach even after applying noise to actions chosen from the deterministic policy.

Observations:

Q learning is not working well for the Vehicle Scheduling environment.
The MADDPG algorithm is working better when compared to the Q-learning algorithm.
Proper attention should be given while implementing the MADDPG algorithm since it may lead to over-estimation of the Q-value using the Critic network.
MADDPG is working well for a continuous action-state value environment (i.e., Simple Adversary)

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Fleet_Env.png		Fleet_Env.png
Fleet_Scheduling_MADDPG.ipynb		Fleet_Scheduling_MADDPG.ipynb
Fleet_Scheduling_Report.pdf		Fleet_Scheduling_Report.pdf
MADDPG_Arc.png		MADDPG_Arc.png
README.md		README.md
demo_index.html		demo_index.html
mpe_simple_adversary.gif		mpe_simple_adversary.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fleet-Scheduling-using-MADDPG-Multi-Agent-RL

Goal:

Multi-Agent Environment

Simple Adversary - OpenAI Multi Agent particle environment

Implementation:

MADDPG:

Improved Version of MADDPG

Observations:

About

Releases

Packages

Languages

nkrgit/Fleet-Scheduling-using-MADDPG-Multi-Agent-RL

Folders and files

Latest commit

History

Repository files navigation

Fleet-Scheduling-using-MADDPG-Multi-Agent-RL

Goal:

Multi-Agent Environment

Simple Adversary - OpenAI Multi Agent particle environment

Implementation:

MADDPG:

Improved Version of MADDPG

Observations:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages