Deep reinforcement learning for traffic light control in transportation networks
In this repository, a deep reinforcement learning model for traffic light control is provided. The main task of this model is to improve the performance of traffic lights with reinforcement learning methods and the use of deep learning networks. If the traffic light is ineffective, it causes problems such as travel delays, energy loss, air pollution, vehicle accidents, etc. The existing traffic lights have a fixed schedule and do not take into account momentary traffic. Others use sensors on the ground to determine how many cars are waiting in front of the light. This model of lights has low efficiency. For example, during a football match or peak traffic hours, most of these systems are paralyzed. In these cases, the police control the traffic manually. Observing the police and this manual operator gives this motivation to design a system like that. This system requires an eye to observe the momentary intersection traffic and a brain to process it. The eye can recognize the number of cars, as well as where and when they are waiting. Reinforcement learning is used for the brain part of the system. This section also has three parts: states, actions, and rewards that we must model for use in the deep network.
At first, state, action, and rewards must be determined so that we can model them.
State: We define the state based on the speed and location of the vehicle at the intersection. The traffic light takes a picture of the intersection and divides it into small equal squares. If there is a vehicle in the square, the value is 1, and if there is no vehicle in the square, the null value is considered for small squares.
Action: The action is defined in such a way that the phase time is selected in the next loop. Each ring in the figure below represents the time of 4 phases in one cycle of the traffic light. We considered the time of each cycle as 5 seconds and showed it as an addition and subtraction of 5 seconds from each phase.
Reward: Our goal is to increase efficiency at the intersection. One way to measure the efficiency is the waiting time of vehicles. So the waiting time between two neighboring rings is the reward of this system. which can be seen in the following relationships: