Main goal of the project was to design a decision-making mechanism to enable autonomous an autonomous agent to take advantage of the opportunities offered by crowdsourcing in a navigation task. The project is exactly focused on designing a navigation algorithm for an autonomous rover that needs to navigate an uncertain environment.
The setup is as follows: an agent wants to navigate a 3-square wide grid, from bottom left to top right. The fields are numbered from 0 to 9. The middle square is an obstacle, represented by a very negative reward. The goal is to compare sources, coded as probability mass functions (pmfs), in order to select the best one at each step. For convenience, instead of using the whole state space as support for the pmfs, we reduce it to five possible actions: stay in place, move left, upwards, right downwards. The same principle was applied to reward lists.
The source code also contains the DKL function, or Kullback-Leibler divergence, which is a measure of how one probability distribution is different from a second, reference probability distribution.
The problem is the uncertain environment in which the robot moves, containing obstacles. The agent must be able to collect information from third parties and re-use such information to achieve the destination. The agent must be also able to avoid obstacles in an uncertain surface. A greedy algorithm was used to program the robot, which, in order to determine the solution at each step, makes the most promising partial solution choice at that time. It used the sources collected, compared with the target behaviour and analysed taking into account the cost function.
The Greedy algorithm is not a predictive algorithm, but in the case of a small grid it proved sufficient for the agent to reach the target, regardless of the location of the obstacle. To illustrate results performed a numerical and graphical simulation, library "Pygame" was used to represent the graphical results.
Extended verison of the project allows the user to generate an arbitrarily large grid and random obstacle’s positions or declare obstacles on specific fields. A size of grid is defined by user. In this version of the algorithm, the return_target(x) function has been modified to be universal, regardless of the grid size, but considers the grid as square. It uses the new target_direction() function, which contains all the calculations. Also modified the return_reward() function to be independent of the number of obstacles.
In this part of the project the receding horizon version of the algorithm is implemented, which is a predictive algorithm. A property of predictive control is its repetitive mode of operation. It is possible to determine a sequence of future controls, but only the first step of this sequence is used, so that in the next step the whole calculation is repeated. In the main loop, we calculate the weights for control through a receding horizon and choose a policy. The idea is that we restric the state space to the relative states available to us and use crowdsourcing on this new state space.
- Python language
- Jupyter Notebook
- Pygame library