This project is an attemp to train a neural network to drive a car in an end-to-end fashion. This approach to autonomous driving is not knew and had been proven by NVIDIA's DAVE-2. The difference we try to make here is to help our model learn a sequence of steering angles
rather than just one steering angle which have been done is a number of publications.
The major benifit of learning a sequence of steering angles is that this can help deliver the two pillars of driving task, namely vehicle's steering angle and velocity. While the former can be learnt given front-facing images, there are no means to infer velocity from static images which make it impossible for a network to learnt. It can be argued that the velocity can also be learnt end-to-end given video inputs and recurrent layers which can capture input's temporal relation. However, the velicty in its essence is not meaned to learn end-by-end. The velocity is a dynamic quantity which depends on the vehicles' surronding environment (mainly moving obstacles). As a result, the velocity should be calculated based on sensory input. Motion planning literature suggest that a velocity profile can be generated by timestamping a geometric path which in case of car-like motion just a sequence of steering angles. With this in mind, if a network can learn such a sequenece, a veloicty can be produce, hence the completion of driving task.
Clone the repository and install the dependencies
pip install -r requirement.txt
Trained model can be downloaded at this Google Drive folder. After being downloaded, the trained models need to be put in best_weights
folder.
The dataset used to train our model is Udaicty dataset CH2. This raw form of dataset is a collection of 6 bag files which need to be preprocessed to extract camera images and associated steering angles (as well as other helpful information such as GPS coordinate). The preprocessing is done thanks to udacity-dataset-reader. After preprocessing, a number of folders and files are created; however, only the folder center
and interpolated.csv
which respectively contains the images captured by front-facing camera and detail information (e.g. timestamp, filename, steering angle, GPS coordiante).
The repository structure should be organized as following
project
│ README.md
│
└───best_weights
|
└───data
| |
| └───training_data
│ │ interpolated.csv
│ └───center
│ │ timestamp_1.jpeg
│ │ timestamp_2.jpeg
│ │ ...
To enable learning a geometrical path, the model is shown an image of the environment in front of the car and it is meant to output a sequence of steering angles in which the first angle is directly applied to the image and each subsequence angle is 2 meters away from its predecessor. To serve this purpose, the data needs to encorporate the distance between each label.
Furthermore, to increase the reliable of model's prediction, the training problem is formulated as a classification problem. In details, the recorded steering angle spectrum is discretized into intervals of length 1 degree
. Then instead of directly predicting a steering angle, the model predicts the class that the steering angle belong to. The advantage of this approach is that model's performance during training can also be measured by the accuracy metric, in addition to the cross-entroy loss. Moreover, with sufficiently small bins, the resulted model can outperform regression-based ones (e.g. DroNet)
These preparation phases are done in ./data/dataset_preparation.ipynb
. An example of training sample is shown in Fig.1
After executing dataset_preparation.ipynb
, 2 .csv files will be created. One file cotains the information of the input to the model, the other contains the expected output. To transform these files into 2 tensors X
and y
which in turn can be used for training or evaluation, you need to use csv_to_npy.py
with synctax:
python csv_to_npy.py --csv_file_path --csv_type --model_type
Example of csv_file_path
is ./data/CH2_training.csv
. The csv_type
takes one of 2 values: training
or validation
. The model_type
takes classifier
or regressor
.
The architect used in this project (Fig.2) is adapted from DroNet: Learning to fly by driving. The body is made of 3 ResNet blocks. The original output layer, with one neurons for performing regression on steering angle and another for predicting the collision probability, is replaced by an array of classififers each comprised of one Dense layer activated by ReLU and another Dense layer acitvated by Softmax to output the one-hot representation of steering angle class.
After training for 100 epochs, our model's performance is better than DroNet and some other model on Root Mean Square Error (RMSE) and Explained Variance (EVA) metric. Since our model employs 5 classifiers (call Head) to predict 5 steering angles, those two metrics are calculated for each classifier.
Model | RMSE | EVA |
---|---|---|
Constant baseline | 0.2129 | 0 |
DroNet | 0.1090 | 0.7370 |
Head_0 | 0.0869 | 0.8933 |
Head_1 | 0.0920 | 0.8781 |
Head_2 | 0.1052 | 0.8382 |
Head_3 | 0.0820 | 0.9012 |
Head_4 | 0.0851 | 0.8943 |
The quality of classification is shown in confusition matrix below. This matrix features a strong main diagonal which means that the majority of predicted angle classes is actually the true class.
In addition, in an attemp to understand what the model has learned, the activation of each ResNet block is shown below. It can be seen that the first block recognizes lane mark and car-like objects. The second block seems to segment the drivable area; and the last block learns an identical mapping.
Fig.4 Activation of each ResNet block
The path predicted by the model compared against the ground truth is shown in Fig.5 which indicates a good match. The extended video of our model's prediction given the data from Udacity dataset Challenge 2 can be found in https://youtu.be/X2fi2xVr2jE. The interpretation of a sequence of steering angle into a path is implemented in ./demo/udacity_dataset_demo_2.py
. Before executing this file, please use csv_to_npy.py
to convert ./data/demo_dataset.csv
to 2 npy files: one contains the input images and the other contains the true sequence of steering angle for each images.
Fig.5 Predicted paths compared to their ground truth
If you use this code in an academic context, please cite the following paper:
M. Q. Dao, D. Lanza and V. Frémont, "End-to-End Deep Neural Network Design for Short-term Path Planning", 11th IROS Workshop on Planning, Perception, Navigation for Intelligent Vehicle (PPNIV 2019 - IROS 2019), Macau, China, November 2019.
@inproceedings{Dao_PPNIV_2019,
author={M. Q. {Dao} and D. {Lanza} and V. {Fremont}},
booktitle={11th IROS Workshop on Planning, Perception, Navigation for Intelligent Vehicle (PPNIV 2019)},
title={End-to-End Deep Neural Network Design for Short-term Path Planning},
year={2019},
month={November},}