Given: data/drive.mp4 8616 frames in data/IMG each frame is 640(w) x 840(h) x 3 (RGB) Given ground_truth data in drive.json with [time, speed] for each of the 8616 frames.
Check out the medium article
VideoToDataset.ipynb
(This is what I used to write the ground truth data to a dataframe and store my images separately, this helped with testing)NvidiaModel-OpticalFlowDense_kerasnew.ipynb
(this is how I trained the model and demonstrated the MSE, I also processed the dataset into a video which is shown in HTML inline, notes on how I did certain things are in here)
- test.py
- model.py
- opticalHelpers.py
- model-weights-Vtest.h5 (trained on 10 epochs, MSE ~ 10)
- model-weights-Vtest2.h5 (trained on 15 epochs, MSE ~ 5.6) (preloaded)
- setupstuff.sh
To test the model:
- run
./setupstuff.sh
- this will create the necessary folders (driving_test.csv, test_IMG, test_predict) - create paths to your own data.json and movie.mp4 file on lines 21 and 22 inside test.py
python test.py
- this will log out the MSE for a given sample size (you pick the sample size on line 14, weights should be prespecified on line 13)python makeVideo.py
- this will create a video with the prediction values overlayed on-top of each image feel free to delete the ./data/predict folder after step 4
- Requires moviepy
Dense Optical Flow network feeding.
-
Method 1: append images to give 3rd dimension an angular and a magnitude layer. In NvidiaModel-OpticalFlowDense I changed up my generator to yield (66, 220, 5) images with (Height , Width, R, G, B, Ang, Mag) Angles and Magnitudes are a result of computing the Dense Optical Flow using Farneback parameters. This did not help my MSE was still ~20 and I did not observe any special results.
-
Method 2: Convert optical flow angles and magnitude HSV to RGB and pass that into the network as (66, 220, 3) RGB values.
-
Hyperparameter selection: I trained the model with 400 samples per epoch, with batch sizes of 32. Therefore I sent ~16,000 images into the generator, resulting in 8k optical flow differentials. I also used an adam optimizer, and ELU activation functions because they lead to convergence faster!
Method 2 was the winner. I guess there was just too much noise when doing a simple image_1 (RGB) - image_2 (RGB). The network model held up because I converted the optical flow parameters to an RGB image, as you can see in the above video.
Other approaches:
- Nvidia Model: PilotNet based implementation that compares the differences between both images and sends that through a network and performs regression based on the image differences
- DeepVO: AlexNet like implementation that performs parallel convolutions on two images and them merges them later in the pipeline to extract special features between them
-
I grabbed the DeepVO model from this paper: https://arxiv.org/pdf/1611.06069.pdf
-
You can drag the train_vo.prototxt to this link: http://ethereon.github.io/netscope/#/editor to see the network model and all its intricacies
- DeepFlow: Large displacement optical flow with deep matching link
- I considered using DeepFlow
Implement Dense optical flow analysis, get optical flow per each pixel. as seen in this example
Twitter: @jonathancmitch
Github: github.com/jonathancmitchell
- Numpy
- OpenCV3
- Python
- Pandas
- Tensorflow
- Matplotlib
- SciKit-Learn
- keras
- moviepy
- tqdm