Ever since I picked up computer vision, all I ever wanted to do in my daily routine is code code and code. I'm currently self-teaching myself Deep Learning and Machine Learning and I'm intrigued by the large potential it has in every occupation. Inspired by Adrian Coyler and Patrick Lui, I brought myself to read papers to keep myself up to date on the development of AI. And just like Coyler and Lui, I will summarize every paper I read so that not only I have an understanding of the paper, but so that you can understand the topic without in depth reading. I would love to share my paper notes to those who are curious, but need an overview of the core concepts. With all said, if you are a researcher, practioner, or undergrad student, I hope you can take something away from my exploration to AI.
While AI has been the prospect for developing better technology (e.g. self driving cars, customer service, financial analysis, robots, etc.), AI still faces many problems. For one, AI is very sensitive about any change in their weights, preventing them from learning new tasks. This leads AI to the what we call the catastrophic forgetting - the AI tends to forget all the tasks it previously learned when learning a new task. For second, AI is computationally expensive and unsustainable for the environment. For third, AI doesn't stop to think about solving the problems because it's only trying to learn from its input and output the best possible predictions. I recommend reading the articles below to understand the problems that AI face.
- THE TURBULENT PAST AND UNCERTAIN FUTURE OF ARTIFICIAL INTELLIGENCE
- HOW DEEPMIND IS REINVENTING THE ROBOT
- 7 REVEALING WAYS AIS FAIL
When I began learning the theory and application of Deep Learning, I read the book Deep Learning with Python by Francois Chollet, the creator of Keras. The book is great for getting started with programming using Tensorflow and Keras, but some concepts didn't seem intuitive (Neural Netowrks in general are just difficult to understand). To pair with the book, I took the the Deep Learning Specialization course by Andrew Ng, founder of deeplearning.ai. Andrew Ng does a great job in describing the theoretical concepts on hyperparameters, convolutional neural networks (CNN), optimization methods, backpropagation, and other concepts essential for understanding Neural Networks. I personally recommend checking these two out as a starting point:
The above two resources requires you to have a basic understanding Calculus (derivatives specifically) and Linear Algebra. To get deep into the bones of Deep Learning, it is required that you strongly understand Linear Algebra(35%), Probability and Statistics (25%), Multivariate Calculus (15%), Algorithms and Data Structures (15%), and Others (10%). Others refer to Real and Complex Analysis, Fourier Transforms, Information Theory, and other topics not covered in the top 4 topics. To get started, I recommend reading Deep Learning by Ian Goodfellow et al., founder of Generative Adversial Networks. He provides you all the math concepts and it's application to machine learning and deep learning all in one book. I will also provide other sources, such as this minibook called Mathematics for Machine Learning.
There are many resources in getting started with Machine Learning and Deep Learning. To explore more of the sources that availabe for you, check out Tensorflow. Tensorflow provides people a guide to how you can go from a beginner to expert in theory and in practice. There are other popular deep learning frameworks that are often used for Deep Learning, such as Keras, Caffe, and PyTorch. Tensorflow, Keras, and Pytorch have been the most popular frameworks used in building NNs. Tensorflow 2.0 actually integrated Keras into their APIs, making the user experience much simpler and easier for generating NNs. Though, do explore all and see which framework floats your boat. On the other hand, Pytorch now integrates Caffe into their APIs.
When you begin understanding the different Neural Networks (NN) and their application, you may need a deep understanding of the NN architecture and the math behind the process. In most resources, you'll be presented with popular Neural Network architectures, ranging from it's revolution to Deep Learning to contemporary Deep Learning. Below are the most popular NNs that are mentioned in most literature today, and I encourage you to read these papers:
- Alex Net [notes]
- ZF Net [notes]
- VGG16 [notes]
- GoogleLetNet [notes]
- Inception-v2
- Inception-v3
- Inception-ResNet
- Microsoft ResNet
- R-CNN [notes]
- Fast R-CNN
- Faster R-CNN
- Xception
- Generative Adversial Networks
- Generating Image Description
- Spatial Transformer Networks
It is always nice to write your code using a notebook, IDE, or a text editor. (Most of the code you will be writing will be in Python). There are a couple notebooks that you can use to write your code: Google Collab or Jupyter Notebooks. Notebooks are create for organizing your code into blocks, fast prototyping, and reiteration of a specific block of code. Google Collab is a personal favorite because you can save your notebooks in your Google Drive, and you have free access to their powerful GPUs and TPUs. GPUs and TPUs drastically improves the training and test time, especially when working with images and CNNs. With Jupyter Notebooks, you get to work with more coding programs, but have no access to a GPU and TPU. Though, I encourage you to explore both notebooks.
Some of the IDEs that you can use for computer programming are PyCharm (developed my JetBrains), Microsoft Visual Studio Code (VS Code), or Spyder. PyCharm supports virtual environments, allowing you to install scientific packages for certain projects without having to interface with the main system. My favorite feature of Pycharm is it's scientific tool, which allows you to easily install scientific packages with just a click. This feature is only allowed if you have the professional edition, and students have the opportunity to get a year long license. On the other hand, VS Code and Spyder are free IDEs, and you can make use of Anaconda to simplify package management and deployment.
- Energy and Policy Considerations for Deep Learning in NLP
Sustainability
- Green AI
Sustainability
Generative Adversial Networks
Computer Vision and Pattern Recognition
- Probabilistic Object Detection: Definition and Evaluation [notes]
- Localization Recall Precision (LRP): A New Performance Metric for Object Detection [notes]
- You Only Look Once: Unified, Real-Time Object Detection [notes]
- YOLO9000: Better, Faster, Stronger [notes]
- YOLOv3: An Incremental Improvement [notes]
List of articles that I will read
- Learning to Map Vehicles into Bird's Eye View
- VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition
- The Matrix Calculus You Need For Deep Learning
- Soft-NMS -- Improving Object Detection With One Line of Code [Github]
- Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation
- Fully Convolutional Networks for Semantic Segmentation [dataset]
- U-Net: Convolutional Networks for Biomedical Image Segmentation
- The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation
- Multi-Scale Context Aggregation by Dilated Convolutions
- DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
- Rethinking Atrous Convolution for Semantic Image Segmentation
- Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
- FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation
- Improving Semantic Segmentation via Video Propagation and Label Relaxation
- Gated-SCNN: Gated Shape CNNs for Semantic Segmentation
- When Does Labeling Smoothing Help?
- YOLACT: Real-time Instance Segmentation
- YOLACT++: Better Real-time Instance Segmentation
- Understanding deep learning requires rethinking generalization
- Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
- Dropout Sampling for Robust Object Detection in Open-Set Conditions
- MIT Advanced Vehicle Technology Study: Large-Scale Naturalistic Driving Study of Driver Behavior and Interaction with Automation
- Multitarget tracking performance metric: deficiency aware subpattern assignment
- Deep Residual Learning for Image Recognition
- Predicting the Generalization Gap in Deep Networks with Margin Distributions
- Adam: A Method for Stochastic Optimization
- MMDetection: Open MMLab Detection Toolbox and Benchmark [Github]
- Spiking-YOLO: Spiking Neural Network for Energy-Efficient Object Detection
- Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression
- Gaussian YOLOv3: An Accurate and Fast Object Detector Using Localization Uncertainty for Autonomous Driving
- EfficientDet: Scalable and Efficient Object Detection
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks [Github]
- M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network [code]
- Learning the Depths of Moving People by Watching Frozen People [Github]
- Reasoning-RCNN: Unifying Adaptive Global Reasoning Into Large-Scale Object Detection [Github]
- Fixing the train-test resolution discrepancy [Github]
- Local Aggregation for Unsupervised Learning of Visual Embeddings [Github]
- End to End Learning for Self-Driving Cars
- Towards End-to-End Lane Detection: an Instance Segmentation Approach
- FaceNet: A Unified Embedding for Face Recognition and Clustering [Github]
- Deep Hough Voting for 3D Object Detection in Point Clouds [Github]