Skip to content

Latest commit

 

History

History
192 lines (171 loc) · 17.7 KB

README.md

File metadata and controls

192 lines (171 loc) · 17.7 KB

Content

Popular Optimization algorithms

Normalization Methods

  • BatchNorm [Link]
  • Weight Norm [Link]
  • Spectral Norm [Link]
  • Cosine Normalization [Link]
  • L2 Regularization versus Batch and Weight Normalization Link
  • WHY GRADIENT CLIPPING ACCELERATES TRAINING: A THEORETICAL JUSTIFICATION FOR ADAPTIVITY Link

On Convexity and Generalization of Neural Networks

  • Convex Neural Networks [Link]
  • Breaking the Curse of Dimensionality with Convex Neural Networks [Link]
  • UNDERSTANDING DEEP LEARNING REQUIRES RETHINKING GENERALIZATION [Link]
  • Optimal Control Via Neural Networks: A Convex Approach. [Link]
  • Input Convex Neural Networks [Link]
  • A New Concept of Convex based Multiple Neural Networks Structure. [Link
  • SGD Converges to Global Minimum in Deep Learning via Star-convex Path [Link]
  • A Convergence Theory for Deep Learning via Over-Parameterization Link

Continuation Methods and Curriculum Learning

  • Curriculum Learning [Link]
  • SOLVING RUBIK’S CUBE WITH A ROBOT HAND Link
  • Noisy Activation Function [Link]
  • Mollifying Networks [Link]
  • Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks Link Talk
  • Automated Curriculum Learning for Neural Networks Link
  • On The Power of Curriculum Learning in Training Deep Networks Link
  • On-line Adaptative Curriculum Learning for GANs Link
  • Parameter Continuation with Secant Approximation for Deep Neural Networks and Step-up GAN Link
  • HashNet: Deep Learning to Hash by Continuation. [Link]
  • Learning Combinations of Activation Functions. [Link]
  • Learning and development in neural networks: The importance of starting small (1993) Link
  • Flexible shaping: How learning in small steps helps Link
  • Curriculum Labeling: Self-paced Pseudo-Labeling for Semi-Supervised Learning Link
  • RETHINKING CURRICULUM LEARNING WITH INCREMENTAL LABELS AND ADAPTIVE COMPENSATION Link
  • Parameter Continuation Methods for the Optimization of Deep Neural Networks Link
  • Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection [Link (https://www.aclweb.org/anthology/W18-6314.pdf)
  • Reinforcement Learning based Curriculum Optimization for Neural Machine Translation Link
  • EVOLUTIONARY POPULATION CURRICULUM FOR SCALING MULTI-AGENT REINFORCEMENT LEARNING Link
  • ENTROPY-SGD: BIASING GRADIENT DESCENT INTO WIDE VALLEYS Link
  • NEIGHBOURHOOD DISTILLATION: ON THE BENEFITS OF NON END-TO-END DISTILLATION Link
  • LEARNING TO EXECUTE Link
  • Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing Link
  • Data Parameters: A New Family of Parameters for Learning a Differentiable Curriculum Link
  • Breaking the Curse of Space Explosion: Towards Effcient NAS with Curriculum Search Link
  • Continuation Methods and Curriculum Learning for Learning to Rank Link

On Loss Surfaces and Generalization of Deep Neural Networks

  • Exact solutions to the nonlinear dynamics of learning in deep linear neural networks Link
  • QUALITATIVELY CHARACTERIZING NEURAL NETWORK OPTIMIZATION PROBLEMS[Link]
  • The Loss Surfaces of Multilayer Networks [Link]
  • Visualizing the Loss Landscape of Neural Nets [Link]
  • The Loss Surface Of Deep Linear Networks Viewed Through The Algebraic Geometry Lens [Link]
  • How regularization affects the critical points in linear networks.[Link]
  • Local minima in training of neural networks [Link]
  • Necessary and Sufficient Geometries for Gradient Methods Link
  • Fine-grained Optimization of Deep Neural Networks Link
  • SCORE-BASED GENERATIVE MODELING THROUGH STOCHASTIC DIFFERENTIAL EQUATIONS Link

Dynamics, Bifurcations and RNNs difficulty to train

  • Deep Equilibrium Models Link
  • Bifurcations of Recurrent Neural Networks in Gradient Descent Learning [Link]
  • On the difficulty of training recurrent neural networks [Link]
  • Understanding and Controlling Memory in Recurrent Neural Networks [Link]
  • Dynamics and Bifurcation of Neural Networks [Link]
  • Context Aware Machine Learning [Link]
  • The trade-off between long-term memory and smoothness for recurrent networks [Link]
  • Dynamical complexity and computation in recurrent neural networks beyond their fxed point [Link]
  • Bifurcations in discrete-time neural networks : controlling complex network behaviour with inputs [Links]
  • Interpreting Recurrent Neural Networks Behaviour via Excitable Network Attractors [Link]
  • Bifurcation analysis of a neural network model Link
  • A Differentiable Physics Engine for Deep Learning in Robotics Link
  • Deep learning for universal linear embeddings of nonlinear dynamics Link
  • Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations Link
  • Analysis of gradient descent learning algorithms for multilayer feedforward neural networks Link
  • A dynamical model for the analysis and acceleration of learning in feedforward networks Link
  • A bio-inspired bistable recurrent cell allows for long-lasting memory Link
  • Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation [Link (https://www.frontiersin.org/articles/10.3389/fncom.2017.00024/full)

Poor Local Minima? and Sharp Minima

  • Adding One Neuron Can Eliminate All Bad Local Minima Link
  • Deep Learning without Poor Local Minima Link
  • Elimination of All Bad Local Minima in Deep Learning Link
  • How to escape saddle points efficiently. Link
  • Depth with Nonlinearity Creates No Bad Local Minima in ResNets Link
  • Sharp Minima Can Generalize For Deep Nets Link
  • Asymmetric Valleys: Beyond Sharp and Flat Local Minima Link
  • A Reparameterization-Invariant Flatness Measure for Deep Neural Networks Link
  • A Simple Weight Decay Can Improve Generalization Link
  • Finding Critical and Gradient-Flat Points of Deep Neural Network Loss Functions Link
  • The Loss Surface Of Deep Linear Networks Viewed Through The Algebraic Geometry Lens Link
  • Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization Link
  • Flatness is a False Friend Link
  • Are_Saddles_Good_Enough_for_Deep_Learning Link

Initialization of Neural Network

  • Deep learning course notes Link
  • On the importance of initialization and momentum in deep learning Link
  • The Break-Even Point on Optimization Trajectories of Deep Neural Networks Link
  • THE EARLY PHASE OF NEURAL NETWORK TRAINING Link
  • One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers Link
  • PCA-Initialized Deep Neural Networks Applied To Document Image Analysis Link
  • Understanding the difficulty of training deep feedforward neural networks Link
  • Unitary Evolution of RNNs Link

Momentum in Optimization

  • RETHINKING THE HYPERPARAMETERS FOR FINE-TUNING Link
  • Momentum Residual Neural Networks Link
  • Smooth momentum: improving lipschitzness in gradient descent Link
  • Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning link

Batch size Optimiation

  • ON LARGE-BATCH TRAINING FOR DEEP LEARNING: GENERALIZATION GAP AND SHARP MINIMALink
  • Revisiting Small Batch Training for Deep Neural Networks Link
  • LARGE BATCH TRAINING OF CONVOLUTIONAL NETWORKS Link
  • Large Batch Optimization for Deep Learning: Training BERT in 76 minutes Link
  • DON’T DECAY THE LEARNING RATE, INCREASE THE BATCH SIZE Link

Degeneracy of Neural Networks

  • Exact solutions to the nonlinear dynamics of learning in deep linear neural networks Link
  • Avoiding pathologies in very deep networks Link
  • Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice Link
  • SKIP CONNECTIONS ELIMINATE SINGULARITIES Link
  • How degenerate is the parametrization of neural networks with the ReLU activation function? Link
  • Theory of Deep Learning III: explaining the non-overfitting puzzle Link
  • Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks Link
  • Understanding Deep Learning: Expected Spanning Dimension and Controlling the Flexibility of Neural Networks Link
  • The Loss Surface Of Deep Linear Networks Viewed Through The Algebraic Geometry Lens Link
  • PYHESSIAN: Neural Networks Through the Lens of the Hessian Link

Convergencec Analysis in Deep Learning

  • A CONVERGENCE ANALYSIS OF GRADIENT DESCENT FOR DEEP LINEAR NEURAL NETWORKS Link
  • A Convergence Theory for Deep Learning via Over-Parameterization Link
  • Convergence Analysis of Homotopy-SGD for Non-Convex Optimization Link

Multi-Task Learning with curricula

  • Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning. Link
  • Learning a Multitask Curriculum for Neural Machine Translation. Link
  • Self-paced Curriculum Learning. Link
  • Curriculum Learning of Multiple Tasks. Link

Constrained Optimization for Deep Learning

  • A Primal-Dual Formulation for Deep Learning with Constraints Link

Reinforcement Learning and Curriculum

  • Object-Oriented Curriculum Generation for Reinforcement Learning Link
  • Teacher-Student Curriculum Learning Link

Tutorials, Surveys and Blogs

  • Curriculum Learning: A Survey Link
  • A Comprehensive Survey on Curriculum Learning Link
  • https://www.offconvex.org/
  • An overview of gradient descent optimization algorithms [Link]
  • Review of second-order optimization techniques in artificial neural networks backpropagation Link
  • Linear Algebra and data Link
  • Why Momentum really works?[Blog]
  • Optimization [Book]
  • Optimization for deep learning: theory and algorithms Link
  • Generalization Error in Deep Learning Link
  • Automatic Differentiation in Machine Learning: a Survey Link
  • Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey Link
  • Automatic Curriculum Learning For Deep RL: A Short Survey Link
  • The Generalization Mystery: Sharp vs Flat Minima Link

Contributing

If you've found any informative resources that you think belong here, be sure to submit a pull request or create an issue!

If you find this helpful, I can enjoy a coffee donation :)