- The First Law of Complexodynamics
- The Unreasonable Effectiveness of Recurrent Neural Networks
- Understanding LSTM Networks
- Recurrent Neural Network Regularization
- Keeping Neural Networks Simple by Minimizing the Description Length of the Weights
- Pointer Networks
- ImageNet Classification with Deep Convolutional Neural Networks
- Order Matters: Sequence to Sequence for Sets
- GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism
- Deep Residual Learning for Image Recognition
- Multi-Scale Context Aggregation by Dilated Convolutions
- Neural Message Passing for Quantum Chemistry
- Attention is All You Need
- Neural Machine Translation by Jointly Learning to Align and Translate
- Identity Mappings in Deep Residual Networks
- A Simple Neural Network Module for Relational Reasoning
- Variational Lossy Autoencoder
- Relational Recurrent Neural Networks
- Quantifying the Rise and Fall of Complexity in Closed Systems: the Coffee Automaton
- Neural Turing Machines
- Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
- Scaling Laws for Neural Language Models
- A Tutorial Introduction to the Minimum Description Length Principle
- Machine Super Intelligence
- Kolmogorov Complexity and Algorithmic Randomness
- Stanford’s CS231n Convolutional Neural Networks for Visual Recognition
- Better & Faster Large Language Models Via Multi-token Prediction
- Key Takeaways:
- Dense Passage Retrieval for Open-Domain Question Answering
- Dense Passage Retriever (DPR):
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- Zephyr: Direct Distillation of LM Alignment
- Lost in the Middle: How Language Models Use Long Contexts