Sharing a Curated List of Machine learning papers to get you started in the world of Machine Learning Research.
If you want to collaborate in the repo, reach me @ Mail
- It's understandable to feel overwhelmed with this much volume but take it at your own pace, the amount of time you'll need to complete the papers will be huge especially if you're new to the space but hang in there, believe in the process. You'll build your understanding and a method of reading a research paper. Best wishes and I hope you enjoy this long journey.
- Deep Learning in Neural Networks: An Overview
- An overview of gradient descent optimization algorithms
- On the Opportunities and Risks of Foundation Models
- Multimodal Foundation Models: From Specialists to General-Purpose Assistants
- Large Multimodal Models: Notes on CVPR 2023 Tutorial
- Towards Generalist Biomedical AI
- A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT
- Interactive Natural Language Processing
- Towards Reasoning in Large Language Models: A Survey
- Recurrent Neural Networks (RNNs): A gentle Introduction and Overview
- Long Short-Term Memory
- Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
- Highway Networks
- Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
- Sequence to Sequence Learning with Neural Networks
- ImageNet Classification with Deep Convolutional Neural Networks
- An Introduction to Convolutional Neural Networks
- U-Net: Convolutional Networks for Biomedical Image Segmentation
- Deep Residual Learning for Image Recognition
- Densely Connected Convolutional Networks
- Aggregated Residual Transformations for Deep Neural Networks
- Sequence to Sequence Learning with Neural Networks
- Thumbs up? Sentiment Classification using Machine Learning Techniques
- A survey of named entity recognition and classification
- Teaching Machines to Read and Comprehend
- Deep neural networks for acoustic modeling in speech recognition
- A Neural Attention Model for Sentence Summarization
- Microsoft COCO: Common Objects in Context
- Rich feature hierarchies for accurate object detection and semantic segmentationFully Convolutional Networks for Semantic SegmentationDeepFace: Closing the Gap to Human-Level Performance in Face VerificationDeepPose: Human Pose Estimation via Deep Neural Networks
- Fully Convolutional Networks for Semantic Segmentation
- DeepFace: Closing the Gap to Human-Level Performance in Face Verification
- DeepPose: Human Pose Estimation via Deep Neural Networks
- Neural Machine Translation by Jointly Learning to Align and Translate
- Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
- Attention-Based Models for Speech Recognition
- Attention Is All You Need
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
- Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
- Training data-efficient image transformers & distillation through attention
- Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
- FNet: Mixing Tokens with Fourier Transforms
- Random Feature Attention
- ETC: Encoding Long and Structured Inputs in Transformers
- Longformer: The Long-Document Transformer
- Generating Long Sequences with Sparse TransformersLinformer: Self-Attention with Linear Complexity
- Linformer: Self-Attention with Linear Complexity
- Parameter-Efficient Transfer Learning for NLP
- LoRA: Low-Rank Adaptation of Large Language Models
- The Power of Scale for Parameter-Efficient Prompt Tuning
- It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
- Making Pre-trained Language Models Better Few-shot Learners
- Deep contextualized word representations
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Unified Language Model Pre-training for Natural Language Understanding and Generation
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
- RoBERTa: A Robustly Optimized BERT Pretraining Approach
- ERNIE: Enhanced Language Representation with Informative Entities
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Curated List of Research Papers[Large Language Models]
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- Language Models are Few-Shot Learners
- Language Models are Unsupervised Multitask Learners
- Evaluating Large Language Models Trained on Code
- PaLM: Scaling Language Modeling with Pathways
- Llama 2: Open Foundation and Fine-Tuned Chat Models
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
- GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
- CoLT5: Faster Long-Range Transformers with Conditional Computation
- BERT Loses Patience: Fast and Robust Inference with Early Exit
- Fast Inference from Transformers via Speculative Decoding
- Training language models to follow instructions with human feedback
- Finetuned Language Models Are Zero-Shot Learners
- Multitask Prompted Training Enables Zero-Shot Task Generalization
- LIMA: Less Is More for Alignment
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- ZEPHYR: DIRECT DISTILLATION OF LM ALIGNMENT
- Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
- RoFormer: Enhanced Transformer with Rotary Position Embedding
- Fast Transformer Decoding: One Write-Head is All You Need
- GQA: Training Generalized Multi-Query Transformer Models from Multi-Head CheckpointsFlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessEfficient Memory Management for Large Language Model Serving with PagedAttention
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessEfficient Memory Management for Large Language Model Serving with PagedAttention
- Efficient Memory Management for Large Language Model Serving with PagedAttention
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
- 8-bit Optimizers via Block-wise Quantization
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- QLoRA: Efficient Finetuning of Quantized LLMs
- BitNet: Scaling 1-bit Transformers for Large Language Models
- Maximum Likelihood Training of Score-Based Diffusion Models
- Denoising Diffusion Implicit Models
- Denoising Diffusion Probabilistic Models
- DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps
- High-Resolution Image Synthesis with Latent Diffusion Models
- Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
- Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
- Hierarchical Text-Conditional Image Generation with CLIP Latents
- PIXART-alpha: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
- Learning Transferable Visual Models From Natural Language Supervision
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
- CoCa: Contrastive Captioners are Image-Text Foundation Models
- Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
- Sigmoid Loss for Language Image Pre-Training
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- Language Models are Few-Shot Learners
- Language Models are Unsupervised Multitask Learners
- Evaluating Large Language Models Trained on Code
- PaLM: Scaling Language Modeling with Pathways
- Llama 2: Open Foundation and Fine-Tuned Chat Models
- Toolformer: Language Models Can Teach Themselves to Use Tools
- Synergizing Reasoning and Acting in Language Models
- AgentBench: Evaluating LLMs as Agents
- ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
- ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings
- REALM: Retrieval-Augmented Language Model Pre-Training
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- Improving language models by retrieving from trillions of tokens
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
- REPLUG: Retrieval-Augmented Black-Box Language Models