I wanted a list of all of the things that Yannic Kilcher has included in the Helpful Things section of his ML News videos. I didn't really check to see whether anyone else had already compiled this list. I'm mostly just doing this for my personal reference.
- TensorFlow Decision Forests
- Habitat
- Falken
- Brax
- AlphaFold Protein Structure Database
- Triton
- A Fast Library for Automated Machine Learning & Tuning (FLAML)
- Italian CLIP
- Melting Pot
- Multi-Joint Dynamics with Contact (MuJoCo)
- Gym
- PyTorch Profiler or (https://pytorch.org/blog/pytorch-profiler-1.9-released/)
- robomimic
- Droidlet
- Unidentified Video Objects dataset
- C4_200M Synthetic Dataset for Grammatical Error Correction
- Ultimate Volleyball
- Maze Applied Reinforcement Learning Framework (MazeRL)
- Wanderer 2 HackerNews Search
- Nimble: Physics Engine for Deep Learning
- Large Vocabulary Instance Segmentation dataset (LVIS)
- Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments (BEHAVIOR)
- A Gentle Introduction to Graph Neural Networks
- Understanding Convolutions on Graphs
- img2dataset
- Vision Library for Self-Supervised Learning (VISSL)
- PyTorch Geometric (https://github.com/pyg-team/pytorch_geometric)
- Amazon S3 plugin for PyTorch
- Infinity
- Text-based Noun Phrase Enrichment (TNE) dataset
- Transformer-based Optical Character Recognition with Pre-trained Models (TrOCR)
- KaoKore dataset
- Real-world Annotated Few-shot Tasks (RAFT) benchmark
- EDGAR-CORPUS dataset
- Pictures Without Humans for Self-Supervised Pretraining (PASS) dataset or (https://www.robots.ox.ac.uk/~vgg/research/pass/) or (https://zenodo.org/record/5570664)
- TorchData
- A Heterogeneous Benchmark for Information Retrieval (BEIR)
- Bayesian Optimization Book
- Imaginaire
- ControlFlag
- SaLinA
- ydata-synthetic
- Aim
- RobustBench
- Pywick
- LEXA benchmark
- Crafter benchmark
- Lightweight Hyperparameter Optimization (mle-hyperopt)
- MobileViT Keras Tutorial
- Image captioning demo
- bitsandbytes
- User-friendly introduction to PAC-Bayes bounds
- xFormers
- Speech Processing Universal Performance Benchmark (SUPERB)
- Common Crawl Question Answering (CCQA) dataset
- Bagua
- Treex
- PyTorch Lightning
- League of Legends game-playing dataset
- iris
- rliable
- MedMNIST
- GoEmotions dataset
- Language Interpretability Tool (LIT)
- ruDALL-E Kandinsky 12 billion parameter checkpoint
- Kalidokit
- ruDALL-E Emojich
- PyFlow
- TensorFlow Graph Neural Networks
- PyDreamer
- CodeGenX
- Heyoh Camera for Mac
- MacBERTh
- TensorRT
- Opacus
- CLIP-guided collage
- Snowball Fight
- arxiv-sanity-lite
- Koila
- EfficientZero
- Hugging Face Transformers
- Cohere's explanation of langauge models
- GitHub's code search
- Data Measurements Tool
- Responsible AI Toolbox
- minitorch
- Pandas tutor
- Yuno
- Driven Data
- Ultimate Tic-Tac-Toe but Alpha-Zero
- NL-Augmenter
- 33 psychology datasets
- minDALL-E
- DALL-E Mini
- Arnheim
- Document Understanding (DUE) benchmark
- Question Answering with Long Input Texts, Yes! (QuALITY) dataset
- Perceiver IO
- Deriving convolutions from first priciples
- Google's distilled datasets
- Heteroscedastic Evolutionary Bayesian Optimisation (HEBO)
- ruDALL-E XXL API
- Deepchecks
- DagsHub
- Bayesian Modeling and Computation in Python
- Machine Learning Contests
- ray-skorch
- skorch
- RumbleDB
- JAX models
- Self-Supervised Speech Pre-training and Representation Learning Toolkit (S3PRL)
- Fast Forward Computer Vision and other things (FFCV)
- Optimal Transport Tools (OTT)
- DietGPU
- Know Your Data
- CLIP weights released
- On Neural Differential Equations
- PGMax (probabalistic graph models)
- Diambra
- Python-FHEz (fully homomorphic encryption)
- TorchMetrics
- EvoJAX
- OpenAI Gym documentation
- Evolution Gym (EvoGym)
- Stable-baselines3 in Hugging Face Hub
- Transformer Reinforcement Learning (trl)
- Rubrix
- Kubric
- Composer
- MuJoCo Python bindings
- Monte Carlo Tree Search in JAX (Mctx)
- Pipeline Abstractions for Deep Learning (PADL)
- Did it spill?
- TorchData and torchfuncs
- STUMPY
- Fast TreeSHAP
- JaxTon
- Novelty MiniGrid (NovGrid)
- Isaac Gym
- Squirrel
- PyScript
- Big Vision
- DeepSportRadar Challenges
- HuggingNFT
- Jacob Hilton's Deep Learning Cirriculum
- Pen and Paper Exercises in Machine Learning
- Quaterion
- TorchDim
- Composer
- mmap.ninja
- Stanford CS25 Transformers United
- Shifts Challenge
- Visualising ML number formats
- GriddlyJS
- Grand Teton
- Huggingface Diffusers support JAX/Flax
- Muse
- Transformer Reinforcement Learning X (TRLX)
- RL Baselines3 Zoo
- JAXSeq
- Albumentations
- Latent Diffusion Model MRI brain scans dataset (LDM 100k)
- CodeGeex
- AITemplate
- nerfstudio
- NerfAcc
- dstack
- OpenAI Whisper - CPU
- diffusiondb prompt dataset
- Public Prompts
- visualise.ai
- PromptSource
- Lexicap
- SD PixelArt SpriteSheet Generator
- Prompt Extend
- Fine-Tune Whisper
- Dream Textures
- Reinforcement Learning Fundamentals
- Lovely Tensors
- LlamaIndex
- Stable Diffusion Upscaler
- DagsHub Direct Data Access
- Build an End-2-End Active Learning Pipeline
- GPU Environment Management (genv)
- mxeval
- tsai
- ColoDiffusion
- TAP-Vid
- SuperGradients
- Shumai
- safetensors
- Versatile Learned Optimizers (VeLO)
- Merlin Dataloader
- Lexicographical Order Descent Assembly (LODA)
- numga
- Massive Text Embedding Benchmark leaderboard
- natbot
- Swift Diffusers: Fast Stable Diffusion for Mac
- pyribs
- PimEyes
- CACTI framework
- ControlNet
- https://learnprompting.org/
- NeMo Guardrails
- LMQL
- Pandas AI
- Lamini
- DeepFloyd IF
- Shimmy
- Embeddings for Wikipedia
- A Cookbook of Self-Supervised Learning
- H2OGPT
- H2O LLM Studio
- Amateur Drawings Dataset
- CAMEL
- Paella
- Sebastian Raschka's blog
- Pick a Pic
- AugLy
- DeepLab2
- VoxPopuli dataset
- Common Objects in 3D (C03D) dataset
- WarpDrive
- LAION-400M image dataset
- H3 hexagonal coordinate system
- alpha zero explanation
- Laion-5B image-text pairs
- OpenCLIP
- SalesForce CodeGen
- Bloom
- YaLM
- data2vec 2.0
- Casual Conversations v2 dataset
- GILGEN
- StarCoder
- RedPajama
- OpenLLaMA
- MPT-7B
- YOLO-NAS
- OpenLLaMA