Repo to compliment my 10 Data Science Uberhacks to Turbocharge your workflow presentation.
It has my slides, this README with a bunch of links in and a basic Makefile.
Awesome curated content. Great for researching and finding more uberhacks.
Some good awesome lists:
- Awesome Production Machine Learning
- Awesome Python
- Awesome Data Science
- Awesome Data Engineering
- Awesome Machine Learning
Eyeball data easily. Use it instead of MS Excel.
EDA as a Service.
Fuzzy String Matching.
So. Much. Data.
Easy AI Visualisation.
AI Explainability
Non-discriminatory AI.
Easy Pipelines.
CLI maker-easier.
- Check the Makefile in this repo - it contains some basic recipes for creating a conda environment and running a main.py file, but you can add stuff like Docker, Cloud tasks etc. to it... Anything and everything involving the CLI.
- The Makefile is linked to the
.env
file. If you specify a variable in the .env, Make will read it and use it. - Using Make is simple:
make <command>
. You can typemake help
or justmake
for a list of commands. - For the
create-environment
command, Make will install everything inrequirements-conda.txt
and then everything inrequirements-pip.txt
- I've recently stopped trying to
conda install
anything. The general consensus is that it's broken due to the length of time it takes solving environments and such. As such, all requirements are in therequirements-pip.txt
file. I'll still useconda
as my package manager because it's interoperable with cloud platforms and MLFlow, but yeahconda install
sucks now. I hope they find a way to fix it 😫. - Feel free to use this Makefile and setup as a base. I can't claim credit for it. I stole the Makefile from Yuxiang Gong's tutorial which is also probably a good place to start for learning more about it.
- Remember... Make is pre-installed on linux, available via XCode on mac and via choco on Windows!
- Last thing I promise!
nbstripout
is awesome and will remove your notebook output cells from git so you don't commit sensitive data. EVERYONE should be using it! In hindsight, it should have made the top 10. Maybe next time...
These didn't make the list for one reason or another, but are still worth checking out!
- polars: High Performance Computing (HPC) in Python
- dask: Like Spark but on your machine
- splink: Data linking at scale
- ydata-synthetic: Synthetic data generator
- python-dp: Python data privacy
- scikit-plot: AI explanation & visualisation
- lime: Quicker explainable AI
- pyLDAvis: Interactive topic model visualisation
- mlflow: AI tracking & serving
- nbstripout: Remove Jupyter output cells from git
- loguru: Easy, colourful logging
- Excalidraw: Online collaborative whiteboarding
- Github copilot ($): Predictive coding AI
- DuckDB: SQLite for analytics
- Streamlit: R Shiny! In Python
- SparseSC: Synthetic Control & A/B testing