ds-uberhacks

Repo to compliment my 10 Data Science Uberhacks to Turbocharge your workflow presentation.

It has my slides, this README with a bunch of links in and a basic Makefile.

The Uberhacks

1. Github Awesomeness

Awesome curated content. Great for researching and finding more uberhacks.

Some good awesome lists:

2. D-Tale

Eyeball data easily. Use it instead of MS Excel.

3. YData Profiling

EDA as a Service.

4. TheFuzz

Fuzzy String Matching.

TheFuzz Repo

5. UK Open Data

So. Much. Data.

6. Yellowbrick

Easy AI Visualisation.

Yellowbrick Repo

7. Shap

AI Explainability

Shap Repo

8. Fairlearn

Non-discriminatory AI.

Fairlearn Repo

9. Metaflow

Easy Pipelines.

Metaflow site

10. Make

CLI maker-easier.

Make site

Usage

Check the Makefile in this repo - it contains some basic recipes for creating a conda environment and running a main.py file, but you can add stuff like Docker, Cloud tasks etc. to it... Anything and everything involving the CLI.
The Makefile is linked to the .env file. If you specify a variable in the .env, Make will read it and use it.
Using Make is simple: make <command>. You can type make help or just make for a list of commands.
For the create-environment command, Make will install everything in requirements-conda.txt and then everything in requirements-pip.txt
I've recently stopped trying to conda install anything. The general consensus is that it's broken due to the length of time it takes solving environments and such. As such, all requirements are in the requirements-pip.txt file. I'll still use conda as my package manager because it's interoperable with cloud platforms and MLFlow, but yeah conda install sucks now. I hope they find a way to fix it 😫.
Feel free to use this Makefile and setup as a base. I can't claim credit for it. I stole the Makefile from Yuxiang Gong's tutorial which is also probably a good place to start for learning more about it.
Remember... Make is pre-installed on linux, available via XCode on mac and via choco on Windows!
Last thing I promise! nbstripout is awesome and will remove your notebook output cells from git so you don't commit sensitive data. EVERYONE should be using it! In hindsight, it should have made the top 10. Maybe next time...

More Uberhacky Things...

These didn't make the list for one reason or another, but are still worth checking out!

polars: High Performance Computing (HPC) in Python
dask: Like Spark but on your machine
splink: Data linking at scale
ydata-synthetic: Synthetic data generator
python-dp: Python data privacy
scikit-plot: AI explanation & visualisation
lime: Quicker explainable AI
pyLDAvis: Interactive topic model visualisation
mlflow: AI tracking & serving
nbstripout: Remove Jupyter output cells from git
loguru: Easy, colourful logging
Excalidraw: Online collaborative whiteboarding
Github copilot ($): Predictive coding AI
DuckDB: SQLite for analytics
Streamlit: R Shiny! In Python
SparseSC: Synthetic Control & A/B testing

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.env		.env
DSF 2023 10 Data Science Uberhacks.pdf		DSF 2023 10 Data Science Uberhacks.pdf
Makefile		Makefile
README.md		README.md
requirements-conda.txt		requirements-conda.txt
requirements-pip.txt		requirements-pip.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ds-uberhacks

The Uberhacks

1. Github Awesomeness

2. D-Tale

3. YData Profiling

4. TheFuzz

5. UK Open Data

6. Yellowbrick

7. Shap

8. Fairlearn

9. Metaflow

10. Make

Usage

More Uberhacky Things...

About

Releases

Packages

Languages

station-10/ds-uberhacks

Folders and files

Latest commit

History

Repository files navigation

ds-uberhacks

The Uberhacks

1. Github Awesomeness

2. D-Tale

3. YData Profiling

4. TheFuzz

5. UK Open Data

6. Yellowbrick

7. Shap

8. Fairlearn

9. Metaflow

10. Make

Usage

More Uberhacky Things...

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages