Bluprint is a command line utility for creating data science project templates, allowing R and Jupyter notebooks seamless access to configuration, data and shared code in this type of structure:
my_project ├── conf │ └── data.yaml # YAML config with data paths ├── data # Store smaller data │ ├── emailed │ │ └── messy.xlsx │ └── user_processed.csv ├── notebooks # Notebooks │ └── process.ipynb └── my_project # Local Python package used by my_project └── shared_code.py
Configuration conf/data.yaml contains either absolute paths or paths relative to the my_project/data/:
emailed:
messy: 'emailed/messy.xlsx'
user:
processed: 'user_processed.csv'
Notebooks can then easily import myproject.shared_code and file paths:
from bluprint.config import load_data_yaml
data = load_data_yaml() # By default loads conf/data.yaml
# Load data in a portable manner
import pandas as pd
messy_df = pd.read_xlsx(data.emailed.messy)
extras_df = pd.read_xlsx(data.remote.extras)
# Load shared code functions as Python modules
# in any notebook anywhere in this project.
from my_project.shared_code import transform_data
transformed_df = transform_data(messy_df, extras_df)
# Save output
transformed_df.to_csv(data.user.processed)
For a working demonstration of a shareable project see https://github.com/igor-sb/bluprint-demo/.
- Write portable notebooks by separating code from configuration - file paths are in YAML configs, loaded with load_data_yaml() and load_config_yaml()
- R/Python packages are version-locked with renv and uv
- Import packaged code as Python modules
- Packaged code can be shared across different projects with pip install
- Use both Python and R notebooks in a single project (see Python/R projects)
- Share entire projects by copying a project directory and running uv venv && uv sync
- Works with common data science IDEs (RStudio, VSCode), notebook tools for linting (nbqa), notebook version control (nbstripout) or workflows (Ploomber)
Full documentation available at: https://igor-sb.github.io/bluprint/.
Install uv 0.4.12 which is a last confirmed
working version and run uv tool install bluprint
.
For R projects, renv R package is required for creating Bluprint projects with R support.
Bluprint integrates:
Bluprint is inspired by these resources:
- Cookiecutter Data Science
- RStudio Projects
- Ploomber
- Microsoft Team Data Science Process
- R for Data Science (2e): 6. Workflow: scripts and projects
- Vincent D. Warmerdam: Untitled12.ipynb | PyData Eindhoven 2019
Bluprint is released under MIT license.