Skip to content

sayantikabanik/DataJourney

Repository files navigation

License Code of Conduct
CI github-repo-stats Deploy DataJourney Stats Lint prose

DJ rocks

🚌 What's DataJourney?

DataJourney demonstrates how organizations can effectively manage and utilize data by harnessing the power of open-source technologies. It's designed to help navigate the complex landscape of data tools, offering a structured approach to building scalable, and reproducible data workflows.

Built on open-source principles, the framework guides users through essential steps—from identifying goals and selecting tools to testing and customising workflows. With its flexible, modular design, DataJourney can be tailored to individual needs, making it an invaluable toolkit for data professionals.

🧱 Design Philosophy

A mesh with additive, subtractive capabilities glued with open source.
{More...coming soon}

🛠 Current workflows covered

{✨= Experimental, ✅ = Implemented}

Python Packaging framework design principles
GitHub actions configured
Vale.sh configured at PR level
Pre-commit hooks configured for code linting/formatting
✅ Environment management via pixi
✅ Reading data from online sources using intake
✅ Sample pipeline built using Dagster
✅ Building Dashboard using holoviews + panel
✅ Exploratory data analysis (EDA) using mito
✅ Web UI build on Flask
✅ Web UI re-done and expanded with FastHTML
✅ Leverage AI models to analyse data GitHub AI models Beta

📊 Repository stats

⚙️ Managed by GitHub Action: https://github.com/jgehrcke/github-repo-stats
⏳ Configured to run daily at 23:55:00 IST
📬 Checkout daily reports generated: DataJourney Stats on Web

Dataset metadata/citations

van Woesik, R., Burkepile, D. (2022) Bleaching and environmental data for global coral reef sites from 1980-2020. Biological and Chemical Oceanography Data Management Office (BCO-DMO). (Version 2) Version Date 2022-10-14 [if applicable, indicate subset used]. doi:10.26008/1912/bco-dmo.773466.2 [access date]
Terms of Use
This dataset is licensed under Creative Commons Attribution 4.0 (https://creativecommons.org/licenses/by/4.0/)

Environment setup using pixi:

Installing pixi & getting started

  • Download pixi : prefix.dev
  • Activate env: pixi shell
  • List all the tasks: pixi task list
  • Execute a task from the list: pixi run <TASK>
  • Execute a task with verbosity enabled: pixi run -v <TASK>

current tasks present under DJ

  • GIT_TOKEN_CHECK
  • DJ_package
  • DJ_pre_commit
  • DJ_dagster
  • DJ_fasthtml_app
  • DJ_flask_app
  • DJ_mito_app
  • DJ_panel_app
  • DJ_ai_models

Install the package locally

pixi run DJ_package

🔌 About pre-commit-hooks and activating

Just like the name suggests, pre-commit-hooks are designed to format the code based on PEP standards before committing. More details

pixi run DJ_pre_commit

Commands to run modules under DataJourney

Dagster UI

pixi run DJ_dagster

Dagit UI output

Panel app

pixi run DJ_panel_app

NOTE: The dashboard generated is exported into HTML format and saved as stock_price_twilio_dashboard

Panel app output

Mito

To explore further visit trymito.io

pixi run DJ_mito_app
mito_output mito_output

Display all data sources present via web UI

# Run FastHTML app
pixi run DJ_fasthtml_app

data_sources_fasthtml.png

Executing LLM script to analyse data

pixi run DJ_ai_models