A deep dive into the Arrow Columnar format with pyarrow and nanoarrow

Repository for the Arrow Columnar Format Tutorial at PyCon DE & PyData Berlin 2024.

Apache Arrow has become a de-facto standard for efficient in-memory columnar data representation. You might have heard about Arrow or using Arrow, but do you understand the format and why it's so useful? This tutorial will dive deep into the details of the Arrow columnar format, the different types and buffer layouts, and explore those details interactively using the pyarrow and nanoarrow libraries.

Setup

To run this tutorial locally, follow those steps:

Clone this repository

git clone https://github.com/voltrondata-labs/2024-arrow-format-tutorial.git

Install the necessary packages

The code examples require numpy, pyarrow and nanoarrow, and JupyterLab (or alternative) to run the notebooks. We need the latest (not-yet-released) version of nanoarrow, but you can install that using the nightly wheels:

pip install jupyterlab numpy pandas pyarrow
pip install --extra-index-url https://pypi.fury.io/arrow-nightlies/ --pre nanoarrow

We recommend to create an environment, either with conda/mamba:

conda create -n arrow-tutorial python numpy pandas jupyterlab
conda activate arrow-tutorial
python -m pip install pyarrow
python -m pip install --extra-index-url https://pypi.fury.io/arrow-nightlies/ --pre nanoarrow

or create a virtual environment:

cd 2024-arrow-format-tutorial
python -m venv .venv
source .venv/bin/activate
pip install jupyterlab numpy pandas pyarrow
pip install --extra-index-url https://pypi.fury.io/arrow-nightlies/ --pre nanoarrow

Launch Jupyter

From the repo directory:

jupyter lab

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
diagrams		diagrams
.gitignore		.gitignore
1-intro.ipynb		1-intro.ipynb
2-primitive-layouts.ipynb		2-primitive-layouts.ipynb
3-nested-layouts.ipynb		3-nested-layouts.ipynb
4-all-types.ipynb		4-all-types.ipynb
5-c-data-interface.ipynb		5-c-data-interface.ipynb
LICENSE		LICENSE
Presentation.pdf		Presentation.pdf
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A deep dive into the Arrow Columnar format with pyarrow and nanoarrow

Setup

About

Releases

Packages

Contributors 4

Languages

License

voltrondata-labs/2024-arrow-format-tutorial

Folders and files

Latest commit

History

Repository files navigation

A deep dive into the Arrow Columnar format with pyarrow and nanoarrow

Setup

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages