Skip to content

Commit

Permalink
Merge pull request #3762 from Alex-Monahan/pyodide
Browse files Browse the repository at this point in the history
Pyodide blog post!
  • Loading branch information
szarnyasg authored Oct 2, 2024
2 parents d2ca672 + 6b4e9f2 commit 6a70562
Show file tree
Hide file tree
Showing 2 changed files with 240 additions and 0 deletions.
207 changes: 207 additions & 0 deletions _posts/2024-10-02-pyodide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
---
layout: post
title: "DuckDB in Python in the Browser with Pyodide, PyScript, and JupyterLite"
author: "Alex Monahan"
thumb: "/images/blog/thumbs/241002.svg"
excerpt: "Run DuckDB in an in-browser Python environment to enable simple querying on remote files, interactive documentation, and easy to use training materials."
---

## Time to “Hello World”

The first time that you are using a new library, the most important thing is how quickly you can get to “Hello World”.

> Note Want to see “Hello World”?
> [Jump to the **fully interactive** examples!]({% post_url 2024-10-02-pyodide %}#pyscript-editor)
Likewise, if someone is visiting any documentation you have written, you want them to quickly and easily get your tool up and running.
When you are giving a demo, you want to avoid "demo hell" and have it work the first try!

If you want to try “expert mode”, try leading an entire conference room of people through those setup steps!
The classroom or conference workshop environment makes it far more critical that installation be bulletproof.

Python is one of our favorite ways to use DuckDB, but Python is notoriously difficult to set up – doubly so for a novice programmer.
What the heck is a virtual environment?
Are you on Windows, Linux, or Mac?
Pip or Conda?
The new kid on the block uv?

Experienced Pythonistas are not immune either!
Many, like me, have been forced to celebrate the time honored and [xkcd-chronicled](https://xkcd.com/1987/) tradition of just wiping everything related to Python and starting from scratch.

How can we make it as easy and as fast as possible to test out DuckDB in Python?

## Difficulties of Server-Side Python

One response to this challenge is to host a Python environment on a server for each of your users.
This has a number of issues.

Hosting Python on a server yourself is not free.
If you have many users, it can be far from free.

If you want to use a free solution like Google Colab, each visitor will need a Google account, and you'll need to be comfortable with Google accessing your data.
Plus, it is hard to embed within an existing web page for a seamless experience.

## Enter Pyodide

[Pyodide](https://pyodide.org/) is a way to run Python directly in your browser with no installation and no setup, thanks to the power of WebAssembly.
That makes it the easiest and fastest way to get a Python environment up and running – just load a web page!
All computation happens locally, so it can be served like any static website with tools like GitHub Pages.
No server-side Python required!

Another benefit is that Pyodide is nicely sandboxed in the browser environment.
Each user gets their own workspace, and since it is all local, it is nice and secure.

Part of what sets Pyodide apart from other in-browser-Python approaches is that it can even run libraries that are written in C, C++, or even Fortran, including much of the Python data science stack.
This means that now you can use DuckDB in Pyodide as well!
You can even combine it with NumPy, SciPy, and Pandas (in addition to many pure-Python libraries).
PyArrow and Ibis have experimental support also.

## Use Cases for Pyodide DuckDB

### Want to quickly analyze some remote data using either Python or DuckDB?

Pyodide is the fastest way to get your questions answered using Python.

### Want to quickly analyze some local data?

Pyodide can also [query local files](https://pyodide.org/en/stable/usage/accessing-files.html)!

### Want to make your documentation interactive?

Let your users test out your DuckDB-powered library with ease.
We will see an example below that demonstrates the [`magic-duckdb` Jupyter plugin](https://github.com/iqmo-org/magic_duckdb) to enable SQL cells.

### Leading a training session with DuckDB and Python?

Skip the hassles of local installation.
There is no need to work 1:1 with the 15% of folks in the audience with some quirky setup!
Everyone will get this to work on the first try, in seconds, so you can get to the content you want to teach.
Plus, it is free, with no signup required of any kind!

## Pyodide Examples

We will cover multiple ways to embed Pyodide-powered-Python directly into your site, so your users can try out your new DuckDB-backed tool with a single click!

* PyScript Editor
* An editor with nice syntax highlighting
* JupyterLite Notebook
* A classic notebook environment
* JupyterLite Lab IDE
* A full development environment

### PyScript Editor

This HTML snippet will embed a runnable PyScript editor into any page!

```html
<script type="module" src="https://pyscript.net/releases/2024.8.2/core.js"></script>
<script type="py-editor" config='{"packages":["duckdb"]}'>
import duckdb
print(duckdb.sql("SELECT '42 in an editor' AS s").fetchall())
</script>
```

Just click the play button and you can execute a DuckDB query directly in the browser.
You can edit the code, add new lines, etc.
Try it out!

{::nomarkdown}

<script type="module" src="https://pyscript.net/releases/2024.8.2/core.js"></script>
<script type="py-editor" config='{"packages":["duckdb"]}'>
import duckdb
print(duckdb.sql("SELECT '42 in an editor' AS s").fetchall())
</script>

{:/nomarkdown}

### JupyterLite Notebook

Here is an example of using an `iframe` that points to a JupyterLite environment that was deployed to GitHub Pages!

```html
<iframe src="https://alex-monahan.github.io/jupyterlite_duckdb_demo/notebooks/index.html?path=hello_duckdb.ipynb" style="height: 600px; width: 100%;"></iframe>
```

This is a fully interactive Python notebook environment, with DuckDB running inside.
Feel free to give it a run!

{::nomarkdown}

<iframe src="https://alex-monahan.github.io/jupyterlite_duckdb_demo/notebooks/index.html?path=hello_duckdb.ipynb" style="height: 600px; width: 100%;"></iframe>

{:/nomarkdown}

Configuring a full JupyterLite environment is only a few steps!
The JupyterLite folks have built a demo page that serves as a template and have some [great documentation](https://jupyterlite.readthedocs.io/en/latest/quickstart/deploy.html).
The main steps are to:

1. Use the JupyterLite Demo Template to create your own repo
2. Enable GitHub Pages for that repo
3. Add and commit a .ipynb file in the `content` folder
4. Visit `https://⟨YOUR_GITHUB_USERNAME⟩.github.io/⟨YOUR_REPOSITORY_NAME⟩/notebooks/index.html?path=⟨YOUR_NOTEBOOK.ipynb⟩`

Note that it can take a couple of minutes for GitHub Pages to deploy.
You can monitor the progress on GitHub's Actions tab.

### JupyterLite Lab IDE

After following the steps in the JupterLite Notebook setup, if you change your URL from `/notebooks/` to `/lab/`, you can have a full IDE experience instead!
This form factor is a bit harder to embed in another page, but great for interactive use.

This example uses the [`magic-duckdb` Jupyter extension](https://github.com/iqmo-org/magic_duckdb) that allows us to create SQL cells using `%%dql`.

[Follow this link to see the Lab IDE interface](https://alex-monahan.github.io/jupyterlite_duckdb_demo/lab/index.html?path=magic_duckdb.ipynb), or experiment with the Notebook-style version below.

```html
<iframe src="https://alex-monahan.github.io/jupyterlite_duckdb_demo/notebooks/index.html?path=magic_duckdb.ipynb" style="height: 600px; width: 100%;"></iframe>
```

{::nomarkdown}

<iframe src="https://alex-monahan.github.io/jupyterlite_duckdb_demo/notebooks/index.html?path=magic_duckdb.ipynb" style="height: 600px; width: 100%;"></iframe>

{:/nomarkdown}

## Architecture of DuckDB in Pyodide

So how does DuckDB work in Pyodide exactly?
The DuckDB Python client is compiled to WebAssembly (Wasm) in its entirety.
This is different than the existing [DuckDB Wasm](https://github.com/duckdb/duckdb-wasm) approach, since that is compiling the C++ side of the library only and wrapping it with a JavaScript API.
Both approaches use the Emscripten toolchain to do the Wasm compilation.
It is DuckDB's design decision to avoid dependencies and the prior investments in DuckDB-Wasm that made this feasible to build in such a short period of time!

The Pyodide team has added DuckDB to their hosted repository of libraries, and even set up DuckDB to run as a part of their CI/CD workflow.
That is what enables JupyterLite to simply run `%pip install duckdb`, and PyScript to specify DuckDB as a package in the `py-editor config` parameter or in the `<py-config>` tag.
Pyodide then downloads the Wasm-compiled version of the DuckDB library from Pyodide's repository.
We want to send a big thank you to the Pyodide team, including [Hood Chatham](https://github.com/hoodmane) and [Gyeongjae Choi](https://github.com/ryanking13), as well as the Voltron Data team including [Phillip Cloud](https://github.com/cpcloud) for leading the effort to get this to work.

### Limitations

Running in the browser is a more restrictive environment (for security purposes), so there are some limitations when using DuckDB in Pyodide.
There is no free lunch!

* Single-threaded
* Wasm does not have the concept of threads, yet!
* A few extra steps to query remote files
* Remote files can't be accessed by DuckDB directly
* Instead, pull the files locally with Pyodide first
* DuckDB-Wasm has custom enhancements to make this possible, but these are not present in DuckDB's Python client
* No runtime-loaded extensions
* Several extensions are automatically included: `parquet`, `json`, `icu`, `tpcds`, and `tpch`.
* Release cadence aligned with Pyodide
* At the time of writing, duckdb-pyodide is at 1.0.0 rather than 1.1.1

## Conclusion

Pyodide is now the fastest way to use Python and DuckDB together!
It is also an approach that scales to an arbitrary number of users because Pyodide's computations happen entirely locally.

We have seen how to embed Pyodide in a static site in multiple ways, as well as how to read remote files.

If you are excited about DuckDB in Pyodide, feel free to join us on Discord.
We have a `#show-and-tell` channel where you can share what you build with the community.
You are also welcome to explore the [duckdb-pyodide repo](https://github.com/duckdb/duckdb-pyodide) and report any issues you find.
We would also really love some help with enabling runtime-loaded extensions – please reach out if you can help!

Happy quacking about!
33 changes: 33 additions & 0 deletions images/blog/thumbs/241002.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 6a70562

Please sign in to comment.