Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add a module to read all Substrait plan formats #45

Draft
wants to merge 35 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
306ea96
Starting.
EpsilonPrime Feb 1, 2024
42cee0e
A working version but for some reason tests aren't working properly.
EpsilonPrime Feb 6, 2024
684d0ac
Now passes with hand-built library.
EpsilonPrime Feb 6, 2024
7287773
Corrected build instructions.
EpsilonPrime Feb 6, 2024
0f8074e
Remove some comments.
EpsilonPrime Feb 6, 2024
18b0bd8
Experiment adding the planloader to the build.
EpsilonPrime Feb 6, 2024
0f5922b
Now handles errors consistently.
EpsilonPrime Feb 6, 2024
bba04a3
Remove accidentally added test output file.
EpsilonPrime Feb 6, 2024
b06464c
Updated based on review.
EpsilonPrime Feb 9, 2024
e207987
Updated workflow.
EpsilonPrime Feb 9, 2024
50010a3
Added missing requirements to test workflow.
EpsilonPrime Feb 9, 2024
e91bb92
See if we can add a C++ library dependency.
EpsilonPrime Feb 9, 2024
7b716f5
Not that way.
EpsilonPrime Feb 9, 2024
b44d311
Another attempt at the C++ library dependency.
EpsilonPrime Feb 9, 2024
e2618b4
Not that way either.
EpsilonPrime Feb 9, 2024
09b2f30
Correct updated size type.
EpsilonPrime Feb 9, 2024
85e55a4
Always free the returned SerializedPlan.
EpsilonPrime Feb 9, 2024
006da91
Be consistent with signed int32.
EpsilonPrime Feb 10, 2024
51da6b0
Now point at latest head.
EpsilonPrime Feb 13, 2024
154e5e3
Switch to turning submodules to true following documents to see if th…
EpsilonPrime Feb 13, 2024
75178ac
Another attempt at recursing into submodules.
EpsilonPrime Feb 13, 2024
791fc14
Use pending version of substrait-cpp which uses updated substrait repo.
EpsilonPrime Feb 14, 2024
5c52ff8
Updated substrait-cpp dependency.
EpsilonPrime Feb 20, 2024
2b5a510
Add cmake dependency to rules.
EpsilonPrime Feb 20, 2024
595f7da
Switch back to the version of substrait-cpp that matches the current …
EpsilonPrime Feb 20, 2024
e32a782
See if we can add a protobuf-c dependency.
EpsilonPrime Feb 20, 2024
09bef75
Try using an apt-get action.
EpsilonPrime Feb 20, 2024
e89780c
Change package name.
EpsilonPrime Feb 20, 2024
f0f21c0
Switch to using apt-get directly (Linux only).
EpsilonPrime Feb 20, 2024
82340d1
Update substrait-cpp (provides an external protobuf dependency.
EpsilonPrime Feb 22, 2024
9ff593b
Remove apt-get call.
EpsilonPrime Feb 22, 2024
7f5c023
See if we can add curl to the environment.
EpsilonPrime Feb 22, 2024
10a91dd
See if we can add curl to the environment.
EpsilonPrime Feb 22, 2024
3044b7c
add more environment dependencies
EpsilonPrime Feb 22, 2024
b42276c
Adding curl somewhere else.
EpsilonPrime Feb 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,12 @@ jobs:
- name: Install build dependencies
run: |
python -m pip install build --user
- name: Build Substrait planloader
run: |
cd third_party/substrait-cpp
make release
cd build-Release/export/planloader
make install
- name: Build package
run: |
python -m build
Expand Down
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
[submodule "third_party/substrait"]
path = third_party/substrait
url = https://github.com/substrait-io/substrait
[submodule "third_party/substrait-cpp"]
path = third_party/substrait-cpp
url = git@github.com:substrait-io/substrait-cpp.git
10 changes: 10 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,16 @@ Generate the protobuf files manually. Requires protobuf `v3.20.1`.


# Build

## Build and install the textplan loader dynamic library
```commandline
pushd third_party/substrait-cpp
make release
cd build-Release/export/planloader
make install
popd
```

## Python package
Editable installation.
```
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ This project is not an execution engine for Substrait Plans.
This is an experimental package that is still under development.

# Example
At the moment, this project contains only generated Python classes for the Substrait protobuf messages. Let's use an existing Substrait producer, [Ibis](https://ibis-project.org), to provide an example using Python Substrait as the consumer.
At the moment, this project contains generated Python classes for the Substrait protobuf messages and a library for loading and saving them in various formats. Let's use an existing Substrait producer, [Ibis](https://ibis-project.org), to provide an example using Python Substrait as the consumer.
## Produce a Substrait Plan with Ibis
```
In [1]: import ibis
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ test = ["pytest >= 7.0.0"]

[tool.pytest.ini_options]
pythonpath = "src"
addopts = "--ignore=third_party"

[build-system]
requires = ["setuptools>=61.0.0", "setuptools_scm[toml]>=6.2.0"]
Expand Down
82 changes: 82 additions & 0 deletions src/substrait/planloader/planloader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# SPDX-License-Identifier: Apache-2.0
"""Routines for loading and saving Substrait plans."""
import ctypes
import ctypes.util as ctutil
import enum
import substrait.gen.proto.plan_pb2 as plan_pb2
import sys


class PlanFileFormat(enum.Enum):
BINARY = ctypes.c_int32(0)
JSON = ctypes.c_int32(1)
PROTOTEXT = ctypes.c_int32(2)
TEXT = ctypes.c_int32(3)


class PlanFileException(Exception):
pass


class SerializedPlan(ctypes.Structure):
pass


SerializedPlan._fields_ = [
("buffer", ctypes.POINTER(ctypes.c_byte)),
("size", ctypes.c_uint32),
("errorMessage", ctypes.c_char_p),
]


# Load the C++ library
planloader_path = ctutil.find_library("planloader")
planloader_lib = ctypes.CDLL(planloader_path)
if planloader_lib is None:
print('Failed to find planloader library')
sys.exit(1)

# Declare the function signatures for the external functions.
external_load_substrait_plan = planloader_lib.load_substrait_plan
external_load_substrait_plan.argtypes = [ctypes.c_char_p]
external_load_substrait_plan.restype = ctypes.POINTER(SerializedPlan)

external_free_substrait_plan = planloader_lib.free_substrait_plan
external_free_substrait_plan.argtypes = [ctypes.POINTER(SerializedPlan)]
external_free_substrait_plan.restype = None

external_save_substrait_plan = planloader_lib.save_substrait_plan
external_save_substrait_plan.argtypes = [ctypes.c_void_p, ctypes.c_uint32, ctypes.c_char_p, ctypes.c_uint32]
external_save_substrait_plan.restype = ctypes.c_char_p


def load_substrait_plan(filename: str) -> plan_pb2.Plan:
"""
Loads a Substrait plan (in any format) from disk.

Returns:
A Plan protobuf object if successful.
Raises:
PlanFileException if an except occurs while converting or reading from disk.
"""
result = external_load_substrait_plan(filename.encode('UTF-8'))
if result.contents.errorMessage:
raise PlanFileException(result.contents.errorMessage)
data = ctypes.string_at(result.contents.buffer, result.contents.size)
plan = plan_pb2.Plan()
plan.ParseFromString(data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance something goes wrong here that doesn't free the plan? For example, maybe the plan fails to deserialize into protobuf? Can you use a try / finally?

external_free_substrait_plan(result)
return plan


def save_substrait_plan(plan: plan_pb2.Plan, filename: str, file_format: PlanFileFormat):
"""
Saves the given plan to disk in the specified file format.

Raises:
PlanFileException if an except occurs while converting or writing to disk.
"""
data = plan.SerializeToString()
err = external_save_substrait_plan(data, len(data), filename.encode('UTF-8'), file_format.value)
if err:
raise PlanFileException(err)
11 changes: 11 additions & 0 deletions tests/test_planloader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# SPDX-License-Identifier: Apache-2.0


from substrait.planloader import planloader


def test_main():
print(planloader.__file__)
print(dir(planloader))
EpsilonPrime marked this conversation as resolved.
Show resolved Hide resolved
testplan = planloader.load_substrait_plan('tests/tpch-plan01.json')
EpsilonPrime marked this conversation as resolved.
Show resolved Hide resolved
planloader.save_substrait_plan(testplan, 'myoutfile.splan', planloader.PlanFileFormat.TEXT.value)
EpsilonPrime marked this conversation as resolved.
Show resolved Hide resolved
Loading
Loading