Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: create an exportable version of plan loading #91

Merged
merged 39 commits into from
Feb 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
0ea3c42
feat: add support for all subquery types
EpsilonPrime Aug 25, 2023
6758f3f
Ran clang tidy.
EpsilonPrime Jan 30, 2024
af4fb1e
Tidy fixes.
EpsilonPrime Jan 30, 2024
bfe2ff6
Fix for some errors occuring too often.
EpsilonPrime Jan 30, 2024
44188b6
Fixed problem with falling through to the wrong case.
EpsilonPrime Jan 30, 2024
20acf0b
Created initial external library
EpsilonPrime Feb 1, 2024
77fcd30
Now the library is in a usable state.
EpsilonPrime Feb 2, 2024
df1b42c
Switch to returning a structure instead of modifying passed in argume…
EpsilonPrime Feb 2, 2024
58bed47
Make the planloader library installable.
EpsilonPrime Feb 5, 2024
73bec1f
Remove extraneous Makefile.
EpsilonPrime Feb 5, 2024
74bf673
Added a planloader test.
EpsilonPrime Feb 6, 2024
80b2c3e
Ran clang tidy.
EpsilonPrime Feb 6, 2024
347fbe2
Remove accidentally added java library still in progress.
EpsilonPrime Feb 6, 2024
e4485db
Fixed SetComparison to require only a single relation.
EpsilonPrime Feb 6, 2024
a72a827
Handled the rest of the review notes.
EpsilonPrime Feb 7, 2024
d0c187d
Clean version
EpsilonPrime Feb 8, 2024
320053c
Cleaned merge.
EpsilonPrime Feb 8, 2024
125a8cb
Updated based on review.
EpsilonPrime Feb 8, 2024
db12b3d
Use int32 instead of uint32 for the buffer length for increased porta…
EpsilonPrime Feb 8, 2024
12b54fe
More tidyness.
EpsilonPrime Feb 9, 2024
3d1e40e
Try making everything position independent.
EpsilonPrime Feb 9, 2024
e49c11a
Be more explicit about what needs position independent code.
EpsilonPrime Feb 9, 2024
00de8bc
Try another way to force PIC.
EpsilonPrime Feb 9, 2024
e754a1c
Handle an impossible error to make a warning go away.
EpsilonPrime Feb 9, 2024
1b191ea
Overly force -fPIC everywhere for now.
EpsilonPrime Feb 9, 2024
f5ce243
Fix new/free mismatch.
EpsilonPrime Feb 9, 2024
755c4ef
Fix delete.
EpsilonPrime Feb 9, 2024
c28f6f0
Apparently delete doesn't need to worry about nullptr so removing thi…
EpsilonPrime Feb 9, 2024
dac8ea9
Removed one unnecessary fPIC setting.
EpsilonPrime Feb 9, 2024
5c3d5f2
Removed some more unnecessary fPIC settings.
EpsilonPrime Feb 9, 2024
dd6b5c3
and one more
EpsilonPrime Feb 9, 2024
8fc79fb
One last cleanup.
EpsilonPrime Feb 9, 2024
7f8a7da
Update export/planloader/planloader.h
EpsilonPrime Feb 9, 2024
fbe9231
Various review changes.
EpsilonPrime Feb 9, 2024
e487313
Added error test case.
EpsilonPrime Feb 10, 2024
8ea2556
fix tidy warning
EpsilonPrime Feb 10, 2024
0d0246d
Added error test case.
EpsilonPrime Feb 10, 2024
06cd5c9
Switch save_plan to also use int32_t.
EpsilonPrime Feb 10, 2024
9be359b
Fix path used for output files.
EpsilonPrime Feb 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ message(STATUS "Build type: ${CMAKE_BUILD_TYPE}")
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED True)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
set(CMAKE_POSITION_INDEPENDENT_CODE TRUE)

option(SUBSTRAIT_CPP_SANITIZE_DEBUG_BUILD
"Turns on address and undefined memory sanitization runtime checking."
Expand Down Expand Up @@ -55,3 +56,4 @@ if(${SUBSTRAIT_CPP_BUILD_TESTING})
endif()

add_subdirectory(src/substrait)
add_subdirectory(export)
3 changes: 3 additions & 0 deletions export/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# SPDX-License-Identifier: Apache-2.0

add_subdirectory(planloader)
22 changes: 22 additions & 0 deletions export/planloader/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# SPDX-License-Identifier: Apache-2.0

if(NOT BUILD_SUBDIR_NAME EQUAL "release")
message(
SEND_ERROR,
"The planloader library does not work in Debug mode due to its dependencies."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? Does this mean you can't build the planloader in debug mode (annoying but ok)? Or does this mean that someone linking to the planloader cannot build in debug (probably an issue)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is all of the sanitization stuff that we're including in debug mode. An unfettered debug mode would be fine.

)
endif()

add_library(planloader SHARED planloader.cpp)

add_dependencies(planloader substrait_io)
target_link_libraries(planloader substrait_io)

install(
TARGETS planloader
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}
PRIVATE_HEADER DESTINATION ${CMAKE_INSTALL_INCDIR})

if(${SUBSTRAIT_CPP_BUILD_TESTING})
add_subdirectory(tests)
endif()
61 changes: 61 additions & 0 deletions export/planloader/planloader.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
/* SPDX-License-Identifier: Apache-2.0 */

#include "planloader.h"

#include <limits>
#include <substrait/common/Io.h>

extern "C" {

// Load a Substrait plan (in any format) from disk.
// Stores the Substrait plan in planBuffer in serialized form.
// Returns a SerializedPlan structure containing either the serialized plan or
// an error message. error_message is nullptr upon success.
SerializedPlan* load_substrait_plan(const char* filename) {
auto newPlan = new SerializedPlan();
newPlan->buffer = nullptr;
newPlan->size = 0;
newPlan->error_message = nullptr;

auto planOrError = io::substrait::loadPlan(filename);
if (!planOrError.ok()) {
auto errMsg = planOrError.status().message();
newPlan->error_message = new char[errMsg.length()+1];
strncpy(newPlan->error_message, errMsg.data(), errMsg.length()+1);
return newPlan;
}
::substrait::proto::Plan plan = *planOrError;
std::string text = plan.SerializeAsString();
newPlan->buffer = new unsigned char[text.length()+1];
memcpy(newPlan->buffer, text.data(), text.length()+1);
newPlan->size = static_cast<int32_t>(
text.length() &
std::numeric_limits<int32_t>::max());
return newPlan;
}

void free_substrait_plan(SerializedPlan* plan) {
delete[] plan->buffer;
delete[] plan->error_message;
delete plan;
}

// Write a serialized Substrait plan to disk in the specified format.
// On error returns a non-empty error message.
// On success a nullptr is returned.
const char* save_substrait_plan(
const unsigned char* plan_data,
int32_t plan_data_length,
const char* filename,
io::substrait::PlanFileFormat format) {
::substrait::proto::Plan plan;
std::string data((const char*) plan_data, plan_data_length);
plan.ParseFromString(data);
auto result = io::substrait::savePlan(plan, filename, format);
if (!result.ok()) {
return result.message().data();
}
return nullptr;
}

} // extern "C"
46 changes: 46 additions & 0 deletions export/planloader/planloader.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
/* SPDX-License-Identifier: Apache-2.0 */

#include <substrait/common/Io.h>

extern "C" {

// Since this is actually C code, stick to C style names for exporting.
// NOLINTBEGIN(readability-identifier-naming)

using SerializedPlan = struct {
// If set, contains a serialized ::substrait::proto::Plan object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've become used to // indicating "internal comment" and /// indicating "formal documentation". Do we want to stick to that convention?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I want to formalize this interface. If someone wants to use the C++ code they should be using the recently exposed substrait_io library. This version is for wrapping purposes only so I'd like it to be more obscure.

unsigned char *buffer;
// If buffer is set, this is the size of the buffer.
int32_t size;
// If null the buffer is valid, otherwise this points to a null terminated
// error string.
char *error_message;
};

// Load a Substrait plan (in any format) from disk.
//
// Accepts filename as a null-terminated C string.
// Returns a SerializedPlan structure containing either the serialized plan or
// an error message. This SerializedPlan should be freed using
// free_substrait_plan.
SerializedPlan* load_substrait_plan(const char* filename);
EpsilonPrime marked this conversation as resolved.
Show resolved Hide resolved

// Frees a SerializedPlan that was returned from load_substrait_plan.
void free_substrait_plan(SerializedPlan* plan);
EpsilonPrime marked this conversation as resolved.
Show resolved Hide resolved

// Write a serialized Substrait plan to disk in the specified format.
//
// plan_data is a Substrait Plan serialized into a byte array with length
// plan_data_length.
// Filename is a null-terminated C string.
// On error returns a non-empty error message.
// On success an empty string is returned.
const char* save_substrait_plan(
const unsigned char* plan_data,
int32_t plan_data_length,
const char* filename,
io::substrait::PlanFileFormat format);

// NOLINTEND(readability-identifier-naming)

} // extern "C"
33 changes: 33 additions & 0 deletions export/planloader/tests/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# SPDX-License-Identifier: Apache-2.0

cmake_path(GET CMAKE_CURRENT_BINARY_DIR PARENT_PATH
CMAKE_CURRENT_BINARY_PARENT_DIR)

set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_CURRENT_BINARY_PARENT_DIR})

add_test_case(
planloader_test
SOURCES
PlanLoaderTest.cpp
EXTRA_LINK_LIBS
planloader
gmock
gtest
gtest_main)

set(TEXTPLAN_SOURCE_DIR "${CMAKE_SOURCE_DIR}/src/substrait/textplan")

add_custom_command(
TARGET planloader_test
POST_BUILD
COMMAND ${CMAKE_COMMAND} -E echo "Copying unit test data.."
COMMAND ${CMAKE_COMMAND} -E make_directory
"${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/tests/data"
COMMAND
${CMAKE_COMMAND} -E copy
"${TEXTPLAN_SOURCE_DIR}/converter/data/q6_first_stage.json"
"${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/tests/data/q6_first_stage.json")

message(
STATUS "test data will be here: ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/tests/data"
)
42 changes: 42 additions & 0 deletions export/planloader/tests/PlanLoaderTest.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
/* SPDX-License-Identifier: Apache-2.0 */

#include <gmock/gmock-matchers.h>
#include <gtest/gtest.h>
#include <functional>

#include "../planloader.h"
#include "substrait/proto/plan.pb.h"

namespace io::substrait::textplan {
namespace {

TEST(PlanLoaderTest, LoadAndSave) {
auto serializedPlan = load_substrait_plan("data/q6_first_stage.json");
ASSERT_EQ(serializedPlan->error_message, nullptr);

::substrait::proto::Plan plan;
bool parseStatus =
plan.ParseFromArray(serializedPlan->buffer, serializedPlan->size);
ASSERT_TRUE(parseStatus) << "Failed to parse the plan.";

const char* saveStatus = save_substrait_plan(
serializedPlan->buffer,
serializedPlan->size,
"outfile.splan",
PlanFileFormat::kText);
ASSERT_EQ(saveStatus, nullptr);

free_substrait_plan(serializedPlan);
}

TEST(PlanLoaderTest, LoadMissingFile) {
auto serializedPlan = load_substrait_plan("no_such_file.json");
ASSERT_THAT(
serializedPlan->error_message,
::testing::StartsWith("Failed to open file no_such_file.json"));

free_substrait_plan(serializedPlan);
}

} // namespace
} // namespace io::substrait::textplan
2 changes: 2 additions & 0 deletions src/substrait/common/Io.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ absl::StatusOr<::substrait::proto::Plan> loadPlan(
case PlanFileFormat::kText:
return textplan::loadFromText(*contentOrError);
}
// There are no other possibilities so this can't happen.
return absl::UnimplementedError("Unexpected format encountered.");
}

absl::Status savePlan(
Expand Down
2 changes: 2 additions & 0 deletions third_party/antlr4/cmake/ExternalAntlr4Cpp.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ if(ANTLR4_ZIP_REPOSITORY)
-DCMAKE_BUILD_TYPE:STRING=${CMAKE_BUILD_TYPE}
-DWITH_STATIC_CRT:BOOL=${ANTLR4_WITH_STATIC_CRT}
-DDISABLE_WARNINGS:BOOL=ON
-DCMAKE_POSITION_INDEPENDENT_CODE:BOOL=ON
# -DCMAKE_CXX_STANDARD:STRING=17 # if desired, compile the runtime with a different C++ standard
# -DCMAKE_CXX_STANDARD:STRING=${CMAKE_CXX_STANDARD} # alternatively, compile the runtime with the same C++ standard as the outer project
INSTALL_COMMAND ""
Expand All @@ -116,6 +117,7 @@ else()
-DCMAKE_BUILD_TYPE:STRING=${CMAKE_BUILD_TYPE}
-DWITH_STATIC_CRT:BOOL=${ANTLR4_WITH_STATIC_CRT}
-DDISABLE_WARNINGS:BOOL=ON
-DCMAKE_POSITION_INDEPENDENT_CODE:BOOL=ON
# -DCMAKE_CXX_STANDARD:STRING=17 # if desired, compile the runtime with a different C++ standard
# -DCMAKE_CXX_STANDARD:STRING=${CMAKE_CXX_STANDARD} # alternatively, compile the runtime with the same C++ standard as the outer project
INSTALL_COMMAND ""
Expand Down
Loading