-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add api for loading plans of all types #80
Merged
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
2dfee3b
feat: add api for loading plans of all types
EpsilonPrime 6fa011e
Addressed some of the review notes.
EpsilonPrime cceac79
Addressed the rest of this set of review notes.
EpsilonPrime c29dbbb
feat: enable address and leak detection on all debug builds (#82)
EpsilonPrime 9aae897
chore: Add EpsilonPrime to the list of code owners for Substrait text…
EpsilonPrime f8c383a
chore: Update version of substrait core to 0.32.0. (#84)
EpsilonPrime 01ac34b
chore: Update antlr4 to version 4.13.0. (#85)
EpsilonPrime fe0ddad
feat: enable roundtrip testing of textplan conversion (#81)
EpsilonPrime 28c9e39
feat: switch datetime package to an external dependency (#83)
EpsilonPrime b0a956f
Handled review notes.
EpsilonPrime 77ecaf3
Added better error detection.
EpsilonPrime File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Validating CODEOWNERS rules …
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
* @westonpace | ||
/src/substrait/textplan @EpsilonPrime |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
/* SPDX-License-Identifier: Apache-2.0 */ | ||
|
||
#pragma once | ||
|
||
#include <string_view> | ||
|
||
#include "absl/status/statusor.h" | ||
#include "substrait/proto/plan.pb.h" | ||
|
||
namespace io::substrait { | ||
|
||
/* | ||
* \brief The four different ways plans can be represented on disk. | ||
*/ | ||
enum class PlanFileFormat { | ||
kBinary = 0, | ||
kJson = 1, | ||
kProtoText = 2, | ||
kText = 3, | ||
}; | ||
|
||
/* | ||
* \brief Loads a Substrait plan of any format from the given file. | ||
* | ||
* loadPlan determines which file type the specified file is and then calls | ||
* the appropriate load/parse method to consume it preserving any error | ||
* messages. | ||
* | ||
* This will load the plan into memory and then convert it consuming twice the | ||
* amount of memory that it consumed on disk. | ||
* | ||
* \param input_filename The filename containing the plan to convert. | ||
* \return If loading was successful, returns a plan. If loading was not | ||
* successful this is a status containing a list of parse errors in the status's | ||
* message. | ||
*/ | ||
absl::StatusOr<::substrait::proto::Plan> loadPlan( | ||
std::string_view input_filename); | ||
|
||
/* | ||
* \brief Writes the provided plan to disk. | ||
* | ||
* savePlan writes the provided plan in the specified format to the specified | ||
* location. | ||
* | ||
* This routine will consume more memory during the conversion to the text | ||
* format as the original plan as well as the annotated parse tree will need to | ||
* reside in memory during the process. | ||
* | ||
* \param plan | ||
* \param output_filename | ||
* \param format | ||
* \return | ||
*/ | ||
absl::Status savePlan( | ||
const ::substrait::proto::Plan& plan, | ||
std::string_view output_filename, | ||
PlanFileFormat format); | ||
|
||
} // namespace io::substrait |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,20 @@ | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
add_library(substrait_common Exceptions.cpp) | ||
|
||
target_link_libraries(substrait_common fmt::fmt-header-only) | ||
|
||
add_library(substrait_io Io.cpp) | ||
add_dependencies( | ||
substrait_io | ||
substrait_proto | ||
substrait_textplan_converter | ||
substrait_textplan_loader | ||
fmt::fmt-header-only | ||
absl::status | ||
absl::statusor) | ||
target_link_libraries(substrait_io substrait_proto substrait_textplan_converter | ||
substrait_textplan_loader absl::status absl::statusor) | ||
|
||
if(${SUBSTRAIT_CPP_BUILD_TESTING}) | ||
add_subdirectory(tests) | ||
endif() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
/* SPDX-License-Identifier: Apache-2.0 */ | ||
|
||
#include "substrait/common/Io.h" | ||
|
||
#include <regex> | ||
#include <string_view> | ||
|
||
#include "substrait/proto/plan.pb.h" | ||
#include "substrait/textplan/converter/LoadBinary.h" | ||
#include "substrait/textplan/converter/SaveBinary.h" | ||
#include "substrait/textplan/parser/LoadText.h" | ||
|
||
namespace io::substrait { | ||
|
||
namespace { | ||
|
||
const std::regex kIsJson( | ||
R"(("extensionUris"|"extension_uris"|"extensions"|"relations"))"); | ||
const std::regex kIsProtoText( | ||
R"((^|\n)((relations|extensions|extension_uris|expected_type_urls) \{))"); | ||
const std::regex kIsText( | ||
R"((^|\n) *(pipelines|[a-z]+ *relation|schema|source|extension_space) *)"); | ||
|
||
PlanFileFormat detectFormat(std::string_view content) { | ||
if (std::regex_search(content.begin(), content.end(), kIsJson)) { | ||
return PlanFileFormat::kJson; | ||
} | ||
if (std::regex_search(content.begin(), content.end(), kIsProtoText)) { | ||
return PlanFileFormat::kProtoText; | ||
} | ||
if (std::regex_search(content.begin(), content.end(), kIsText)) { | ||
return PlanFileFormat::kText; | ||
} | ||
return PlanFileFormat::kBinary; | ||
} | ||
|
||
} // namespace | ||
|
||
absl::StatusOr<::substrait::proto::Plan> loadPlan( | ||
std::string_view input_filename) { | ||
auto contentOrError = textplan::readFromFile(input_filename.data()); | ||
EpsilonPrime marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if (!contentOrError.ok()) { | ||
return contentOrError.status(); | ||
} | ||
|
||
auto encoding = detectFormat(*contentOrError); | ||
absl::StatusOr<::substrait::proto::Plan> planOrError; | ||
switch (encoding) { | ||
case PlanFileFormat::kBinary: | ||
return textplan::loadFromBinary(*contentOrError); | ||
case PlanFileFormat::kJson: | ||
return textplan::loadFromJson(*contentOrError); | ||
case PlanFileFormat::kProtoText: | ||
return textplan::loadFromProtoText(*contentOrError); | ||
case PlanFileFormat::kText: | ||
return textplan::loadFromText(*contentOrError); | ||
} | ||
} | ||
|
||
absl::Status savePlan( | ||
const ::substrait::proto::Plan& plan, | ||
std::string_view output_filename, | ||
PlanFileFormat format) { | ||
switch (format) { | ||
case PlanFileFormat::kBinary: | ||
return textplan::savePlanToBinary(plan, output_filename); | ||
case PlanFileFormat::kJson: | ||
return textplan::savePlanToJson(plan, output_filename); | ||
case PlanFileFormat::kProtoText: | ||
return textplan::savePlanToProtoText(plan, output_filename); | ||
case PlanFileFormat::kText: | ||
return textplan::savePlanToText(plan, output_filename); | ||
} | ||
return absl::UnimplementedError("Unexpected format requested."); | ||
} | ||
|
||
} // namespace io::substrait |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
/* SPDX-License-Identifier: Apache-2.0 */ | ||
|
||
#include "substrait/common/Io.h" | ||
|
||
#include <filesystem> | ||
|
||
#include <gmock/gmock-matchers.h> | ||
#include <gtest/gtest.h> | ||
#include <protobuf-matchers/protocol-buffer-matchers.h> | ||
#include <unistd.h> | ||
|
||
using ::protobuf_matchers::EqualsProto; | ||
using ::protobuf_matchers::Partially; | ||
|
||
namespace io::substrait { | ||
|
||
namespace { | ||
|
||
constexpr const char* planFileEncodingToString(PlanFileFormat e) noexcept { | ||
switch (e) { | ||
case PlanFileFormat::kBinary: | ||
return "kBinary"; | ||
case PlanFileFormat::kJson: | ||
return "kJson"; | ||
case PlanFileFormat::kProtoText: | ||
return "kProtoText"; | ||
case PlanFileFormat::kText: | ||
return "kText"; | ||
} | ||
return "IMPOSSIBLE"; | ||
} | ||
|
||
} // namespace | ||
|
||
class IoTest : public ::testing::Test {}; | ||
|
||
TEST_F(IoTest, LoadMissingFile) { | ||
auto result = ::io::substrait::loadPlan("non-existent-file"); | ||
ASSERT_FALSE(result.ok()); | ||
ASSERT_THAT( | ||
result.status().message(), | ||
::testing::ContainsRegex("Failed to open file non-existent-file")); | ||
} | ||
|
||
class SaveAndLoadTestFixture : public ::testing::TestWithParam<PlanFileFormat> { | ||
EpsilonPrime marked this conversation as resolved.
Show resolved
Hide resolved
|
||
public: | ||
void SetUp() override { | ||
testFileDirectory_ = std::filesystem::temp_directory_path() / | ||
std::filesystem::path("my_temp_dir"); | ||
|
||
if (!std::filesystem::create_directory(testFileDirectory_)) { | ||
ASSERT_TRUE(false) << "Failed to create temporary directory."; | ||
testFileDirectory_.clear(); | ||
} | ||
} | ||
|
||
void TearDown() override { | ||
if (!testFileDirectory_.empty()) { | ||
std::error_code err; | ||
std::filesystem::remove_all(testFileDirectory_, err); | ||
ASSERT_FALSE(err) << err.message(); | ||
} | ||
} | ||
|
||
static std::string makeTempFileName() { | ||
static int tempFileNum = 0; | ||
return "testfile" + std::to_string(++tempFileNum); | ||
} | ||
|
||
protected: | ||
std::string testFileDirectory_; | ||
}; | ||
|
||
TEST_P(SaveAndLoadTestFixture, SaveAndLoad) { | ||
auto tempFilename = testFileDirectory_ + "/" + makeTempFileName(); | ||
PlanFileFormat encoding = GetParam(); | ||
|
||
::substrait::proto::Plan plan; | ||
auto root = plan.add_relations()->mutable_root(); | ||
auto read = root->mutable_input()->mutable_read(); | ||
read->mutable_common()->mutable_direct(); | ||
read->mutable_named_table()->add_names("table_name"); | ||
auto status = ::io::substrait::savePlan(plan, tempFilename, encoding); | ||
ASSERT_TRUE(status.ok()) << "Save failed.\n" << status; | ||
|
||
auto result = ::io::substrait::loadPlan(tempFilename); | ||
ASSERT_TRUE(result.ok()) << "Load failed.\n" << result.status(); | ||
ASSERT_THAT( | ||
*result, | ||
Partially(EqualsProto<::substrait::proto::Plan>( | ||
R"(relations { | ||
root { | ||
input { | ||
read { | ||
common { | ||
direct { | ||
} | ||
} | ||
named_table { | ||
names: "table_name" | ||
} | ||
} | ||
} | ||
} | ||
})"))); | ||
} | ||
|
||
INSTANTIATE_TEST_SUITE_P( | ||
SaveAndLoadTests, | ||
SaveAndLoadTestFixture, | ||
testing::Values( | ||
PlanFileFormat::kBinary, | ||
PlanFileFormat::kJson, | ||
PlanFileFormat::kProtoText, | ||
PlanFileFormat::kText), | ||
[](const testing::TestParamInfo<SaveAndLoadTestFixture::ParamType>& info) { | ||
return planFileEncodingToString(info.param); | ||
}); | ||
|
||
} // namespace io::substrait |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the load and save methods return
StatusOr
then I, as a user, would not expect to get an exception. I think we still rely on exceptions in some places (maybe this isn't true?) If so, we should probably wrap these methods with a try/catch and wrap the exception in an invalid status.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed most of the cases where we use SUBSTRAIT_FAIL. There are about 5 remaining and most of those are for nearly impossible cases. I'd prefer to finish removing them over wrapping this code in an exception. If it turns out we're getting exceptions from other libraries then we will be forced to add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And the fuzz testing work will help illuminate if we need to wrap the code.)