Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional column_order in JSON reader #17029

Merged
merged 27 commits into from
Nov 8, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
1b6ca58
add optional column_order to schema_element
karthikeyann Oct 9, 2024
e0c373a
Merge branch 'branch-24.12' into fea-json_column_order
karthikeyann Oct 21, 2024
732f234
doc fixes
karthikeyann Oct 21, 2024
ffdd817
fix ambiguous std::map call
karthikeyann Oct 21, 2024
02e8ab3
simplify schema_element interface
karthikeyann Oct 22, 2024
ac05ae9
create all null columns
karthikeyann Oct 23, 2024
f10d9c2
metadata for all null non-present columns
karthikeyann Oct 24, 2024
71b4142
Merge branch 'branch-24.12' into fea-json_column_order
karthikeyann Oct 24, 2024
c8e223d
address review commemnts, unit test
karthikeyann Oct 24, 2024
1c871a8
Merge branch 'fea-json_column_order' of github.com:karthikeyann/cudf …
karthikeyann Oct 24, 2024
9297b7e
fix empty all-null rows issue at top level
karthikeyann Oct 25, 2024
eb9f8fc
Merge branch 'branch-24.12' into fea-json_column_order
karthikeyann Nov 4, 2024
9e31b71
add validation for dtypes with column order
karthikeyann Nov 4, 2024
82cf186
cleanup
karthikeyann Nov 4, 2024
105250b
address review comments
karthikeyann Nov 4, 2024
1a7a99c
add docs to dtype_variant
karthikeyann Nov 4, 2024
fcd8e3c
fix docs
karthikeyann Nov 4, 2024
15ef1d5
Merge branch 'branch-24.12' into fea-json_column_order
ttnghia Nov 5, 2024
b2dd7cd
moved dtype_variant alias to public
karthikeyann Nov 5, 2024
2272f72
Merge branch 'branch-24.12' into fea-json_column_order
karthikeyann Nov 5, 2024
e8e1c28
Merge branch 'branch-24.12' into fea-json_column_order
karthikeyann Nov 6, 2024
0bfe85e
remove chars in string column metadata
karthikeyann Nov 6, 2024
da24f1d
fix string col metadata in unit test
karthikeyann Nov 6, 2024
7e31f91
Merge branch 'branch-24.12' into fea-json_column_order
karthikeyann Nov 6, 2024
83b979c
add missing doc
karthikeyann Nov 7, 2024
a5442b9
Merge branch 'branch-24.12' into fea-json_column_order
karthikeyann Nov 7, 2024
4b82cf6
address review comments
karthikeyann Nov 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions cpp/include/cudf/io/json.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,69 @@ struct schema_element {
* @brief Allows specifying this column's child columns target type
*/
std::map<std::string, schema_element> child_types;
// // Add constructors for schema_element to keep order too with initializer list.
// // templated constructor with schema_element<bool keep_order=false>
// // store the order as
/** @brief Allows specifying the order of the columns
*/
karthikeyann marked this conversation as resolved.
Show resolved Hide resolved
std::optional<std::vector<std::string>> column_order;

/**
* @brief Default constructor
*
* @param type The type that this column should be converted to
* @param child_types Allows specifying this column's child columns target type
* @param column_order Allows specifying the order of the columns
*/
schema_element(data_type type,
karthikeyann marked this conversation as resolved.
Show resolved Hide resolved
std::map<std::string, schema_element> child_types,
std::optional<std::vector<std::string>> column_order = std::nullopt)
: type{std::move(type)},
child_types{std::move(child_types)},
column_order{std::move(column_order)}
{
}

/**
* @brief Constructor to create a schema_element from a data_type
*
* @param type The type that this column should be converted to
*/
schema_element(data_type type) : schema_element{type, {}, std::nullopt} {}
karthikeyann marked this conversation as resolved.
Show resolved Hide resolved
schema_element() = default;
karthikeyann marked this conversation as resolved.
Show resolved Hide resolved
/**
* @brief Constructor to create a schema_element with a specific order of columns
*
* @param type The type that this column should be converted to
* @param child_types Allows specifying this column's child columns target type and
* the order of the columns
*/
schema_element(data_type type,
std::initializer_list<std::pair<const std::string, schema_element>> child_types)
: type{type}
{
this->column_order->reserve(child_types.size());
for (auto const& [key, value] : child_types) {
this->column_order->push_back(key);
}
this->child_types = {std::move(child_types)};
}

schema_element(schema_element const&) = default; ///< Copy constructor
schema_element(schema_element&&) noexcept = default; ///< Copy assignment operator
/**
* @brief Copy assignment operator
*
* @return Reference to this object
*/
schema_element& operator=(schema_element const&) = default;
/**
* @brief Move assignment operator
*
* @return Reference to this object (after transferring ownership)
*/
schema_element& operator=(schema_element&&) noexcept = default;
~schema_element() = default;
};

/**
Expand Down
2 changes: 1 addition & 1 deletion cpp/src/io/json/parser_features.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ std::optional<schema_element> child_schema_element(std::string const& col_name,
[col_name](std::vector<data_type> const& user_dtypes) -> std::optional<schema_element> {
auto column_index = atol(col_name.data());
return (static_cast<std::size_t>(column_index) < user_dtypes.size())
? std::optional<schema_element>{{user_dtypes[column_index]}}
? std::optional<schema_element>{user_dtypes[column_index]}
: std::optional<schema_element>{};
},
[col_name](
Expand Down
3 changes: 2 additions & 1 deletion cpp/tests/io/json/json_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -350,7 +350,8 @@ TEST_P(JsonReaderParamTest, JsonLinesStrings)

cudf::io::json_reader_options in_options =
cudf::io::json_reader_options::builder(cudf::io::source_info{data.data(), data.size()})
.dtypes({{"2", dtype<cudf::string_view>()}, {"0", dtype<int32_t>()}, {"1", dtype<double>()}})
.dtypes(std::map<std::string, data_type>{
{"2", dtype<cudf::string_view>()}, {"0", dtype<int32_t>()}, {"1", dtype<double>()}})
.lines(true);

cudf::io::table_with_metadata result = cudf::io::read_json(in_options);
Expand Down
Loading