Skip to content

Commit

Permalink
Improve Parquet reader/writer properties docs (#5863)
Browse files Browse the repository at this point in the history
* Improve Parquet reader/writer properties docs

* fix

* Apply suggestions from code review

Co-authored-by: Val Lorentz <progval+github@progval.net>

* Apply suggestions from code review

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>

---------

Co-authored-by: Val Lorentz <progval+github@progval.net>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
  • Loading branch information
3 people authored Jun 12, 2024
1 parent a20116e commit 8bee08b
Show file tree
Hide file tree
Showing 3 changed files with 132 additions and 69 deletions.
18 changes: 10 additions & 8 deletions parquet/src/arrow/arrow_reader/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -258,23 +258,25 @@ impl ArrowReaderOptions {
Self::default()
}

/// Parquet files generated by some writers may contain embedded arrow
/// schema and metadata. This may not be correct or compatible with your system.
///
/// For example:[ARROW-16184](https://issues.apache.org/jira/browse/ARROW-16184)
/// Skip decoding the embedded arrow metadata (defaults to `false`)
///
/// Set `skip_arrow_metadata` to true, to skip decoding this
/// Parquet files generated by some writers may contain embedded arrow
/// schema and metadata.
/// This may not be correct or compatible with your system,
/// for example: [ARROW-16184](https://issues.apache.org/jira/browse/ARROW-16184)
pub fn with_skip_arrow_metadata(self, skip_arrow_metadata: bool) -> Self {
Self {
skip_arrow_metadata,
..self
}
}

/// Set this true to enable decoding of the [PageIndex] if present. This can be used
/// to push down predicates to the parquet scan, potentially eliminating unnecessary IO
/// Enable decoding of the [`PageIndex`], if present (defaults to `false`)
///
/// The `PageIndex` can be used to push down predicates to the parquet scan,
/// potentially eliminating unnecessary IO, by some query engines.
///
/// [PageIndex]: https://github.com/apache/parquet-format/blob/master/PageIndex.md
/// [`PageIndex`]: https://github.com/apache/parquet-format/blob/master/PageIndex.md
pub fn with_page_index(self, page_index: bool) -> Self {
Self { page_index, ..self }
}
Expand Down
8 changes: 4 additions & 4 deletions parquet/src/arrow/arrow_writer/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -341,20 +341,20 @@ impl ArrowWriterOptions {
Self { properties, ..self }
}

/// Skip encoding the embedded arrow metadata (defaults to `false`)
///
/// Parquet files generated by the [`ArrowWriter`] contain embedded arrow schema
/// by default.
///
/// Set `skip_arrow_metadata` to true, to skip encoding this.
/// Set `skip_arrow_metadata` to true, to skip encoding the embedded metadata.
pub fn with_skip_arrow_metadata(self, skip_arrow_metadata: bool) -> Self {
Self {
skip_arrow_metadata,
..self
}
}

/// Overrides the name of the root parquet schema element
///
/// Defaults to `"arrow_schema"`
/// Set the name of the root parquet schema element (defaults to `"arrow_schema"`)
pub fn with_schema_root(self, name: String) -> Self {
Self {
schema_root: Some(name),
Expand Down
Loading

0 comments on commit 8bee08b

Please sign in to comment.