Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write "upgrade guide" for DataFusion 44.0.0 #13702

Open
andygrove opened this issue Dec 9, 2024 · 13 comments
Open

Write "upgrade guide" for DataFusion 44.0.0 #13702

andygrove opened this issue Dec 9, 2024 · 13 comments
Assignees
Labels
enhancement New feature or request

Comments

@andygrove
Copy link
Member

Is your feature request related to a problem or challenge?

DataFusion 44.0.0 has breaking changes that require downstream projects to make code changes. Let's document these as we upgrade our own subprojects (Python, Comet, Ballista) to make life easier for other downstream projects.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

@andygrove andygrove added the enhancement New feature or request label Dec 9, 2024
@andygrove andygrove self-assigned this Dec 9, 2024
@andygrove
Copy link
Member Author

andygrove commented Dec 9, 2024

I plan on adding notes to this issue as I encounter issues while upgrading Comet to use latest DF. First couple of issues:

  • down_cast_any_ref was removed from the public API. Perhaps we should consider adding it back (and mark it as deprecated)?
  • PhysicalExpr trait no longer has dyn_hash method and it is now necessary to implement DynEq. This may cause confusion unless we explain that deriving Hash, PartialEq, Eq for structs will automatically implement these traits.

@andygrove
Copy link
Member Author

next issue:

60 |             signature: Signature::coercible(vec![DataType::Float64], Volatility::Immutable),
   |                                                  ^^^^^^^^^^^^^^^^^ expected `Arc<dyn LogicalType>`, found `DataType`

Solution was to change DataType::Float64 to Arc::new(NativeType::Float64)

@andygrove
Copy link
Member Author

We can't just derive PartialEq for expressions that contain other expressions due to rust-lang/rust#78808:

error[E0507]: cannot move out of `other.child` which is behind a shared reference
  --> spark-expr/src/list.rs:49:5
   |
47 | #[derive(Debug, Hash, PartialEq, Eq)]
   |                       --------- in this derive macro expansion

We have to implement PartialEq manually instead.

@andygrove
Copy link
Member Author

GroupValues and new_group_values were removed from the public API without being deprecated first. It does not impact Comet; I'm just pointing this out.

@andygrove
Copy link
Member Author

public function find_df_window_func was removed in #13201. It would have been better to mark it deprecated and have it return None.

@andygrove
Copy link
Member Author

andygrove commented Dec 9, 2024

error[E0308]: mismatched types
     |
1929 |         datafusion::physical_plan::windows::create_window_expr(
     |         ------------------------------------------------------ arguments to this function are incorrect
...
1934 |             sort_exprs,
     |             ^^^^^^^^^^ expected `&LexOrdering`, found `&[PhysicalSortExpr]`

fixed by changing sort_exprs to &LexOrdering::new(sort_exprs.to_vec()),

@andygrove
Copy link
Member Author

I am now running into:

Error from DataFusion: return_type_from_exprs shoud be called instead.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker.

@Omega359
Copy link
Contributor

Omega359 commented Dec 10, 2024

Error from DataFusion: return_type_from_exprs shoud be called instead.

The only place that has that typo is date_part which was changed in #13466 @jayzhan211

@andygrove
Copy link
Member Author

andygrove commented Dec 10, 2024

ScalarUDFImpl.return_type is part of DataFusion's public API but returns an internal error if called, which doesn't seem correct. Should we deprecate this function and change the error type?

    fn return_type(&self, _arg_types: &[DataType]) -> Result<DataType> {
        internal_err!("return_type_from_exprs shoud be called instead")
    }

edit: it only returns this error in DatePartFunc

@andygrove
Copy link
Member Author

I created a PR to deprecate return_type and revert the breaking change - #13717

@andygrove
Copy link
Member Author

I am now running into failures in Comet tests that use miri:

error: unsupported operation: can't call foreign function `rust_psm_stack_pointer` on OS `linux`
    --> /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/psm-0.1.24/src/lib.rs:319:14
     |
319  |     unsafe { rust_psm_stack_pointer() }
     |              ^^^^^^^^^^^^^^^^^^^^^^^^ can't call foreign function `rust_psm_stack_pointer` on OS `linux`
     |
     = help: if this is a basic API commonly used on this target, please report an issue with Miri
     = help: however, note that Miri does not aim to support every FFI function out there; for instance, we will not support APIs for things such as GUIs, scripting languages, or databases
     = note: BACKTRACE on thread `execution::data`:
     = note: inside `psm::stack_pointer` at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/psm-0.1.24/src/lib.rs:319:14: 319:38
     = note: inside `stacker::current_stack_ptr` at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/stacker-0.1.17/src/lib.rs:98:13: 98:33
     = note: inside `stacker::remaining_stack` at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/stacker-0.1.17/src/lib.rs:91:23: 91:42
     = note: inside `stacker::maybe_grow::<std::result::Result<datafusion_common::tree_node::Transformed<datafusion_physical_expr::tree_node::ExprContext<std::option::Option<petgraph::graph_impl::NodeIndex>>>, datafusion_common::DataFusionError>, {closure@datafusion_common::tree_node::TreeNode::transform_up::transform_up_impl<datafusion_physical_expr::tree_node::ExprContext<std::option::Option<petgraph::graph_impl::NodeIndex>>, {closure@datafusion_physical_expr::utils::build_dag<datafusion_physical_expr::intervals::cp_solver::ExprIntervalGraphNode, {closure@datafusion_physical_expr::intervals::cp_solver::ExprIntervalGraph::try_new::{closure#0}}>::{closure#0}}>::{closure#0}}>` at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/stacker-0.1.17/src/lib.rs:50:30: 50:47
     ```

@andygrove
Copy link
Member Author

The miri test failures are related to DataFusion using the recursive crate in #13310

@andygrove
Copy link
Member Author

I filed #13766 to consider making recursive an optional feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants