-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add UDF to_local_time() #11347
feat: add UDF to_local_time() #11347
Conversation
chore: doc chore: doc chore: doc chore: doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @appletreeisyellow -- I found this PR well tested and very well documented 🏆 . Really nice
I also filed #11358 to track this particular feature so it was quite as entangled in various proposals
I think the PR needs a few things before it could be merged:
- slt (end to end) tests, as suggested by @jayzhan211
- Better error handing (don't panic if some part of the conversion doesn't succeed)
While not strictly required, I think it would also be good to avoid parsing the timezone on each row.
Also, finishing up the TODOs is probably good too
} | ||
} | ||
|
||
/// This function converts a timestamp with a timezone to a timestamp without a timezone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the clearest explanations of what a function does that I have read in a long time 💯 Nice work @appletreeisyellow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this @appletreeisyellow 🙏 Impressive work. Can't wait to try it out once it's in influxDB.
I left a couple of very minor remarks
chore: doc chore: doc
f0a1cf4
to
db5b73e
Compare
The clippy error was fixed in #11368 |
I took the liberty of merging up from main to get the CI error fix |
@alamb @jayzhan211 @Abdullahsab3 -- Thank you for all your reviews and the many helpful suggestions @alamb -- Thank you for merging the clippy error fix! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @appletreeisyellow and @Abdullahsab3. I think this PR is looking very nice now thanks to all your work and review
The other thing we should do is document this function in the function reference: https://datafusion.apache.org/user-guide/sql/scalar_functions.html#time-and-date-functions
However, we can do that as a follow on PR as this one is already quite large
I went through this PR carefully and I think it looks really nice and could be merged. I plan to leave it open for a few more hours to allow time for any additional comments that people might have
I also tested that I could use this function to get the desired answer from #10602
Input
DataFusion CLI v40.0.0
> create table t AS
VALUES
('2024-01-01T00:00:01Z'),
('2024-02-01T00:00:01Z'),
('2024-03-01T00:00:01Z'),
('2024-04-01T00:00:01Z'),
('2024-05-01T00:00:01Z'),
('2024-06-01T00:00:01Z'),
('2024-07-01T00:00:01Z'),
('2024-08-01T00:00:01Z'),
('2024-09-01T00:00:01Z'),
('2024-10-01T00:00:01Z'),
('2024-11-01T00:00:01Z'),
('2024-12-01T00:00:01Z')
;
> create view t_timezone as
select column1::timestamp AT TIME ZONE 'Europe/Brussels' as "column1"
from t;
0 row(s) fetched.
Elapsed 0.005 seconds.
> select column1 from t_timezone;
+---------------------------+
| column1 |
+---------------------------+
| 2024-01-01T00:00:01+01:00 |
| 2024-02-01T00:00:01+01:00 |
| 2024-03-01T00:00:01+01:00 |
| 2024-04-01T00:00:01+02:00 |
| 2024-05-01T00:00:01+02:00 |
| 2024-06-01T00:00:01+02:00 |
| 2024-07-01T00:00:01+02:00 |
| 2024-08-01T00:00:01+02:00 |
| 2024-09-01T00:00:01+02:00 |
| 2024-10-01T00:00:01+02:00 |
| 2024-11-01T00:00:01+01:00 |
| 2024-12-01T00:00:01+01:00 |
+---------------------------+
12 row(s) fetched.
Elapsed 0.014 seconds.
(Bad) date_bin
with timezone'd timestamps:
> select column1, date_bin(interval '1 month', column1) as month from t_timezone;
+---------------------------+---------------------------+
| column1 | month |
+---------------------------+---------------------------+
| 2024-01-01T00:00:01+01:00 | 2023-12-01T01:00:00+01:00 | <-- in the wrong month!
| 2024-02-01T00:00:01+01:00 | 2024-01-01T01:00:00+01:00 |
| 2024-03-01T00:00:01+01:00 | 2024-02-01T01:00:00+01:00 |
| 2024-04-01T00:00:01+02:00 | 2024-03-01T01:00:00+01:00 |
| 2024-05-01T00:00:01+02:00 | 2024-04-01T02:00:00+02:00 |
| 2024-06-01T00:00:01+02:00 | 2024-05-01T02:00:00+02:00 |
| 2024-07-01T00:00:01+02:00 | 2024-06-01T02:00:00+02:00 |
| 2024-08-01T00:00:01+02:00 | 2024-07-01T02:00:00+02:00 |
| 2024-09-01T00:00:01+02:00 | 2024-08-01T02:00:00+02:00 |
| 2024-10-01T00:00:01+02:00 | 2024-09-01T02:00:00+02:00 |
| 2024-11-01T00:00:01+01:00 | 2024-10-01T02:00:00+02:00 |
| 2024-12-01T00:00:01+01:00 | 2024-11-01T01:00:00+01:00 |
+---------------------------+---------------------------+
12 row(s) fetched.
Elapsed 0.011 seconds.
(good) date_bin
after using to_local_time
:
> select column1, date_bin(interval '1 month', to_local_time(column1)) as month from t_timezone;
+---------------------------+---------------------+
| column1 | month |
+---------------------------+---------------------+
| 2024-01-01T00:00:01+01:00 | 2024-01-01T00:00:00 | <-- right month
| 2024-02-01T00:00:01+01:00 | 2024-02-01T00:00:00 |
| 2024-03-01T00:00:01+01:00 | 2024-03-01T00:00:00 |
| 2024-04-01T00:00:01+02:00 | 2024-04-01T00:00:00 |
| 2024-05-01T00:00:01+02:00 | 2024-05-01T00:00:00 |
| 2024-06-01T00:00:01+02:00 | 2024-06-01T00:00:00 |
| 2024-07-01T00:00:01+02:00 | 2024-07-01T00:00:00 |
| 2024-08-01T00:00:01+02:00 | 2024-08-01T00:00:00 |
| 2024-09-01T00:00:01+02:00 | 2024-09-01T00:00:00 |
| 2024-10-01T00:00:01+02:00 | 2024-10-01T00:00:00 |
| 2024-11-01T00:00:01+01:00 | 2024-11-01T00:00:00 |
| 2024-12-01T00:00:01+01:00 | 2024-12-01T00:00:00 |
+---------------------------+---------------------+
12 row(s) fetched.
Elapsed 0.008 seconds.
let mut builder = PrimitiveBuilder::<T>::new(); | ||
|
||
let primitive_array = as_primitive_array::<T>(array)?; | ||
for ts_opt in primitive_array.iter() { | ||
match ts_opt { | ||
None => builder.append_null(), | ||
Some(ts) => { | ||
let adjusted_ts: i64 = | ||
adjust_to_local_time::<T>(ts, tz)?; | ||
builder.append_value(adjusted_ts) | ||
} | ||
} | ||
} | ||
|
||
Ok(ColumnarValue::Array(Arc::new(builder.finish()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also use try_unary
here (that basically does the same thing as what you have here)
let primitive_array = as_primitive_array::<T>(array)?;
let ts_array = try_unary(primitive_array, |ts| {
adjust_to_local_time::<T>(ts, tz)
})?;
Ok(ColumnarValue::Array(Arc::new(ts_array)))
I tried it locally, and using try_unary does require that adjust_to_local_time
is changed to return ArrowError rather than DataFusion error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it would be neat to use try_unary
. I got the same compiling error when I used try_unary
, so I rewrote the code in a for loop with PrimitiveBuilder
{ | ||
match converter(ts) { | ||
MappedLocalTime::Ambiguous(earliest, latest) => exec_err!( | ||
"Ambiguous timestamp in microseconds. Do you mean {:?} or {:?}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that this is a general function -- the error message applies to microsecond, millisecond, and second😅 So I removed the phrase in microseconds
in the error to avoid confusion.
Updated in 2c35025
@alamb Thank you for the careful review!
I'll have a follow-up PR to document this new function |
I plan to merge this PR tomorrow morning Eastern time unless there are any other comments or anyone would like additional time to review |
Here is the follow-up PR: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Thank you everyone! This is a great addition i think |
* feat: add UDF `to_local_time()` * chore: support column value in array * chore: lint * chore: fix conversion for us, ms, and s * chore: add more tests for daylight savings time * chore: add function description * refactor: update tests and add examples in description * chore: add description and example * chore: doc chore: doc chore: doc chore: doc chore: doc * chore: stop copying * chore: fix typo * chore: mention that the offset varies based on daylight savings time * refactor: parse timezone once and update examples in description * refactor: replace map..concat with flat_map * chore: add hard code timestamp value in test chore: doc chore: doc * chore: handle errors and remove panics * chore: move some test to slt * chore: clone time_value * chore: typo --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* feat: add UDF `to_local_time()` * chore: support column value in array * chore: lint * chore: fix conversion for us, ms, and s * chore: add more tests for daylight savings time * chore: add function description * refactor: update tests and add examples in description * chore: add description and example * chore: doc chore: doc chore: doc chore: doc chore: doc * chore: stop copying * chore: fix typo * chore: mention that the offset varies based on daylight savings time * refactor: parse timezone once and update examples in description * refactor: replace map..concat with flat_map * chore: add hard code timestamp value in test chore: doc chore: doc * chore: handle errors and remove panics * chore: move some test to slt * chore: clone time_value * chore: typo --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* feat: add UDF `to_local_time()` * chore: support column value in array * chore: lint * chore: fix conversion for us, ms, and s * chore: add more tests for daylight savings time * chore: add function description * refactor: update tests and add examples in description * chore: add description and example * chore: doc chore: doc chore: doc chore: doc chore: doc * chore: stop copying * chore: fix typo * chore: mention that the offset varies based on daylight savings time * refactor: parse timezone once and update examples in description * refactor: replace map..concat with flat_map * chore: add hard code timestamp value in test chore: doc chore: doc * chore: handle errors and remove panics * chore: move some test to slt * chore: clone time_value * chore: typo --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* feat: add UDF `to_local_time()` * chore: support column value in array * chore: lint * chore: fix conversion for us, ms, and s * chore: add more tests for daylight savings time * chore: add function description * refactor: update tests and add examples in description * chore: add description and example * chore: doc chore: doc chore: doc chore: doc chore: doc * chore: stop copying * chore: fix typo * chore: mention that the offset varies based on daylight savings time * refactor: parse timezone once and update examples in description * refactor: replace map..concat with flat_map * chore: add hard code timestamp value in test chore: doc chore: doc * chore: handle errors and remove panics * chore: move some test to slt * chore: clone time_value * chore: typo --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* feat: add UDF `to_local_time()` * chore: support column value in array * chore: lint * chore: fix conversion for us, ms, and s * chore: add more tests for daylight savings time * chore: add function description * refactor: update tests and add examples in description * chore: add description and example * chore: doc chore: doc chore: doc chore: doc chore: doc * chore: stop copying * chore: fix typo * chore: mention that the offset varies based on daylight savings time * refactor: parse timezone once and update examples in description * refactor: replace map..concat with flat_map * chore: add hard code timestamp value in test chore: doc chore: doc * chore: handle errors and remove panics * chore: move some test to slt * chore: clone time_value * chore: typo --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* feat: add UDF `to_local_time()` * chore: support column value in array * chore: lint * chore: fix conversion for us, ms, and s * chore: add more tests for daylight savings time * chore: add function description * refactor: update tests and add examples in description * chore: add description and example * chore: doc chore: doc chore: doc chore: doc chore: doc * chore: stop copying * chore: fix typo * chore: mention that the offset varies based on daylight savings time * refactor: parse timezone once and update examples in description * refactor: replace map..concat with flat_map * chore: add hard code timestamp value in test chore: doc chore: doc * chore: handle errors and remove panics * chore: move some test to slt * chore: clone time_value * chore: typo --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* feat: add UDF `to_local_time()` * chore: support column value in array * chore: lint * chore: fix conversion for us, ms, and s * chore: add more tests for daylight savings time * chore: add function description * refactor: update tests and add examples in description * chore: add description and example * chore: doc chore: doc chore: doc chore: doc chore: doc * chore: stop copying * chore: fix typo * chore: mention that the offset varies based on daylight savings time * refactor: parse timezone once and update examples in description * refactor: replace map..concat with flat_map * chore: add hard code timestamp value in test chore: doc chore: doc * chore: handle errors and remove panics * chore: move some test to slt * chore: clone time_value * chore: typo --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Which issue does this PR close?
Help with #10602
Closes #11358
Rationale for this change
This PR adds a ScalarUDF function
to_local_time()
:Timestamp(..., *)
Timestamp(..., None)
Example
This is how to use it in
datafusion-cli
:Example of using
to_local_time()
indate_bin()
Combine
to_local_time()
withdate_bin()
will look like:Click to see more examples of applying to array values
to_local_time()
date_bin()
What changes are included in this PR?
New ScalarUDF function
to_local_time()
with testsAre these changes tested?
Yes
Are there any user-facing changes?
No API changes.