Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DateRange Timestamp conversion on windows fails with timestamps close to EPOCH #261

Open
bdemirtas opened this issue Apr 10, 2024 · 3 comments
Assignees
Labels
potential-workaround Unconfirmed potential workaround available wont-fix Issue is marked `wont-fix` due to availability of workaround or usage outside of intended platform wontfix This will not be worked on workaround

Comments

@bdemirtas
Copy link

bdemirtas commented Apr 10, 2024

Expected Behavior

When you use DateRange with starting date before 1970 (EPOCH) it raise OSError.
OSError: [Errno 22] Invalid argument
Linking the bug ticket from Python
https://bugs.python.org/issue37527

Current Behavior

Currently it works as intended for any non Windows OS . The work around is to provide the datetime with a timezone utc.

Steps to Reproduce (for bugs)

This code will fail and raise OSError on windows.

testDataSpec = (
    dg.DataGenerator( spark, name="test_data_set1", rows=1000 partitions=4)
    .withColumn(
        "purchase_date",
        "date",
        data_range=dg.DateRange("1910-10-01 00:00:00", "1950-10-06 11:55:00", "days=3"),
        random=True,
    )
)

Context

Your Environment

  • dbldatagen version used:
  • Databricks Runtime version:
  • Cloud environment used:
@ronanstokes-db
Copy link
Contributor

ronanstokes-db commented Jun 17, 2024

Can you provide more details of the workaround? If there's a valid workaround, we will document it but as intended runtime environment is Databricks cloud environment and it is tested under cloud environment and local Linux or similar environment, we cannot validate it.

While we don't block it running on other environments, the intent is to support it running on a Databricks cloud environment or developing locally in preparation for use on a Databricks cloud environment.

@ronanstokes-db ronanstokes-db self-assigned this Jun 27, 2024
@ronanstokes-db ronanstokes-db added wont-fix Issue is marked `wont-fix` due to availability of workaround or usage outside of intended platform wontfix This will not be worked on potential-workaround Unconfirmed potential workaround available labels Jun 27, 2024
@ronanstokes-db
Copy link
Contributor

The folllowing example shows use of DateTime instances to define the range:

import dbldatagen as dg
from datetime import datetime, timezone

startingTime = datetime.fromisoformat("1910-10-01T00:00:00").replace(tzinfo=timezone.utc)
endingTime = datetime.fromisoformat("1950-10-06T11:55:00").replace(tzinfo=timezone.utc)

testDataSpec = (
    dg.DataGenerator( spark, name="test_data_set1", rows=1000, partitions=4)
    .withColumn(
        "purchase_date",
        "date",
        data_range=dg.DateRange(startingTime, endingTime, "days=3"),
        random=True,
    )
)

display(testDataSpec.build())

@bdemirtas
Copy link
Author

bdemirtas commented Aug 25, 2024

Sorry for the late answer. Yes it's what I use as workaround or also this one works too.

from datetime import datetime, timezone

now  = datetime.now()
utc_now = now.astimezone(timezone.utc)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
potential-workaround Unconfirmed potential workaround available wont-fix Issue is marked `wont-fix` due to availability of workaround or usage outside of intended platform wontfix This will not be worked on workaround
Projects
None yet
Development

No branches or pull requests

2 participants