Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pydeephaven-ticking arrow type support inconsistent with server #6044

Open
devinrsmith opened this issue Sep 10, 2024 · 10 comments
Open

pydeephaven-ticking arrow type support inconsistent with server #6044

devinrsmith opened this issue Sep 10, 2024 · 10 comments
Assignees
Labels
bug Something isn't working core Core development tasks python-client user-reported
Milestone

Comments

@devinrsmith
Copy link
Member

There is a mismatch between server side arrow type support and pydeephaven-ticking arrow type support, as evidenced by this error:

Traceback (most recent call last):
  File "...", line 946, in __aiter__
    listen_handle.start()
  File "/deephaven/.venv/lib64/python3.9/site-packages/pydeephaven_ticking/table_listener.py", line 209, in start
    self._bp = dhc.BarrageProcessor.create(self._table.schema)
  File "src/pydeephaven_ticking/_core.pyx", line 486, in pydeephaven_ticking._core.BarrageProcessor.create
  File "src/pydeephaven_ticking/_core.pyx", line 459, in pydeephaven_ticking._core._pyarrow_schema_to_deephaven_schema
  File "src/pydeephaven_ticking/_core.pyx", line 567, in pydeephaven_ticking._core._pa_type_to_dh_type
RuntimeError: Can't convert pyarrow type date64[ms] to Deephaven type

On the server, after converting the table the arrow schema includes:

my_date: date64[ms]
  -- field metadata --
  deephaven:isSortable: 'true'
  deephaven:isRowStyle: 'false'
  deephaven:isPartitioning: 'false'
  deephaven:type: 'java.time.LocalDate'
  deephaven:isNumberFormat: 'false'
  deephaven:isStyle: 'false'
  deephaven:isDateFormat: 'false'

There may be additional types besides date64 that need to be checked.

Relevant code includes

py/server/deephaven/arrow.py:

_ARROW_DH_DATA_TYPE_MAPPING = {
...
    pa.date64(): 'java.time.LocalDate',
...
}

py/client-ticking/src/pydeephaven_ticking/_core.pyx:

cdef _equivalentTypes = [
    _EquivalentTypes.create(ElementTypeId.kChar, pa.uint16()),
    _EquivalentTypes.create(ElementTypeId.kInt8, pa.int8()),
    _EquivalentTypes.create(ElementTypeId.kInt16, pa.int16()),
    _EquivalentTypes.create(ElementTypeId.kInt32, pa.int32()),
    _EquivalentTypes.create(ElementTypeId.kInt64, pa.int64()),
    _EquivalentTypes.create(ElementTypeId.kFloat, pa.float32()),
    _EquivalentTypes.create(ElementTypeId.kDouble, pa.float64()),
    _EquivalentTypes.create(ElementTypeId.kBool, pa.bool_()),
    _EquivalentTypes.create(ElementTypeId.kString, pa.string()),
    _EquivalentTypes.create(ElementTypeId.kTimestamp, pa.timestamp("ns", "UTC"))
]

We should also investigate pydeephaven (non-ticking) completeness as well.

@devinrsmith devinrsmith added bug Something isn't working triage labels Sep 10, 2024
@devinrsmith devinrsmith added this to the Triage milestone Sep 10, 2024
@devinrsmith
Copy link
Member Author

devinrsmith commented Sep 10, 2024

A temporary workaround might be to translate the LocalDate into an Instant, either on the server, or from the client before subscribing.

my_table = (
    my_table
    .update_view(
        [
            "my_timestamp=toInstant(my_date, java.time.LocalTime.MIDNIGHT, 'America/New_York')"
        ]
    )
    .drop_columns(["my_date"])
)

@rcaudy rcaudy added core Core development tasks python-client and removed triage labels Sep 10, 2024
@rcaudy rcaudy modified the milestones: Triage, 0.37.0 Sep 10, 2024
@jmao-denver
Copy link
Contributor

I noticed at least we are missing pa.date64 in the supported arrow type list in the static Py client.

@niloc132
Copy link
Member

Would be a good follow-up for #3455 and related tickets, as we start expanding support for arrow types.

@jmao-denver
Copy link
Contributor

I am wondering if the same issue exists in the Go/R clients. @alexpeters1208 ?

@jmao-denver
Copy link
Contributor

@alexpeters1208 and I tested the R client and it suffers almost the same problem and what's different is that the error is reported by the C++ client which the R client relies on. Therefore it seems that changes are needed for both the C++ client and the Python ticking client (cython) to expand the list of supported Arrow types. Once that's achieved, the R client should just work, that would leave us only the Go client to deal with.

@alexpeters1208
Copy link
Contributor

I used the following table from @jmao-denver as the test case for the R client:

import pyarrow as pa
from datetime import datetime
from deephaven import arrow as dharrow

pa_types = [
    pa.time64('ns'),
    pa.date64(),
]
pa_data = [
    pa.array([1_000_001, 1_000_002]),
    pa.array([datetime(2022, 12, 7), datetime(2022, 12, 30)]),
]

fields = [pa.field(f"f{i}", ty) for i, ty in enumerate(pa_types)]
schema = pa.schema(fields)
pa_table = pa.table(pa_data, schema=schema)
dh_table = dharrow.to_table(pa_table)

Using the R client, I get a TableHandle for this table, which succeeds. Some of that TableHandle's methods fail with the following error:

! Can't find Deephaven mapping for arrow data type time64[ns]: static std::optional<deephaven::dhcore::ElementTypeId::Enum> deephaven::client::utility::ArrowUtil::GetElementTypeId(const arrow::DataType&, bool)@/home/alexpeters/Documents/R-Dev/src/deephaven/deephaven-core/cpp-client/deephaven/dhclient/src/utility/arrow_util.cc:110

As it turns out, all of the failing methods have something in common: they all call the TableHandle::Schema() method in the Deephaven C++ client. It seems that schema is resolved lazily, and attemting to resolve it tries to do the type conversion, which fails.

Despite this, I can still successfully convert this table from server-side Deephaven to R arrow, and it has the correct types. This is because the particular implementation of this operation does not touch the Deephaven C++ client at all (at least not in the C++ binding I wrote), but rather exclusively relies on Arrow's C++ client. Relevant code is as_arrow_table -> as_record_batch_reader -> GetArrowArrayStreamPtr.

@alexpeters1208
Copy link
Contributor

In case anyone is interested, here's a full reproducer of this behavior using the R client.

library(rdeephaven)
library(arrow)

client <- Client$new("localhost:10000")

script <- "
import pyarrow as pa
from datetime import datetime
from deephaven import arrow as dharrow

pa_types = [
  pa.time64('ns'),
  pa.date64()
]
pa_data = [
  pa.array([1_000_001, 1_000_002]),
  pa.array([datetime(2022, 12, 7), datetime(2022, 12, 30)])
]

fields = [pa.field(f'f{i}', ty) for i, ty in enumerate(pa_types)]
schema = pa.schema(fields)
pa_table = pa.table(pa_data, schema=schema)
dh_table = dharrow.to_table(pa_table)
"

# create table on the server with bad types
client$run_script(script=script)

# get table handle
th <- client$open_table("dh_table")

# most methods do not eventually invoke TableHandle::schema()
th$nrow()
new_th <- th.update("X = ii")
new_at <- as_arrow_table(new_th)

# others do call TableHandle::Schema()
th$dim()
th$ncol()
arrow_table(new_th)

@kosak
Copy link
Contributor

kosak commented Sep 15, 2024

Hi,

Thanks @alexpeters1208 for the repros. In my "opinion" supporting three time types is a bit much. I can see why someone would maybe like LocalDate, but supporting the timezoneless LocalTime seems like taking a trip straight to Sad Town. That said, implementing this in C++ is fairly straightforward. I would just have to do the mapping, as above, and then either create some new data types or find some C# data types that are close enough.

In any case, having all of our clients support the same set of types is the right thing to do.

@niloc132
Copy link
Member

I can see why someone would maybe like LocalDate, but supporting the timezoneless LocalTime seems like taking a trip straight to Sad Town.

Both LocalDate and LocalTime are inherently timezoneless, and while Instant (and maybe ZonedDateTime) types have their place in terms of "this thing happened (or will/may happen) and here's when", but the fleshy meatbags who use computers insist on all kinds of more specific timezoneless formats for all kinds of stuff...

"Set my alarm for 7am" okay, when you travel 'at hour west' (whatever that means), do you get up at 7am in the new timezone? And if you get there a week before a daylight saving time adjustment, why should I have to adjust what time I wake up, the rest of the world should fit my preferred starting time.

"My birthday is April 26" sure, but is that from 00:00 to 23:59 in CEST, or australia DST? Or maybe we only celebrate at the Instant you were born, plus N years.

I don't disagree that having more "time" types seems worse than fewer types, but having "not enough" types is an even bigger pitfall...

@kosak
Copy link
Contributor

kosak commented Sep 15, 2024

I agree with that set of observations but would add (hold on while I consult my Euler diagram) the intersection between the kinds of loosey-goosey dates humans use (birthdays etc) and the kinds of data that properly belong in a high-performance streaming join database for financial professionals might be small or empty.

jmao-denver added a commit that referenced this issue Sep 25, 2024
…ata types in the Python client (#6048)

related to #6044

The DH server is now able to convert between Arrow date64/time64 and
Java LocalDate/LocalTime. The Python client needs to include the mapping
of these two types when building Arrow schema for uploading Arrow data
to the server.
kosak added a commit that referenced this issue Sep 26, 2024
This PR adds support for `LocalDate` and `LocalTime` to
pydeephaven/ticking.

Note this PR depends on
#6105, so that one
should be merged first.

It also depends on
#6137, which also needs
to be merged first.

I also add a comment to pydeephaven/README.md to remind people to make a
venv.

Also note that this is a fix for
#6044
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core Core development tasks python-client user-reported
Projects
None yet
Development

No branches or pull requests

7 participants