Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Utilities to assist in retrieving data from tables #2857

Open
wants to merge 45 commits into
base: main
Choose a base branch
from

Conversation

rbasralian
Copy link
Contributor

Adds utilities to make it easier to pull data from a table.

There is a KeyedRecordAdapter that leverages the ToMapListener to allow pulling data by key (e.g. USym):

final KeyedRecordAdapter<String, MyTradeHolder> keyedRecordAdapter = KeyedRecordAdapter.makeKeyedRecordAdapter(
                tradesTable,
                myTradeHolderRecordAdapterDescriptor,
                "USym",
                String.class
        );
final MyTradeHolder lastAaplTrade = keyedRecordAdapter.getRecord("AAPL");
final Map<String, MyTradeHolder> lastTradesBySymbol = keyedRecordAdapter.getRecords("AAPL", "CAT", "SPY");

And a TableToRecordListener that lets you listen to updates as objects instead of just row keys:

final Consumer<MyTradeHolder> tradeConsumer = trade -> System.out.println(trade.toString());
TableToRecordListener.create(tradesTable, myTradeHolderRecordAdapterDescriptor, tradeConsumer);

The mapping of columns to fields in whatever object is handled by a RecordAdapterDescriptor:

RecordAdapterDescriptor<MyTradeHolder> myTradeHolderRecordAdapterDescriptor = RecordAdapterDescriptorBuilder
            .create(MyTradeHolder::new)
            .addColumnAdapter("Sym", RecordUpdater.getStringUpdater(MyTradeHolder::setSym))
            .addColumnAdapter("Price", RecordUpdater.getDoubleUpdater(MyTradeHolder::setPrice))
            .addColumnAdapter("Size", RecordUpdater.getIntUpdater(MyTradeHolder::setSize))
            .addColumnAdapter("Timestamp", RecordUpdater.getReferenceTypeUpdater(DBDateTime.class, MyTradeHolder::setTimestamp))
            .build();

The RecordAdapterDescriptor can be used to create a SingleRowRecordAdapter or a MultiRowRecordAdapter. A single-row adapter creates an object (MyTradeHolder or JsonNode or whatever) and reads data from the columns directly into that object. A multi-row adapter has more steps but is intended to be efficient and chunk-oriented. It will:

  1. Create arrays to hold data for all of the columns.
  2. Read the data for all columns into those arrays (by chunk).
  3. Create an array of records (e.g. MyTradeHolder[], JsonNode[], whatever) and fill it with new empty records.
  4. Populate all of the records with the data from the arrays, one column at a time.

A record adapter instance can be generated based on the descriptor. For example, com.illumon.iris.db.util.dataadapter.rec.json.JsonRecordAdapterUtil#createJsonRecordAdapterDescriptor(t, "Col1", "Col2", "Col3").createMultiRowRecordAdapter(t) would figure out the types of "Col1"/"Col2"/"Col3" and generate a class that populates ObjectNodes with those fields.

There is also a draft Python version that lets you create data structures by passing data to either a function or straight to some type's constructor:

class StockTrade:
    sym: str
    price: float
    size: int
    exch: str

    def __init__(self, sym: str, price: float, size: int, exch: str):
        self.sym = sym
        self.price = price
        self.size = size
        self.exch = exch

keyed_record_adapter: KeyedRecordAdapter[str, StockTrade] = KeyedRecordAdapter(
    source,
    StockTrade,
    'Sym',
    ["Price", "Size", "Exch"]
)

records = keyed_record_adapter.get_records(["AAPL", None])

aapl_trade = records['AAPL']
self.assertEquals('AAPL', aapl_trade.sym)
self.assertEquals(1.1, aapl_trade.price)
self.assertEquals(10_000_000_000, aapl_trade.size)
self.assertEquals('ARCA', aapl_trade.exch)

Random notes:

  • This is a generic version of a utility we previously used for latency monitoring (that's where the "create arrays to hold data, consistently read chunks from columns and copy the chunks to the arrays, then copy the data from the arrays back into row-oriented data structures" part comes from)
  • The MultiRowRecordAdapter is responsible for the real work of pulling data out of tables; both KeyedRecordAdapter and TableToRecordListener use that. (KeyedRecordAdapter also uses SingleRowRecordAdapter when fetching single keys.)
  • The Python version (keyed_record_adapter.py/PyKeyedRecordAdapter.java) only supports strings as keys and has no listening component.
  • (The Java unit tests pass. The Python unit tests fail on primitive keys.)

@margaretkennedy
Copy link
Contributor

Copy link
Member

@chipkent chipkent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I found this PR very tough to review in a thorough way.

  1. There is a ton of boilerplate that obscures the signal I'm trying to see.
  2. The overall high-level design for the Java code is not apparent, which makes it hard to follow.
  3. I have a feeling that the java could be simplified in some significant way, but I'm at a loss for saying what this would look like, since I'm still not fully grasping the high-level view.

@@ -0,0 +1,33 @@
plugins {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I THINK that there has been a drift to sub project folders being lower case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QueryUtil is a vague name that is just going to lead this to become a dumping ground. It needs a more purposeful name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe QueryUtil should be dropped and just keep dataadaper?

Also, python has the experimental package. Should Java have the same thing? @rcaudy


import java.util.Objects;

public class ChunkRetrievalUtil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do all of the classes need to be public? Changing them would reducer our API surface area in a good way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turned down this & a few others

@@ -0,0 +1,101 @@
package io.deephaven.queryutil.dataadapter.datafetch.bulk.gen;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is just example code and not real code, should it go in test or somewhere else so that it doesn't clutter the codebase? @rcaudy

@@ -0,0 +1,177 @@
package io.deephaven.queryutil.dataadapter.datafetch.bulk.gen;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is generated, it should have a header indicating how it was generated, how to regenerate it, not to edit it, etc. Same for all related files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not generated; it handles generation

Comment on lines +176 to +180
# source = Table(_JTstTools.testRefreshingTable(
# _JTstTools.i(2, 4).copy().toTracking(),
# _JTableTools.intCol("Sym", 'AAPL', 'AAPL', None),
# _JTableTools.doubleCol("Price", 1.1, 1.2, NULL_DOUBLE),
# _JTableTools.longCol("Size", 10_000_000_000, 1_000, NULL_LONG),
# _JTableTools.col("Exch", "ARCA", 'NASDAQ', None),
# ))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kill?

Comment on lines +204 to +203
keyed_record_adapter: KeyedRecordAdapter[
List[str], StockTradeAlternateConstructor] = make_record_adapter_with_constructor(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unneeded type hint?

Comment on lines +236 to +240
# source = Table(_JTstTools.testRefreshingTable(
# _JTstTools.i(2, 4).copy().toTracking(),
# _JTableTools.intCol("Sym", 'AAPL', 'AAPL', None),
# _JTableTools.doubleCol("Price", 1.1, 1.2, NULL_DOUBLE),
# _JTableTools.longCol("Size", 10_000_000_000, 1_000, NULL_LONG),
# _JTableTools.col("Exch", "ARCA", 'NASDAQ', None),
# ))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kill?

def create_stock_trade(sym: str, exch: str, price: float, size: int):
return StockTrade(sym, price, size, exch)

keyed_record_adapter: KeyedRecordAdapter[List[str], StockTrade] = make_record_adapter(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unneeded type hint?



if __name__ == '__main__':
unittest.main()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In PartitionedTables, @rcaudy has resolved the 1- vs many-key values problem. This whole PR should strive for the same for a user experience.

@rbasralian rbasralian self-assigned this Jan 5, 2023
@rbasralian rbasralian added the ReleaseNotesNeeded Release notes are needed label Jan 11, 2023
@rbasralian rbasralian marked this pull request as ready for review June 24, 2024 16:35
…cord_adapter

# Conflicts:
#	Util/src/main/java/io/deephaven/util/type/TypeUtils.java
#	engine/table/src/test/java/io/deephaven/engine/util/TestTypeUtils.java
@rbasralian rbasralian changed the title WIP: DH-12293 Utilities to assist in retrieving data from tables feat: Utilities to assist in retrieving data from tables Aug 2, 2024
@chipkent
Copy link
Member

chipkent commented Nov 1, 2024

I have not looked at this PR recently, so I don't remember the contents. DHC has added functionality to help users extract data. I'm noting it here in case it makes this PR no longer needed.

#5595
https://deephaven.io/core/docs/how-to-guides/iterate-table-data/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants