Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Multijoin feature to community #4170

Merged
merged 19 commits into from
Aug 4, 2023
Merged

Conversation

lbooker42
Copy link
Contributor

Port the enterprise multijoin feature to allow more efficient computation of multiple table joins.

The multiJoin operation collects the set of distinct keys from the input tables, and then joins one row from each of the input tables onto the result. Input tables need not have a match row for each key, but they may not have multiple matching rows for a given key. The operation can be thought of as a merge of the key columns, followed by a selectDistinct and then a series of iterative naturalJoin operations as follows:

private Table doIterativeMultiJoin(String [] keyColumns, List<? extends Table> inputTables) {
    final List<Table> keyTables = inputTables.stream().map(t -> t.view(keyColumns)).collect(Collectors.toList());
    final Table base = TableTools.merge(keyTables).selectDistinct(keyColumns);

    Table result = base;
    for (int ii = 0; ii < inputTables.size(); ++ii) {
        result = result.naturalJoin(inputTables.get(ii), Arrays.asList(keyColumns));
    }

    return result;
}

@lbooker42 lbooker42 self-assigned this Jul 11, 2023
@lbooker42 lbooker42 added feature request New feature or request java DocumentationNeeded ReleaseNotesNeeded Release notes are needed labels Jul 11, 2023
@lbooker42 lbooker42 added this to the July 2023 milestone Jul 11, 2023
Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of just the changes since Tuesday.

Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP on review

@rcaudy
Copy link
Member

rcaudy commented Jul 28, 2023

Review should continue with MultiJoinMergedListener and the actual state managers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but maybe this should look like NanosBasedTimeArraySource, and be a wrapper around a ByteArraySource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #4272 to track this request.

Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP. Only incremental state managers left.

Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed current state of incremental state managers.

return QueryPerformanceRecorder.withNugget("multiJoin",
() -> new MultiJoinTableImpl(joinControl, multiJoinInputs));

final Table[] tables = Arrays.stream(multiJoinInputs).map(MultiJoinInput::inputTable).toArray(Table[]::new);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably memoize this, using the memoization cache on the first table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can hold for another PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created issue #4273 to track this.

mainKeySources[ii] = InMemoryColumnSource.getImmutableMemoryColumnSource(tableSize,
alternateKeySources[ii].getType(), alternateKeySources[ii].getComponentType());
mainKeySources[ii].ensureCapacity(tableSize);
}
alternateTableSize = oldTableSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this assignment effect anything? I think it's immaterial, since we'll ignore any hashed values that are greater than the rehashPointer (which is zero). Some part of me wants to assign it to oldTableSize in newAlternate, and back to 1 in clearAlternate, just for cleanliness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oldTableSize is local to doRehash() and the newAlternate() signature is determined by the core hash generation code. Adding the old size as a parameter is a significant change that will affect all existing hashing routines.

The alternative is to push oldTableSize into a member variable so it can be accessed by newAlternate(). Do you want me to do this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you have done is good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I don't love that you're clearing it in clearAlternate, and assigning it after newAlternate, I think this is good enough for now.

@lbooker42 lbooker42 merged commit f15c8ff into deephaven:main Aug 4, 2023
10 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Aug 4, 2023
@deephaven-internal
Copy link
Contributor

Labels indicate documentation is required. Issues for documentation have been opened:

How-to: https://github.com/deephaven/deephaven.io/issues/2995
Reference: https://github.com/deephaven/deephaven.io/issues/2996

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
DocumentationNeeded feature request New feature or request java ReleaseNotesNeeded Release notes are needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants