Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide framework for generic lazily evaluated operation results #1350

Draft
wants to merge 104 commits into
base: master
Choose a base branch
from

Conversation

RobinTF
Copy link
Collaborator

@RobinTF RobinTF commented May 18, 2024

Still WIP. Currently missing:

  • Discussion about remaining TODOs
  • Lots of unit tests
  • Also most likely some functions need to be broken up into smaller pieces once we found everything else to be working "correctly".
  • Documentation of all newly introduced functions once they're becoming somewhat "final"
  • Cold Fusion & World domination?

src/engine/Operation.cpp Outdated Show resolved Hide resolved
result._resultPointer->resultTable()->idTable().numColumns();
LOG(DEBUG) << "Computed result of size " << resultNumRows << " x "
<< resultNumCols << std::endl;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this debug message provide any real benefit to make it worth somehow incorporating it into lazily evaluated operations?

Copy link

codecov bot commented May 18, 2024

Codecov Report

Attention: Patch coverage is 78.61446% with 142 lines in your changes missing coverage. Please review.

Project coverage is 89.35%. Comparing base (3fad814) to head (603a48b).

Files Patch % Lines
src/engine/Result.cpp 71.09% 67 Missing and 7 partials ⚠️
src/engine/Operation.cpp 79.62% 18 Missing and 4 partials ⚠️
src/util/CacheableGenerator.h 87.40% 0 Missing and 17 partials ⚠️
src/engine/IndexScan.cpp 5.88% 15 Missing and 1 partial ⚠️
src/util/Cache.h 84.84% 0 Missing and 5 partials ⚠️
src/engine/ExportQueryExecutionTrees.cpp 93.33% 0 Missing and 3 partials ⚠️
src/util/ConcurrentCache.h 70.00% 0 Missing and 3 partials ⚠️
src/engine/QueryExecutionTree.cpp 87.50% 0 Missing and 1 partial ⚠️
src/util/IteratorWrapper.h 88.88% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1350      +/-   ##
==========================================
- Coverage   89.63%   89.35%   -0.29%     
==========================================
  Files         343      345       +2     
  Lines       29951    30472     +521     
  Branches     3315     3393      +78     
==========================================
+ Hits        26847    27227     +380     
- Misses       1957     2063     +106     
- Partials     1147     1182      +35     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

joka921 pushed a commit that referenced this pull request May 23, 2024
This PR contains all the changes from the infrastructure for lazy operation evaluation (#1350)  that are simple and repetitive, but touch many files. In particular:

* Rename the `ResultTable` class to `Result` (a TODO suggested by @hannahbast some time ago).
* Add a new parameter `bool requestLaziness` to `Operation::computeResult`. This parameter is currently unused.
src/engine/Operation.cpp Outdated Show resolved Hide resolved
updateRuntimeInformationOnSuccess(
*resultAndCacheStatus._resultPointer->resultTable(),
// TODO<RobinTF> find a better representation for "unknown" than 0.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This concept doesn't 100% make sense for lazy evaluation. We could of course provide a lower bound for qlever-ui to display by adding this to the onSizeChanged listener or something

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now this is not the primary concern:)

@@ -38,6 +37,8 @@ class WaitedForResultWhichThenFailedException : public std::exception {
enum struct CacheStatus {
cachedNotPinned,
cachedPinned,
// TODO<RobinTF> Rename to notCached, the name is just confusing. Can
// potentially be merged with notInCacheAndNotComputed.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Task for a follow-up PR

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first round of comments.
I haven't looked at the whole caching stuff yet.

src/util/IteratorWrapper.h Outdated Show resolved Hide resolved
src/util/IteratorWrapper.h Show resolved Hide resolved
src/util/IteratorWrapper.h Outdated Show resolved Hide resolved
src/util/CacheableGenerator.h Outdated Show resolved Hide resolved
return;
}
if (masterState_ == MasterIteratorState::MASTER_STARTED && !isMaster) {
conditionVariable_.wait(lock, [this, index]() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should implement a time limit here, especially if you synchronously block a thread here.
We need to handle the case of slow masters. I still think that the easiest way to do this is that also the master can receive a IteratorExpired because there is no conceptual difference between a master and a non-master.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on the time limit, but as I told you in person the master/slave system does have some benefits so having only slaves pass a time limit would be an option, but this might be a task for a follow-up PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that the non-master approach + everybody having a timeout is better. We can probably dispute this further, the current state (forever blocking until the slow master times out) is a blocker for this PR.

src/util/CacheableGenerator.h Show resolved Hide resolved
sortedBy_{std::move(sortedBy)},
localVocab_{std::move(localVocab)} {}

bool isDataEvaluated() const noexcept {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this rather isMaterialized as opposed to isLazy?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name can be changed to everything you want once everything else is settled

src/engine/Operation.cpp Outdated Show resolved Hide resolved
src/engine/Operation.cpp Outdated Show resolved Hide resolved
updateRuntimeInformationOnSuccess(
*resultAndCacheStatus._resultPointer->resultTable(),
// TODO<RobinTF> find a better representation for "unknown" than 0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now this is not the primary concern:)

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments.

Comment on lines +243 to +244
if (_maxSizeSingleEntry < newSize ||
_maxSize - std::min(_totalSizePinned, _maxSize) < newSize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I don't understand/believe:

why not

sizeDiff = newSize - oldSize;  // check for underflow...
if (totalSizePinnedPlusNonPinnedEverything + sizeDiff > _maxSize) erase(key);
```.
(+ The `_maxSizeSingleEntry` check of course, that one is undisputed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because _maxSize is of type MemorySize which will throw an exception instead of becoming a negative size value. I remember fiddling around with this a lot until it worked and this is what my changes converged to. I must admit that I don't know anymore why I settled on these exact changes.

src/util/Cache.h Outdated Show resolved Hide resolved
src/util/Cache.h Outdated Show resolved Hide resolved
src/util/Cache.h Show resolved Hide resolved
return;
}
if (masterState_ == MasterIteratorState::MASTER_STARTED && !isMaster) {
conditionVariable_.wait(lock, [this, index]() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that the non-master approach + everybody having a timeout is better. We can probably dispute this further, the current state (forever blocking until the slow master times out) is a blocker for this PR.

Copy link

sonarcloud bot commented Aug 2, 2024

joka921 pushed a commit that referenced this pull request Aug 2, 2024
…tead of `Result` (#1433)

This is a small preparation for lazy operations (see #1350)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants