Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rtree #1063

Open
wants to merge 28 commits into
base: master
Choose a base branch
from
Open

Rtree #1063

wants to merge 28 commits into from

Conversation

noahnock
Copy link

Bachelorprojekt - Noah Nock

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left some initial meta comments for the Rtree.h and .cpp, so that you can start making this code more readable.

CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
src/util/Rtree.h Outdated Show resolved Hide resolved
src/util/Rtree.h Outdated Show resolved Hide resolved
src/util/Rtree.h Outdated Show resolved Hide resolved
src/util/Rtree.h Outdated Show resolved Hide resolved
src/util/Rtree.cpp Outdated Show resolved Hide resolved
src/util/Rtree.cpp Outdated Show resolved Hide resolved
src/util/Rtree.cpp Outdated Show resolved Hide resolved
src/util/Rtree.cpp Outdated Show resolved Hide resolved
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional suggestions for parts of the code.
And of course you can apply similar techniques to get rid of duplications on many more places.
I am now fairly sure, that this whole file can be brought down to at most 328 lines of code to have a precise target:)

src/util/RtreeBuild.cpp Outdated Show resolved Hide resolved
src/util/RtreeBuild.cpp Outdated Show resolved Hide resolved
src/util/RtreeBuild.cpp Outdated Show resolved Hide resolved
src/util/RtreeBuild.cpp Outdated Show resolved Hide resolved
src/util/RtreeBuild.cpp Outdated Show resolved Hide resolved
src/util/RtreeBuild.cpp Outdated Show resolved Hide resolved
src/util/RtreeBuild.cpp Outdated Show resolved Hide resolved
src/util/RtreeBuild.cpp Outdated Show resolved Hide resolved
src/util/RtreeBuild.cpp Outdated Show resolved Hide resolved
src/util/RtreeBuild.cpp Outdated Show resolved Hide resolved
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some further comments for cleaning this up.

src/util/Rtree.h Outdated Show resolved Hide resolved
src/util/Rtree.h Outdated Show resolved Hide resolved
src/util/Rtree.h Outdated Show resolved Hide resolved
src/util/Rtree.h Outdated Show resolved Hide resolved
src/util/Rtree.h Outdated Show resolved Hide resolved
src/util/RtreeBuild.cpp Outdated Show resolved Hide resolved
src/util/RtreeBuild.cpp Outdated Show resolved Hide resolved
src/util/RtreeBuild.cpp Outdated Show resolved Hide resolved
src/util/RtreeBuild.cpp Outdated Show resolved Hide resolved
src/util/RtreeBuild.cpp Outdated Show resolved Hide resolved
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A rough review of the files directly related to the Rtree.
There is still a lot of copy-paste-boilerplate-duplicate code that makes it hard to read, please take care of it.

#ifndef QLEVER_NODE_H
#define QLEVER_NODE_H

#include "./Rtree.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Move all the Rtree-related stuff into a subdirectory util/rtree

  2. Make the includes absolute (for all files:
    include "util/rtree/Rtree.h"

Comment on lines +9 to +13
ConstructionNode::ConstructionNode(uint64_t id, OrderedBoxes orderedBoxes)
: RtreeNode{id} {
this->orderedBoxes_ = orderedBoxes;
// calculate the boundingBoxes
this->boundingBox_ = orderedBoxes.GetBoundingBox();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this-> if there's no technical reason for it (you now use the trailing_) to disambiguate.

Suggested change
ConstructionNode::ConstructionNode(uint64_t id, OrderedBoxes orderedBoxes)
: RtreeNode{id} {
this->orderedBoxes_ = orderedBoxes;
// calculate the boundingBoxes
this->boundingBox_ = orderedBoxes.GetBoundingBox();
ConstructionNode::ConstructionNode(uint64_t id, OrderedBoxes orderedBoxes)
: RtreeNode{id, orderedBoxes.GetBoundingBox()} , orderedBoxes_{std::move(orderedBoxes)}{}

* Add all children of a certain node at once.
* This is used when a leaf node is reached.
*/
if (this->GetOrderedBoxes().WorkInRam()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never use this-> (everywhere)

Comment on lines +21 to +34
if (this->GetOrderedBoxes().WorkInRam()) {
for (RTreeValueWithOrderIndex box :
this->GetOrderedBoxes().GetRectanglesInRam()) {
RtreeNode leafNode = RtreeNode(box.id, box.box);
this->AddChild(leafNode);
}
} else {
for (const RTreeValueWithOrderIndex& element :
FileReader(this->GetOrderedBoxes().GetRectanglesOnDisk())) {
RtreeNode leafNode = RtreeNode(element.id, element.box);
this->AddChild(leafNode);
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the orderedBoxes_ a std::variant<blaInRam, blaOnDisk> and then use std::visit here (more typesafety, and less duplicated code).

Comment on lines +36 to +38
OrderedBoxes& ConstructionNode::GetOrderedBoxes() {
return this->orderedBoxes_;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can just be called orderedBoxes(). And consider making the non-const getters private wherever possibly (unless you need them for some reason, I don't yet undderstand enough).

Comment on lines +102 to +104
auto match : ctre::range<
R"( *([\-|\+]?[0-9]+(?:[.][0-9]+)?) +([\-|\+]?[0-9]+(?:[.][0-9]+)?))">(
wkt)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somewhere have some ctre helpers that allow for concatenation of regexes, then you can get rid of the duplication. Also: does the wkt format allow for arbitrary whitespace, or does it have to be the space character?

Comment on lines +124 to +127
[[nodiscard]] double MinX() const { return box.min_corner().get<0>(); }
[[nodiscard]] double MaxX() const { return box.max_corner().get<0>(); }
[[nodiscard]] double MinY() const { return box.min_corner().get<1>(); }
[[nodiscard]] double MaxY() const { return box.max_corner().get<1>(); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, you ave this functionality, reuse it (the first MinX function in this file.).

Comment on lines +328 to +367
auto sizeLeft =
(uint64_t)(std::ceil(((double)splitResult.bestIndex - 2.0) / 2.0) *
(double)S);
uint64_t sizeRight = this->size_ - sizeLeft;
uint64_t split0ByteSize =
sizeLeft * (4 * sizeof(double) + sizeof(uint64_t) + 2 * sizeof(uint64_t));
uint64_t split1ByteSize = sizeRight * (4 * sizeof(double) + sizeof(uint64_t) +
2 * sizeof(uint64_t));
bool split0InRam = split0ByteSize * 4 < maxBuildingRamUsage;
bool split1InRam = split1ByteSize * 4 < maxBuildingRamUsage;

if (!split0InRam) {
splitBuffers.rectsD0Split0.rectangles = filePath + ".0.dim0.tmp";
splitBuffers.rectsD1Split0.rectangles = filePath + ".0.dim1.tmp";
}

if (!split1InRam) {
splitBuffers.rectsD0Split1.rectangles = filePath + ".1.dim0.tmp";
splitBuffers.rectsD1Split1.rectangles = filePath + ".1.dim1.tmp";
}

std::pair<BasicGeometry::BoundingBox, BasicGeometry::BoundingBox>
boundingBoxes =
PerformSplit(splitResult, splitBuffers, M, S, maxBuildingRamUsage);

if (!split0InRam) {
split0.SetOrderedBoxesToDisk(rectsD0Split0, rectsD1Split0, sizeLeft,
boundingBoxes.first);
} else {
split0.SetOrderedBoxesToRam(rectsD0Split0, rectsD1Split0,
boundingBoxes.first);
}

if (!split1InRam) {
split1.SetOrderedBoxesToDisk(rectsD0Split1, rectsD1Split1, sizeRight,
boundingBoxes.second);
} else {
split1.SetOrderedBoxesToRam(rectsD0Split1, rectsD1Split1,
boundingBoxes.second);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of code duplication for split0 and split1 factor out the current code using one or multiple lambdas, and then just call.

Comment on lines +431 to +455
if (splitResult.bestDim == 0) {
pushSmallBoundaries(splitBuffers.rectsD0Split0.rectanglesSmall,
splitBuffers.rectsD0Split1.rectanglesSmall);

// placeholder, since we need the min and max element of the split in the
otherDimension.smallSplit0 = &splitBuffers.rectsD1Split0.rectanglesSmall;
otherDimension.smallSplit1 = &splitBuffers.rectsD1Split1.rectanglesSmall;
// first two spots
otherDimension.smallSplit0->emplace_back();
otherDimension.smallSplit0->emplace_back();
otherDimension.smallSplit1->emplace_back();
otherDimension.smallSplit1->emplace_back();
} else {
pushSmallBoundaries(splitBuffers.rectsD1Split0.rectanglesSmall,
splitBuffers.rectsD1Split1.rectanglesSmall);

// placeholder
otherDimension.smallSplit0 = &splitBuffers.rectsD0Split0.rectanglesSmall;
otherDimension.smallSplit1 = &splitBuffers.rectsD0Split1.rectanglesSmall;

otherDimension.smallSplit0->emplace_back();
otherDimension.smallSplit0->emplace_back();
otherDimension.smallSplit1->emplace_back();
otherDimension.smallSplit1->emplace_back();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same argument here about code duplication.

Comment on lines +512 to +548
if (dim == 0) {
currentSmallList = &splitBuffers.rectsD0Split0.rectanglesSmall;
if (currentSplitInRam || workInRam) {
currentList = &std::get<multiBoxWithOrderIndex>(
splitBuffers.rectsD0Split0.rectangles);
} else {
currentList = &rectanglesOnDiskS0D0Stream.value();
}
} else {
currentSmallList = &splitBuffers.rectsD1Split0.rectanglesSmall;
if (currentSplitInRam || workInRam) {
currentList = &std::get<multiBoxWithOrderIndex>(
splitBuffers.rectsD1Split0.rectangles);
} else {
currentList = &rectanglesOnDiskS0D1Stream.value();
}
}
} else {
if (dim == 0) {
currentSmallList = &splitBuffers.rectsD0Split1.rectanglesSmall;
if (currentSplitInRam || workInRam) {
currentList = &std::get<multiBoxWithOrderIndex>(
splitBuffers.rectsD0Split1.rectangles);
} else {
currentList = &rectanglesOnDiskS1D0Stream.value();
}
} else {
currentSmallList = &splitBuffers.rectsD1Split1.rectanglesSmall;
if (currentSplitInRam || workInRam) {
currentList = &std::get<multiBoxWithOrderIndex>(
splitBuffers.rectsD1Split1.rectangles);
} else {
currentList = &rectanglesOnDiskS1D1Stream.value();
}
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a lot of code duplication for the dim0 and dim1, please remove.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants