Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARQL Update Part 1: Located Triples #1379

Merged
merged 46 commits into from
Jul 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
0694eb5
Extract triple location code to separate branch
Qup42 Jun 19, 2024
d5d297c
Add `LocatedTriplesTest` to build
Qup42 Jun 20, 2024
b536a2a
CodeReview
Qup42 Jun 20, 2024
2c4b4e7
Have printing function as class members in the header
Qup42 Jun 21, 2024
5419dec
Make `IdTriple` its own type
Qup42 Jun 21, 2024
1c95e4a
Remove no longer used pretty printer
Qup42 Jun 21, 2024
00add84
More concrete print function
Qup42 Jun 21, 2024
e81127e
Simply tripleLocation test
Qup42 Jun 21, 2024
e587ebd
Revert change that should not yet be done
Qup42 Jun 21, 2024
cdd40a9
Add missing test helpers
Qup42 Jun 21, 2024
f70ce8e
Commit
Qup42 Jun 21, 2024
2bceb94
Fix segfault
Qup42 Jun 21, 2024
ed2a3af
Fix
Qup42 Jun 21, 2024
cc18eef
Code Review
Qup42 Jun 21, 2024
ec6d1df
Code Review
Qup42 Jun 24, 2024
62dd444
Code Review
Qup42 Jun 24, 2024
03938b1
Template `numIndex` in `mergeTriples`
Qup42 Jun 24, 2024
77db066
Use implicit copy constructor
Qup42 Jun 24, 2024
9230326
Mark print function's purpose
Qup42 Jun 24, 2024
480bc57
Improvements
Qup42 Jun 25, 2024
1b1597d
Improvements
Qup42 Jun 26, 2024
f89d9e6
Add new method to `LocatedTriplesPerBlock`
Qup42 Jun 26, 2024
b0d4894
Add more functions to `LocatedTriples`
Qup42 Jun 29, 2024
d3aed0e
`LocatedTriple::locateTriplesInPermutation` takes the block metadata …
Qup42 Jul 1, 2024
e731418
Add additional check when writing remaining rows in block merge
Qup42 Jul 1, 2024
cfea64b
Some cleanup
Qup42 Jul 2, 2024
7587123
Execute `LocatedTriplesTest` in parallel
Qup42 Jul 2, 2024
04776be
Add interface changes from other branch
Qup42 Jul 2, 2024
e052620
Consolidate and test LocatedTriples interface
Qup42 Jul 3, 2024
e55b8a9
Dummy test for printing function
Qup42 Jul 3, 2024
e752dfe
Revert change that is no longer required at this stage
Qup42 Jul 3, 2024
bd7d4d4
Apply sonarlint suggestion
Qup42 Jul 3, 2024
9a3249a
Merge branch 'refs/heads/master' into update/locatedTriples
Qup42 Jul 5, 2024
70757a4
Add missing switch cases from 9a3249af
Qup42 Jul 5, 2024
bfd6f0d
Revert "Add missing switch cases from 9a3249af"
Qup42 Jul 5, 2024
21b1e23
Merge branch 'refs/heads/master' into update/locatedTriples
Qup42 Jul 5, 2024
355b77a
Fix sonarcloud errors
Qup42 Jul 5, 2024
a658818
Merge branch 'refs/heads/master' into update/locatedTriples
Qup42 Jul 5, 2024
3088c57
Code Review
Qup42 Jul 7, 2024
02309d2
Code Review
Qup42 Jul 7, 2024
8430b2d
Code Review
Qup42 Jul 7, 2024
ef85411
Code Review
Qup42 Jul 8, 2024
13e3f56
Fix for stdc++16
Qup42 Jul 11, 2024
21cfbba
Apply final suggestion
Qup42 Jul 16, 2024
928920d
Add another test
Qup42 Jul 16, 2024
3d6663f
Apply suggestion
Qup42 Jul 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions src/engine/idTable/IdTable.h
Original file line number Diff line number Diff line change
Expand Up @@ -751,6 +751,17 @@ class IdTableStatic
*(static_cast<Base*>(this)) = std::move(b);
return *this;
}

// This operator is only for debugging and testing. It returns a
// human-readable representation.
friend std::ostream& operator<<(std::ostream& os,
const IdTableStatic& idTable) {
os << "{ ";
std::ranges::copy(
idTable, std::ostream_iterator<columnBasedIdTable::Row<Id>>(os, " "));
Qup42 marked this conversation as resolved.
Show resolved Hide resolved
os << "}";
return os;
}
};

// This was previously implemented as an alias (`using IdTable =
Expand Down
12 changes: 12 additions & 0 deletions src/engine/idTable/IdTableRow.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#include <variant>
#include <vector>

#include "global/Id.h"
#include "util/Enums.h"
#include "util/Exception.h"
#include "util/Forward.h"
Expand Down Expand Up @@ -93,6 +94,17 @@ class Row {
std::ranges::copy(*this, result.begin());
return result;
}

// This operator is only for debugging and testing. It returns a
// human-readable representation.
friend std::ostream& operator<<(std::ostream& os, const Row& idTableRow)
requires(std::is_same_v<T, Id>) {
os << "(";
for (size_t i = 0; i < idTableRow.numColumns(); ++i) {
os << idTableRow[i] << (i < idTableRow.numColumns() - 1 ? " " : ")");
}
return os;
}
};

// The following two classes store a reference to a row in the underlying
Expand Down
59 changes: 56 additions & 3 deletions src/global/IdTriple.h
Original file line number Diff line number Diff line change
@@ -1,12 +1,65 @@
// Copyright 2024, University of Freiburg
// Chair of Algorithms and Data Structures
// Authors: Hannah Bast <bast@cs.uni-freiburg.de>
// Authors:
// 2023 Hannah Bast <bast@cs.uni-freiburg.de>
// 2024 Julian Mundhahs <mundhahj@tf.uni-freiburg.de>

#pragma once

#include <array>
#include <ostream>

#include "global/Id.h"
#include "index/CompressedRelation.h"

// Should we have an own class for this? We need this at several places.
using IdTriple = std::array<Id, 3>;
template <size_t N = 0>
struct IdTriple {
// The three IDs that define the triple.
std::array<Id, 3> ids_;
// Some additional payload of the triple, e.g. which graph it belongs to.
std::array<Id, N> payload_;

explicit IdTriple(const std::array<Id, 3>& ids) requires(N == 0)
: ids_(ids), payload_(){};

explicit IdTriple(const std::array<Id, 3>& ids,
const std::array<Id, N>& payload) requires(N != 0)
: ids_(ids), payload_(payload){};

friend std::ostream& operator<<(std::ostream& os, const IdTriple& triple) {
os << "IdTriple(";
std::ranges::copy(triple.ids_, std::ostream_iterator<Id>(os, ", "));
std::ranges::copy(triple.payload_, std::ostream_iterator<Id>(os, ", "));
os << ")";
return os;
}

// TODO: default once we drop clang16 with libc++16
std::strong_ordering operator<=>(const IdTriple& other) const {
return std::tie(ids_[0], ids_[1], ids_[2]) <=>
std::tie(other.ids_[0], other.ids_[1], other.ids_[2]);
}
bool operator==(const IdTriple& other) const = default;

template <typename H>
friend H AbslHashValue(H h, const IdTriple& c) {
return H::combine(std::move(h), c.ids_, c.payload_);
}

// Permutes the ID of this triple according to the given permutation given by
// its keyOrder.
IdTriple<N> permute(const std::array<size_t, 3>& keyOrder) const {
std::array<Id, 3> newIds{ids_[keyOrder[0]], ids_[keyOrder[1]],
ids_[keyOrder[2]]};
if constexpr (N == 0) {
return IdTriple<N>(newIds);
} else {
return IdTriple<N>(newIds, payload_);
}
}

CompressedBlockMetadata::PermutedTriple toPermutedTriple() const
requires(N == 0) {
return {ids_[0], ids_[1], ids_[2]};
}
};
2 changes: 1 addition & 1 deletion src/index/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ add_subdirectory(vocabulary)
add_library(index
Index.cpp IndexImpl.cpp IndexImpl.Text.cpp
Vocabulary.cpp VocabularyOnDisk.cpp
Permutation.cpp TextMetaData.cpp
LocatedTriples.cpp Permutation.cpp TextMetaData.cpp
DocsDB.cpp FTSAlgorithms.cpp
PrefixHeuristic.cpp CompressedRelation.cpp
PatternCreator.cpp)
Expand Down
245 changes: 245 additions & 0 deletions src/index/LocatedTriples.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
// Copyright 2023 - 2024, University of Freiburg
// Chair of Algorithms and Data Structures
// Authors:
// 2023 Hannah Bast <bast@cs.uni-freiburg.de>
// 2024 Julian Mundhahs <mundhahj@tf.uni-freiburg.de>

#include "index/LocatedTriples.h"

#include <algorithm>

#include "absl/strings/str_join.h"
#include "index/CompressedRelation.h"
#include "util/ChunkedForLoop.h"

// ____________________________________________________________________________
std::vector<LocatedTriple> LocatedTriple::locateTriplesInPermutation(
std::span<const IdTriple<0>> triples,
std::span<const CompressedBlockMetadata> blockMetadata,
const std::array<size_t, 3>& keyOrder, bool shouldExist,
ad_utility::SharedCancellationHandle cancellationHandle) {
std::vector<LocatedTriple> out;
out.reserve(triples.size());
ad_utility::chunkedForLoop<10'000>(
0, triples.size(),
[&triples, &out, &blockMetadata, &keyOrder, &shouldExist](size_t i) {
auto triple = triples[i].permute(keyOrder);
// A triple belongs to the first block that contains at least one triple
// that larger than or equal to the triple. See `LocatedTriples.h` for a
// discussion of the corner cases.
size_t blockIndex =
std::ranges::lower_bound(blockMetadata, triple.toPermutedTriple(),
std::less<>{},
&CompressedBlockMetadata::lastTriple_) -
blockMetadata.begin();
out.emplace_back(blockIndex, triple, shouldExist);
},
[&cancellationHandle]() { cancellationHandle->throwIfCancelled(); });

return out;
}

// ____________________________________________________________________________
bool LocatedTriplesPerBlock::hasUpdates(size_t blockIndex) const {
return map_.contains(blockIndex);
}

// ____________________________________________________________________________
NumAddedAndDeleted LocatedTriplesPerBlock::numTriples(size_t blockIndex) const {
// If no located triples for `blockIndex_` exist, there is no entry in `map_`.
if (!hasUpdates(blockIndex)) {
return {0, 0};
}

auto blockUpdateTriples = map_.at(blockIndex);
size_t countInserts = std::ranges::count_if(
blockUpdateTriples, &LocatedTriple::shouldTripleExist_);
return {countInserts, blockUpdateTriples.size() - countInserts};
}

// ____________________________________________________________________________
// Collect the relevant entries of a LocatedTriple into a triple.
template <size_t numIndexColumns>
Qup42 marked this conversation as resolved.
Show resolved Hide resolved
requires(numIndexColumns >= 1 && numIndexColumns <= 3)
auto tieIdTableRow(auto& row) {
return [&row]<size_t... I>(std::index_sequence<I...>) {
return std::tie(row[I]...);
}(std::make_index_sequence<numIndexColumns>{});
}

// ____________________________________________________________________________
// Collect the relevant entries of a LocatedTriple into a triple.
template <size_t numIndexColumns>
requires(numIndexColumns >= 1 && numIndexColumns <= 3)
auto tieLocatedTriple(auto& lt) {
auto& ids = lt->triple_.ids_;
return [&ids]<size_t... I>(std::index_sequence<I...>) {
return std::tie(ids[3 - numIndexColumns + I]...);
}(std::make_index_sequence<numIndexColumns>{});
}

// ____________________________________________________________________________
template <size_t numIndexColumns>
IdTable LocatedTriplesPerBlock::mergeTriplesImpl(size_t blockIndex,
const IdTable& block) const {
// This method should only be called if there are located triples in the
// specified block.
AD_CONTRACT_CHECK(map_.contains(blockIndex));

AD_CONTRACT_CHECK(numIndexColumns <= block.numColumns());

auto numInsertsAndDeletes = numTriples(blockIndex);
IdTable result{block.numColumns(), block.getAllocator()};
result.resize(block.numRows() + numInsertsAndDeletes.numAdded_);

const auto& locatedTriples = map_.at(blockIndex);

auto lessThan = [](const auto& lt, const auto& row) {
return tieLocatedTriple<numIndexColumns>(lt) <
tieIdTableRow<numIndexColumns>(row);
};
auto equal = [](const auto& lt, const auto& row) {
return tieLocatedTriple<numIndexColumns>(lt) ==
tieIdTableRow<numIndexColumns>(row);
};

auto rowIt = block.begin();
auto locatedTripleIt = locatedTriples.begin();
auto resultIt = result.begin();

auto writeTripleToResult = [&result, &resultIt](auto& locatedTriple) {
for (size_t i = 0; i < numIndexColumns; i++) {
(*resultIt)[i] = locatedTriple.triple_.ids_[3 - numIndexColumns + i];
}
// Write UNDEF to any additional columns.
for (size_t i = numIndexColumns; i < result.numColumns(); i++) {
(*resultIt)[i] = ValueId::makeUndefined();
}
resultIt++;
};

while (rowIt != block.end() && locatedTripleIt != locatedTriples.end()) {
if (lessThan(locatedTripleIt, *rowIt)) {
if (locatedTripleIt->shouldTripleExist_) {
// Insertion of a non-existent triple.
writeTripleToResult(*locatedTripleIt);
}
locatedTripleIt++;
} else if (equal(locatedTripleIt, *rowIt)) {
if (!locatedTripleIt->shouldTripleExist_) {
// Deletion of an existing triple.
rowIt++;
}
locatedTripleIt++;
} else {
// The rowIt is not deleted - copy it
*resultIt++ = *rowIt++;
}
}

if (locatedTripleIt != locatedTriples.end()) {
AD_CORRECTNESS_CHECK(rowIt == block.end());
std::ranges::for_each(
std::ranges::subrange(locatedTripleIt, locatedTriples.end()) |
std::views::filter(&LocatedTriple::shouldTripleExist_),
writeTripleToResult);
}
if (rowIt != block.end()) {
AD_CORRECTNESS_CHECK(locatedTripleIt == locatedTriples.end());
while (rowIt != block.end()) {
*resultIt++ = *rowIt++;
}
}

result.resize(resultIt - result.begin());
return result;
}

Qup42 marked this conversation as resolved.
Show resolved Hide resolved
// ____________________________________________________________________________
IdTable LocatedTriplesPerBlock::mergeTriples(size_t blockIndex,
const IdTable& block,
size_t numIndexColumns) const {
if (numIndexColumns == 3) {
return mergeTriplesImpl<3>(blockIndex, block);
} else if (numIndexColumns == 2) {
return mergeTriplesImpl<2>(blockIndex, block);
} else {
AD_CORRECTNESS_CHECK(numIndexColumns == 1);
return mergeTriplesImpl<1>(blockIndex, block);
}
}

// ____________________________________________________________________________
std::vector<LocatedTriples::iterator> LocatedTriplesPerBlock::add(
std::span<const LocatedTriple> locatedTriples) {
std::vector<LocatedTriples::iterator> handles;
handles.reserve(locatedTriples.size());
for (auto triple : locatedTriples) {
LocatedTriples& locatedTriplesInBlock = map_[triple.blockIndex_];
auto [handle, wasInserted] = locatedTriplesInBlock.emplace(triple);
AD_CORRECTNESS_CHECK(wasInserted == true);
Qup42 marked this conversation as resolved.
Show resolved Hide resolved
AD_CORRECTNESS_CHECK(handle != locatedTriplesInBlock.end());
++numTriples_;
handles.emplace_back(handle);
}

updateAugmentedMetadata();

return handles;
}

// ____________________________________________________________________________
void LocatedTriplesPerBlock::erase(size_t blockIndex,
LocatedTriples::iterator iter) {
auto blockIter = map_.find(blockIndex);
AD_CONTRACT_CHECK(blockIter != map_.end(), "Block ", blockIndex,
" is not contained.");
auto& block = blockIter->second;
block.erase(iter);
numTriples_--;
if (block.empty()) {
map_.erase(blockIndex);
}
updateAugmentedMetadata();
}

// ____________________________________________________________________________
void LocatedTriplesPerBlock::setOriginalMetadata(
std::vector<CompressedBlockMetadata> metadata) {
originalMetadata_ = std::move(metadata);
updateAugmentedMetadata();
}

// ____________________________________________________________________________
void LocatedTriplesPerBlock::updateAugmentedMetadata() {
// TODO<C++23> use view::enumerate
size_t blockIndex = 0;
// Copy to preserve originalMetadata_.
augmentedMetadata_ = originalMetadata_;
for (auto& blockMetadata : augmentedMetadata_.value()) {
if (hasUpdates(blockIndex)) {
const auto& blockUpdates = map_.at(blockIndex);
blockMetadata.firstTriple_ =
std::min(blockMetadata.firstTriple_,
blockUpdates.begin()->triple_.toPermutedTriple());
blockMetadata.lastTriple_ =
std::max(blockMetadata.lastTriple_,
blockUpdates.rbegin()->triple_.toPermutedTriple());
}
blockIndex++;
}
}

// ____________________________________________________________________________
std::ostream& operator<<(std::ostream& os, const LocatedTriples& lts) {
os << "{ ";
std::ranges::copy(lts, std::ostream_iterator<LocatedTriple>(os, " "));
os << "}";
return os;
}

// ____________________________________________________________________________
std::ostream& operator<<(std::ostream& os, const std::vector<IdTriple<0>>& v) {
std::ranges::copy(v, std::ostream_iterator<IdTriple<0>>(os, ", "));
return os;
}
Loading
Loading