Add Path Search Feature to Qlever #1335

JoBuRo · 2024-04-30T11:16:50Z

This PR will add a feature for performing path-search in Qlever. It can be called via Federated Query. Currently, there are two different algorithms for computing paths: one for the computation of all paths and one to compute all shortest paths. The PathSearch is designed in such a way, that adding more algorithms later is straightforward.

JoBuRo · 2024-04-30T11:20:12Z

Current TODOs:

Implement federated query parsing for path search
Improve estimates of the PathSearch operation
Check if building the ResultTable can be improved
Refactor PathSearch into smaller classes

codecov · 2024-04-30T12:20:00Z

Codecov Report

Attention: Patch coverage is 83.36842% with 79 lines in your changes missing coverage. Please review.

Project coverage is 89.01%. Comparing base (14d6e1c) to head (1c209fe).

Files	Patch %	Lines
src/engine/PathSearch.cpp	82.48%	42 Missing and 6 partials ⚠️
src/parser/GraphPatternOperation.cpp	82.07%	13 Missing and 6 partials ⚠️
src/parser/sparqlParser/SparqlQleverVisitor.cpp	80.00%	5 Missing and 1 partial ⚠️
src/engine/QueryPlanner.cpp	93.87%	2 Missing and 1 partial ⚠️
src/engine/CheckUsePatternTrick.cpp	0.00%	2 Missing ⚠️
src/engine/PathSearch.h	92.85%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1335      +/-   ##
==========================================
- Coverage   89.06%   89.01%   -0.05%     
==========================================
  Files         328      330       +2     
  Lines       29294    29763     +469     
  Branches     3262     3335      +73     
==========================================
+ Hits        26090    26494     +404     
- Misses       2054     2108      +54     
- Partials     1150     1161      +11

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sonarcloud · 2024-04-30T12:21:42Z

Quality Gate passed

Issues
17 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

This reverts commit d700df2.

joka921

A first round of comments for the parsing/planning stuff.

I still have to dig through the more complicated boost-graph-stuff.

src/parser/GraphPatternOperation.cpp

joka921 · 2024-06-26T14:19:42Z

src/parser/GraphPatternOperation.cpp

+ auto simpleTriple = triple.getSimple();
+ TripleComponent predicate = simpleTriple.p_;
+ TripleComponent object = simpleTriple.o_;
+ AD_CORRECTNESS_CHECK(predicate.isIri());


This should be cleaned up.

Store the result of toStringRepresentation to get rid of the redundant code.

make a lambda setVariable so that you can do something like

if (iriString.ends_with("start") { setVariable(start);); // object is captured by the lambda. 3. You should handle the case, that there are duplicate triples (e.g. something you want to set is not nullopt anymore etc). 4. You shouldn't do `AD_...CHECK` here, as this is not a programming error if it happens, but may happen in the query. You should throw proper exceptions with proper strings.

I'm unsure how to implement the lambda you suggest. Assuming the signature of the lambda is something like void setVariable(Variable var), I don't see how the lambda can choose to set the correct member of the PathQuery. The only two options I see is passing a pointer to the member such as start_ or using a map, such that variable can be set by parameter name (map[param] = object.getVariable()). I don't think the amount of duplication justifies either of these solutions.
I'm also aware that I could capture the PathQuery (this), but I'm not sure how that would help here.

I am aware that this is a lot of duplicate code. I implemented a lambda similar to what you suggested to get rid of some the duplication. While I was at it, I also improved the error handling.

src/parser/GraphPatternOperation.cpp

src/parser/sparqlParser/SparqlQleverVisitor.cpp

src/parser/GraphPatternOperation.h

src/engine/QueryPlanner.cpp

joka921 · 2024-06-26T14:37:23Z

src/engine/PathSearch.h

+ * @param endNodes A span of end nodes.
+ * @param edgePropertyLists A span of edge property lists.
+ */
+ void buildGraph(std::span<const Id> startNodes, std::span<const Id> endNodes,


Is there a good reason why this doesn't simply return the graph?

sonarcloud · 2024-07-11T01:49:47Z

Quality Gate passed

Issues
15 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

joka921

A thorough pass on everything but the tests.

joka921 · 2024-07-11T07:54:11Z

e2e/scientists_queries.yaml

+ # - query: path_search_shortest_paths
+ # type: no-text
+ # sparql: |
+ # PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>
+ # SELECT * WHERE {
+ # SERVICE pathSearch: {
+ # pathSearch: pathSearch:algorithm pathSearch:shortestPaths;
+ # pathSearch:source <Mary_Ann_Leeper>;
+ # pathSearch:target <Literature_Subject>;
+ # pathSearch:pathColumn ?path;
+ # pathSearch:edgeColumn ?edge;
+ # pathSearch:start ?start;
+ # pathSearch:end ?end;
+ # {SELECT * WHERE {
+ # ?start <is-a> ?end
+ # }}
+ # }
+ # }
+ # checks:
+ # - num_rows: 4
+ # - num_cols: 4
+ # - selected: ["?path", "?edge", "?start", "?end"]
+ # - contains_row: [0, 0, "<Mary_Ann_Leeper>", "<Biochemist>"] 
+ # - contains_row: [0, 1, "<Biochemist>", "<Literature_Subject>"] 
+ # - contains_row: [1, 0, "<Mary_Ann_Leeper>", "<Chemist>"] 
+ # - contains_row: [1, 1, "<Chemist>", "<Literature_Subject>"] 


Delete if not needed anymore.

joka921 · 2024-07-11T07:54:41Z

e2e/scientists_queries.yaml

+ checks:
+ - num_rows: 17
+ - num_cols: 4
+ - selected: ["?path", "?edge", "?start", "?end"]


If the result is deterministic, then you can check some rows here.

joka921 · 2024-07-11T07:57:31Z

src/parser/sparqlParser/SparqlQleverVisitor.cpp

+ pathQuery.childGraphPattern_ = std::move(pattern._child);
+ } else {
+ throw parsedQuery::PathSearchException(
+ "Unsupported subquery in pathSearch."


Suggested change

"Unsupported subquery in pathSearch."

"Unsupported element in pathSearch."

joka921 · 2024-07-11T08:01:14Z

src/parser/GraphPatternOperation.h

+ void addParameter(const SparqlTriple& triple);
+ void fromBasicPattern(const BasicGraphPattern& pattern);
+ std::variant<Variable, std::vector<Id>> toSearchSide(
+ std::vector<TripleComponent> side, const Index::Vocab& vocab) const;
+ PathSearchConfiguration toPathSearchConfiguration(
+ const Index::Vocab& vocab) const;
+};


Please add some documentation....

joka921 · 2024-07-11T08:03:13Z

src/parser/GraphPatternOperation.cpp

+ auto getVariable = [](std::string parameter, const TripleComponent& object) {
+ if (!object.isVariable()) {
+ throw PathSearchException("The value " + object.toString() +
+ " for parameter '" + parameter +
+ "' has to be a variable");


pass the parameter as a string_view and then use absl::StrCat to concatenate the error message.

joka921 · 2024-07-11T09:44:12Z

src/engine/PathSearch.cpp

+ while (!currentPath.empty() && edge.start_ != currentPath.end()) {
+ currentPath.pop_back();
+ }


I think it works that you just do a single pop in the end as soon as you have no more outgoing edges that lead to unvisited nodes. Because then you know, that the next edge will be a

joka921 · 2024-07-11T09:44:31Z

src/engine/PathSearch.cpp

+ visited.insert(edge.end_.getBits());
+ }
+
+ return pathCache[source.getBits()];


I think for this function at all we should aim for the most efficient implementation thinkable.
And that is, I think, one that directly stores to the IdTable and has very simple path creations. We can talk about this again if you want.

One thing we have to think about: In the case where you have pairs of [source, target], the path cache has to be invalidated the set of targets changes. So you basically should

Okay,
I just thought of an idea:

You store the complete subtree in a deterministic and sorted way.

Then the Edge type can just be a single rowIndex into that table (or the individual spans for start and end nodes, which is the same). That way you don't really have the problem with the overhead, and the edge Properties etc, as you only have to touch all this when you actually materialize the result. This should all be really efficient (no allocations etc).

joka921 · 2024-07-11T09:45:43Z

src/engine/PathSearch.cpp

+ if (pathCache.contains(edge.end_.getBits())) {
+ for (auto subPath : pathCache[edge.end_.getBits()]) {
+ if (subPath.first() == currentPath.first()) {
+ addToCache(pathCache, currentPath, currentPath.size());
+ } else {
+ auto fullPath = currentPath.concat(subPath);
+ addToCache(pathCache, fullPath, currentPath.size());
+ }
+ }
+ continue;


You should definitely factor out the caching etc. to separate lambdas outside the core-DFS.
That way we can more easily reason about the code and refactor it if necessary.

joka921 · 2024-07-11T09:46:47Z

src/engine/PathSearch.cpp

+ }
+ for (auto source : sources) {
+ for (auto path : findPaths(source, targetSet, binSearch)) {
+ paths.push_back(path);


This implementation is the one for two separate values clauses, doing the full cartesian product of sources and targets.

joka921 · 2024-07-11T09:49:25Z

test/QueryPlannerTestHelpers.h

+ std::holds_alternative<Variable>(config.sources_)
+ ? AD_FIELD(
+ PathSearchConfiguration, sources_,
+ VariantWith<Variable>(Eq(std::get<Variable>(config.sources_))))
+ : AD_FIELD(
+ PathSearchConfiguration, sources_,
+ VariantWith<std::vector<ValueId>>(UnorderedElementsAreArray(
+ std::get<std::vector<ValueId>>(config.sources_))));


Can't this just be AD_FIELD(Config, sources_, Eq(config.sources_))
?
Same for the targetMatcher.

JoBuRo added 11 commits April 18, 2024 16:13

Added PathSearch class

2c360bf

Added test class for PathSearch

b620e53

Added new sources to CMakeLists

c8f4c28

Added boilerplate code for override

23d5eb5

First draft of path search

f59ad6b

Implemented Path Search using boost

fcff8ba

Simplified visitor, added cycle test

92826a9

Added test, fixed cycles

982708e

Added pathfinding for multiple targets

3c9345e

Added edge properties

5def5eb

Added shortest path search

fce274e

JoBuRo and others added 2 commits April 30, 2024 13:21

Merge branch 'master' into path-search

9d10dec

Fixed setTextLimit error after merge

e371542

JoBuRo added 13 commits June 19, 2024 19:40

Added PathSearch parsing

178fd14

Moved visitors to new file

f228eb6

Fixed a bug where the wrong sub columns were read

3966a69

Fixed QueryPlanner PathSearch tests

1aa8350

Added documentation to PathSearch and visitors

536e5fe

Merge branch 'master' into path-search

02380c3

Rename ResultTable to Result in PathSearch

2451fd7

Format fix

d700df2

Fix the format fix

19027e2

This reverts commit d700df2.

Added PathSearch e2e tests

ec7bfd0

Reworked AllPathsVisitor

47daa2b

Format fix

765f1aa

Sonar Fixes

da1eb3a

JoBuRo marked this pull request as ready for review June 26, 2024 07:33

joka921 reviewed Jun 26, 2024

View reviewed changes

JoBuRo added 27 commits June 30, 2024 12:19

Added multisource to PathSearch

0027d7b

format fix

fbb61a1

Added mutlisource multitarget tests

17a8b39

Added createJoinWithPathSearch

b2611c9

Added tests, finished binding logic

08da81b

Added runtime info

adf85bc

Added cancellation checks

48dbab1

Fixed CacheKey

d5b513b

Removed unneeded members

49dfaa3

Format fix

f6357a7

Simplified handleSearchSide

abdc36a

New all paths implementation

1c892a2

Format fix

ae39abc

Extracted visitPathQuery method

411ba0a

Moved PathSearchConfig creation to PathQuery method

ed03651

Sonar Fixes

818e41b

Sonar fixes

acf09c8

Added PathSearchException

385f67a

Improved error handling and path query parsing

946bda3

Added docstring for PathQuery

aec3e34

Fixed typo

de33fdd

Added tests for path search exceptions

e9def11

Merge branch 'master' into path-search

6ce1494

Improved setVariable lambda in PathQuery

eea3625

Removed shortestPaths and boost BGL

ae175ac

Simplified Edge

13494b8

Refactored DFS

1c209fe

joka921 requested changes Jul 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Path Search Feature to Qlever #1335

Add Path Search Feature to Qlever #1335

JoBuRo commented Apr 30, 2024

JoBuRo commented Apr 30, 2024

codecov bot commented Apr 30, 2024 •

edited

Loading

sonarcloud bot commented Apr 30, 2024

joka921 left a comment

joka921 Jun 26, 2024

JoBuRo Jul 9, 2024

joka921 Jun 26, 2024

sonarcloud bot commented Jul 11, 2024

joka921 left a comment

joka921 Jul 11, 2024

joka921 Jul 11, 2024

joka921 Jul 11, 2024

joka921 Jul 11, 2024

joka921 Jul 11, 2024

joka921 Jul 11, 2024

joka921 Jul 11, 2024

joka921 Jul 11, 2024

joka921 Jul 11, 2024

joka921 Jul 11, 2024

joka921 Jul 11, 2024

	"Unsupported subquery in pathSearch."
	"Unsupported element in pathSearch."

Add Path Search Feature to Qlever #1335

Are you sure you want to change the base?

Add Path Search Feature to Qlever #1335

Conversation

JoBuRo commented Apr 30, 2024

JoBuRo commented Apr 30, 2024

codecov bot commented Apr 30, 2024 • edited Loading

Codecov Report

sonarcloud bot commented Apr 30, 2024

Quality Gate passed

joka921 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarcloud bot commented Jul 11, 2024

Quality Gate passed

joka921 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Apr 30, 2024 •

edited

Loading