-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support standard UNION/INTERSECT syntax #248
base: master
Are you sure you want to change the base?
Conversation
Just approved to run the actions, I will review your PR in the next week. I noticed that the gcc-6 stage fails because manually cloning does not work with forked repositories, I will see if I can provide a fix for that. |
8448bd1
to
99e7d29
Compare
Thanks a lot. Since you're on it, I removed to last commit where I was trying to fix that issue, not doing a very good job at it. |
The CI stage should work now if you merge the lastest update to master. |
99e7d29
to
76d498f
Compare
Thanks. All green now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did a quick pass with some easy-to-fix things, e.g., code style. I will need some more time for an in-depth review because the changes are quite heavy. I appreciate that you kept the naming and grammar closer to the SQL standard, which should make it easier to review.
@@ -70,7 +70,7 @@ $(SRCPARSER)/flex_lexer.o: $(SRCPARSER)/flex_lexer.cpp $(SRCPARSER)/bison_parser | |||
$(CXX) $(LIB_CFLAGS) -c -o $@ $< -Wno-sign-compare -Wno-unneeded-internal-declaration -Wno-register | |||
|
|||
%.o: %.cpp $(PARSER_CPP) $(LIB_H) | |||
$(CXX) $(LIB_CFLAGS) -c -o $@ $< | |||
$(CXX) $(LIB_CFLAGS) $(CXXFLAGS) -c -o $@ $< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use additonal flags in our base repo builds. Unless there's a reason to have them here, I'd ask you to remove them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR: CFLAGS/CXXFLAGS is a Makefile convention, for example for a program
https://en.wikipedia.org/wiki/CFLAGS
https://www.gnu.org/software/make/manual/make.html#index-CFLAGS
#include <iostream>
int main() {
#ifdef FOO
std::cout << "Foo\n";
#endif
}
and, using make's default rule:
$ CXXFLAGS=-DFOO make -B cflagtest && ./cflagtest
g++ -DFOO cflagtest.cpp -o cflagtest
Foo
It's not that common to have hand-written Makefiles that call other hand-written Makefiles - you normally let a build system such as CMake or meson do that for you - but when you do, you need to make to sure you follow this convention and pass along CFLAGS/CXXFLAGS. You can of course choose your own names for them if you want, but that would be confusing IMHO.
@@ -137,7 +137,7 @@ test: $(TEST_BUILD) | |||
|
|||
$(TEST_BUILD): $(TEST_ALL) $(LIB_BUILD) | |||
@mkdir -p $(BIN)/ | |||
$(CXX) $(TEST_CFLAGS) $(TEST_CPP) -o $(TEST_BUILD) -lsqlparser -lstdc++ | |||
$(CXX) $(CXXFLAGS) $(TEST_CFLAGS) $(TEST_CPP) -o $(TEST_BUILD) -lsqlparser -lstdc++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(see above)
@@ -63,6 +63,8 @@ | |||
// %output "bison_parser.cpp" | |||
// %defines "bison_parser.h" | |||
|
|||
%expect 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.gnu.org/software/bison/manual/html_node/Expect-Decl.html
In other words, bails out with an error if there's a shift/reduce conflict.
src/parser/bison_parser.y
Outdated
$$->setOperations->back()->resultLimit = $5; | ||
$$->lockings = $6; | ||
}; | ||
select_statement: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pick here: The indentation does not match the other code. Could ypu please fix that (e.g., by calling the format.sh
script)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, the formatting script has a bug where it places a space after % in a %prec
declaration (see last commit)
$$ = $2; | ||
} | ||
|
||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
knit-picky: We either use C-style comments with an asterisk *
on each line or C++-style single line comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*nit-pick ;)
Done.
src/parser/bison_parser.y
Outdated
query_term: | ||
query_primary | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
knit-picky: we usually put short declarations on a single line. I also see no reason to have a non-terminal that only forwards to another one.
src/parser/bison_parser.y
Outdated
subquery: | ||
'(' query_expression ')' | ||
{ | ||
$$ = $2; | ||
} | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prefer the following style:
subquery: | |
'(' query_expression ')' | |
{ | |
$$ = $2; | |
} | |
; | |
subquery: '(' query_expression ')' { | |
$$ = $2; | |
}; |
(or even a one-liner)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
test/queries/queries-good.sql
Outdated
((SELECT 1)) UNION ((SELECT 1)); | ||
(((SELECT 1)) UNION ((SELECT 1))); | ||
((((SELECT 1)) UNION ((SELECT 1)))); | ||
# JOIN# JOIN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# JOIN# JOIN | |
# JOIN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, fixed.
@@ -18,7 +18,35 @@ SELECT * FROM t WHERE a = ? AND b = ?; | |||
SELECT City.name, Product.category, SUM(price) FROM fact INNER JOIN City ON fact.city_id = City.id INNER JOIN Product ON fact.product_id = Product.id GROUP BY City.name, Product.category; | |||
SELECT SUBSTR(a, 3, 5) FROM t; | |||
SELECT * FROM t WHERE a = DATE '1996-12-31'; | |||
# JOIN | |||
SELECT 1 UNION SELECT 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You added a lot of things that I would love to see tested more properly. Could you please add more complex examples with actual subqueries? I would also be interested in some edge cases that are not on the happy path, e.g., in queries-bad.sql
. I also encourage you to think of a thorough test case in select_tests.cpp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I actually broke union-in-subquery syntax. I added a new commit that addresses it.
76d498f
to
3595906
Compare
- Renamed most rules starting with select_ to the names used in the SQL standard, <query expression>, <query expression body>, <query term>, <query primary> and <subquery>, respectively. - Added %expect 0 - The previous rule for UNION/INTERSECT syntax was right-recursive, and so the parse tree is right-deep. This patch only changes the rule, you don't want right-recursion in an LALR parser, and moreover it's not standard. It doesn't change the parse tree structure, so we need an extra wrinkle to build the parse tree top-down. - Indentation follows the most common standard, found in MySQL and Postgres. Future work: INTERSECT has higher priority than UNION in the standard, but these rules give them the same priority.
refactoring, we now have to add only a single rule to support them all. Quite surprising, there are not shift/reduce conflicts, which are known to occur in both the MySQL and Postgres parsers for this construction.
syntax that worked. A union without parentheses was actually working. Now we're making the whole thing work. This introduces an ambiguity, though, so we fix that with a %prec declaration.
3595906
to
04fe8df
Compare
@dey4ss Conflicts resolved, please feel free to start the CI workflow. |
Bump. |
Hi @dinodork, we currently lack the time to review this PR properly. We would like to merge it, but not without proper reviewing and that might take some more time. Please bear with us. |
The SQL standard allows parentheses around all query expressions (SELECTs) in a UNION. The Bison parser cannot currently handle queries like
(SELECT 1) UNION (SELECT 1)
, let alone((SELECT 1) UNION (SELECT 1))
.