You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Together these pages describe the syntax of the SAMPLE and TABLESAMPLE clauses in detail.
However, it does not precisely explain where the SAMPLE or TABLESAMPLE clause fits syntactically inside the query as a whole.
The sample clause is applied right after anything in the FROM clause (i.e., after any joins, but before the WHERE clause or any aggregates).
But from what I understand now, the term "applied" does not refer to the syntactical placement but to the effect of its operation.
I think the documentation would be improved if it would more explicitly explain whether it is talking about the order of operation vs the syntactical position.
The TABLESAMPLE and USING SAMPLE clauses are identical in terms of syntax and effect, with one important difference: tablesamples sample directly from the table for which they are specified, whereas the sample clause samples after the entire from clause has been resolved.
Here, just like in the earlier explanation of the effect of the SAMPLE clause, this text seems to be more about the order of operations than about the syntactical position. In retrospect I understand now that the term "identical in terms of syntax" is about the internal syntax of the clause itself, and not about the syntactical position in the statement as a whole. I think the documentation would be improved again by making that distinction explicit.
Proposed changes:
From what I gather now, it looks like the syntactical placement of the SAMPLE-clause is after the GROUP BY-clause, but before the ORDER BY clause. I think this should be stated explicitly right before or right after the syntax graph of the clause itself.
For TABLESAMPLE, the placement is after the table expression to which it applies. If a table alias is present, the TABLESAMPLE-clause should follow after the alias. I think it would be good to state this explicitly when describing the TABLESAMPLE clause.
Right now the documentation seems to use the term "SAMPLE-clause" throughout, and its not always clear whether that refers to USING SAMPLE, TABLESAMPLE, or both. I think it would be an improvement to explicitly talk about either the USING SAMPLE-clause, or the TABLESAMPLE-clause. In those cases where you just in general want to talk about the effect of such clauses, the uncapitalized term "sampling" or "sampling operation" could be used.
The syntax diagram in the FROM-clause documentation (https://duckdb.org/docs/sql/query_syntax/from) explicitly describes the position of the TABLESAMPLE clause, but it shows it only for the first table expression in the FROM-clause, not for any Joined table expressions.
The effect of REPEATABLE sub-clause and the significance of the seed option appear not to be documented. There are a few examples of how it is used in the general feature overview page, but there is no explanation of how it works, what types of values are appropriate for the seed parameter, and how the choice of the seed parameter affects the result.
This is in relation to the issueI filed earlier: duckdb/duckdb#14773
I have read both the SQL Query syntax page on the SAMPLE clause (https://duckdb.org/docs/sql/query_syntax/sample.html) as well as the general feature overview page on samples (https://duckdb.org/docs/sql/samples.html).
Together these pages describe the syntax of the SAMPLE and TABLESAMPLE clauses in detail.
However, it does not precisely explain where the SAMPLE or TABLESAMPLE clause fits syntactically inside the query as a whole.
At first, I thought that this piece of text in the opening paragraph from https://duckdb.org/docs/sql/query_syntax/sample explained that:
But from what I understand now, the term "applied" does not refer to the syntactical placement but to the effect of its operation.
I think the documentation would be improved if it would more explicitly explain whether it is talking about the order of operation vs the syntactical position.
There is currently some text in https://duckdb.org/docs/sql/samples.html#table-samples, that reads:
Here, just like in the earlier explanation of the effect of the SAMPLE clause, this text seems to be more about the order of operations than about the syntactical position. In retrospect I understand now that the term "identical in terms of syntax" is about the internal syntax of the clause itself, and not about the syntactical position in the statement as a whole. I think the documentation would be improved again by making that distinction explicit.
Proposed changes:
From what I gather now, it looks like the syntactical placement of the SAMPLE-clause is after the GROUP BY-clause, but before the ORDER BY clause. I think this should be stated explicitly right before or right after the syntax graph of the clause itself.
For TABLESAMPLE, the placement is after the table expression to which it applies. If a table alias is present, the TABLESAMPLE-clause should follow after the alias. I think it would be good to state this explicitly when describing the TABLESAMPLE clause.
Right now the documentation seems to use the term "SAMPLE-clause" throughout, and its not always clear whether that refers to USING SAMPLE, TABLESAMPLE, or both. I think it would be an improvement to explicitly talk about either the USING SAMPLE-clause, or the TABLESAMPLE-clause. In those cases where you just in general want to talk about the effect of such clauses, the uncapitalized term "sampling" or "sampling operation" could be used.
The syntax diagram in the FROM-clause documentation (https://duckdb.org/docs/sql/query_syntax/from) explicitly describes the position of the TABLESAMPLE clause, but it shows it only for the first table expression in the FROM-clause, not for any Joined table expressions.
The effect of
REPEATABLE
sub-clause and the significance of theseed
option appear not to be documented. There are a few examples of how it is used in the general feature overview page, but there is no explanation of how it works, what types of values are appropriate for the seed parameter, and how the choice of the seed parameter affects the result.Page URL: https://duckdb.org/docs/sql/query_syntax/sample.html and https://duckdb.org/docs/sql/samples.html
The text was updated successfully, but these errors were encountered: