Skip to content

Commit

Permalink
Exported ZetaSQL changes.
Browse files Browse the repository at this point in the history
- Added the support for [SQL pipe syntax](https://research.google/pubs/pub1005959/)
- Improved the `execute_query` with an interactive web UI and more functionality.
- Added new and improved SQL language features.
- Improved documentation.

GitOrigin-RevId: 88446a33c3a4498dab3f5cf2a1fe92c9f56d9723
Change-Id: Ia8ba13c3131dfc37b8ca9c60c9fd92752eac941d
  • Loading branch information
ZetaSQL Team authored and KimiWaRokkuWoKikanai committed Aug 15, 2024
1 parent f6df697 commit f30c319
Show file tree
Hide file tree
Showing 554 changed files with 95,737 additions and 16,427 deletions.
6 changes: 6 additions & 0 deletions .bazelrc
Original file line number Diff line number Diff line change
Expand Up @@ -51,3 +51,9 @@ build:g++ --cxxopt=-Wno-class-memaccess
build:g++ --cxxopt=-Wno-deprecated-declarations
# For string_fortified
build:g++ --cxxopt=-Wno-stringop-truncation

# C++17 is required to build ZetaSQL, hence `-cxxopt=-std=c++17`. On MacOS
# `--host_cxxopt=-std=c++17` is also needed.
build --cxxopt=-std=c++17 --host_cxxopt=-std=c++17
run --cxxopt=-std=c++17 --host_cxxopt=-std=c++17
test --cxxopt=-std=c++17 --host_cxxopt=-std=c++17
2 changes: 1 addition & 1 deletion .bazelversion
Original file line number Diff line number Diff line change
@@ -1 +1 @@
6.2.0
6.5.0
25 changes: 23 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ RUN apt-get update && apt-get -qq install -y default-jre default-jdk
RUN apt-get update && apt-get -qq install curl tar build-essential wget \
python python3 zip unzip

ENV BAZEL_VERSION=6.2.0
ENV BAZEL_VERSION=6.5.0

# Install bazel from source
RUN mkdir -p bazel && \
Expand All @@ -38,10 +38,31 @@ RUN add-apt-repository ppa:ubuntu-toolchain-r/test && \
--slave /usr/bin/g++ g++ /usr/bin/g++-11 && \
update-alternatives --set gcc /usr/bin/gcc-11


# To support fileNames with non-ascii characters
RUN apt-get -qq install locales && locale-gen en_US.UTF-8
ENV LANG en_US.UTF-8

COPY . /zetasql

# Create a new user zetasql to avoid running as root.
RUN useradd -ms /bin/bash zetasql
RUN chown -R zetasql:zetasql /zetasql
USER zetasql

ENV BAZEL_ARGS="--config=g++"

# Pre-build the binary for execute_query so that users can try out zetasql
# directly. Users can modify the target in the docker file or enter the
# container and build other targets as needed.
RUN cd zetasql && \
CC=/usr/bin/gcc CXX=/usr/bin/g++ \
bazel build ${BAZEL_ARGS} ...
bazel build ${BAZEL_ARGS} -c opt //zetasql/tools/execute_query:execute_query

# Create a shortcut for execute_query.
ENV HOME=/home/zetasql
RUN mkdir -p $HOME/bin
RUN ln -s /zetasql/bazel-bin/zetasql/tools/execute_query/execute_query $HOME/bin/execute_query
ENV PATH=$PATH:$HOME/bin

WORKDIR /zetasql
221 changes: 170 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,79 +1,198 @@
## ZetaSQL - Analyzer Framework for SQL

ZetaSQL defines a language (grammar, types, data model, and semantics) as well
as a parser and analyzer. It is not itself a database or query engine. Instead
it is intended to be used by multiple engines wanting to provide consistent
behavior for all semantic analysis, name resolution, type checking, implicit
casting, etc. Specific query engines may not implement all features in the
ZetaSQL language and may give errors if specific features are not supported. For
example, engine A may not support any updates and engine B may not support
analytic functions.

[ZetaSQL Language Guide](docs/README.md)

[ZetaSQL ResolvedAST API](docs/resolved_ast.md)

[ZetaSQL BigQuery Analysis Example](https://github.com/GoogleCloudPlatform/professional-services/tree/main/tools/zetasql-helper)

## Status of Project and Roadmap

This codebase is being open sourced in multiple phases:

1. Parser and Analyzer **Complete**
2. Reference Implementation **In Progress**
- Base capability **Complete**
- Function library **In Progress**
3. Compliance Tests **Complete**
- includes framework for validating compliance of arbitrary engines
4. Misc tooling
- Improved Formatter **Complete**
ZetaSQL defines a SQL language (grammar, types, data model, semantics, and
function library) and
implements parsing and analysis for that language as a reusable component.
ZetaSQL is not itself a database or query engine. Instead,
it's intended to be used by multiple engines, to provide consistent
language and behavior (name resolution, type checking, implicit
casting, etc.). Specific query engines may implement a subset of features,
giving errors for unuspported features.
ZetaSQL's compliance test suite can be used to validate query engine
implementations are correct and consistent.

ZetaSQL implements the ZetaSQL language, which is used across several of
Google's SQL products, both publicly and internally, including BigQuery,
Spanner, F1, BigTable, Dremel, Procella, and others.

ZetaSQL and ZetaSQL have been described in these publications:

* (CDMS 2022) [ZetaSQL: A SQL Language as a Component](https://cdmsworkshop.github.io/2022/Slides/Fri_C2.5_DavidWilhite.pptx) (Slides)
* (SIGMOD 2017) [Spanner: Becoming a SQL System](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46103.pdf) -- See section 6.
* (VLDB 2024) [SQL Has Problems. We Can Fix Them: Pipe Syntax in SQL](https://research.google/pubs/pub1005959/) -- Describes ZetaSQL's new pipe query syntax.

Some other documentation:

* [ZetaSQL Language Reference](docs/README.md)
* [ZetaSQL Resolved AST](docs/resolved_ast.md), documenting the intermediate representation produced by the ZetaSQL analyzer.
* [ZetaSQL Toolkit](https://github.com/GoogleCloudPlatform/zetasql-toolkit), a project using ZetaSQL to analyze and understand queries against BigQuery, and other ZetaSQL engines.

## Project Overview

The main components and APIs are in these directories under `zetasql/`:

* `zetasql/public`: Most public APIs are here.
* `zetasql/resolved_ast`: Defines the [Resolved AST](docs/resolved_ast.md), which the analyzer produces.
* `zetasql/parser`: The grammar and parser implementation. (Semi-public, since the parse trees are not a stable API.)
* `zetasql/analyzer`: The internal implementation of query analysis.
* `zetasql/reference_impl`: The reference implementation for executing queries.
* `zetasql/compliance`: Compliance test framework and compliance tests.
* `zetasql/public/functions`: Function implementations for engines to use.
* `zetasql/tools/execute_query`: Interactive query execution for debugging.
* `zetasql/java/com/google/zetasql`: Java APIs, implemented by calling a local RPC server.

Multiplatform support is planned for the following platforms:

- Linux (Ubuntu 20.04 is our reference platform, but others may work).
- gcc-9+ is required, recent versions of clang may work.
- MacOS (Experimental)
- Windows (version TDB)

We do not provide any guarantees of API stability and *cannot accept
contributions*.

## Running Queries with `execute_query`

The `execute_query` tool can parse, analyze and run SQL
queries using the reference implementation.

See [Execute Query](execute_query.md) for more details on using the tool.

You can run it using binaries from
[Releases](https://github.com/google/zetasql/releases), or build it using the
instructions below.

There are some runnable example queries in
[tpch examples](../zetasql/examples/tpch/README.md).

### Getting and Running `execute_query`
#### Pre-built Binaries

ZetaSQL provides pre-built binaries for `execute_query` for Linux and MacOS on
the [Releases](https://github.com/google/zetasql/releases) page. You can run
the downloaded binary like:

```bash
./execute_query_linux --web
```

Note the prebuilt binaries require GCC-9+ and tzdata. If you run into dependency
issues, you can try running `execute_query` with Docker. See the
[Run with Docker](#run-with-docker) section.

#### Running from a bazel build

You can build `execute_query` with Bazel from source and run it by:

```bash
bazel run zetasql/tools/execute_query:execute_query -- --web
```

#### Run with Docker

You can run `execute_query` using Docker. First download the pre-built Docker
image `zetasql` or build your own from Dockerfile. See the instructions in the
[Build With Docker](#build-with-docker) section.

Assuming your Docker image name is MyZetaSQLImage, run:

## Flags
ZetaSQL uses the Abseil [Flags](https://abseil.io/blog/20190509-flags) library
to handle commandline flags. Unless otherwise documented, all flags are for
debugging purposes only and may change, stop working or be removed at any time.
```bash
sudo docker run --init -it -h=$(hostname) -p 8080:8080 MyZetasqlImage execute_query --web
```

Argument descriptions:

* `--init`: Allows `execute_query` to handle signals properly.
* `-it`: Runs the container in interactive mode.
* `-h=$(hostname)`: Makes the hostname of the container the same as that of the
host.
* `-p 8080:8080`: Sets up port forwarding.

`-h=$(hostname)` and `-p 8080:8080` together make the URL address of the
web server accessible from the host machine.

Alternatively, you can run this to start a bash shell, and then run
`execute_query` inside:

```bash
sudo docker run --init -it -h=$(hostname) -p 8080:8080 MyZetasqlImage

# Inside the container bash shell
execute_query --web
```

## How to Build

ZetaSQL uses [bazel](https://bazel.build) for building and dependency
resolution. After installing bazel (check .bazelversion for the specific version
of bazel we test with, but other versions may work), simply run:
### Build with Bazel

ZetaSQL uses [Bazel](https://bazel.build) for building and dependency
resolution. Instructions for installing Bazel can be found in
https://bazel.build/install. The Bazel version that ZetaSQL uses is specified in
the `.bazelversion` file.

Besides Bazel, the following dependencies are also needed:

* GCC-9+ or equivalent Clang
* tzdata

`tzdata` provides the support for time zone information. It is generally
available on MacOS. If you run Linux and it is not pre-installed, you can
install it with `apt-get install tzdata`.

Once the dependencies are installed, you can build or run ZetaSQL targets as
needed, for example:

```bash
# Build everything.
bazel build ...

# Build and run the execute_query tool.
bazel run //zetasql/tools/execute_query:execute_query -- --web

# The built binary can be found under bazel-bin and run directly.
bazel-bin/tools/execute_query:execute_query --web

# Build and run a test.
bazel test //zetasql/parser:parser_set_test
```

Some Mac users may experience build issues due to the Python error
`ModuleNotFoundError: no module named 'google.protobuf'`. To resolve it, run
`pip install protobuf==<version>` to install python protobuf. The protobuf
version can be found in the `zetasql_deps_step_2.bzl` file.

### Build with Docker

ZetaSQL also provides a `Dockerfile` which configures all the dependencies so
that users can build ZetaSQL more easily across different platforms.

To build the Docker image locally (called MyZetaSQLImage here), run:

```bash
sudo docker build . -t MyZetaSQLImage -f Dockerfile
```

```bazel build ...```
Alternatively, ZetaSQL provides pre-built Docker images named `zetasql`. See the
[Releases](https://github.com/google/zetasql/releases) page. You can load the
downloaded image by:

If your Mac build fails due the python error
`ModuleNotFoundError: no module named 'google.protobuf'`, run
`pip install protobuf==<version>` to install python protobuf first. The
protobuf version can be found in the zetasql_deps_step_2.bzl file.
```bash
sudo docker load -i /path/to/the/downloaded/zetasql_docker.tar
```

## How to add as a Dependency in bazel
See the (WORKSPACE) file, as it is a little unusual.
To run builds or other commands inside the Docker environment, run this command
to open a bash shell inside the container:

### With docker
TODO: Add docker build instructions.
```bash
# Start a bash shell running inside the Docker container.
sudo docker run -it MyZetaSQLImage
```

## Example Usage
A very basic command line tool is available to run simple queries with the
reference implementation:
```bazel run //zetasql/tools/execute_query:execute_query -- "select 1 + 1;"```
Then you can run the commands from the [Build with Bazel](#build-with-bazel)
section above.

The reference implementation is not yet completely released and currently
supports only a subset of functions and types.

## Differential Privacy
For questions, documentation and examples of ZetaSQLs implementation of
For questions, documentation, and examples of ZetaSQL's implementation of
Differential Privacy, please check out
(https://github.com/google/differential-privacy).

Expand Down
53 changes: 53 additions & 0 deletions bazel/boost.BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

licenses(["notice"]) # Apache v2.0


load("@rules_foreign_cc//foreign_cc:defs.bzl", "boost_build")

filegroup(
name = "all_srcs",
srcs = glob(["**"]),
visibility = ["//visibility:private"],
)

boost_build(
name = "boost",
bootstrap_options = ["--without-icu"],
lib_source = ":all_srcs",
out_static_libs = select({
"//conditions:default": [
"libboost_atomic.a",
"libboost_filesystem.a",
"libboost_program_options.a",
"libboost_regex.a",
"libboost_system.a",
"libboost_thread.a",
],
}),
user_options = [
"-j4",
"--with-filesystem",
"--with-program_options",
"--with-regex",
"--with-system",
"--with-thread",
"variant=release",
"link=static",
"threading=multi",
],
visibility = ["//visibility:public"],
)
Loading

0 comments on commit f30c319

Please sign in to comment.