Skip to content

Commit

Permalink
Add BigQueryReader.to_recap
Browse files Browse the repository at this point in the history
Recap can now convert BigQuery table schemas to Recap StructTypes.

Some notes:

1. Nested types are supported (RECORD, STRUCT)
2. Repeated/ARRAYs are supported
3. JSON types are treated as STRING
4. STRING and BYTES have a max length of 65KiB by default

For (4), I had a really hard time nailing down the exact max string size in BQ.
[This](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types)
page says they have a 2 byte header on strings/bytes, which I assume is to store
the byte length. Thus I went with 65KiB.

I'm also using [bigquery-emulator](https://github.com/goccy/bigquery-emulator)
to test. I've only included integration tests.

Closes #285
  • Loading branch information
criccomini committed Jul 19, 2023
1 parent 24c97c4 commit 3eed5df
Show file tree
Hide file tree
Showing 5 changed files with 769 additions and 4 deletions.
13 changes: 11 additions & 2 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ jobs:
- name: Test with pytest
run: |
source venv/bin/activate
pdm run pytest tests/unit
pdm run pytest tests/unit -vv
integration-tests:
runs-on: ubuntu-latest
Expand All @@ -74,6 +74,15 @@ jobs:
--health-timeout 5s
--health-retries 5
# TODO Waiting on https://github.com/goccy/bigquery-emulator/pull/208
bigquery:
image: ghcr.io/criccomini/bigquery-emulator:0.4.3-envvar
env:
BIGQUERY_EMULATOR_PROJECT: test_project
BIGQUERY_EMULATOR_DATASET: test_dataset
ports:
- 9050:9050

zookeeper:
image: confluentinc/cp-zookeeper:latest
ports:
Expand Down Expand Up @@ -142,4 +151,4 @@ jobs:
- name: Test with pytest
run: |
source venv/bin/activate
pdm run pytest tests/integration
pdm run pytest tests/integration -vv
Loading

0 comments on commit 3eed5df

Please sign in to comment.