Search for large data sources for performance profiling #975
aaronsteers
started this conversation in
Ideas
Replies: 2 comments 7 replies
-
Something in 20 GB range, show extraction on record by record, SDK-based batching, native batching on both. Seems like a good resource https://www.reddit.com/r/bigquery/wiki/datasets |
Beta Was this translation helpful? Give feedback.
5 replies
-
Would getting good metrics logging in the SDK help with this? I imagine it would at least help to compare record-by-record vs batch by looking at a record count timeseries in e.g. Prometheus. For example, backpressure would become apparent. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Wanted to open this discussion to locate large data sources for performance benchmarking.
Related to:
A few known options to kick off the discussion:
tap-socrata
to access one ore more large datasets in the public domain.tap-smoke-test
to fabricate large randomized datasets on-demand.Beta Was this translation helpful? Give feedback.
All reactions