Releases: bxparks/bigquery-schema-generator
Releases · bxparks/bigquery-schema-generator
1.6.1 - fix amnesia during multipe type mismatches
- 1.6.1 (2024-01-12)
- Bug Fix: Prevent amnesia that causes multiple type mismatches warnings
- If a data set contains multiple records with a column which do not
match each other, then the old code would remove the corresponding
internalschema_entry
for that column, and print a warning message. - This means that subsequent records would recreate the
schema_entry
,
and a subsequent mismatch would print another warning message. - This also meant that if there was a second record after the most
recent mismatch, the script would output a schema entry for the
mismatching column, corresponding to the type of the last record which
was not marked as a mismatch. - The fix is to use a tombstone entry for the offending column, instead
of deleting theschema_entry
completely. Only a single warning
message is printed, and the column is ignored for all subsequent
records in the input data set. - See
[Issue#98](https://github.com/bxparks/bigquery-schema-generator/issues/98]
which identified this problem which seems to have existed from the
very beginning.
- If a data set contains multiple records with a column which do not
- Bug Fix: Prevent amnesia that causes multiple type mismatches warnings
1.6.0 - allow NULLABLE to convert to REPEATED; add input_format=csvdictreader
- 1.6.0 (2023-04-01)
- Allow
null
fields to convert toREPEATED
becausebq load
seems
to interpret null fields to be equivalent to an empty array[]
.
See #90. - Add
input_format='csvdictreader'
option. Similar to'dict'
but
intended to be used with thecsv.DictReader
class to read CSV and TSV
files with various options. More documentation and discussions at:
- Allow
1.5.1 - add examples; update documentation
- 1.5.1 (2022-12-04)
- Add
examples/*.py
to demonstrate how to useSchemaGenerator
as a
library. - Update README.md to state that
bq load --autodetect
uses the first
500 records. Previously, it scanned only the 100 records. - This is a maintenance release with no new features or bug fixes.
- Add
v1.5 - add --preserve_input_sort_order flag
- 1.5 (2021-11-14)
- Make the column order in the BQ schema file match the order of appearance
in the JSON data file using the--preserve_input_sort_order
flag.
Thanks to kdeggelman@ in
PR#75.
- Make the column order in the BQ schema file match the order of appearance
v1.4.1 - add documentation for input_format='dict'
- 1.4.1 (2021-08-23)
- Add documentation for the
input_format='dict'
option. - Add additional inpout format 'json' and 'dict' test cases.
- Maintenance release, no functional change in core code.
- Add documentation for the
v1.4 - input_format can be an internal Python dict; support scientific floating point numbers
- 1.4 (2020-12-09)
- Add 'dict' as a third
input_format
whenSchemaGenerator
is used as a
library. This can be useful when the data has already been transformed
into a list of native Pythondict
objects (see #58, thanks to
ZiggerZZ@). - Expand the pattern matchers for quoted integers and quoted floating point
numbers to be more compatible with the patterns recognized bybq load --autodetect
. - Add Table of Contents to READMD.md. Add usage info for the
schema_map=existing_schema_map
and theinput_format='dict'
parameters
in theSchemaGenerator()
constructor.
- Add 'dict' as a third
1.3 - support an existing schema file
1.2 - print JSON full path in error messages
- 1.2 (2020-10-27)
- Print full path of nested JSON elements in error messages (See #52;
thanks abroglesc@).
- Print full path of nested JSON elements in error messages (See #52;
1.1 - Add `--ignore_invalid_lines` flag
- 1.1 (2020-07-10)
- Add
--ignore_invalid_lines
to ignore parsing errors on invalid lines
and continue processing. Fixes
#49. - Add GitHub actions for automated tests and flake8 validation.
- Add package
__version__
string. - Update setup.py, no longer need to convert README.md markdown to RST.
- Add
1.0 - fix sanitize_name, add continuous integration
- 1.0 (2020-04-04)
- Fix
--sanitize_names
for recursive RECORD fields (Thanks riccardomc@,
see #43). - Clean up how unit tests are run, trying my best to figure out
Python's convolution package importing mechanism. - Add GitHub Actions continuous integration pipelines with flake8 checks and
automated unit testing.
- Fix