Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

bxparks / bigquery-schema-generator Public

Notifications You must be signed in to change notification settings
Fork 50
Star 238

Code
Issues 3
Pull requests
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Releases: bxparks/bigquery-schema-generator

Releases · bxparks/bigquery-schema-generator

1.6.1 - fix amnesia during multipe type mismatches

12 Jan 23:31

bxparks

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

1.6.1 - fix amnesia during multipe type mismatches Latest

Latest

1.6.1 (2024-01-12)
- Bug Fix: Prevent amnesia that causes multiple type mismatches warnings
  - If a data set contains multiple records with a column which do not
    match each other, then the old code would remove the corresponding
    internal schema_entry for that column, and print a warning message.
  - This means that subsequent records would recreate the schema_entry,
    and a subsequent mismatch would print another warning message.
  - This also meant that if there was a second record after the most
    recent mismatch, the script would output a schema entry for the
    mismatching column, corresponding to the type of the last record which
    was not marked as a mismatch.
  - The fix is to use a tombstone entry for the offending column, instead
    of deleting the schema_entry completely. Only a single warning
    message is printed, and the column is ignored for all subsequent
    records in the input data set.
  - See
    [Issue#98](https://github.com/bxparks/bigquery-schema-generator/issues/98]
    which identified this problem which seems to have existed from the
    very beginning.

Assets 2

Loading

All reactions

1.6.0 - allow NULLABLE to convert to REPEATED; add input_format=csvdictreader

01 Apr 16:46

bxparks

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

1.6.0 - allow NULLABLE to convert to REPEATED; add input_format=csvdictreader

1.6.0 (2023-04-01)
- Allow null fields to convert to REPEATED because bq load seems
  to interpret null fields to be equivalent to an empty array [].
  See #90.
- Add input_format='csvdictreader' option. Similar to 'dict' but
  intended to be used with the csv.DictReader class to read CSV and TSV
  files with various options. More documentation and discussions at:
  - SchemaGenerator.deduce_schema() from
    csv.DictReader,
  - Discussion#91.

Assets 2

Loading

All reactions

1.5.1 - add examples; update documentation

04 Dec 16:10

bxparks

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

1.5.1 - add examples; update documentation

1.5.1 (2022-12-04)
- Add examples/*.py to demonstrate how to use SchemaGenerator as a
  library.
- Update README.md to state that bq load --autodetect uses the first
  500 records. Previously, it scanned only the 100 records.
- This is a maintenance release with no new features or bug fixes.

Assets 2

Loading

All reactions

v1.5 - add --preserve_input_sort_order flag

14 Nov 16:31

bxparks

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v1.5 - add --preserve_input_sort_order flag

1.5 (2021-11-14)
- Make the column order in the BQ schema file match the order of appearance
  in the JSON data file using the --preserve_input_sort_order flag.
  Thanks to kdeggelman@ in
  PR#75.

Assets 2

Loading

All reactions

v1.4.1 - add documentation for input_format='dict'

23 Aug 16:52

bxparks

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v1.4.1 - add documentation for input_format='dict'

1.4.1 (2021-08-23)
- Add documentation for the input_format='dict' option.
- Add additional inpout format 'json' and 'dict' test cases.
- Maintenance release, no functional change in core code.

Assets 2

Loading

All reactions

v1.4 - input_format can be an internal Python dict; support scientific floating point numbers

10 Dec 05:27

bxparks

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v1.4 - input_format can be an internal Python dict; support scientific floating point numbers

1.4 (2020-12-09)
- Add 'dict' as a third input_format when SchemaGenerator is used as a
  library. This can be useful when the data has already been transformed
  into a list of native Python dict objects (see #58, thanks to
  ZiggerZZ@).
- Expand the pattern matchers for quoted integers and quoted floating point
  numbers to be more compatible with the patterns recognized by bq load --autodetect.
- Add Table of Contents to READMD.md. Add usage info for the
  schema_map=existing_schema_map and the input_format='dict' parameters
  in the SchemaGenerator() constructor.

Assets 2

Loading

All reactions

1.3 - support an existing schema file

05 Dec 18:53

bxparks

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

1.3 - support an existing schema file

1.3 (2020-12-05)
- Allow an existing schema file to be specified using
  --existing_schema_path flag, so that new data can be merged into it.
  See #40, #57, and #61.
  (Thanks to abroglesc@ and bozzzzo@).

Assets 2

Loading

All reactions

1.2 - print JSON full path in error messages

28 Oct 03:46

bxparks

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

1.2 - print JSON full path in error messages

1.2 (2020-10-27)
- Print full path of nested JSON elements in error messages (See #52;
  thanks abroglesc@).

Assets 2

Loading

All reactions

1.1 - Add `--ignore_invalid_lines` flag

10 Jul 14:48

bxparks

Compare

Choose a tag to compare

Loading

1.1 - Add `--ignore_invalid_lines` flag

1.1 (2020-07-10)
- Add --ignore_invalid_lines to ignore parsing errors on invalid lines
  and continue processing. Fixes
  #49.
- Add GitHub actions for automated tests and flake8 validation.
- Add package __version__ string.
- Update setup.py, no longer need to convert README.md markdown to RST.

Assets 2

Loading

All reactions

1.0 - fix sanitize_name, add continuous integration

04 Apr 19:33

bxparks

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

1.0 - fix sanitize_name, add continuous integration

1.0 (2020-04-04)
- Fix --sanitize_names for recursive RECORD fields (Thanks riccardomc@,
  see #43).
- Clean up how unit tests are run, trying my best to figure out
  Python's convolution package importing mechanism.
- Add GitHub Actions continuous integration pipelines with flake8 checks and
  automated unit testing.

Assets 2

Loading

All reactions

Previous 1 2 3 Next

Previous Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.