Releases: google/weather-tools
v0.3.2
weather-mv
is much faster now & equipped with Cloud-Optimized Geotiffs (GOGs) ingestion. weather-dl
is enhanced to support MARS syntax in JSON config files and restriction for max-number of workers.
We're happy to welcome @deepgabani8 to the weather-tools dev team !
Current Status
weather-dl
: Fixes and parser system improvements
- Fixed error while parsing new-line separated date-values.
- JSON config files now support MARS syntax.
- New syntax supported: now, users can specify MARS range syntax in reverse orders as well (e.g. 2020-01-01/to/2018-01-01/by/-1).
- Prevent exhaustion of quotas: Based on current approach for the downloader, we've capped the max number of workers to N i.e. possible simultaneous requests + fudge factor.
weather-mv
: Performance improvements and support for COGs ingestion
- Substantial performance improvement !
- Added flag to control in-memory copying of dataset. By default the dataset is opened in-memory, the user can restrict it by passing the
--disable_in_memory_copy
flag. - Added validation to alert the user earlier that the BigQuery table and temp location (cloud bucket) need to be in the same region. Users can skip this validation by passing the
-s, --skip-region-validation
flag. - Added support for ingestion of COGs into BigQuery.
- Updated doc (README.md) of the tool to remove duplicate flags from sample examples.
General
- Fixed typo in contribution guide (CONTRIBUTING.md).
What's Changed
- Validates non-compatible regions scenarios in weather_mv tool. by @mahrsee1997 in #155
- Changes for #75, CLI-support for user controlled in-memory copy operation by @ksic8 in #154
- Candidate implementation to speed up row extraction. by @alxmrs in #146
- Fixed typo in CONTRIBUTING.md by @mahrsee1997 in #156
- Fix parsing newline separated date values by @mahrsee1997 in #161
- Support ingestion of COGs into BigQuery by @mahrsee1997 in #158
- DL: Add support for reverse order in MARS range syntax by @deepgabani8 in #162
- DL: Add support for MARS syntax in JSON config by @deepgabani8 in #163
- Removed duplicate flag from sample examples given in Readme.md of weather_mv tool. by @mahrsee1997 in #168
- Cap max number of workers in weather-dl. by @mahrsee1997 in #170
New Contributors
- @deepgabani8 made their first contribution in #162
Full Changelog: v0.3.1...v0.3.2
v0.3.1
Improvements to the weather-dl
parser.
- Fixed a bug where numbers with leading zeros were not parsed (useful for date ranges)
- Correct additional issue for singleton partition values (e.g. get only one day of every month)
- New syntax added: now, users can specify
day=all
to get all the days in a month.
What's Changed
- New Syntax for download configs:
day=all
by @alxmrs in #150 - Proper handling of singleton partition dimensions. by @alxmrs in #151
- Incrementing weather-dl version to cover recent parser changes. by @alxmrs in #153
Full Changelog: v0.3.0...v0.3.1
v0.3.0
The weather splitter has a new API that allows for partitioning weather data by any dimension (we intentionally exclude lat/lngs). weather-dl
Now has a simpler, more pythonic interface for expressing target paths. The weather-mv
tool now supports dry runs and BigQuery geopoints.
We're happy to welcome @mahrsee1997 and @ksic8 to the weather-tools dev team!
Current Status
weather-dl
: Fixes and DSL usability improvements
- Specifying templates is much simpler. Only
target_path
is needed, and we fully support python string formatting syntax. - A significant error was fixed, and now downloads have better skipping and retry logic.
- Log ergonomics were improved by adding timestamps and removing needless warnings (thanks, @pbattaglia!).
- Internal code refactors were included to improve maintainability.
- Data source clients (now, only from ECMWF) includes important license information regarding terms of data use.
weather-mv
: Schema & usage improvements
- The default schemas were improved to include BigQuery Geography-type columns. Now, lat/lngs will be represented as
POINT
s. - The weather mover now has dry runs! Users will be able to preview their data ingestion into BigQuery before making use of infrastructure.
weather-sp
: Flexible splits
- A new version of the splitter was introduce to allow for flexible splits of weather data: Now, you can divide Grib and NetCDF data by any dimension except latitude and longitude (great work, @uhager!).
General
- Pip install instructions include debugging advice for long installs.
- We've removed open meetings from our contributing guide due to low attendance
What's Changed
- Fixes issue (127) where append_date_dirs wasn't working properly. by @pbattaglia in #130
- Added support for dry-runs to weather-mv tool by @mahrsee1997 in #132
- CONTRIBUTING guide changes for pip install command by @ksic8 in #137
- Code-changes for #49, added S2_LOCATION column by @ksic8 in #133
- License information added in client and documentation by @ksic8 in #136
- Added logger timestamp. by @pbattaglia in #131
- Robust download from clients to VMs. by @alxmrs in #143
- Suppress urllib3 warning by @pbattaglia in #134
- Unscheduling developer meetings. by @alxmrs in #145
- Flexible splits by @uhager in #125
- Changes to make 'target_path' and 'target_filename' compliant with Python's standard string formatting by @mahrsee1997 in #144
- Converted
Config
dict into dataclass by @mahrsee1997 in #142 - Netcdf splits by @uhager in #147
New Contributors
- @mahrsee1997 made their first contribution in #132
- @ksic8 made their first contribution in #137
Full Changelog: v0.2.2...v0.3.0
v0.2.2
v0.2.1
Improvements and bugfixes for all weather tools. weather-dl
is much faster & more robust. weather-mv
now uses a pluggable infrastructure, which makes iterations faster. weather-sp
is mid transition to arbitrary splits.
Thanks to our new OSS contributors, @pranay101 and @pbattaglia!
Current Status
weather-dl
: Major fixes
- This release introduces a fix to #98, which makes the downloader faster and more robust. With this change, there is no need to override the autoscaling algorithm – so, it now has less moving parts.
- Uploads have better retry logic. Users should experience less crashes from network errors in the pipeline.
- The downloader structure has been refactored to be more testable.
- Examples of JSON configs were added.
- Address a critical
NameError
bug that occurred during a refactor.
weather-mv
: Refactor
- The mover has been refactored to use a pluggable infrastructure. This makes it easier to develop local runs, and to write weather data to other sources besides BQ.
weather-sp
: Skipping logic
- A non-API changing feature has been added to the splitter: now, already splitted data will be skipped. Users can override this feature with
-f,--force
. - The documentation for the splitter has been improved.
General
The release process now produces smaller binaries (we're now ignoring test data).
What's Changed
- Added example JSON config by @pranay101 in #52
- Refactored weather-mv to work with pluggable Data Sinks. by @alxmrs in #101
- Update weather-sp's templating system to allow users to specify level and shortname. by @alxmrs in #105
- Upload to cloud is robust to socket timeout errors. by @alxmrs in #110
- Fix wrong output file example in weather-sp readme. by @uhager in #111
- Added skipping logic to weather-sp by @alxmrs in #108
- Lower default num-requests for MARS to make it more robust. by @alxmrs in #113
- New data-oriented task distribution strategy. by @alxmrs in #116
- Downloader refactor: extracted out partitioning; tested pipeline args. by @alxmrs in #117
- Fix minor bug: main session needs to be saved. by @alxmrs in #120
- (#123) Fixed beam not being able to access global namespace + minor related bug. by @pbattaglia in #124
- Shrinking the size of the package release artifacts by @alxmrs in #122
New Contributors
- @pranay101 made their first contribution in #52
- @pbattaglia made their first contribution in #124
Full Changelog: v0.2.0...v0.2.1
v0.2.0
New version of weather-sp
. Fixes and improvements to weather-dl
and weather-mv
.
Thanks to our volunteer open source contributors and Google 20%ers!
Current State
All three tools are still in their beta and alpha stages. In this release, the stability of weather-mv
was especially improved. We've been able to execute streaming ingestion of Grib data into BigQuery. Users of weather-sp
will now have greater control to express the output location of split files through a file pattern template.
weather-dl
: Minor fixes
- We fixed GCS timeout issues experienced intermittently.
- Issue with mandatory partition keys was fixed.
weather-mv
: Major fixes for tool stability
- Grib support added.
- Row extraction is faster by loading weather data into memory.
- Log messages were improved.
- Writes to BigQuery will use the most efficient method (streaming vs file upload).
- XArray Open step is made generic.
- Several fixes were introduced.
- JSON serialization fixes.
- Dataflow environment will now include get ecCodes installed so we can run cfgrib.
- Tarballs are smaller / faster to upload to Dataflow (or another Beam runner).
- BigQuery write errors were fixed.
weather-sp
: New version
The splitter now supports flexible specification of output files.
General project improvements
- Documentation was groomed.
- Windows developer pathway was documented.
- Fix in developer scripts (now we can better dev-test different branches of the project) and slow CI.
- Announced open developer meetings.
What's Changed
weather-dl
: Fix GCS timeout issues the pipelines intermittently experiences. by @alxmrs in #72- Improve grib file processing speed by @pramodg in #74
- Default behavior is better by @lakshmanok in #77
- Updating script to use new package name by @CillianFn in #79
- Better progress logs for
weather-mv
. by @alxmrs in #82 weather-mv
fix: Serializing all numpy float and int types to JSON. by @alxmrs in #83- Documented windows workaround. by @alxmrs in #85
- Updated
weather-mv
install process to setup ecCodes on worker machine. by @alxmrs in #86 - Groomed documentation by @alxmrs in #88
- Coercing timedelta to float by @alxmrs in #89
weather-mv
: Allow users to pass in keyword arguments to xarray.open_dataset by @alxmrs in #87- weather-splitter: allow for more flexible output files by @uhager in #65
- Fix slow test runs by @CillianFn in #92
- Add check for partition_keys when using append_date_dirs by @CillianFn in #90
- Exclude test data from tarball by @CillianFn in #93
weather-mv
– Fixed error writing to BigQuery: Excluding non-coordinate indexes if they don't appear in the Schema by @alxmrs in #95- Updating tool versions in prep for release. by @alxmrs in #97
- Announcing open developer meetings. by @alxmrs in #96
New Contributors
Full Changelog: v0.1.1...v0.2.0
Hotfix for issue found in `weather-mv`.
What's Changed
Full Changelog: v0.1.0...v0.1.1
Initial Release of weather-tools
The inaugural release of weather-tools
.
Current State
Currently, there are three tools in development: weather-dl
, weather-mv
, and weather-sp
. The first tool is in its beta stage, and the latter two are in alpha. Since this is the start of the project's changelog, I will now quickly summarize the features of each tool:
weather-dl
: the Weather Downloader
Weather Downloader ingests weather data to cloud buckets.
- Downloads weather data from ECMWF through their MARS and CDS APIs.
- Supports pipeline Dry-runs.
- Downloads are filesystem agnostic. Data can be ingested to GCS, S3, Azure Blobstore, or a local filesystem.
- Manifests of downloads are recoded in Firebase.
- A ConfigParser-based DSL lets users select data to download and control how data is sharded in a general manner.
weather-mv
: the Weather Mover
Weather Mover loads weather data from cloud storage into Google BigQuery.
- Weather data from any filesystem can be uploaded in batch to Google BigQuery.
- Both NetCDF and Grib data are explicitly supported. Later, any XArray-readable dataset will be supported.
- All rows include an "import time" to keep track of when the data was ingested.
- Weather data can be filtered by geographic area or by variable type.
- Supports inference if BigQuery Schema from parts of the dataset.
- Streaming pipelines for ingesting real-time data into BigQuery is supported.
weather-sp
: the Weather Spitter
Splits NetCDF and Grib files into several files by variable.
- NetCDF and Grib data splitting is supported.
- Grib data is split by variable and leveltype.
- Buckets with mixtures of data types (Grib and NetCDF) can be processed at once.
- The root of the output path is computed for you; users have control over the parent directory.
- Dry-runs of splits are supported.
Recent Changes
- Adding back an example config. by @alxmrs in #30
- Handle NaNs in data by @pramodg in #33
- Add utf-8 encoding to file read in setup by @CillianFn in #36
- Bump urllib3 from 1.25.11 to 1.26.5 in /weather_dl by @dependabot in #37
- Support un-indexed / single valued coordinates. by @pramodg in #39
- Set up empty dataset, not table by @lakshmanok in #41
- Docs fix - typo & stale links by @CillianFn in #44
- Basic support for grib files. by @pramodg in #40
- Test example configs by @CillianFn in #56
- Read the docs config by @CillianFn in #61
weather-mv
: Now using Streaming Inserts into BQ by @alxmrs in #62weather-mv
: Implemented streaming import of data into BigQuery. by @alxmrs in #58- Added script to help contributors test each other's updates to weather-tools. by @alxmrs in #63
- Github action to publish package by @saveriogzz in #31
- Updated python package name to
google-weather-tools
. by @alxmrs in #67 - Updated the standard example configs to use Reanalyses instead of Ensemble Means. by @alxmrs in #66
- Setting initial versions of each weather-tool. by @alxmrs in #68
New Contributors
- @pramodg made their first contribution in #33
- @CillianFn made their first contribution in #36
- @dependabot made their first contribution in #37
- @lakshmanok made their first contribution in #41
- @saveriogzz made their first contribution in #31
Full Changelog: https://github.com/google/weather-tools/commits/v0.1.0