Releases: spotify/scio
v0.2.9
v0.2.8
"Consummatum est"
Notable changes:
- Add
@description
for type-safe BigQuery #340 (see wiki) - Support custom KryoRegistrar #343 (see FAQ)
- Bump Dataflow Java SDK version to 1.8.1
- Bump bigtable version to 0.9.4 (#362)
- Check and warn about Scio version #335
- Expose ScioResult metrics (#344)
- Hadoop dist cache should skip GCS upload with direct runner #143
- Bug fixes #364 #368
v0.2.7
"Crescat scientia vita excolatur"
Notable changes:
- Allow user to set job name with dataflow runner #341
- Delay creating local output directories until after
sc.close()
#321 - Support custom IO in JobTest #317
ScioResult#saveMetrics
now saves dataflow service metrics in addition to accumulators and other basic job metrics.
BigQueryType.fromTable
now keeps the pre-formatted table string in the generated companion object so that the table name can be formatted at runtime. This is useful for, for example, BigQuery tables partitioned by date, where a fixed value for the date is used in the annotation to supply the schema, but the actual table queried at runtime is different. This is now at parity with BigQueryType.fromQuery
.
ScioContext#applyTransform and SCollection#applyOutputTransform
have been replaced with ScioContext#customInput
and SCollection#saveAsCustomOutput
.
v0.2.6
"Sensu lato"
Notable changes:
- Upgrade Dataflow Java SDK to 1.8.0
- Save Protobuf generic schema as Avro metadata #265
- Update BigQuery types, including better timestamp support #293
- Improve JobTest error message for missing test wiring and
materialize
handling #316 - Improve metrics API #296 #306
- Add SideInOutExample and test #313
- Fix
@BigQueryType.toTable
for case classes with optional arguments #311 - Fix
BigtableUtil.updateNumberOfBigtableNodes
#309 - Refactor HDFS #301
SCollection#saveAsProtobufFile
now saves protobuf-generic schema in Avro metadata. The schema and protobuf binary records can be read with proto-tools
in gcs-tools. See wiki for details.
BigQueryType
now supports standard SQL types including DATE
, TIME
, and DATETIME
. See ScalaDoc for details.
v0.2.5
"Imperium in imperio"
Notable changes:
- Joins now favor large LHS #290 #287 #284
- Expose Scala and Scio version in PipelineOptions #281
- Add
saveMetrics
#286 - Update to bigtable 0.9.3 #288
The biggest change is join behavior and performance improvement. Inner, left and out joins now performs better when the left-hand side is larger. For example a
should be larger than b
in a.join(b)
, and size(a) >= size(b) >= size(c) >= size(d)
in MultiJoin(a, b, c, d)
. For more information please see this FAQ.
v0.2.4
v0.2.3
"Aurea mediocritas"
Notable changes:
- Bump Dataflow Java SDK to 1.7.0 #253
- Update Datastore API to V1 #262
The current v1beta2 API is being turned off on Sep 30, 2016. Please upgrade ASAP. - Add compression to Avro sink #260
Avro file output now defaults to "deflate" codec with level=6 - Use InProcessPipelineRunner #170 #246
Local and test pipelines will now run with multiple threads - Add
debug
toSCollection
#248 - Fix some edge cases with BigQueryType #251 #255
- Fix threading issue with Avro Tap #259
- Fix some flakey Algebird tests #264
v0.2.2
"Intelligenti pauca"
Notable changes:
- Support type-safe BigQuery in IntelliJ IDEA through scio-idea-plugin wiki
- Add more ScalaTest matchers #233
- Add namespace option to Datastore IO #230
- Add function trait to companion object from BigQueryType.toTable #234
- Fix ScalaCheck tests and race condition in JobTest #239
- Fix TableRow serialization
- Update java-lsh to 0.10 #231
v0.2.1
"Sedes incertae"
Notable changes:
- Scio now detects BigQuery staging dataset location. #226
The-Dbigquery.staging_dataset.location
flag is removed. - Bump Bigtable dependency to 0.9.2 which fixed a bug that causes some rows to be read twice #223
@BigQueryType
must now be declared inside a class or object #227
This is in preparation for further macro improvements.- Add
allowedTimestampSkew
totimestampBy
#219
v0.2.0
"Nulli secundus"
Notable changes:
- support both legacy and SQL syntanx for BigQuery #218
Scio now supports the new SQL 2011 syntax in BigQuery. This is backwards compatible as Scio detects whether a query string is legacy or SQL. However the new SQL syntax works better with type safe BigQuery and is recommended. - add helpers to simplify accumulator API #215
Some convenience methods where added improve accumulator usability. See AccumulatorExample and AccumulatorSCollectionTest for more.