Skip to content

Releases: spotify/scio

v0.2.9

09 Jan 22:45
Compare
Choose a tag to compare

"Hoc tempore atque nunc et semper"

Notable changes:

  • Upgrade Dataflow SDK to 1.9.0 (release notes)
  • Add TensorFlow TFRecord file IO #382
  • Improve DistCache API and test #386
  • Add typed BigQuery taps #381
  • Skip nulls in BigQueryTable.toTableRow #388
  • BigQuery DateTime format fix #377

v0.2.8

20 Dec 21:02
Compare
Choose a tag to compare

"Consummatum est"

Notable changes:

  • Add @description for type-safe BigQuery #340 (see wiki)
  • Support custom KryoRegistrar #343 (see FAQ)
  • Bump Dataflow Java SDK version to 1.8.1
  • Bump bigtable version to 0.9.4 (#362)
  • Check and warn about Scio version #335
  • Expose ScioResult metrics (#344)
  • Hadoop dist cache should skip GCS upload with direct runner #143
  • Bug fixes #364 #368

v0.2.7

21 Nov 17:44
Compare
Choose a tag to compare

"Crescat scientia vita excolatur"

Notable changes:

  • Allow user to set job name with dataflow runner #341
  • Delay creating local output directories until after sc.close() #321
  • Support custom IO in JobTest #317

ScioResult#saveMetrics now saves dataflow service metrics in addition to accumulators and other basic job metrics.

BigQueryType.fromTable now keeps the pre-formatted table string in the generated companion object so that the table name can be formatted at runtime. This is useful for, for example, BigQuery tables partitioned by date, where a fixed value for the date is used in the annotation to supply the schema, but the actual table queried at runtime is different. This is now at parity with BigQueryType.fromQuery.

ScioContext#applyTransform and SCollection#applyOutputTransform have been replaced with ScioContext#customInput and SCollection#saveAsCustomOutput.

v0.2.6

28 Oct 15:22
Compare
Choose a tag to compare

"Sensu lato"

Notable changes:

  • Upgrade Dataflow Java SDK to 1.8.0
  • Save Protobuf generic schema as Avro metadata #265
  • Update BigQuery types, including better timestamp support #293
  • Improve JobTest error message for missing test wiring and materialize handling #316
  • Improve metrics API #296 #306
  • Add SideInOutExample and test #313
  • Fix @BigQueryType.toTable for case classes with optional arguments #311
  • Fix BigtableUtil.updateNumberOfBigtableNodes #309
  • Refactor HDFS #301

SCollection#saveAsProtobufFile now saves protobuf-generic schema in Avro metadata. The schema and protobuf binary records can be read with proto-tools in gcs-tools. See wiki for details.

BigQueryType now supports standard SQL types including DATE, TIME, and DATETIME. See ScalaDoc for details.

v0.2.5

15 Oct 00:42
Compare
Choose a tag to compare

"Imperium in imperio"

Notable changes:

  • Joins now favor large LHS #290 #287 #284
  • Expose Scala and Scio version in PipelineOptions #281
  • Add saveMetrics #286
  • Update to bigtable 0.9.3 #288

The biggest change is join behavior and performance improvement. Inner, left and out joins now performs better when the left-hand side is larger. For example a should be larger than b in a.join(b), and size(a) >= size(b) >= size(c) >= size(d) in MultiJoin(a, b, c, d). For more information please see this FAQ.

v0.2.4

07 Oct 17:09
Compare
Choose a tag to compare

"Ab imo pectore"

Notable changes:

  • Add collect() to SCollection #268
  • Upgrade Algebird to 0.12.2 #269
  • KryoAtomicCoder is now thread local #274

v0.2.3

19 Sep 21:00
Compare
Choose a tag to compare

"Aurea mediocritas"

Notable changes:

  • Bump Dataflow Java SDK to 1.7.0 #253
  • Update Datastore API to V1 #262
    The current v1beta2 API is being turned off on Sep 30, 2016. Please upgrade ASAP.
  • Add compression to Avro sink #260
    Avro file output now defaults to "deflate" codec with level=6
  • Use InProcessPipelineRunner #170 #246
    Local and test pipelines will now run with multiple threads
  • Add debug to SCollection #248
  • Fix some edge cases with BigQueryType #251 #255
  • Fix threading issue with Avro Tap #259
  • Fix some flakey Algebird tests #264

v0.2.2

08 Sep 21:59
Compare
Choose a tag to compare

"Intelligenti pauca"

Notable changes:

  • Support type-safe BigQuery in IntelliJ IDEA through scio-idea-plugin wiki
  • Add more ScalaTest matchers #233
  • Add namespace option to Datastore IO #230
  • Add function trait to companion object from BigQueryType.toTable #234
  • Fix ScalaCheck tests and race condition in JobTest #239
  • Fix TableRow serialization
  • Update java-lsh to 0.10 #231

v0.2.1

30 Aug 18:05
Compare
Choose a tag to compare

"Sedes incertae"

Notable changes:

  • Scio now detects BigQuery staging dataset location. #226
    The -Dbigquery.staging_dataset.location flag is removed.
  • Bump Bigtable dependency to 0.9.2 which fixed a bug that causes some rows to be read twice #223
  • @BigQueryType must now be declared inside a class or object #227
    This is in preparation for further macro improvements.
  • Add allowedTimestampSkew to timestampBy #219

v0.2.0

23 Aug 20:37
Compare
Choose a tag to compare

"Nulli secundus"

Notable changes:

  • support both legacy and SQL syntanx for BigQuery #218
    Scio now supports the new SQL 2011 syntax in BigQuery. This is backwards compatible as Scio detects whether a query string is legacy or SQL. However the new SQL syntax works better with type safe BigQuery and is recommended.
  • add helpers to simplify accumulator API #215
    Some convenience methods where added improve accumulator usability. See AccumulatorExample and AccumulatorSCollectionTest for more.