Skip to content

Releases: Azure/azure-event-hubs-spark

v2.3.5

19 Oct 19:44
1c42c2f
Compare
Choose a tag to compare

Release notes:

  • IoT Hub system properties are available
  • Documentation updates
  • Unit test improvements.
  • Various bug fixes. Since the addition of the cached event hub receiver in 2.3.3, there have been some bugs affecting the reliability of the connector. With this latest release, all known bugs have been fixed.

v2.2.5

19 Oct 20:43
Compare
Choose a tag to compare

This is identical to v2.3.5. It's only compatible with Spark 2.1 and Spark 2.2 (whereas v2.3.5 is compatible with Spark 2.3 and up).

v2.3.4

11 Sep 18:58
Compare
Choose a tag to compare

This adds a critical bug fix to 2.3.3. We recommend everyone upgrade as soon as possible. Release notes:

  • Improved exception messaging in the cached receiver.
  • Bug fix in the cached receiver

v2.2.4

11 Sep 18:48
Compare
Choose a tag to compare

This is identical to v2.3.4. It's only compatible with Spark 2.1 and Spark 2.2 (whereas v2.3.4 is compatible with Spark 2.3 and up).

v2.2.3

06 Sep 02:03
a2ccbdc
Compare
Choose a tag to compare

This is identical to v2.3.3. It's only compatible with Spark 2.1 and Spark 2.2.

v2.3.3

06 Sep 02:04
0733567
Compare
Choose a tag to compare

This release focused on bug fixes + refactoring/cleanup. Release notes:

  • Cached receiver is now async
  • Various bug fixes in schema type handling, receive calls, and client
  • Properties can be added in EventHubsSink feature
  • Refactoring in send calls and management calls

v2.2.2

06 Jul 21:19
Compare
Choose a tag to compare

Please the v2.3.2 for release notes. This release contains the same changes.

This version is compatible with Spark 2.2 and Spark 2.1. It is not compatible with Spark 2.3.

v2.3.2

05 Jul 21:41
3682b30
Compare
Choose a tag to compare

Version 2.3.2 of azure-eventhubs-spark_2.11. Release notes:

  • Bug fixes
    • Fixed data loss check in getBatch - excess warnings are no longer printed.
    • Prefetch count can no longer be set below the minimum allowed value.
    • Invalid offsets are detected in translate.
  • Enhancements
    • cached receivers are used. This allows receivers to be reused across batches which dramatically improves receive times. This changed
      required a move to epoch receivers which means each Spark application requires it's own consumer group in order to run properly.
    • properties has been added to the Structured Streaming schema
    • partition has been added to the Structured Streaming schema
    • Check for old fromSeqNos in DStreams. If a DStream falls behind (e.g. events expire from the Event Hub before they are consumed by Spark),
      then Spark will move to the earliest valid event. Previously, the job would fail in this case.
    • Add retry mechanism for certain exceptions thrown in getRunTimeInfo. Exceptions are retried if the exception (or the inner exception) is
      an EventHubException and getIsTransient returns true.
    • All service API calls are now asynchronously done
    • ForeachWriter Implementation has been added. This ForeachWriter uses asynchronous sends and performs much better than the existing
      EventHubsSink.
    • translate has been omptimized
    • javadocs have been added
    • Only the necessary configs are sent to executors (the unneeded onces are trimmed by the driver before they're sent over)
    • ConnectionString validation added in EventHubsConf
    • Improved error messaging
    • All singletons (ClientConnectionPool and CachedEventHubsReceiver) use globally valid keys.
  • Various documentation updates
  • Various unit tests have been added

v2.3.1

23 Mar 03:32
ca17f58
Compare
Choose a tag to compare

Release notes:

  • Translate method optimization for quicker startup
  • Default starting position is EndOfStream
  • Added Receiver Identifier for improved tracing
  • Moved to latest Event Hubs Java Client
  • Various bug fixes

This version is only compatible with Spark 2.3. Please see the Latest Releases section on the README to see which version of the library works with your version of Spark 👍

v2.3.0

08 Mar 00:00
Compare
Choose a tag to compare

This tags the first release of the re-written connector. The major changes are listed below:

  • No more progress tracker. Instead use "checkpointLocation" in Structured Streaming or "checkpoint" in Spark Streaming. For Spark Streaming, see "Storing Offsets" in the integration guide.
  • Switched to sequence number filtering internally. This allows us to know the start and end point of each batch deterministically. In other words, 1000 events from sequence number X is simply X + 1000 whereas 1000 events from byte offset Y cannot be known apriori. This (plus the progress tracker being removed) allows us to have concurrent jobs.
  • Parallelized all API calls
  • Connection pooling of EventHub clients
  • Thread pooling per EventHub client
  • Added EventHubsSink
  • Added EventHubsRelation (batch style query)
  • Preferred location. Spark now consistently schedules partitions on the same executors across batches. This will allow us to use a prefetch queue across batches in future releases.
  • Spark 2.3 support
  • Databricks (and Azure Databricks) support
  • Added Spark core support
  • Moved to EventHubs Java Client 1.0.0
  • Java support in Spark Streaming
  • Allow users to manage their own offsets with the HasOffsetRanges trait. See the integration guide for details!
  • Per partition configuration for starting positions, ending positions, and max rates.
  • Users can start their jobs from START_OF_STREAM and END_OF_STREAM
  • EventHubs receiver timeout and operation timeout are now configurable
  • Non-public and international clouds are properly supported

Additionally the repo was improved:

  • Documentation rewrite
  • README is revamped