Releases · Azure/azure-event-hubs-spark

19 Oct 19:44

sabeegrewal

v2.3.5

1c42c2f

v2.3.5

Release notes:

IoT Hub system properties are available
Documentation updates
Unit test improvements.
Various bug fixes. Since the addition of the cached event hub receiver in 2.3.3, there have been some bugs affecting the reliability of the connector. With this latest release, all known bugs have been fixed.

Assets 2

19 Oct 20:43

sabeegrewal

v2.2.5

bbe190f

v2.2.5

This is identical to v2.3.5. It's only compatible with Spark 2.1 and Spark 2.2 (whereas v2.3.5 is compatible with Spark 2.3 and up).

Assets 2

11 Sep 18:58

sabeegrewal

v2.3.4

22d7936

v2.3.4

This adds a critical bug fix to 2.3.3. We recommend everyone upgrade as soon as possible. Release notes:

Improved exception messaging in the cached receiver.
Bug fix in the cached receiver

Assets 2

11 Sep 18:48

sabeegrewal

v2.2.4

f793fe9

v2.2.4

This is identical to v2.3.4. It's only compatible with Spark 2.1 and Spark 2.2 (whereas v2.3.4 is compatible with Spark 2.3 and up).

Assets 2

06 Sep 02:03

sabeegrewal

v2.2.3

a2ccbdc

v2.2.3

This is identical to v2.3.3. It's only compatible with Spark 2.1 and Spark 2.2.

Assets 2

06 Sep 02:04

sabeegrewal

v2.3.3

0733567

v2.3.3

This release focused on bug fixes + refactoring/cleanup. Release notes:

Cached receiver is now async
Various bug fixes in schema type handling, receive calls, and client
Properties can be added in EventHubsSink feature
Refactoring in send calls and management calls

Assets 2

06 Jul 21:19

sabeegrewal

v2.2.2

7f300a3

v2.2.2

Please the v2.3.2 for release notes. This release contains the same changes.

This version is compatible with Spark 2.2 and Spark 2.1. It is not compatible with Spark 2.3.

Assets 2

05 Jul 21:41

sabeegrewal

v2.3.2

3682b30

v2.3.2

Version 2.3.2 of azure-eventhubs-spark_2.11. Release notes:

Bug fixes
- Fixed data loss check in getBatch - excess warnings are no longer printed.
- Prefetch count can no longer be set below the minimum allowed value.
- Invalid offsets are detected in translate.
Enhancements
- cached receivers are used. This allows receivers to be reused across batches which dramatically improves receive times. This changed
  required a move to epoch receivers which means each Spark application requires it's own consumer group in order to run properly.
- properties has been added to the Structured Streaming schema
- partition has been added to the Structured Streaming schema
- Check for old fromSeqNos in DStreams. If a DStream falls behind (e.g. events expire from the Event Hub before they are consumed by Spark),
  then Spark will move to the earliest valid event. Previously, the job would fail in this case.
- Add retry mechanism for certain exceptions thrown in getRunTimeInfo. Exceptions are retried if the exception (or the inner exception) is
  an EventHubException and getIsTransient returns true.
- All service API calls are now asynchronously done
- ForeachWriter Implementation has been added. This ForeachWriter uses asynchronous sends and performs much better than the existing
  EventHubsSink.
- translate has been omptimized
- javadocs have been added
- Only the necessary configs are sent to executors (the unneeded onces are trimmed by the driver before they're sent over)
- ConnectionString validation added in EventHubsConf
- Improved error messaging
- All singletons (ClientConnectionPool and CachedEventHubsReceiver) use globally valid keys.
Various documentation updates
Various unit tests have been added

Assets 2

23 Mar 03:32

sabeegrewal

v2.3.1

ca17f58

v2.3.1

Release notes:

Translate method optimization for quicker startup
Default starting position is EndOfStream
Added Receiver Identifier for improved tracing
Moved to latest Event Hubs Java Client
Various bug fixes

This version is only compatible with Spark 2.3. Please see the Latest Releases section on the README to see which version of the library works with your version of Spark 👍

Assets 2

08 Mar 00:00

sabeegrewal

spark-2.3.0

bc5c651

v2.3.0

This tags the first release of the re-written connector. The major changes are listed below:

No more progress tracker. Instead use "checkpointLocation" in Structured Streaming or "checkpoint" in Spark Streaming. For Spark Streaming, see "Storing Offsets" in the integration guide.
Switched to sequence number filtering internally. This allows us to know the start and end point of each batch deterministically. In other words, 1000 events from sequence number X is simply X + 1000 whereas 1000 events from byte offset Y cannot be known apriori. This (plus the progress tracker being removed) allows us to have concurrent jobs.
Parallelized all API calls
Connection pooling of EventHub clients
Thread pooling per EventHub client
Added EventHubsSink
Added EventHubsRelation (batch style query)
Preferred location. Spark now consistently schedules partitions on the same executors across batches. This will allow us to use a prefetch queue across batches in future releases.
Spark 2.3 support
Databricks (and Azure Databricks) support
Added Spark core support
Moved to EventHubs Java Client 1.0.0
Java support in Spark Streaming
Allow users to manage their own offsets with the HasOffsetRanges trait. See the integration guide for details!
Per partition configuration for starting positions, ending positions, and max rates.
Users can start their jobs from START_OF_STREAM and END_OF_STREAM
EventHubs receiver timeout and operation timeout are now configurable
Non-public and international clouds are properly supported

Additionally the repo was improved:

Documentation rewrite
README is revamped

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: Azure/azure-event-hubs-spark

v2.3.5

v2.2.5

v2.3.4

v2.2.4

v2.2.3

v2.3.3

v2.2.2

v2.3.2

v2.3.1

v2.3.0