Skip to content

Releases: elastacloud/spark-excel

v0.1.13

29 Jun 12:38
5dc22b4
Compare
Choose a tag to compare

Welcome to the latest release of the Spark Excel reader. This release brings a few changes and bug-fixes in with it including.

  • New parser option to disable formula evaluation. When disabled the formula itself is extracted from the sheet rather than being evaluated.
  • Fix for empty files. Now when a file is parsed that contains only the header record a single null record is returned instead of raising an error
  • Support for the latest Spark 3.4.x and 3.5.x releases
  • Updates to the Apache POI library and other dependencies

Another thanks goes out to @josecsotomorales for his help in this release

v0.1.12

25 Oct 15:02
c8d97d0
Compare
Choose a tag to compare

It's been a while longer this time but we're back with the 0.1.12 release of the Excel data source for Apache Spark. A big thanks to @josecsotomorales who has contributed to this release.

This release introduces the following changes

  • Update Apache POI to 5.2.3, bring in support for new functions and features
  • Add the ability for users to specify values which should be treated as null (e.g. "N/A")
  • Handle Log4J conflicts across Spark versions
  • Spark 3.4.1 support
  • Spark 3.5.0 support

We're always looking for people to help contibute, from code changes to feature suggestions, so if you feel like you can contribute then feel free to join in.

v0.1.11

29 May 09:22
c2511ab
Compare
Choose a tag to compare

This release brings in support for Spark 3.4.0 (the latest version of the 3.4.x series at the time of release) along with a new feature to provide a per-row flag indicating if the row matches the provided or inferred schema.

  • Support for Spark 3.4
  • schemaMatchColumnName option for indicating if each row matches the schema
  • Update of Spark versions to match releases available from Apache Spark, and the versions supported by supported Databricks Runtimes and Azure Synapse Analytics.

N.B. When using Databricks, please check the Spark version used by the runtime to ensure the correct version of the package is used.

v0.1.10

20 Aug 10:51
0e0c0a0
Compare
Choose a tag to compare

Another minor increment, but bringing in Spark 3.3 support (which is big, right?)

  • Support for Spark 3.3
  • Introduces the thresholdBytesForTempFiles option which is an alias for maxBytesForTempFiles but is clearer about what is happening

v0.1.9

02 Jul 16:34
Compare
Choose a tag to compare

Minor release with a couple of significant updates.

  • Update scalatest to 3.2.11
  • Update Apache POI to version 5.2.2
  • Update Apache Commons IO to 2.11.0
  • Introduce new option for maxBytesForTempFiles
    • Resolves issue for loading larger files by setting the maximum bytes before creating temp files
    • Defaults the option to 100_000_000
    • Can be overridden with options

v0.1.8

05 Mar 21:57
22f2fee
Compare
Choose a tag to compare

Upgrades to version 5.2.0 of the Apache POI library, bringing general improvements and support for additional formula types such as CONCAT.

Brings in support for additional types in user-defined schemas, reducing the need to cast data once it's been read

Brings in Spark version 3.1.3 and 3.2.1 to the build profiles.

The JAR is a little larger as additional Apache libraries need to be included to support the updated POI library. In addition, Azure Synapse uses an older version of commons-io which doesn't include features required by the updated POI library.

v0.1.7

23 Oct 10:36
7b02ed4
Compare
Choose a tag to compare

Provides a general clean up of the build.sbt file, and adds support for Spark 3.2.0

v0.1.6

08 Aug 09:42
d7ce3c6
Compare
Choose a tag to compare

With no new ideas turning up in the last couple of weeks I've promoted the 0.1.6-SNAPSHOT release to final. This release includes the following changes.

  • Add Spark 3.0.3
  • Fix for cells which contain data that does not match the target schema
  • Better error reporting for parser option, suggesting which option the user might have meant if it has been mis-spelt
  • Some code cleanup

v0.1.6-SNAPSHOT

19 Jul 16:43
c4fa957
Compare
Choose a tag to compare
v0.1.6-SNAPSHOT Pre-release
Pre-release

Snapshot release for 0.1.6, this includes

  • Fix for cells which contain data that does not match the target schema
  • Better error reporting for parser option, suggesting which option the user might have meant if it has been mis-spelt
  • Some code cleanup

v0.1.5

03 Jul 14:11
Compare
Choose a tag to compare

Initial public release of the Spark Excel library. Please see the README.md file for a list of current capabilities.

JARs have been attached to the release. In future the plan is to publish as well so that they can be installed using maven co-ordinates.