From 37f6fc543bc704d88ca7b25ac7a2f96d44f2fd98 Mon Sep 17 00:00:00 2001 From: Xuzhou Qin Date: Thu, 13 Feb 2020 15:55:42 +0100 Subject: [PATCH] v0.4.1 Signed-off-by: Xuzhou Qin --- CHANGELOG.md | 74 +++++++++++++++++++++++++++++----------------------- README.md | 4 +-- 2 files changed, 44 insertions(+), 34 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index a91740c2..1e4aefcf 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,9 +1,34 @@ -## 0.4.1 (2020-01-15) +## 0.4.1 (2020-02-13) +Changes: +- Changed benchmark unit of time to *seconds* (#88) + +Fixes: +- The master URL of SparkSession can now be overwritten in local environment (#74) +- `FileConnector` now lists path correctly for nested directories (#97) + New features: - Added [Mermaid](https://mermaidjs.github.io/#/) diagram generation to **Pipeline** (#51) -- Added `showDiagram()` method to **Pipeline** that prints the Mermaid code and generates the - live editor URL 🎩🐰✨ (#52) +- Added `showDiagram()` method to **Pipeline** that prints the Mermaid code and generates the live editor URL 🎩🐰✨ (#52) - Added **Codecov** report and **Scala API doc** +- Added `delete` method in `JDBCConnector` (#82) +- Added `drop` method in `DBConnector` (#83) +- Added support for both of the following two Spark configuration styles in SETL builder (#86) + ```hocon + setl.config { + spark { + spark.app.name = "my_app" + spark.sql.shuffle.partitions = "1000" + } + } + + setl.config_2 { + spark.app.name = "my_app" + spark.sql.shuffle.partitions = "1000" + } + ``` + +Others: +- Improved test coverage ## 0.4.0 (2020-01-09) Changes: @@ -26,46 +51,37 @@ Others: - Optimized **PipelineInspector** (#33) ## 0.3.5 (2019-12-16) -- BREAKING CHANGE: replace the Spark compatible version by the Scala compatible version in the artifact ID. -The old artifact id **dc-spark-sdk_2.4** was changed to **dc-spark-sdk_2.11** (or **dc-spark-sdk_2.12**) +- BREAKING CHANGE: replace the Spark compatible version by the Scala compatible version in the artifact ID. The old artifact id **dc-spark-sdk_2.4** was changed to **dc-spark-sdk_2.11** (or **dc-spark-sdk_2.12**) - Upgraded dependencies - Added Scala 2.12 support - Removed **SparkSession** from Connector and SparkRepository constructor (old constructors are kept but now deprecated) - Added **Column** type support in FindBy method of **SparkRepository** and **Condition** -- Added method **setConnector** and **setRepository** in **Setl** that accept -object of type Connector/SparkRepository +- Added method **setConnector** and **setRepository** in **Setl** that accept object of type Connector/SparkRepository ## 0.3.4 (2019-12-06) - Added read cache into spark repository to avoid consecutive disk IO. -- Added option **autoLoad** in the Delivery annotation so that *DeliverableDispatcher* can still handle the dependency -injection in the case where the delivery is missing but a corresponding -repository is present. +- Added option **autoLoad** in the Delivery annotation so that *DeliverableDispatcher* can still handle the dependency injection in the case where the delivery is missing but a corresponding repository is present. - Added option **condition** in the Delivery annotation to pre-filter loaded data when **autoLoad** is set to true. -- Added option **id** in the Delivery annotation. DeliveryDispatcher will match deliveries by the id in addition to -the payload type. By default the id is an empty string (""). -- Added **setConnector** method in DCContext. Each connector should be delivered with an ID. By default the ID will be its -config path. +- Added option **id** in the Delivery annotation. DeliveryDispatcher will match deliveries by the id in addition to the payload type. By default the id is an empty string (""). +- Added **setConnector** method in DCContext. Each connector should be delivered with an ID. By default the ID will be itsconfig path. - Added support of wildcard path for SparkRepository and Connector - Added JDBCConnector ## 0.3.3 (2019-10-22) - Added **SnappyCompressor**. -- Added method **persist(persistence: Boolean)** into **Stage** and **Factory** to. -activate/deactivate output persistence. By default the output persistence is set to *true*. +- Added method **persist(persistence: Boolean)** into **Stage** and **Factory** to activate/deactivate output persistence. By default the output persistence is set to *true*. - Added implicit method `filter(cond: Set[Condition])` for Dataset and DataFrame. - Added `setUserDefinedSuffixKey` and `getUserDefinedSuffixKey` to **SparkRepository**. ## 0.3.2 (2019-10-14) -- Added **@Compress** annotation. **SparkRepository** will compress all columns having this annotation by -using a **Compressor** (the default compressor is **XZCompressor**) +- Added **@Compress** annotation. **SparkRepository** will compress all columns having this annotation by using a **Compressor** (the default compressor is **XZCompressor**) ```scala case class CompressionDemo(@Compress col1: Seq[Int], @Compress(compressor = classOf[GZIPCompressor]) col2: Seq[String]) ``` - Added interface **Compressor** and implemented **XZCompressor** and **GZIPCompressor** -- Added **SparkRepositoryAdapter[A, B]**. It will allow a **SparkRepository[A]** to write/read a data store of type - **B** by using an implicit **DatasetConverter[A, B]** +- Added **SparkRepositoryAdapter[A, B]**. It will allow a **SparkRepository[A]** to write/read a data store of type **B** by using an implicit **DatasetConverter[A, B]** - Added trait **Converter[A, B]** that handles the conversion between an object of type A and an object of type **B** - Added abstract class **DatasetConverter[A, B]** that extends a **Converter[Dataset[A], Dataset[B]]** - Added auto-correction for `SparkRepository.findby(conditions)` method when we filter by case class field name instead of column name @@ -77,8 +93,7 @@ case class CompressionDemo(@Compress col1: Seq[Int], - Added sequential mode in class `Stage`. Use can turn in on by setting `parallel` to *true*. - Added external data flow description in pipeline description - Added method `beforeAll` into `ConfigLoader` -- Added new method `addStage` and `addFactory` that take a class object as input. The instantiation will be handled - by the stage. +- Added new method `addStage` and `addFactory` that take a class object as input. The instantiation will be handled by the stage. - Removed implicit argument encoder from all methods of Repository trait - Added new get method to **Pipeline**: `get[A](cls: Class[_ <: Factory[_]): A`. @@ -97,8 +112,7 @@ case class CompressionDemo(@Compress col1: Seq[Int], ``` - Added an optional argument `suffix` in `FileConnector` and `SparkRepository` - Added method `partitionBy` in `FileConnector` and `SparkRepository` -- Added possibility to filter by name pattern when a FileConnector is trying to read a directory. - To do this, add `filenamePattern` into the configuration file +- Added possibility to filter by name pattern when a FileConnector is trying to read a directory. To do this, add `filenamePattern` into the configuration file - Added possibility to create a `Conf` object from Map. ```scala Conf(Map("a" -> "A")) @@ -122,15 +136,12 @@ case class CompressionDemo(@Compress col1: Seq[Int], - Added a second argument to CompoundKey to handle primary and sort keys ## 0.2.7 (2019-06-21) -- Added `Conf` into `SparkRepositoryBuilder` and changed all the set methods -of `SparkRepositoryBuilder` to use the conf object +- Added `Conf` into `SparkRepositoryBuilder` and changed all the set methods of `SparkRepositoryBuilder` to use the conf object - Changed package name `com.jcdecaux.setl.annotations` to `com.jcdecaux.setl.annotation` ## 0.2.6 (2019-06-18) -- Added annotation `ColumnName`, which could be used to replace the current column name -with an alias in the data storage. -- Added annotation `CompoundKey`. It could be used to define a compound key for databases -that only allow one partition key +- Added annotation `ColumnName`, which could be used to replace the current column name with an alias in the data storage. +- Added annotation `CompoundKey`. It could be used to define a compound key for databases that only allow one partition key - Added sheet name into arguments of ExcelConnector ## 0.2.5 (2019-06-12) @@ -155,8 +166,7 @@ that only allow one partition key ## 0.2.0 (2019-05-21) - Changed spark version to 2.4.3 -- Added `SparkRepositoryBuilder` that allows creation of a `SparkRepository` for a given class without creating a -dedicated `Repository` class +- Added `SparkRepositoryBuilder` that allows creation of a `SparkRepository` for a given class without creating a dedicated `Repository` class - Added Excel support for `SparkRepository` by creating `ExcelConnector` - Added `Logging` trait diff --git a/README.md b/README.md index 7423e740..872ba942 100644 --- a/README.md +++ b/README.md @@ -25,7 +25,7 @@ You can start working by cloning [this template project](https://github.com/qxzz com.jcdecaux.setl setl_2.11 - 0.4.0 + 0.4.1 ``` @@ -42,7 +42,7 @@ To use the SNAPSHOT version, add Sonatype snapshot repository to your `pom.xml` com.jcdecaux.setl setl_2.11 - 0.4.1-SNAPSHOT + 0.4.2-SNAPSHOT ```