v0.4.1

Signed-off-by: Xuzhou Qin <xuzhou.qin@jcdecaux.com>
SETL-Framework · Feb 13, 2020 · 37f6fc5 · 37f6fc5
1 parent 40325ac
commit 37f6fc5
Show file tree

Hide file tree

Showing 2 changed files with 44 additions and 34 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,9 +1,34 @@
-## 0.4.1 (2020-01-15)
+## 0.4.1 (2020-02-13)
+Changes:
+- Changed benchmark unit of time to *seconds* (#88)
+
+Fixes:
+- The master URL of SparkSession can now be overwritten in local environment (#74)
+- `FileConnector` now lists path correctly for nested directories (#97)
+
 New features:
 - Added [Mermaid](https://mermaidjs.github.io/#/) diagram generation to **Pipeline** (#51)
-- Added `showDiagram()` method to **Pipeline** that prints the Mermaid code and generates the 
- live editor URL 🎩🐰✨ (#52)
+- Added `showDiagram()` method to **Pipeline** that prints the Mermaid code and generates the live editor URL 🎩🐰✨ (#52)
 - Added **Codecov** report and **Scala API doc** 
+- Added `delete` method in `JDBCConnector` (#82)
+- Added `drop` method in `DBConnector` (#83)
+- Added support for both of the following two Spark configuration styles in SETL builder (#86)
+ ```hocon
+ setl.config {
+ spark {
+ spark.app.name = "my_app"
+ spark.sql.shuffle.partitions = "1000"
+ }
+ }
+ 
+ setl.config_2 {
+ spark.app.name = "my_app"
+ spark.sql.shuffle.partitions = "1000"
+ }
+ ```
+
+Others:
+- Improved test coverage
 
 ## 0.4.0 (2020-01-09)
 Changes: 
@@ -26,46 +51,37 @@ Others:
 - Optimized **PipelineInspector** (#33)
 
 ## 0.3.5 (2019-12-16)
-- BREAKING CHANGE: replace the Spark compatible version by the Scala compatible version in the artifact ID. 
-The old artifact id **dc-spark-sdk_2.4** was changed to **dc-spark-sdk_2.11** (or **dc-spark-sdk_2.12**)
+- BREAKING CHANGE: replace the Spark compatible version by the Scala compatible version in the artifact ID. The old artifact id **dc-spark-sdk_2.4** was changed to **dc-spark-sdk_2.11** (or **dc-spark-sdk_2.12**)
 - Upgraded dependencies
 - Added Scala 2.12 support
 - Removed **SparkSession** from Connector and SparkRepository constructor (old constructors are kept but now deprecated)
 - Added **Column** type support in FindBy method of **SparkRepository** and **Condition**
-- Added method **setConnector** and **setRepository** in **Setl** that accept 
-object of type Connector/SparkRepository
+- Added method **setConnector** and **setRepository** in **Setl** that accept object of type Connector/SparkRepository
 
 ## 0.3.4 (2019-12-06)
 - Added read cache into spark repository to avoid consecutive disk IO.
-- Added option **autoLoad** in the Delivery annotation so that *DeliverableDispatcher* can still handle the dependency
-injection in the case where the delivery is missing but a corresponding
-repository is present.
+- Added option **autoLoad** in the Delivery annotation so that *DeliverableDispatcher* can still handle the dependency injection in the case where the delivery is missing but a corresponding repository is present.
 - Added option **condition** in the Delivery annotation to pre-filter loaded data when **autoLoad** is set to true.
-- Added option **id** in the Delivery annotation. DeliveryDispatcher will match deliveries by the id in addition to 
-the payload type. By default the id is an empty string ("").
-- Added **setConnector** method in DCContext. Each connector should be delivered with an ID. By default the ID will be its
-config path.
+- Added option **id** in the Delivery annotation. DeliveryDispatcher will match deliveries by the id in addition to the payload type. By default the id is an empty string ("").
+- Added **setConnector** method in DCContext. Each connector should be delivered with an ID. By default the ID will be itsconfig path.
 - Added support of wildcard path for SparkRepository and Connector
 - Added JDBCConnector
 
 ## 0.3.3 (2019-10-22)
 - Added **SnappyCompressor**.
-- Added method **persist(persistence: Boolean)** into **Stage** and **Factory** to.
-activate/deactivate output persistence. By default the output persistence is set to *true*.
+- Added method **persist(persistence: Boolean)** into **Stage** and **Factory** to activate/deactivate output persistence. By default the output persistence is set to *true*.
 - Added implicit method `filter(cond: Set[Condition])` for Dataset and DataFrame.
 - Added `setUserDefinedSuffixKey` and `getUserDefinedSuffixKey` to **SparkRepository**.
 
 ## 0.3.2 (2019-10-14)
-- Added **@Compress** annotation. **SparkRepository** will compress all columns having this annotation by
-using a **Compressor** (the default compressor is **XZCompressor**)
+- Added **@Compress** annotation. **SparkRepository** will compress all columns having this annotation by using a **Compressor** (the default compressor is **XZCompressor**)
 ```scala
 case class CompressionDemo(@Compress col1: Seq[Int],
  @Compress(compressor = classOf[GZIPCompressor]) col2: Seq[String])
 ```
 
 - Added interface **Compressor** and implemented **XZCompressor** and **GZIPCompressor**
-- Added **SparkRepositoryAdapter[A, B]**. It will allow a **SparkRepository[A]** to write/read a data store of type
- **B** by using an implicit **DatasetConverter[A, B]**
+- Added **SparkRepositoryAdapter[A, B]**. It will allow a **SparkRepository[A]** to write/read a data store of type **B** by using an implicit **DatasetConverter[A, B]**
 - Added trait **Converter[A, B]** that handles the conversion between an object of type A and an object of type **B**
 - Added abstract class **DatasetConverter[A, B]** that extends a **Converter[Dataset[A], Dataset[B]]**
 - Added auto-correction for `SparkRepository.findby(conditions)` method when we filter by case class field name instead of column name
@@ -77,8 +93,7 @@ case class CompressionDemo(@Compress col1: Seq[Int],
 - Added sequential mode in class `Stage`. Use can turn in on by setting `parallel` to *true*.
 - Added external data flow description in pipeline description
 - Added method `beforeAll` into `ConfigLoader`
-- Added new method `addStage` and `addFactory` that take a class object as input. The instantiation will be handled 
- by the stage.
+- Added new method `addStage` and `addFactory` that take a class object as input. The instantiation will be handled by the stage.
 - Removed implicit argument encoder from all methods of Repository trait
 - Added new get method to **Pipeline**: `get[A](cls: Class[_ <: Factory[_]): A`. 
 
@@ -97,8 +112,7 @@ case class CompressionDemo(@Compress col1: Seq[Int],
  ```
 - Added an optional argument `suffix` in `FileConnector` and `SparkRepository`
 - Added method `partitionBy` in `FileConnector` and `SparkRepository`
-- Added possibility to filter by name pattern when a FileConnector is trying to read a directory. 
- To do this, add `filenamePattern` into the configuration file
+- Added possibility to filter by name pattern when a FileConnector is trying to read a directory. To do this, add `filenamePattern` into the configuration file
 - Added possibility to create a `Conf` object from Map. 
  ```scala
  Conf(Map("a" -> "A"))
@@ -122,15 +136,12 @@ case class CompressionDemo(@Compress col1: Seq[Int],
 - Added a second argument to CompoundKey to handle primary and sort keys
 
 ## 0.2.7 (2019-06-21)
-- Added `Conf` into `SparkRepositoryBuilder` and changed all the set methods 
-of `SparkRepositoryBuilder` to use the conf object
+- Added `Conf` into `SparkRepositoryBuilder` and changed all the set methods of `SparkRepositoryBuilder` to use the conf object
 - Changed package name `com.jcdecaux.setl.annotations` to `com.jcdecaux.setl.annotation`
 
 ## 0.2.6 (2019-06-18)
-- Added annotation `ColumnName`, which could be used to replace the current column name 
-with an alias in the data storage.
-- Added annotation `CompoundKey`. It could be used to define a compound key for databases 
-that only allow one partition key
+- Added annotation `ColumnName`, which could be used to replace the current column name with an alias in the data storage.
+- Added annotation `CompoundKey`. It could be used to define a compound key for databases that only allow one partition key
 - Added sheet name into arguments of ExcelConnector
 
 ## 0.2.5 (2019-06-12)
@@ -155,8 +166,7 @@ that only allow one partition key
 
 ## 0.2.0 (2019-05-21)
 - Changed spark version to 2.4.3
-- Added `SparkRepositoryBuilder` that allows creation of a `SparkRepository` for a given class without creating a 
-dedicated `Repository` class
+- Added `SparkRepositoryBuilder` that allows creation of a `SparkRepository` for a given class without creating a dedicated `Repository` class
 - Added Excel support for `SparkRepository` by creating `ExcelConnector`
 - Added `Logging` trait
 

diff --git a/README.md b/README.md
@@ -25,7 +25,7 @@ You can start working by cloning [this template project](https://github.com/qxzz
 <dependency>
  <groupId>com.jcdecaux.setl</groupId>
  <artifactId>setl_2.11</artifactId>
- <version>0.4.0</version>
+ <version>0.4.1</version>
 </dependency>
 ```
 
@@ -42,7 +42,7 @@ To use the SNAPSHOT version, add Sonatype snapshot repository to your `pom.xml`
  <dependency>
  <groupId>com.jcdecaux.setl</groupId>
  <artifactId>setl_2.11</artifactId>
- <version>0.4.1-SNAPSHOT</version>
+ <version>0.4.2-SNAPSHOT</version>
  </dependency>
 </dependencies>
 ```