Skip to content

Commit

Permalink
Release 0.4.0 (#111)
Browse files Browse the repository at this point in the history
  • Loading branch information
tovbinm committed Sep 23, 2018
1 parent f9a3718 commit 82df9dc
Show file tree
Hide file tree
Showing 28 changed files with 369 additions and 328 deletions.
62 changes: 61 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,65 @@
# Changelog

## 0.4.0

New features and bug fixes:

- Allow to specify the formula to compute the text features bin size for `RawFeatureFilter` (see `RawFeatureFilter.textBinsFormula` argument) [#99](https://github.com/salesforce/TransmogrifAI/pull/99)
- Fixed metadata on `Geolocation` and `GeolocationMap` so that keep the name of the column in descriptorValue. [#100](https://github.com/salesforce/TransmogrifAI/pull/100)
- Local scoring (aka Sparkless) using Aardpfark. This enables loading and scoring models without Spark context but locally using Aardpfark (PFA for Spark) and Hadrian libraries instead. This allows orders of magnitude faster scoring times compared to Spark. [#41](https://github.com/salesforce/TransmogrifAI/pull/41)
- Add distributions calculated in `RawFeatureFilter` to `ModelInsights` [#103](https://github.com/salesforce/TransmogrifAI/pull/103)
- Added binary sequence transformer & estimator: `BinarySequenceTransformer` and `BinarySequenceEstimator` + plus the associated base traits [#84](https://github.com/salesforce/TransmogrifAI/pull/84)
- Added `StringIndexerHandleInvalid.Keep` option into `OpStringIndexer` (same as in underlying Spark estimator) [#93](https://github.com/salesforce/TransmogrifAI/pull/93)
- Allow numbers and underscores in feature names [#92](https://github.com/salesforce/TransmogrifAI/pull/92)
- Stable key order for map vectorizers [#88](https://github.com/salesforce/TransmogrifAI/pull/88)
- Keep raw feature distributions calculated in raw feature filter [#76](https://github.com/salesforce/TransmogrifAI/pull/76)
- Transmogrify to use smart text vectorizer for text types: `Text`, `TextArea`, `TextMap` and `TextAreaMap` [#63](https://github.com/salesforce/TransmogrifAI/pull/63)
- Transmogrify circular date representations for date feature types: `Date`, `DateTime`, `DateMap` and `DateTimeMap` [#100](https://github.com/salesforce/TransmogrifAI/pull/100)
- Improved test coverage for utils and other modules [#50](https://github.com/salesforce/TransmogrifAI/pull/50), [#53](https://github.com/salesforce/TransmogrifAI/pull/53), [#67](https://github.com/salesforce/TransmogrifAI/pull/67), [#69](https://github.com/salesforce/TransmogrifAI/pull/69), [#70](https://github.com/salesforce/TransmogrifAI/pull/70), [#71](https://github.com/salesforce/TransmogrifAI/pull/71), [#72](https://github.com/salesforce/TransmogrifAI/pull/72), [#73](https://github.com/salesforce/TransmogrifAI/pull/73)
- Match feature type map hierarchy with regular feature types [#49](https://github.com/salesforce/TransmogrifAI/pull/49)
- Redundant and deadlock-prone end listener removal [#52](https://github.com/salesforce/TransmogrifAI/pull/52)
- OS-neutral filesystem path creation [#51](https://github.com/salesforce/TransmogrifAI/pull/51)
- Make Feature class public instead hide it's ctor [#45](https://github.com/salesforce/TransmogrifAI/pull/45)
- Specify categorical variables in metadata [#120](https://github.com/salesforce/TransmogrifAI/pull/120)
- Fix fill geo location vectorizer values [#132](https://github.com/salesforce/TransmogrifAI/pull/132)
- Adding feature importance for new model types [#128](https://github.com/salesforce/TransmogrifAI/pull/128)
- Adding binaryclassification bin score evaluator [#119](https://github.com/salesforce/TransmogrifAI/pull/119)
- Apply DateToUnitCircleTransformer logic in raw feature filter transformations [130#](https://github.com/salesforce/TransmogrifAI/pull/130)

Breaking changes:
- Made case class to deal with model selector metadata [#39](https://github.com/salesforce/TransmogrifAI/pull/39)
- Made `FileOutputCommiter` a default and got rid of `DirectMapreduceOutputCommitter` and `DirectOutputCommitter` [#86](https://github.com/salesforce/TransmogrifAI/pull/86)
- Refactored `OpVectorColumnMetadata` to allow numeric column descriptors [#89](https://github.com/salesforce/TransmogrifAI/pull/89)
- Renaming `JaccardDistance` to `JaccardSimilarity` [#80](https://github.com/salesforce/TransmogrifAI/pull/80)
- New model selector interface [#55](https://github.com/salesforce/TransmogrifAI/pull/55). The breaking changes are related to return type and the way the parameters are passed into model selectors. Starting this version model selectors would return a single result feature of type `Prediction` (instead of a variable number of feature - `(pred, raw, prob)`). Example:
```scala
val (pred, raw, prob) = MultiClassificationModelSelector() // won't compile anymore
val prediction = MultiClassificationModelSelector() // ok!
```
Another change is the way parameters are passed into model selectors. Example:
```scala
BinaryClassificationModelSelector
.withCrossValidation()
.setLogisticRegressionRegParam(0.05, 0.1) // won't compile anymore
```
Instead one should do:
```scala
val lr = new OpLogisticRegression()
val models = Seq(lr -> new ParamGridBuilder().addGrid(lr.regParam, Array(0.05, 0.1)).build())
BinaryClassificationModelSelector
.withCrossValidation(modelsAndParameters = models)
```
For more example on how to use new model selectors please refer to our documentation and helloworld examples.


Dependency upgrades & misc:
- CI/CD runtime improvements for CircleCI and TravisCI
- Updated Gradle to 4.10
- Updated `scala-graph` to `1.12.5`
- Updated `scalafmt` to `1.5.1`
- New `transmogrifai-local` subproject [#41](https://github.com/salesforce/TransmogrifAI/pull/41) introduces `aardpfark` and `hadrian` dependencies.


## 0.3.4
Performance improvements:
- Added featureLabelCorrOnly parameter in SanityChecker to only compute correlations between features and label (defaults to false)
Expand All @@ -17,7 +77,7 @@ New features and bug fixes:
- Pretty print model summaries
- Ensure OP Models are portable across environments
- Ignore _ in simple streaming avro file reader
- Updated evaluators so they can work with either Prediction type feature or three input featues
- Updated evaluators so they can work with either Prediction type feature or three input features
- Added Algebird kryo registrar
- Make Sure that SmartTextVectorizerModel can be serialized to/from json

Expand Down
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# TransmogrifAI

[ ![Download](https://api.bintray.com/packages/salesforce/maven/TransmogrifAI/images/download.svg?version=0.3.4) ](https://bintray.com/salesforce/maven/TransmogrifAI/0.3.4/link) [![Javadocs](https://www.javadoc.io/badge/com.salesforce.transmogrifai/transmogrifai-core_2.11.svg?color=blue)](https://www.javadoc.io/doc/com.salesforce.transmogrifai/transmogrifai-core_2.11) [![Spark version](https://img.shields.io/badge/spark-2.2-brightgreen.svg)](https://spark.apache.org/downloads.html) [![Scala version](https://img.shields.io/badge/scala-2.11-brightgreen.svg)](https://www.scala-lang.org/download/2.11.12.html) [![License](http://img.shields.io/:license-BSD--3-blue.svg)](./LICENSE) [![Chat](https://badges.gitter.im/salesforce/TransmogrifAI.svg)](https://gitter.im/salesforce/TransmogrifAI?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[![Download](https://api.bintray.com/packages/salesforce/maven/TransmogrifAI/images/download.svg)](https://bintray.com/salesforce/maven/TransmogrifAI/_latestVersion) [![Javadocs](https://www.javadoc.io/badge/com.salesforce.transmogrifai/transmogrifai-core_2.11.svg?color=blue)](https://www.javadoc.io/doc/com.salesforce.transmogrifai/transmogrifai-core_2.11) [![Spark version](https://img.shields.io/badge/spark-2.2-brightgreen.svg)](https://spark.apache.org/downloads.html) [![Scala version](https://img.shields.io/badge/scala-2.11-brightgreen.svg)](https://www.scala-lang.org/download/2.11.12.html) [![License](http://img.shields.io/:license-BSD--3-blue.svg)](./LICENSE) [![Chat](https://badges.gitter.im/salesforce/TransmogrifAI.svg)](https://gitter.im/salesforce/TransmogrifAI?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)


[![TravisCI Build Status](https://travis-ci.com/salesforce/TransmogrifAI.svg?token=Ex9czVEUD7AzPTmVh6iX&branch=master)](https://travis-ci.com/salesforce/TransmogrifAI) [![CircleCI Build Status](https://circleci.com/gh/salesforce/TransmogrifAI.svg?&style=shield&circle-token=e84c1037ae36652d38b49207728181ee85337e0b)](https://circleci.com/gh/salesforce/TransmogrifAI) [![Codecov](https://codecov.io/gh/salesforce/TransmogrifAI/branch/master/graph/badge.svg)](https://codecov.io/gh/salesforce/TransmogrifAI) [![CodeFactor](https://www.codefactor.io/repository/github/salesforce/transmogrifai/badge)](https://www.codefactor.io/repository/github/salesforce/transmogrifai)
Expand Down Expand Up @@ -124,29 +124,29 @@ You can simply add TransmogrifAI as a regular dependency to an existing project.
For Gradle in `build.gradle` add:
```gradle
repositories {
jcenter()
mavenCentral()
maven { url 'https://dl.bintray.com/salesforce/maven' }
}
dependencies {
// TransmogrifAI core dependency
compile 'com.salesforce.transmogrifai:transmogrifai-core_2.11:0.3.4'
compile 'com.salesforce.transmogrifai:transmogrifai-core_2.11:0.4.0'
// TransmogrifAI pretrained models, e.g. OpenNLP POS/NER models etc. (optional)
// compile 'com.salesforce.transmogrifai:transmogrifai-models_2.11:0.3.4'
// compile 'com.salesforce.transmogrifai:transmogrifai-models_2.11:0.4.0'
}
```

For SBT in `build.sbt` add:
```sbt
scalaVersion := "2.11.12"

resolvers += Resolver.bintrayRepo("salesforce", "maven")
resolvers += Resolver.jcenterRepo

// TransmogrifAI core dependency
libraryDependencies ++= "com.salesforce.transmogrifai" %% "transmogrifai-core" % "0.3.4"
libraryDependencies ++= "com.salesforce.transmogrifai" %% "transmogrifai-core" % "0.4.0"

// TransmogrifAI pretrained models, e.g. OpenNLP POS/NER models etc. (optional)
// libraryDependencies ++= "com.salesforce.transmogrifai" %% "transmogrifai-models" % "0.3.4"
// libraryDependencies ++= "com.salesforce.transmogrifai" %% "transmogrifai-models" % "0.4.0"
```

Then import TransmogrifAI into your code:
Expand Down
3 changes: 3 additions & 0 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,9 @@ configure(subProjs) {

jar.baseName = "$rootProject.name-$project.name"

// ignore link warning in scaladoc
scaladoc.scalaDocOptions.additionalParameters = ['-no-link-warnings']

task scaladocJar(type: Jar, dependsOn: scaladoc) {
classifier = 'javadoc'
from scaladoc.destinationDir
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,8 +91,7 @@ trait RichDateFeature {
* @param dateListPivot name of the pivot type from [[DateListPivot]] enum
* @param referenceDate reference date to compare against when [[DateListPivot]] is [[SinceFirst]] or [[SinceLast]]
* @param trackNulls option to keep track of values that were missing
* @param circularDateReps list of all the circular date representations that should be included
* feature vector
* @param circularDateReps list of all the circular date representations that should be included in feature vector
* @return result feature of type Vector
*/
def vectorize
Expand Down Expand Up @@ -164,8 +163,7 @@ trait RichDateFeature {
* @param dateListPivot name of the pivot type from [[DateListPivot]] enum
* @param referenceDate reference date to compare against when [[DateListPivot]] is [[SinceFirst]] or [[SinceLast]]
* @param trackNulls option to keep track of values that were missing
* @param circularDateReps list of all the circular date representations that should be included
* feature vector
* @param circularDateReps list of all the circular date representations that should be included in feature vector
* @return result feature of type Vector
*/
def vectorize
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -705,8 +705,7 @@ trait RichMapFeature {
* @param blackListKeys keys to blacklist
* @param trackNulls option to keep track of values that were missing
* @param referenceDate reference date to subtract off before converting to vector
* @param circularDateReps list of all the circular date representations that should be included
* feature vector
* @param circularDateReps list of all the circular date representations that should be included in feature vector
* @return result feature of type Vector
* @param others other features of the same type
* @return an OPVector feature
Expand Down Expand Up @@ -785,8 +784,7 @@ trait RichMapFeature {
* @param blackListKeys keys to blacklist
* @param trackNulls option to keep track of values that were missing
* @param referenceDate reference date to subtract off before converting to vector
* @param circularDateReps list of all the circular date representations that should be included
* feature vector
* @param circularDateReps list of all the circular date representations that should be included in feature vector
* @param others other features of the same type
* @return an OPVector feature
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ E <: Estimator[_] with OpPipelineStage2[RealNN, OPVector, Prediction]]
}

@transient private[op] var bestEstimator: Option[BestEstimator[E]] = None
@transient private val modelsUse = models.map{case (e, p) =>
@transient private lazy val modelsUse = models.map{case (e, p) =>
val est = e.setOutputFeatureName(getOutputFeatureName)
val par = if (p.isEmpty) Array(new ParamMap) else p
est -> par
Expand Down
4 changes: 2 additions & 2 deletions docs/examples/Bootstrap-Your-First-Project.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ Clone the TransmogrifAI repo:
```bash
git clone https://github.com/salesforce/TransmogrifAI.git
```
Checkout the latest release branch (in this example `0.3.4`):
Checkout the latest release branch (in this example `0.4.0`):
```bash
cd ./TransmogrifAI
git checkout 0.3.4
git checkout 0.4.0
```
Build the TransmogrifAI CLI by running:
```bash
Expand Down
2 changes: 1 addition & 1 deletion gradle.properties
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
version=0.3.5-SNAPSHOT
version=0.4.0
group=com.salesforce.transmogrifai
org.gradle.caching=true
Binary file modified gradle/wrapper/gradle-wrapper.jar
Binary file not shown.
6 changes: 3 additions & 3 deletions helloworld/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,15 +59,15 @@ First, build project with `./gradlew shadowJar`.
#### Train
```shell
$SPARK_HOME/bin/spark-submit --class com.salesforce.hw.titanic.OpTitanic \
build/libs/op-helloworld-0.0.1-all.jar \
build/libs/transmogrifai-helloworld-0.0.1-all.jar \
--run-type train \
--model-location /tmp/titanic-model \
--read-location Passenger=`pwd`/src/main/resources/TitanicDataset/TitanicPassengersTrainData.csv
```
#### Score
```shell
$SPARK_HOME/bin/spark-submit --class com.salesforce.hw.titanic.OpTitanic \
build/libs/op-helloworld-0.0.1-all.jar \
build/libs/transmogrifai-helloworld-0.0.1-all.jar \
--run-type score \
--model-location /tmp/titanic-model \
--read-location Passenger=`pwd`/src/main/resources/TitanicDataset/TitanicPassengersTrainData.csv \
Expand All @@ -76,7 +76,7 @@ $SPARK_HOME/bin/spark-submit --class com.salesforce.hw.titanic.OpTitanic \
#### Evaluate
```shell
$SPARK_HOME/bin/spark-submit --class com.salesforce.hw.titanic.OpTitanic \
build/libs/op-helloworld-0.0.1-all.jar \
build/libs/transmogrifai-helloworld-0.0.1-all.jar \
--run-type evaluate \
--model-location /tmp/titanic-model \
--read-location Passenger=`pwd`/src/main/resources/TitanicDataset/TitanicPassengersTrainData.csv \
Expand Down
4 changes: 2 additions & 2 deletions helloworld/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ plugins {
id 'com.commercehub.gradle.plugin.avro' version '0.8.0'
}
repositories {
jcenter()
mavenCentral()
maven { url 'https://dl.bintray.com/salesforce/maven' }
}

apply plugin: 'application'
Expand All @@ -38,7 +38,7 @@ ext {
junitVersion = '4.11'
sparkVersion = '2.2.1'
scalatestVersion = '3.0.0'
transmogrifaiVersion ='0.3.4'
transmogrifaiVersion ='0.4.0'
collectionsVersion = '3.2.2'
mainClassName = "com.salesforce.dummy.DummyMain"
}
Expand Down
33 changes: 16 additions & 17 deletions helloworld/gradle/scalastyle-config.xml
Original file line number Diff line number Diff line change
Expand Up @@ -55,28 +55,27 @@ This file is divided into 3 sections:
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* 1. Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions of source code must retain the above copyright notice, this
* list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright notice,
* this list of conditions and the following disclaimer in the documentation
* and/or other materials provided with the distribution.
* * Redistributions in binary form must reproduce the above copyright notice,
* this list of conditions and the following disclaimer in the documentation
* and/or other materials provided with the distribution.
*
* 3. Neither the name of Salesforce.com nor the names of its contributors may
* be used to endorse or promote products derived from this software without
* specific prior written permission.
* * Neither the name of the copyright holder nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
* SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/]]></parameter>
</parameters>
</check>
Expand Down
2 changes: 1 addition & 1 deletion helloworld/settings.gradle
Original file line number Diff line number Diff line change
@@ -1 +1 @@
rootProject.name='op-helloworld'
rootProject.name='transmogrifai-helloworld'
Loading

0 comments on commit 82df9dc

Please sign in to comment.