From 036d1fc19d8ca418189234f5498901655b1cf95a Mon Sep 17 00:00:00 2001 From: Nico de Vos Date: Thu, 11 Jun 2020 15:34:59 -0700 Subject: [PATCH] 0.7.0 release (#481) * Revert "Revert back to Spark 2.3 (#399)" This reverts commit 95a77b17269a71bf0d53c54df7d76f0bfe862275. * Update to Spark 2.4.3 and XGBoost 0.90 * special double serializer fix * fix serialization * fix serialization * docs * fixed missng value for test * meta fix * Updated DecisionTreeNumericMapBucketizer test to deal with the change made to decision tree pruning in Spark 2.4. If nodes are split, but both child nodes lead to the same prediction then the split is pruned away. This updates the test so this doesn't happen for feature 'b' * fix params meta test * FIxed failing xgboost test * ident * cleanup * added dataframe reader and writer extensions * added const * cherrypick fixes * added xgboost params + update models to use public predict method * blarg * double ser test * update mleap and spark testing base * Update README.md * type fix * bump minor version * Update Spark version in the README * bump version * Update build.gradle * Update pom.xml * set correct json4s version * upgrade helloworld deps * upgrade notebook deps on TMog and Spark * bump to version 0.7.0 for Spark update * align helloworld dependencies * align helloworld dependencies * get -> getOrElse with exception * fix helloworld compilation * style * WIP release notes * TMog version bump * update release notes * update release notes * updates to changelog * updates to changelog * updates to changelog * updates to changelog * updates to changelog * updates to changelog * fix changelog * fix changelog * keep helloworld on 0.6.1 until release Co-authored-by: Matthew Tovbin Co-authored-by: Matthew Tovbin Co-authored-by: Christopher Suchanek Co-authored-by: Kevin Moore Co-authored-by: Matthew Tovbin --- CHANGELOG.md | 43 +++++++++++++++++++++- gradle.properties | 2 +- helloworld/notebooks/OpHousingPrices.ipynb | 2 +- helloworld/notebooks/OpIris.ipynb | 2 +- helloworld/notebooks/OpTitanicSimple.ipynb | 2 +- 5 files changed, 46 insertions(+), 5 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 639998430e..ad849369c4 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,46 @@ # Changelog +## 0.7.0 + +Bug fixes: +- Fix flaky `ModelInsight` tests [#407](https://github.com/salesforce/TransmogrifAI/pull/407) +- Remove logging of tokens of text fields [#420](https://github.com/salesforce/TransmogrifAI/pull/420), [#438](https://github.com/salesforce/TransmogrifAI/pull/438), [#447](https://github.com/salesforce/TransmogrifAI/pull/447), [#474](https://github.com/salesforce/TransmogrifAI/pull/474) +- Add validation prepare call before model selection when no DAG is passed [#424](https://github.com/salesforce/TransmogrifAI/pull/424), [#429](https://github.com/salesforce/TransmogrifAI/pull/429) +- Fix `Days.daysBetween` int overflow [#471](https://github.com/salesforce/TransmogrifAI/pull/471) + +New features / updates: +- Downsample the number of training samples to `maxTrainingSample` for regression [#413](https://github.com/salesforce/TransmogrifAI/pull/413) and multi-class classification [#414](https://github.com/salesforce/TransmogrifAI/pull/414) +- Refactor `InsightLOCOTest` [#412](https://github.com/salesforce/TransmogrifAI/pull/412) +- Enable more loss types for `OpLinearRegression` [#421](https://github.com/salesforce/TransmogrifAI/pull/421) +- Add property-based tests for regression model selection [#427](https://github.com/salesforce/TransmogrifAI/pull/427) +- Add option to calculate LOCO for dates/texts by leaving out their entire vector [#418](https://github.com/salesforce/TransmogrifAI/pull/418) +- Add Chinese and Korean examples to `TextTokenizerTest` [#442](https://github.com/salesforce/TransmogrifAI/pull/442) +- Add support for ignoring text that looks like IDs in `SmartTextVectorizer` [#448](https://github.com/salesforce/TransmogrifAI/pull/448), [#455](https://github.com/salesforce/TransmogrifAI/pull/455) +- Add a unary estimator for detecting names in text fields and transforming to likely gender [#445](https://github.com/salesforce/TransmogrifAI/pull/445) +- Allow result features to be removed by raw feature filter [#458](https://github.com/salesforce/TransmogrifAI/pull/458) +- Metadata changes for sensitive feature information [#457](https://github.com/salesforce/TransmogrifAI/pull/457) +- Add `MinVarianceFilter` which checks that computed features have a minimum variance [#463](https://github.com/salesforce/TransmogrifAI/pull/463), [#465](https://github.com/salesforce/TransmogrifAI/pull/465) +- Allow `TextStats` length distribution to be token-based and refactor for testability [#464](https://github.com/salesforce/TransmogrifAI/pull/464) +- Use Spark job grouping to distinguish steps of the machine learning flow [#467](https://github.com/salesforce/TransmogrifAI/pull/467), [#468](https://github.com/salesforce/TransmogrifAI/pull/468), [#470](https://github.com/salesforce/TransmogrifAI/pull/470) +- Add categorical detection to be coverage based in addition to unique count based [#473](https://github.com/salesforce/TransmogrifAI/pull/473) +- Remove duplicate features using sanity checker feature to feature correlations [#476](https://github.com/salesforce/TransmogrifAI/pull/476), [#479](https://github.com/salesforce/TransmogrifAI/pull/479) +- Lift the upper bound on number of hash features [#477](https://github.com/salesforce/TransmogrifAI/pull/477) +- Enable Html stripping on text-like features [#478](https://github.com/salesforce/TransmogrifAI/pull/478) + +Dependency updates ([#402](https://github.com/salesforce/TransmogrifAI/pull/402), [#466](https://github.com/salesforce/TransmogrifAI/pull/466)): +- Update Apache Spark version to 2.4.5 +- Avro is a built-in data source in Spark 2.4, so no longer using the spark-avro package +- Avro to 1.8.2 +- XGBoost to 0.90 +- MLeap to 0.14.0 +- json4s to 3.5.3 +- JUnit to 4.12 +- chill to 0.9.3 +- gradle-avro-plugin to 0.16.0 + +Miscellaneous: +- Add ROADMAP.md [#394](https://github.com/salesforce/TransmogrifAI/pull/394) + ## 0.6.1 Bug fixes: @@ -19,7 +60,7 @@ New features / updates: - Use compact and compressed model json by default [#375](https://github.com/salesforce/TransmogrifAI/pull/375) - Descale feature contribution for Linear Regression & Logistic Regression [#345](https://github.com/salesforce/TransmogrifAI/pull/345) -Dependency updates: +Dependency updates: - Update tika version [#382](https://github.com/salesforce/TransmogrifAI/pull/382) ## 0.6.0 diff --git a/gradle.properties b/gradle.properties index f8fb7ad658..089cc7de03 100644 --- a/gradle.properties +++ b/gradle.properties @@ -1,3 +1,3 @@ -version=0.7.0-SNAPSHOT +version=0.7.0 group=com.salesforce.transmogrifai org.gradle.caching=true diff --git a/helloworld/notebooks/OpHousingPrices.ipynb b/helloworld/notebooks/OpHousingPrices.ipynb index b518ae06c4..69bdb25dbc 100644 --- a/helloworld/notebooks/OpHousingPrices.ipynb +++ b/helloworld/notebooks/OpHousingPrices.ipynb @@ -16,7 +16,7 @@ "metadata": {}, "outputs": [], "source": [ - "%classpath add mvn com.salesforce.transmogrifai transmogrifai-core_2.11 0.7.0" + "%classpath add mvn com.salesforce.transmogrifai transmogrifai-core_2.11 0.6.1" ] }, { diff --git a/helloworld/notebooks/OpIris.ipynb b/helloworld/notebooks/OpIris.ipynb index c68ebe406f..0816c17925 100644 --- a/helloworld/notebooks/OpIris.ipynb +++ b/helloworld/notebooks/OpIris.ipynb @@ -17,7 +17,7 @@ "metadata": {}, "outputs": [], "source": [ - "%classpath add mvn com.salesforce.transmogrifai transmogrifai-core_2.11 0.7.0" + "%classpath add mvn com.salesforce.transmogrifai transmogrifai-core_2.11 0.6.1" ] }, { diff --git a/helloworld/notebooks/OpTitanicSimple.ipynb b/helloworld/notebooks/OpTitanicSimple.ipynb index 392886e6fb..82a8f7ac79 100644 --- a/helloworld/notebooks/OpTitanicSimple.ipynb +++ b/helloworld/notebooks/OpTitanicSimple.ipynb @@ -22,7 +22,7 @@ "metadata": {}, "outputs": [], "source": [ - "%classpath add mvn com.salesforce.transmogrifai transmogrifai-core_2.11 0.7.0" + "%classpath add mvn com.salesforce.transmogrifai transmogrifai-core_2.11 0.6.1" ] }, {