diff --git a/CHANGELOG.md b/CHANGELOG.md index 7968c163b2..0e7f17bea1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,65 @@ # Changelog +## 0.4.0 + +New features and bug fixes: + +- Allow to specify the formula to compute the text features bin size for `RawFeatureFilter` (see `RawFeatureFilter.textBinsFormula` argument) [#99](https://github.com/salesforce/TransmogrifAI/pull/99) +- Fixed metadata on `Geolocation` and `GeolocationMap` so that keep the name of the column in descriptorValue. [#100](https://github.com/salesforce/TransmogrifAI/pull/100) +- Local scoring (aka Sparkless) using Aardpfark. This enables loading and scoring models without Spark context but locally using Aardpfark (PFA for Spark) and Hadrian libraries instead. This allows orders of magnitude faster scoring times compared to Spark. [#41](https://github.com/salesforce/TransmogrifAI/pull/41) +- Add distributions calculated in `RawFeatureFilter` to `ModelInsights` [#103](https://github.com/salesforce/TransmogrifAI/pull/103) +- Added binary sequence transformer & estimator: `BinarySequenceTransformer` and `BinarySequenceEstimator` + plus the associated base traits [#84](https://github.com/salesforce/TransmogrifAI/pull/84) +- Added `StringIndexerHandleInvalid.Keep` option into `OpStringIndexer` (same as in underlying Spark estimator) [#93](https://github.com/salesforce/TransmogrifAI/pull/93) +- Allow numbers and underscores in feature names [#92](https://github.com/salesforce/TransmogrifAI/pull/92) +- Stable key order for map vectorizers [#88](https://github.com/salesforce/TransmogrifAI/pull/88) +- Keep raw feature distributions calculated in raw feature filter [#76](https://github.com/salesforce/TransmogrifAI/pull/76) +- Transmogrify to use smart text vectorizer for text types: `Text`, `TextArea`, `TextMap` and `TextAreaMap` [#63](https://github.com/salesforce/TransmogrifAI/pull/63) +- Transmogrify circular date representations for date feature types: `Date`, `DateTime`, `DateMap` and `DateTimeMap` [#100](https://github.com/salesforce/TransmogrifAI/pull/100) +- Improved test coverage for utils and other modules [#50](https://github.com/salesforce/TransmogrifAI/pull/50), [#53](https://github.com/salesforce/TransmogrifAI/pull/53), [#67](https://github.com/salesforce/TransmogrifAI/pull/67), [#69](https://github.com/salesforce/TransmogrifAI/pull/69), [#70](https://github.com/salesforce/TransmogrifAI/pull/70), [#71](https://github.com/salesforce/TransmogrifAI/pull/71), [#72](https://github.com/salesforce/TransmogrifAI/pull/72), [#73](https://github.com/salesforce/TransmogrifAI/pull/73) +- Match feature type map hierarchy with regular feature types [#49](https://github.com/salesforce/TransmogrifAI/pull/49) +- Redundant and deadlock-prone end listener removal [#52](https://github.com/salesforce/TransmogrifAI/pull/52) +- OS-neutral filesystem path creation [#51](https://github.com/salesforce/TransmogrifAI/pull/51) +- Make Feature class public instead hide it's ctor [#45](https://github.com/salesforce/TransmogrifAI/pull/45) +- Specify categorical variables in metadata [#120](https://github.com/salesforce/TransmogrifAI/pull/120) +- Fix fill geo location vectorizer values [#132](https://github.com/salesforce/TransmogrifAI/pull/132) +- Adding feature importance for new model types [#128](https://github.com/salesforce/TransmogrifAI/pull/128) +- Adding binaryclassification bin score evaluator [#119](https://github.com/salesforce/TransmogrifAI/pull/119) +- Apply DateToUnitCircleTransformer logic in raw feature filter transformations [130#](https://github.com/salesforce/TransmogrifAI/pull/130) + +Breaking changes: +- Made case class to deal with model selector metadata [#39](https://github.com/salesforce/TransmogrifAI/pull/39) +- Made `FileOutputCommiter` a default and got rid of `DirectMapreduceOutputCommitter` and `DirectOutputCommitter` [#86](https://github.com/salesforce/TransmogrifAI/pull/86) +- Refactored `OpVectorColumnMetadata` to allow numeric column descriptors [#89](https://github.com/salesforce/TransmogrifAI/pull/89) +- Renaming `JaccardDistance` to `JaccardSimilarity` [#80](https://github.com/salesforce/TransmogrifAI/pull/80) +- New model selector interface [#55](https://github.com/salesforce/TransmogrifAI/pull/55). The breaking changes are related to return type and the way the parameters are passed into model selectors. Starting this version model selectors would return a single result feature of type `Prediction` (instead of a variable number of feature - `(pred, raw, prob)`). Example: +```scala +val (pred, raw, prob) = MultiClassificationModelSelector() // won't compile anymore +val prediction = MultiClassificationModelSelector() // ok! +``` +Another change is the way parameters are passed into model selectors. Example: +```scala +BinaryClassificationModelSelector + .withCrossValidation() + .setLogisticRegressionRegParam(0.05, 0.1) // won't compile anymore +``` +Instead one should do: +```scala +val lr = new OpLogisticRegression() +val models = Seq(lr -> new ParamGridBuilder().addGrid(lr.regParam, Array(0.05, 0.1)).build()) +BinaryClassificationModelSelector + .withCrossValidation(modelsAndParameters = models) +``` +For more example on how to use new model selectors please refer to our documentation and helloworld examples. + + +Dependency upgrades & misc: +- CI/CD runtime improvements for CircleCI and TravisCI +- Updated Gradle to 4.10 +- Updated `scala-graph` to `1.12.5` +- Updated `scalafmt` to `1.5.1` +- New `transmogrifai-local` subproject [#41](https://github.com/salesforce/TransmogrifAI/pull/41) introduces `aardpfark` and `hadrian` dependencies. + + ## 0.3.4 Performance improvements: - Added featureLabelCorrOnly parameter in SanityChecker to only compute correlations between features and label (defaults to false) @@ -17,7 +77,7 @@ New features and bug fixes: - Pretty print model summaries - Ensure OP Models are portable across environments - Ignore _ in simple streaming avro file reader -- Updated evaluators so they can work with either Prediction type feature or three input featues +- Updated evaluators so they can work with either Prediction type feature or three input features - Added Algebird kryo registrar - Make Sure that SmartTextVectorizerModel can be serialized to/from json diff --git a/README.md b/README.md index 2e107b375e..3bc5596e57 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # TransmogrifAI - [ ![Download](https://api.bintray.com/packages/salesforce/maven/TransmogrifAI/images/download.svg?version=0.3.4) ](https://bintray.com/salesforce/maven/TransmogrifAI/0.3.4/link) [![Javadocs](https://www.javadoc.io/badge/com.salesforce.transmogrifai/transmogrifai-core_2.11.svg?color=blue)](https://www.javadoc.io/doc/com.salesforce.transmogrifai/transmogrifai-core_2.11) [![Spark version](https://img.shields.io/badge/spark-2.2-brightgreen.svg)](https://spark.apache.org/downloads.html) [![Scala version](https://img.shields.io/badge/scala-2.11-brightgreen.svg)](https://www.scala-lang.org/download/2.11.12.html) [![License](http://img.shields.io/:license-BSD--3-blue.svg)](./LICENSE) [![Chat](https://badges.gitter.im/salesforce/TransmogrifAI.svg)](https://gitter.im/salesforce/TransmogrifAI?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) + [![Download](https://api.bintray.com/packages/salesforce/maven/TransmogrifAI/images/download.svg)](https://bintray.com/salesforce/maven/TransmogrifAI/_latestVersion) [![Javadocs](https://www.javadoc.io/badge/com.salesforce.transmogrifai/transmogrifai-core_2.11.svg?color=blue)](https://www.javadoc.io/doc/com.salesforce.transmogrifai/transmogrifai-core_2.11) [![Spark version](https://img.shields.io/badge/spark-2.2-brightgreen.svg)](https://spark.apache.org/downloads.html) [![Scala version](https://img.shields.io/badge/scala-2.11-brightgreen.svg)](https://www.scala-lang.org/download/2.11.12.html) [![License](http://img.shields.io/:license-BSD--3-blue.svg)](./LICENSE) [![Chat](https://badges.gitter.im/salesforce/TransmogrifAI.svg)](https://gitter.im/salesforce/TransmogrifAI?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) [![TravisCI Build Status](https://travis-ci.com/salesforce/TransmogrifAI.svg?token=Ex9czVEUD7AzPTmVh6iX&branch=master)](https://travis-ci.com/salesforce/TransmogrifAI) [![CircleCI Build Status](https://circleci.com/gh/salesforce/TransmogrifAI.svg?&style=shield&circle-token=e84c1037ae36652d38b49207728181ee85337e0b)](https://circleci.com/gh/salesforce/TransmogrifAI) [![Codecov](https://codecov.io/gh/salesforce/TransmogrifAI/branch/master/graph/badge.svg)](https://codecov.io/gh/salesforce/TransmogrifAI) [![CodeFactor](https://www.codefactor.io/repository/github/salesforce/transmogrifai/badge)](https://www.codefactor.io/repository/github/salesforce/transmogrifai) @@ -124,15 +124,15 @@ You can simply add TransmogrifAI as a regular dependency to an existing project. For Gradle in `build.gradle` add: ```gradle repositories { + jcenter() mavenCentral() - maven { url 'https://dl.bintray.com/salesforce/maven' } } dependencies { // TransmogrifAI core dependency - compile 'com.salesforce.transmogrifai:transmogrifai-core_2.11:0.3.4' + compile 'com.salesforce.transmogrifai:transmogrifai-core_2.11:0.4.0' // TransmogrifAI pretrained models, e.g. OpenNLP POS/NER models etc. (optional) - // compile 'com.salesforce.transmogrifai:transmogrifai-models_2.11:0.3.4' + // compile 'com.salesforce.transmogrifai:transmogrifai-models_2.11:0.4.0' } ``` @@ -140,13 +140,13 @@ For SBT in `build.sbt` add: ```sbt scalaVersion := "2.11.12" -resolvers += Resolver.bintrayRepo("salesforce", "maven") +resolvers += Resolver.jcenterRepo // TransmogrifAI core dependency -libraryDependencies ++= "com.salesforce.transmogrifai" %% "transmogrifai-core" % "0.3.4" +libraryDependencies ++= "com.salesforce.transmogrifai" %% "transmogrifai-core" % "0.4.0" // TransmogrifAI pretrained models, e.g. OpenNLP POS/NER models etc. (optional) -// libraryDependencies ++= "com.salesforce.transmogrifai" %% "transmogrifai-models" % "0.3.4" +// libraryDependencies ++= "com.salesforce.transmogrifai" %% "transmogrifai-models" % "0.4.0" ``` Then import TransmogrifAI into your code: diff --git a/build.gradle b/build.gradle index ab1098c087..b4c1913fc3 100644 --- a/build.gradle +++ b/build.gradle @@ -282,6 +282,9 @@ configure(subProjs) { jar.baseName = "$rootProject.name-$project.name" + // ignore link warning in scaladoc + scaladoc.scalaDocOptions.additionalParameters = ['-no-link-warnings'] + task scaladocJar(type: Jar, dependsOn: scaladoc) { classifier = 'javadoc' from scaladoc.destinationDir diff --git a/core/src/main/scala/com/salesforce/op/dsl/RichDateFeature.scala b/core/src/main/scala/com/salesforce/op/dsl/RichDateFeature.scala index c9903f836d..ffacb398be 100644 --- a/core/src/main/scala/com/salesforce/op/dsl/RichDateFeature.scala +++ b/core/src/main/scala/com/salesforce/op/dsl/RichDateFeature.scala @@ -91,8 +91,7 @@ trait RichDateFeature { * @param dateListPivot name of the pivot type from [[DateListPivot]] enum * @param referenceDate reference date to compare against when [[DateListPivot]] is [[SinceFirst]] or [[SinceLast]] * @param trackNulls option to keep track of values that were missing - * @param circularDateReps list of all the circular date representations that should be included - * feature vector + * @param circularDateReps list of all the circular date representations that should be included in feature vector * @return result feature of type Vector */ def vectorize @@ -164,8 +163,7 @@ trait RichDateFeature { * @param dateListPivot name of the pivot type from [[DateListPivot]] enum * @param referenceDate reference date to compare against when [[DateListPivot]] is [[SinceFirst]] or [[SinceLast]] * @param trackNulls option to keep track of values that were missing - * @param circularDateReps list of all the circular date representations that should be included - * feature vector + * @param circularDateReps list of all the circular date representations that should be included in feature vector * @return result feature of type Vector */ def vectorize diff --git a/core/src/main/scala/com/salesforce/op/dsl/RichMapFeature.scala b/core/src/main/scala/com/salesforce/op/dsl/RichMapFeature.scala index 13c2ac1f24..199baa7a10 100644 --- a/core/src/main/scala/com/salesforce/op/dsl/RichMapFeature.scala +++ b/core/src/main/scala/com/salesforce/op/dsl/RichMapFeature.scala @@ -705,8 +705,7 @@ trait RichMapFeature { * @param blackListKeys keys to blacklist * @param trackNulls option to keep track of values that were missing * @param referenceDate reference date to subtract off before converting to vector - * @param circularDateReps list of all the circular date representations that should be included - * feature vector + * @param circularDateReps list of all the circular date representations that should be included in feature vector * @return result feature of type Vector * @param others other features of the same type * @return an OPVector feature @@ -785,8 +784,7 @@ trait RichMapFeature { * @param blackListKeys keys to blacklist * @param trackNulls option to keep track of values that were missing * @param referenceDate reference date to subtract off before converting to vector - * @param circularDateReps list of all the circular date representations that should be included - * feature vector + * @param circularDateReps list of all the circular date representations that should be included in feature vector * @param others other features of the same type * @return an OPVector feature */ diff --git a/core/src/main/scala/com/salesforce/op/stages/impl/selector/ModelSelector.scala b/core/src/main/scala/com/salesforce/op/stages/impl/selector/ModelSelector.scala index 5fb8f4ca05..c98ad0cd0c 100644 --- a/core/src/main/scala/com/salesforce/op/stages/impl/selector/ModelSelector.scala +++ b/core/src/main/scala/com/salesforce/op/stages/impl/selector/ModelSelector.scala @@ -94,7 +94,7 @@ E <: Estimator[_] with OpPipelineStage2[RealNN, OPVector, Prediction]] } @transient private[op] var bestEstimator: Option[BestEstimator[E]] = None - @transient private val modelsUse = models.map{case (e, p) => + @transient private lazy val modelsUse = models.map{case (e, p) => val est = e.setOutputFeatureName(getOutputFeatureName) val par = if (p.isEmpty) Array(new ParamMap) else p est -> par diff --git a/docs/examples/Bootstrap-Your-First-Project.md b/docs/examples/Bootstrap-Your-First-Project.md index 9687473971..9ed867e572 100644 --- a/docs/examples/Bootstrap-Your-First-Project.md +++ b/docs/examples/Bootstrap-Your-First-Project.md @@ -7,10 +7,10 @@ Clone the TransmogrifAI repo: ```bash git clone https://github.com/salesforce/TransmogrifAI.git ``` -Checkout the latest release branch (in this example `0.3.4`): +Checkout the latest release branch (in this example `0.4.0`): ```bash cd ./TransmogrifAI -git checkout 0.3.4 +git checkout 0.4.0 ``` Build the TransmogrifAI CLI by running: ```bash diff --git a/gradle.properties b/gradle.properties index ed6d3d898c..0a433655f7 100644 --- a/gradle.properties +++ b/gradle.properties @@ -1,3 +1,3 @@ -version=0.3.5-SNAPSHOT +version=0.4.0 group=com.salesforce.transmogrifai org.gradle.caching=true diff --git a/gradle/wrapper/gradle-wrapper.jar b/gradle/wrapper/gradle-wrapper.jar index 0d4a951687..28861d273a 100644 Binary files a/gradle/wrapper/gradle-wrapper.jar and b/gradle/wrapper/gradle-wrapper.jar differ diff --git a/helloworld/README.md b/helloworld/README.md index fbcbb29b98..2655f411e8 100644 --- a/helloworld/README.md +++ b/helloworld/README.md @@ -59,7 +59,7 @@ First, build project with `./gradlew shadowJar`. #### Train ```shell $SPARK_HOME/bin/spark-submit --class com.salesforce.hw.titanic.OpTitanic \ - build/libs/op-helloworld-0.0.1-all.jar \ + build/libs/transmogrifai-helloworld-0.0.1-all.jar \ --run-type train \ --model-location /tmp/titanic-model \ --read-location Passenger=`pwd`/src/main/resources/TitanicDataset/TitanicPassengersTrainData.csv @@ -67,7 +67,7 @@ $SPARK_HOME/bin/spark-submit --class com.salesforce.hw.titanic.OpTitanic \ #### Score ```shell $SPARK_HOME/bin/spark-submit --class com.salesforce.hw.titanic.OpTitanic \ - build/libs/op-helloworld-0.0.1-all.jar \ + build/libs/transmogrifai-helloworld-0.0.1-all.jar \ --run-type score \ --model-location /tmp/titanic-model \ --read-location Passenger=`pwd`/src/main/resources/TitanicDataset/TitanicPassengersTrainData.csv \ @@ -76,7 +76,7 @@ $SPARK_HOME/bin/spark-submit --class com.salesforce.hw.titanic.OpTitanic \ #### Evaluate ```shell $SPARK_HOME/bin/spark-submit --class com.salesforce.hw.titanic.OpTitanic \ - build/libs/op-helloworld-0.0.1-all.jar \ + build/libs/transmogrifai-helloworld-0.0.1-all.jar \ --run-type evaluate \ --model-location /tmp/titanic-model \ --read-location Passenger=`pwd`/src/main/resources/TitanicDataset/TitanicPassengersTrainData.csv \ diff --git a/helloworld/build.gradle b/helloworld/build.gradle index 525b7e4708..37ef2aab66 100644 --- a/helloworld/build.gradle +++ b/helloworld/build.gradle @@ -12,8 +12,8 @@ plugins { id 'com.commercehub.gradle.plugin.avro' version '0.8.0' } repositories { + jcenter() mavenCentral() - maven { url 'https://dl.bintray.com/salesforce/maven' } } apply plugin: 'application' @@ -38,7 +38,7 @@ ext { junitVersion = '4.11' sparkVersion = '2.2.1' scalatestVersion = '3.0.0' - transmogrifaiVersion ='0.3.4' + transmogrifaiVersion ='0.4.0' collectionsVersion = '3.2.2' mainClassName = "com.salesforce.dummy.DummyMain" } diff --git a/helloworld/gradle/scalastyle-config.xml b/helloworld/gradle/scalastyle-config.xml index 625f33d533..1fd88d2b01 100644 --- a/helloworld/gradle/scalastyle-config.xml +++ b/helloworld/gradle/scalastyle-config.xml @@ -55,28 +55,27 @@ This file is divided into 3 sections: * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * - * 1. Redistributions of source code must retain the above copyright notice, - * this list of conditions and the following disclaimer. + * * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. * - * 2. Redistributions in binary form must reproduce the above copyright notice, - * this list of conditions and the following disclaimer in the documentation - * and/or other materials provided with the distribution. + * * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials provided with the distribution. * - * 3. Neither the name of Salesforce.com nor the names of its contributors may - * be used to endorse or promote products derived from this software without - * specific prior written permission. + * * Neither the name of the copyright holder nor the names of its + * contributors may be used to endorse or promote products derived from + * this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE - * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR - * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF - * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS - * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN - * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) - * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE - * POSSIBILITY OF SUCH DAMAGE. + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER + * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */]]> diff --git a/helloworld/settings.gradle b/helloworld/settings.gradle index b942faaa67..4166ec96fc 100644 --- a/helloworld/settings.gradle +++ b/helloworld/settings.gradle @@ -1 +1 @@ -rootProject.name='op-helloworld' +rootProject.name='transmogrifai-helloworld' diff --git a/helloworld/src/main/scala/com/salesforce/hw/OpTitanicSimple.scala b/helloworld/src/main/scala/com/salesforce/hw/OpTitanicSimple.scala index 91be90008e..100c43b320 100644 --- a/helloworld/src/main/scala/com/salesforce/hw/OpTitanicSimple.scala +++ b/helloworld/src/main/scala/com/salesforce/hw/OpTitanicSimple.scala @@ -5,28 +5,27 @@ * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * - * 1. Redistributions of source code must retain the above copyright notice, - * this list of conditions and the following disclaimer. + * * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. * - * 2. Redistributions in binary form must reproduce the above copyright notice, - * this list of conditions and the following disclaimer in the documentation - * and/or other materials provided with the distribution. + * * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials provided with the distribution. * - * 3. Neither the name of Salesforce.com nor the names of its contributors may - * be used to endorse or promote products derived from this software without - * specific prior written permission. + * * Neither the name of the copyright holder nor the names of its + * contributors may be used to endorse or promote products derived from + * this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE - * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR - * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF - * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS - * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN - * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) - * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE - * POSSIBILITY OF SUCH DAMAGE. + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER + * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ package com.salesforce.hw @@ -37,7 +36,7 @@ import com.salesforce.op.features.FeatureBuilder import com.salesforce.op.features.types._ import com.salesforce.op.readers.DataReaders import com.salesforce.op.stages.impl.classification.BinaryClassificationModelSelector -import com.salesforce.op.stages.impl.classification.ClassificationModelsToTry._ +import com.salesforce.op.stages.impl.classification.BinaryClassificationModelsToTry._ import org.apache.spark.SparkConf import org.apache.spark.sql.SparkSession @@ -134,16 +133,12 @@ object OpTitanicSimple { val finalFeatures = if (sanityCheck) survived.sanityCheck(passengerFeatures) else passengerFeatures // Define the model we want to use (here a simple logistic regression) and get the resulting output - val (prediction, rawPrediction, prob) = - BinaryClassificationModelSelector.withTrainValidationSplit() - .setModelsToTry(LogisticRegression) - .setInput(survived, finalFeatures).getOutput() + val prediction = + BinaryClassificationModelSelector.withTrainValidationSplit( + modelTypesToUse = Seq(OpLogisticRegression) + ).setInput(survived, finalFeatures).getOutput() - val evaluator = Evaluators.BinaryClassification() - .setLabelCol(survived) - .setRawPredictionCol(rawPrediction) - .setPredictionCol(prediction) - .setProbabilityCol(prob) + val evaluator = Evaluators.BinaryClassification().setLabelCol(survived).setPredictionCol(prediction) //////////////////////////////////////////////////////////////////////////////// // WORKFLOW @@ -159,7 +154,7 @@ object OpTitanicSimple { // Define a new workflow and attach our data reader val workflow = new OpWorkflow() - .setResultFeatures(survived, rawPrediction, prob, prediction) + .setResultFeatures(survived, prediction) .setReader(trainDataReader) // Fit the workflow to the data diff --git a/helloworld/src/main/scala/com/salesforce/hw/boston/BostonFeatures.scala b/helloworld/src/main/scala/com/salesforce/hw/boston/BostonFeatures.scala index ad3b3ae7e7..06053f4fdd 100644 --- a/helloworld/src/main/scala/com/salesforce/hw/boston/BostonFeatures.scala +++ b/helloworld/src/main/scala/com/salesforce/hw/boston/BostonFeatures.scala @@ -5,28 +5,27 @@ * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * - * 1. Redistributions of source code must retain the above copyright notice, - * this list of conditions and the following disclaimer. + * * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. * - * 2. Redistributions in binary form must reproduce the above copyright notice, - * this list of conditions and the following disclaimer in the documentation - * and/or other materials provided with the distribution. + * * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials provided with the distribution. * - * 3. Neither the name of Salesforce.com nor the names of its contributors may - * be used to endorse or promote products derived from this software without - * specific prior written permission. + * * Neither the name of the copyright holder nor the names of its + * contributors may be used to endorse or promote products derived from + * this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE - * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR - * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF - * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS - * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN - * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) - * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE - * POSSIBILITY OF SUCH DAMAGE. + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER + * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ package com.salesforce.hw.boston diff --git a/helloworld/src/main/scala/com/salesforce/hw/boston/BostonHouse.scala b/helloworld/src/main/scala/com/salesforce/hw/boston/BostonHouse.scala index 6cd3d73890..9b6539ff76 100644 --- a/helloworld/src/main/scala/com/salesforce/hw/boston/BostonHouse.scala +++ b/helloworld/src/main/scala/com/salesforce/hw/boston/BostonHouse.scala @@ -5,28 +5,27 @@ * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * - * 1. Redistributions of source code must retain the above copyright notice, - * this list of conditions and the following disclaimer. + * * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. * - * 2. Redistributions in binary form must reproduce the above copyright notice, - * this list of conditions and the following disclaimer in the documentation - * and/or other materials provided with the distribution. + * * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials provided with the distribution. * - * 3. Neither the name of Salesforce.com nor the names of its contributors may - * be used to endorse or promote products derived from this software without - * specific prior written permission. + * * Neither the name of the copyright holder nor the names of its + * contributors may be used to endorse or promote products derived from + * this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE - * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR - * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF - * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS - * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN - * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) - * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE - * POSSIBILITY OF SUCH DAMAGE. + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER + * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ package com.salesforce.hw.boston diff --git a/helloworld/src/main/scala/com/salesforce/hw/boston/BostonKryoRegistrator.scala b/helloworld/src/main/scala/com/salesforce/hw/boston/BostonKryoRegistrator.scala index fe5347e0f9..ef2360a14b 100644 --- a/helloworld/src/main/scala/com/salesforce/hw/boston/BostonKryoRegistrator.scala +++ b/helloworld/src/main/scala/com/salesforce/hw/boston/BostonKryoRegistrator.scala @@ -5,28 +5,27 @@ * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * - * 1. Redistributions of source code must retain the above copyright notice, - * this list of conditions and the following disclaimer. + * * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. * - * 2. Redistributions in binary form must reproduce the above copyright notice, - * this list of conditions and the following disclaimer in the documentation - * and/or other materials provided with the distribution. + * * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials provided with the distribution. * - * 3. Neither the name of Salesforce.com nor the names of its contributors may - * be used to endorse or promote products derived from this software without - * specific prior written permission. + * * Neither the name of the copyright holder nor the names of its + * contributors may be used to endorse or promote products derived from + * this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE - * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR - * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF - * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS - * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN - * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) - * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE - * POSSIBILITY OF SUCH DAMAGE. + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER + * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ package com.salesforce.hw.boston diff --git a/helloworld/src/main/scala/com/salesforce/hw/boston/OpBoston.scala b/helloworld/src/main/scala/com/salesforce/hw/boston/OpBoston.scala index a36043124e..9bf61d555d 100644 --- a/helloworld/src/main/scala/com/salesforce/hw/boston/OpBoston.scala +++ b/helloworld/src/main/scala/com/salesforce/hw/boston/OpBoston.scala @@ -5,28 +5,27 @@ * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * - * 1. Redistributions of source code must retain the above copyright notice, - * this list of conditions and the following disclaimer. + * * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. * - * 2. Redistributions in binary form must reproduce the above copyright notice, - * this list of conditions and the following disclaimer in the documentation - * and/or other materials provided with the distribution. + * * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials provided with the distribution. * - * 3. Neither the name of Salesforce.com nor the names of its contributors may - * be used to endorse or promote products derived from this software without - * specific prior written permission. + * * Neither the name of the copyright holder nor the names of its + * contributors may be used to endorse or promote products derived from + * this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE - * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR - * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF - * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS - * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN - * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) - * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE - * POSSIBILITY OF SUCH DAMAGE. + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER + * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ package com.salesforce.hw.boston @@ -35,6 +34,7 @@ import com.salesforce.op._ import com.salesforce.op.evaluators.Evaluators import com.salesforce.op.readers.CustomReader import com.salesforce.op.stages.impl.regression.RegressionModelSelector +import com.salesforce.op.stages.impl.regression.RegressionModelsToTry._ import com.salesforce.op.stages.impl.tuning.DataSplitter import com.salesforce.op.utils.kryo.OpKryoRegistrator import org.apache.spark.rdd.RDD @@ -51,7 +51,7 @@ object OpBoston extends OpAppWithRunner with BostonFeatures { // READERS DEFINITION ///////////////////////////////////////////////////////////////////////////////// - val randomSeed = 112233 + val randomSeed = 112233L def customRead(path: Option[String], spark: SparkSession): RDD[BostonHouse] = { require(path.isDefined, "The path is not set") @@ -89,11 +89,10 @@ object OpBoston extends OpAppWithRunner with BostonFeatures { val houseFeatures = Seq(crim, zn, indus, chas, nox, rm, age, dis, rad, tax, ptratio, b, lstat).transmogrify() val prediction = RegressionModelSelector - .withCrossValidation(dataSplitter = Option(DataSplitter(seed = randomSeed)), seed = randomSeed) - .setRandomForestSeed(randomSeed) - .setGradientBoostedTreeSeed(randomSeed) - .setInput(medv, houseFeatures) - .getOutput() + .withCrossValidation( + dataSplitter = Some(DataSplitter(seed = randomSeed)), seed = randomSeed, + modelTypesToUse = Seq(OpGBTRegressor, OpRandomForestRegressor) + ).setInput(medv, houseFeatures).getOutput() val workflow = new OpWorkflow().setResultFeatures(prediction) diff --git a/helloworld/src/main/scala/com/salesforce/hw/dataprep/ConditionalAggregation.scala b/helloworld/src/main/scala/com/salesforce/hw/dataprep/ConditionalAggregation.scala index 13fb46faac..5d8ae8b01b 100644 --- a/helloworld/src/main/scala/com/salesforce/hw/dataprep/ConditionalAggregation.scala +++ b/helloworld/src/main/scala/com/salesforce/hw/dataprep/ConditionalAggregation.scala @@ -5,28 +5,27 @@ * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * - * 1. Redistributions of source code must retain the above copyright notice, - * this list of conditions and the following disclaimer. + * * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. * - * 2. Redistributions in binary form must reproduce the above copyright notice, - * this list of conditions and the following disclaimer in the documentation - * and/or other materials provided with the distribution. + * * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials provided with the distribution. * - * 3. Neither the name of Salesforce.com nor the names of its contributors may - * be used to endorse or promote products derived from this software without - * specific prior written permission. + * * Neither the name of the copyright holder nor the names of its + * contributors may be used to endorse or promote products derived from + * this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE - * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR - * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF - * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS - * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN - * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) - * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE - * POSSIBILITY OF SUCH DAMAGE. + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER + * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ package com.salesforce.hw.dataprep diff --git a/helloworld/src/main/scala/com/salesforce/hw/dataprep/JoinsAndAggregates.scala b/helloworld/src/main/scala/com/salesforce/hw/dataprep/JoinsAndAggregates.scala index c3ab534624..4ba251c827 100644 --- a/helloworld/src/main/scala/com/salesforce/hw/dataprep/JoinsAndAggregates.scala +++ b/helloworld/src/main/scala/com/salesforce/hw/dataprep/JoinsAndAggregates.scala @@ -5,28 +5,27 @@ * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * - * 1. Redistributions of source code must retain the above copyright notice, - * this list of conditions and the following disclaimer. + * * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. * - * 2. Redistributions in binary form must reproduce the above copyright notice, - * this list of conditions and the following disclaimer in the documentation - * and/or other materials provided with the distribution. + * * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials provided with the distribution. * - * 3. Neither the name of Salesforce.com nor the names of its contributors may - * be used to endorse or promote products derived from this software without - * specific prior written permission. + * * Neither the name of the copyright holder nor the names of its + * contributors may be used to endorse or promote products derived from + * this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE - * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR - * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF - * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS - * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN - * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) - * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE - * POSSIBILITY OF SUCH DAMAGE. + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER + * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ package com.salesforce.hw.dataprep diff --git a/helloworld/src/main/scala/com/salesforce/hw/iris/IrisFeatures.scala b/helloworld/src/main/scala/com/salesforce/hw/iris/IrisFeatures.scala index e62d16cb56..16c60676ce 100644 --- a/helloworld/src/main/scala/com/salesforce/hw/iris/IrisFeatures.scala +++ b/helloworld/src/main/scala/com/salesforce/hw/iris/IrisFeatures.scala @@ -5,28 +5,27 @@ * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * - * 1. Redistributions of source code must retain the above copyright notice, - * this list of conditions and the following disclaimer. + * * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. * - * 2. Redistributions in binary form must reproduce the above copyright notice, - * this list of conditions and the following disclaimer in the documentation - * and/or other materials provided with the distribution. + * * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials provided with the distribution. * - * 3. Neither the name of Salesforce.com nor the names of its contributors may - * be used to endorse or promote products derived from this software without - * specific prior written permission. + * * Neither the name of the copyright holder nor the names of its + * contributors may be used to endorse or promote products derived from + * this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE - * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR - * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF - * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS - * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN - * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) - * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE - * POSSIBILITY OF SUCH DAMAGE. + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER + * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ package com.salesforce.hw.iris diff --git a/helloworld/src/main/scala/com/salesforce/hw/iris/IrisKryoRegistrator.scala b/helloworld/src/main/scala/com/salesforce/hw/iris/IrisKryoRegistrator.scala index eaad041400..9d87d83b1c 100644 --- a/helloworld/src/main/scala/com/salesforce/hw/iris/IrisKryoRegistrator.scala +++ b/helloworld/src/main/scala/com/salesforce/hw/iris/IrisKryoRegistrator.scala @@ -5,28 +5,27 @@ * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * - * 1. Redistributions of source code must retain the above copyright notice, - * this list of conditions and the following disclaimer. + * * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. * - * 2. Redistributions in binary form must reproduce the above copyright notice, - * this list of conditions and the following disclaimer in the documentation - * and/or other materials provided with the distribution. + * * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials provided with the distribution. * - * 3. Neither the name of Salesforce.com nor the names of its contributors may - * be used to endorse or promote products derived from this software without - * specific prior written permission. + * * Neither the name of the copyright holder nor the names of its + * contributors may be used to endorse or promote products derived from + * this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE - * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR - * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF - * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS - * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN - * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) - * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE - * POSSIBILITY OF SUCH DAMAGE. + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER + * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ package com.salesforce.hw.iris diff --git a/helloworld/src/main/scala/com/salesforce/hw/iris/OpIris.scala b/helloworld/src/main/scala/com/salesforce/hw/iris/OpIris.scala index 837dbed9cc..cd450246a6 100644 --- a/helloworld/src/main/scala/com/salesforce/hw/iris/OpIris.scala +++ b/helloworld/src/main/scala/com/salesforce/hw/iris/OpIris.scala @@ -5,28 +5,27 @@ * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * - * 1. Redistributions of source code must retain the above copyright notice, - * this list of conditions and the following disclaimer. + * * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. * - * 2. Redistributions in binary form must reproduce the above copyright notice, - * this list of conditions and the following disclaimer in the documentation - * and/or other materials provided with the distribution. + * * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials provided with the distribution. * - * 3. Neither the name of Salesforce.com nor the names of its contributors may - * be used to endorse or promote products derived from this software without - * specific prior written permission. + * * Neither the name of the copyright holder nor the names of its + * contributors may be used to endorse or promote products derived from + * this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE - * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR - * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF - * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS - * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN - * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) - * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE - * POSSIBILITY OF SUCH DAMAGE. + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER + * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ package com.salesforce.hw.iris @@ -73,18 +72,15 @@ object OpIris extends OpAppWithRunner with IrisFeatures { val features = Seq(sepalLength, sepalWidth, petalLength, petalWidth).transmogrify() - val (pred, raw, prob) = MultiClassificationModelSelector - .withCrossValidation(splitter = Some(DataCutter(reserveTestFraction = 0.2, seed = randomSeed)), seed = randomSeed) - .setDecisionTreeSeed(randomSeed) + val cutter = DataCutter(reserveTestFraction = 0.2, seed = randomSeed) + + val prediction = MultiClassificationModelSelector + .withCrossValidation(splitter = Option(cutter), seed = randomSeed) .setInput(labels, features).getOutput() - val evaluator = Evaluators.MultiClassification.f1() - .setLabelCol(labels) - .setPredictionCol(pred) - .setRawPredictionCol(raw) - .setProbabilityCol(prob) + val evaluator = Evaluators.MultiClassification.f1().setLabelCol(labels).setPredictionCol(prediction) - val workflow = new OpWorkflow().setResultFeatures(pred, raw, prob, labels) + val workflow = new OpWorkflow().setResultFeatures(prediction, labels) def runner(opParams: OpParams): OpWorkflowRunner = new OpWorkflowRunner( diff --git a/helloworld/src/main/scala/com/salesforce/hw/titanic/OpTitanic.scala b/helloworld/src/main/scala/com/salesforce/hw/titanic/OpTitanic.scala index 201499593a..abc2270e24 100644 --- a/helloworld/src/main/scala/com/salesforce/hw/titanic/OpTitanic.scala +++ b/helloworld/src/main/scala/com/salesforce/hw/titanic/OpTitanic.scala @@ -5,41 +5,38 @@ * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * - * 1. Redistributions of source code must retain the above copyright notice, - * this list of conditions and the following disclaimer. + * * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. * - * 2. Redistributions in binary form must reproduce the above copyright notice, - * this list of conditions and the following disclaimer in the documentation - * and/or other materials provided with the distribution. + * * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials provided with the distribution. * - * 3. Neither the name of Salesforce.com nor the names of its contributors may - * be used to endorse or promote products derived from this software without - * specific prior written permission. + * * Neither the name of the copyright holder nor the names of its + * contributors may be used to endorse or promote products derived from + * this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE - * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR - * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF - * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS - * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN - * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) - * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE - * POSSIBILITY OF SUCH DAMAGE. + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER + * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ package com.salesforce.hw.titanic import com.salesforce.op._ -import com.salesforce.op.features._ -import com.salesforce.op.features.types._ import com.salesforce.op.evaluators.Evaluators import com.salesforce.op.readers.DataReaders -import com.salesforce.op.stages.impl.classification.ClassificationModelsToTry._ import com.salesforce.op.stages.impl.classification._ import com.salesforce.op.stages.impl.tuning.DataSplitter import com.salesforce.op.utils.kryo.OpKryoRegistrator +import org.apache.spark.ml.tuning.ParamGridBuilder /** * TransmogrifAI example classification app using the Titanic dataset @@ -50,7 +47,7 @@ object OpTitanic extends OpAppWithRunner with TitanicFeatures { // READER DEFINITION ///////////////////////////////////////////////////////////////////////////////// - val randomSeed = 112233 + val randomSeed = 112233L val simpleReader = DataReaders.Simple.csv[Passenger]( schema = Passenger.getClassSchema.toString, key = _.getPassengerId.toString ) @@ -68,25 +65,28 @@ object OpTitanic extends OpAppWithRunner with TitanicFeatures { ) // Automated model selection + val lr = new OpLogisticRegression() + val rf = new OpRandomForestClassifier() + val models = Seq( + lr -> new ParamGridBuilder() + .addGrid(lr.regParam, Array(0.05, 0.1)) + .addGrid(lr.elasticNetParam, Array(0.01)) + .build(), + rf -> new ParamGridBuilder() + .addGrid(rf.maxDepth, Array(5, 10)) + .addGrid(rf.minInstancesPerNode, Array(10, 20, 30)) + .addGrid(rf.seed, Array(randomSeed)) + .build() + ) val splitter = DataSplitter(seed = randomSeed, reserveTestFraction = 0.1) - val (pred, raw, prob) = BinaryClassificationModelSelector - .withCrossValidation(splitter = Option(splitter), seed = randomSeed) - .setLogisticRegressionRegParam(0.05, 0.1) - .setLogisticRegressionElasticNetParam(0.01) - .setRandomForestMaxDepth(5, 10) - .setRandomForestMinInstancesPerNode(10, 20, 30) - .setRandomForestSeed(randomSeed) - .setModelsToTry(LogisticRegression, RandomForest) + val prediction = BinaryClassificationModelSelector + .withCrossValidation(splitter = Option(splitter), seed = randomSeed, modelsAndParameters = models) .setInput(survived, checkedFeatures) .getOutput() - val workflow = new OpWorkflow().setResultFeatures(pred, raw) + val workflow = new OpWorkflow().setResultFeatures(prediction) - val evaluator = Evaluators.BinaryClassification.auPR() - .setLabelCol(survived) - .setPredictionCol(pred) - .setRawPredictionCol(raw) - .setProbabilityCol(prob) + val evaluator = Evaluators.BinaryClassification.auPR().setLabelCol(survived).setPredictionCol(prediction) //////////////////////////////////////////////////////////////////////////////// // APPLICATION RUNNER DEFINITION diff --git a/helloworld/src/main/scala/com/salesforce/hw/titanic/OpTitanicMini.scala b/helloworld/src/main/scala/com/salesforce/hw/titanic/OpTitanicMini.scala index 73f850f347..8b21c9d5a7 100644 --- a/helloworld/src/main/scala/com/salesforce/hw/titanic/OpTitanicMini.scala +++ b/helloworld/src/main/scala/com/salesforce/hw/titanic/OpTitanicMini.scala @@ -5,28 +5,27 @@ * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * - * 1. Redistributions of source code must retain the above copyright notice, - * this list of conditions and the following disclaimer. + * * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. * - * 2. Redistributions in binary form must reproduce the above copyright notice, - * this list of conditions and the following disclaimer in the documentation - * and/or other materials provided with the distribution. + * * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials provided with the distribution. * - * 3. Neither the name of Salesforce.com nor the names of its contributors may - * be used to endorse or promote products derived from this software without - * specific prior written permission. + * * Neither the name of the copyright holder nor the names of its + * contributors may be used to endorse or promote products derived from + * this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE - * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR - * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF - * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS - * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN - * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) - * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE - * POSSIBILITY OF SUCH DAMAGE. + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER + * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ package com.salesforce.hw.titanic @@ -35,6 +34,7 @@ import com.salesforce.op._ import com.salesforce.op.features.FeatureBuilder import com.salesforce.op.features.types._ import com.salesforce.op.readers.DataReaders +import com.salesforce.op.stages.impl.classification.BinaryClassificationModelsToTry.{OpLogisticRegression, OpRandomForestClassifier} import com.salesforce.op.stages.impl.classification._ import org.apache.spark.SparkConf import org.apache.spark.sql.SparkSession @@ -78,8 +78,10 @@ object OpTitanicMini { val checkedFeatures = survived.sanityCheck(featureVector, checkSample = 1.0, removeBadFeatures = true) // Automated model selection - val (pred, raw, prob) = BinaryClassificationModelSelector().setInput(survived, checkedFeatures).getOutput() - val model = new OpWorkflow().setInputDataset(passengersData).setResultFeatures(pred).train() + val prediction = BinaryClassificationModelSelector + .withCrossValidation(modelTypesToUse = Seq(OpLogisticRegression, OpRandomForestClassifier)) + .setInput(survived, checkedFeatures).getOutput() + val model = new OpWorkflow().setInputDataset(passengersData).setResultFeatures(prediction).train() println("Model summary:\n" + model.summaryPretty()) } diff --git a/helloworld/src/main/scala/com/salesforce/hw/titanic/TitanicFeatures.scala b/helloworld/src/main/scala/com/salesforce/hw/titanic/TitanicFeatures.scala index b1b4025d23..199cedfa96 100644 --- a/helloworld/src/main/scala/com/salesforce/hw/titanic/TitanicFeatures.scala +++ b/helloworld/src/main/scala/com/salesforce/hw/titanic/TitanicFeatures.scala @@ -5,28 +5,27 @@ * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * - * 1. Redistributions of source code must retain the above copyright notice, - * this list of conditions and the following disclaimer. + * * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. * - * 2. Redistributions in binary form must reproduce the above copyright notice, - * this list of conditions and the following disclaimer in the documentation - * and/or other materials provided with the distribution. + * * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials provided with the distribution. * - * 3. Neither the name of Salesforce.com nor the names of its contributors may - * be used to endorse or promote products derived from this software without - * specific prior written permission. + * * Neither the name of the copyright holder nor the names of its + * contributors may be used to endorse or promote products derived from + * this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE - * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR - * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF - * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS - * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN - * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) - * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE - * POSSIBILITY OF SUCH DAMAGE. + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER + * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ package com.salesforce.hw.titanic diff --git a/helloworld/src/main/scala/com/salesforce/hw/titanic/TitanicKryoRegistrator.scala b/helloworld/src/main/scala/com/salesforce/hw/titanic/TitanicKryoRegistrator.scala index fd295124d9..201f8c321a 100644 --- a/helloworld/src/main/scala/com/salesforce/hw/titanic/TitanicKryoRegistrator.scala +++ b/helloworld/src/main/scala/com/salesforce/hw/titanic/TitanicKryoRegistrator.scala @@ -5,28 +5,27 @@ * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * - * 1. Redistributions of source code must retain the above copyright notice, - * this list of conditions and the following disclaimer. + * * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. * - * 2. Redistributions in binary form must reproduce the above copyright notice, - * this list of conditions and the following disclaimer in the documentation - * and/or other materials provided with the distribution. + * * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials provided with the distribution. * - * 3. Neither the name of Salesforce.com nor the names of its contributors may - * be used to endorse or promote products derived from this software without - * specific prior written permission. + * * Neither the name of the copyright holder nor the names of its + * contributors may be used to endorse or promote products derived from + * this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE - * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR - * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF - * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS - * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN - * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) - * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE - * POSSIBILITY OF SUCH DAMAGE. + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER + * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ package com.salesforce.hw.titanic diff --git a/templates/simple/build.gradle.template b/templates/simple/build.gradle.template index dac42d1504..30dab79e22 100644 --- a/templates/simple/build.gradle.template +++ b/templates/simple/build.gradle.template @@ -11,8 +11,8 @@ buildscript { } repositories { + jcenter() mavenCentral() - maven { url 'https://dl.bintray.com/salesforce/maven' } } apply plugin: 'application'