BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can write their deep learning applications as standard Spark programs, which can directly run on top of existing Spark or Hadoop clusters.
-
Step1 build analytics-zoo jar clone analytics zoo to local: git clone https://github.com/intel-analytics/analytics-zoo build job2career-with-dependencies.jar: mvn clean install -DskipTests
-
Step 2 login on to web with credentials
-
Step 3 setup cluster
- Clusters -> create cluster
- give a name “intel” set up workers 1, uncheck auto scaling.
- Set up spark configuration here, for example
- spark.executor.cores 4
- spark.cores.max 4
- spark.shuffle.reduceLocality.enabled false
- spark.shuffle.blockTransferService nio
- spark.scheduler.minRegisteredResourcesRatio 1.0
- spark.speculation false
-
Step 4, upload data and dependency jar
- Data-> create table -> upload data, give a name for example ”NEG50”
- /FileStore/taAbles/Jobs2Career/indexed/indexed/
- /FileStore/tables/Jobs2Career/indexed/NEG50/
- /FileStore/tables/Jobs2Career/lib/job2career_1_0_SNAPSHOT_job-0ca74.jar
-
Step 5 run job
- Jobs -> Create job -> give a name
- set Jar, Upload jar, give main class “com.intel.analytics.bigdl.apps.job2Career.TrainWithD2VGlove”, give arguments "--inputDir /FileStore/tables/Jobs2Career/indexed/“
- Add dependency lib dbfs:/FileStore/tables/Jobs2Career/lib/job2career_1_0_SNAPSHOT_job-0ca74.jar
- Edit cluster -> existing cluster, choose the one you created
- confirm -> run now -> see results from log