Skip to content

Releases: cas-bigdatalab/piflow

πFlow V1.9.5 Release

06 Dec 02:40
Compare
Choose a tag to compare

Features

  1. Add file-based pipeline scheduling mode;
  2. Comprehensively upgrade system security;
  3. Add my center.

Requirements

  • JDK 1.8
  • Scala 2.12.18
  • Spark-3.4.0(other spark version of piflow.jar should be built with code)
  • Hadoop-3.3.0(other hadoop version of piflow.jar should be with code)

config.properties

  spark.master=yarn
  spark.deploy.mode=cluster
  
  #hdfs default file system
  fs.defaultFS=hdfs://master:9000
  
  #yarn resourcemanager.hostname
  yarn.resourcemanager.hostname=master
  
  #if you want to use hive, set hive metastore uris
  #hive.metastore.uris=thrift://master:9083
  
  #show data in log, set 0 if you do not want to show data in logs
  data.show=5
  
  #server ip and port, ip can not be set to localhost or 127.0.0.1
  server.ip=your_ip
  server.port=8002
  
  #h2db port, path
  h2.port=50002
  #h2.path=test
  
  monitor.throughput=false
  #If you want to upload python stop,please set hdfs configs
  #example hdfs.cluster=hostname:hostIP
  hdfs.cluster=master:127.0.0.1
  hdfs.web.url=master:9870
  checkpoint.path=/piflow/tmp/checkpoint/
  
  #unstructured.parse
  unstructured.parse=false
  #host can not be set to localhost or 127.0.0.1
  # if port is not be set, default 8000
  #unstructured.port=8000
  #embed models path
  #embed_models_path=/data/testingStuff/models/

Command

  ./start.sh
  ./stop.sh
  ./restart.sh
  ./status.sh

πFlow V1.9 Release

08 Oct 06:58
Compare
Choose a tag to compare

Features

  1. Add new visualization features;
  2. Add a new Python base image management module;
  3. Add vector database storage components such as Chroma, Faiss, Weaviate, Pinecone, and Qdrant.

Requirements

  • JDK 1.8
  • Scala 2.12.18
  • Spark-3.4.0(other spark version of piflow.jar should be built with code)
  • Hadoop-3.3.0(other hadoop version of piflow.jar should be with code)

config.properties

  spark.master=yarn
  spark.deploy.mode=cluster
  
  #hdfs default file system
  fs.defaultFS=hdfs://master:9000
  
  #yarn resourcemanager.hostname
  yarn.resourcemanager.hostname=master
  
  #if you want to use hive, set hive metastore uris
  #hive.metastore.uris=thrift://master:9083
  
  #show data in log, set 0 if you do not want to show data in logs
  data.show=5
  
  #server ip and port, ip can not be set to localhost or 127.0.0.1
  server.ip=your_ip
  server.port=8002
  
  #h2db port, path
  h2.port=50002
  #h2.path=test
  
  monitor.throughput=false
  #If you want to upload python stop,please set hdfs configs
  #example hdfs.cluster=hostname:hostIP
  hdfs.cluster=master:127.0.0.1
  hdfs.web.url=master:9870
  checkpoint.path=/piflow/tmp/checkpoint/
  
  #unstructured.parse
  unstructured.parse=false
  #host can not be set to localhost or 127.0.0.1
  # if port is not be set, default 8000
  #unstructured.port=8000
  #embed models path
  #embed_models_path=/data/testingStuff/models/

Command

  ./start.sh
  ./stop.sh
  ./restart.sh
  ./status.sh

πFlow V1.8 Release

18 Apr 07:06
Compare
Choose a tag to compare

Features

  1. Add ability to parse unstructured data;
  2. Supports customize Server H2 database names;
  3. Customized Python components optimized;
  4. Template function optimized;
  5. Pipeline optimized.

Requirements

  • JDK 1.8
  • Scala 2.12.18
  • Spark-3.4.0(other spark version of piflow.jar should be built with code)
  • Hadoop-3.3.0(other hadoop version of piflow.jar should be with code)

config.properties

  spark.master=yarn
  spark.deploy.mode=cluster
  
  #hdfs default file system
  fs.defaultFS=hdfs://master:9000
  
  #yarn resourcemanager.hostname
  yarn.resourcemanager.hostname=master
  
  #if you want to use hive, set hive metastore uris
  #hive.metastore.uris=thrift://master:9083
  
  #show data in log, set 0 if you do not want to show data in logs
  data.show=5
  
  #server ip and port, ip can not be set to localhost or 127.0.0.1
  server.ip=your_ip
  server.port=8002
  
  #h2db port
  h2.port=50002
  #h2.name=test
  
  monitor.throughput=false
  #If you want to upload python stop,please set hdfs configs
  #example hdfs.cluster=hostname:hostIP
  hdfs.cluster=master:127.0.0.1
  hdfs.web.url=master:50070
  checkpoint.path=/piflow/tmp/checkpoint/
  
  #unstructured.parse
  unstructured.parse=false

Command

  ./start.sh
  ./stop.sh
  ./restart.sh
  ./status.sh

πFlow V1.7 Release

28 Dec 06:44
Compare
Choose a tag to compare

Requirements

  • JDK 1.8
  • Scala 2.12.18
  • Spark-3.4.0(other spark version of piflow.jar should be built with code)
  • Hadoop-3.3.0(other hadoop version of piflow.jar should be with code)

config.properties

  spark.master=yarn
  spark.deploy.mode=cluster
  
  #hdfs default file system
  fs.defaultFS=hdfs://10.0.85.83:9000
  
  #yarn resourcemanager hostname
  yarn.resourcemanager.hostname=10.0.85.83
  
  #if you want to use hive, set hive metastore uris
  #hive.metastore.uris=thrift://10.0.85.83:9083
  
  #show data in log, set 0 if you do not show the logs
  data.show=10

  #monitor the throughput of flow
  monitor.throughput=true

  #server port
  server.port=8001

  #h2db port
  h2.port=50001

  #if you want to mount python component,please set hdfs params
  #hdfs.cluster=master:127.0.0.1;slave1:127.0.0.2
  #hdfs.web.url=master:50070

Command

  ./start.sh
  ./stop.sh
  ./restart.sh
  ./status.sh

πFlow V1.6 Release

28 Sep 07:31
Compare
Choose a tag to compare

Requirements

  • JDK 1.8
  • Scala 2.12.18
  • Spark-3.4.0(other spark version of piflow.jar should be built with code)
  • Hadoop-3.3.0(other hadoop version of piflow.jar should be with code)

config.properties

  spark.master=yarn
  spark.deploy.mode=cluster
  
  #hdfs default file system
  fs.defaultFS=hdfs://10.0.85.83:9000
  
  #yarn resourcemanager hostname
  yarn.resourcemanager.hostname=10.0.85.83
  
  #if you want to use hive, set hive metastore uris
  #hive.metastore.uris=thrift://10.0.85.83:9083
  
  #show data in log, set 0 if you do not show the logs
  data.show=10

  #monitor the throughput of flow
  monitor.throughput=true

  #server port
  server.port=8001

  #h2db port
  h2.port=50001

  #if you want to mount python component,please set hdfs params
  #hdfs.cluster=master:127.0.0.1;slave1:127.0.0.2
  #hdfs.web.url=master:50070

Command

  ./start.sh
  ./stop.sh
  ./restart.sh
  ./status.sh

πFlow V1.5 Release

10 May 02:45
Compare
Choose a tag to compare

Requirements

  • JDK 1.8
  • Scala 2.11.8
  • Spark-2.1.0、Spark-2.2.0、Spark-2.3.0(other spark version of piflow.jar should be built with code)
  • Hadoop-2.6.0 (other hadoop version of piflow.jar should be with code)
  • Hive-1.2.1(if you need to use hive,setup and modify the config.properties)

config.properties

  spark.master=yarn
  spark.deploy.mode=cluster
  
  #hdfs default file system
  fs.defaultFS=hdfs://10.0.85.83:9000
  
  #yarn resourcemanager hostname
  yarn.resourcemanager.hostname=10.0.85.83
  
  #if you want to use hive, set hive metastore uris
  #hive.metastore.uris=thrift://10.0.85.83:9083
  
  #show data in log, set 0 if you do not show the logs
  data.show=10

  #monitor the throughput of flow
  monitor.throughput=true

  #server port
  server.port=8001

  #h2db port
  h2.port=50001

  #if you want to mount python component,please set hdfs params
  #hdfs.cluster=master:127.0.0.1;slave1:127.0.0.2
  #hdfs.web.url=master:50070

Command

  ./start.sh
  ./stop.sh
  ./restart.sh
  ./status.sh

PiFlow V1.4 Release

24 Nov 07:33
Compare
Choose a tag to compare

Requirements

  • JDK 1.8
  • Scala 2.11.8
  • Spark-2.1.0、Spark-2.2.0、Spark-2.3.0(other spark version of piflow.jar should be built with code)
  • Hadoop-2.6.0 (other hadoop version of piflow.jar should be with code)
  • Hive-1.2.1(if you need to use hive,setup and modify the config.properties)

config.properties

  spark.master=yarn
  spark.deploy.mode=cluster
  
  #hdfs default file system
  fs.defaultFS=hdfs://10.0.85.83:9000
  
  #yarn resourcemanager hostname
  yarn.resourcemanager.hostname=10.0.85.83
  
  #if you want to use hive, set hive metastore uris
  #hive.metastore.uris=thrift://10.0.85.83:9083
  
  #show data in log, set 0 if you do not show the logs
  data.show=10

  #monitor the throughput of flow
  monitor.throughput=true

  #server port
  server.port=8001

  #h2db port
  h2.port=50001

Command

  ./start.sh
  ./stop.sh
  ./restart.sh
  ./status.sh

PiFlow V1.3 Release

27 Jul 10:53
78e249e
Compare
Choose a tag to compare

Requirements

  • JDK 1.8
  • Scala 2.11.8
  • Spark-2.1.0、Spark-2.2.0、Spark-2.3.0(other spark version of piflow.jar should be built with code)
  • Hadoop-2.6.0 (other hadoop version of piflow.jar should be with code)
  • Hive-1.2.1(if you need to use hive,setup and modify the config.properties)

config.properties

  spark.master=yarn
  spark.deploy.mode=cluster
  
  #hdfs default file system
  fs.defaultFS=hdfs://10.0.85.83:9000
  
  #yarn resourcemanager hostname
  yarn.resourcemanager.hostname=10.0.85.83
  
  #if you want to use hive, set hive metastore uris
  #hive.metastore.uris=thrift://10.0.85.83:9083
  
  #show data in log, set 0 if you do not show the logs
  data.show=10

  #monitor the throughput of flow
  monitor.throughput=true

  #server port
  server.port=8001

  #h2db port
  h2.port=50001

Command

  ./start.sh
  ./stop.sh
  ./restart.sh
  ./status.sh

PiFlow V1.2 Release

04 Mar 11:14
2d73f41
Compare
Choose a tag to compare

Requirements

  • JDK 1.8
  • Scala 2.11.8
  • Spark-2.1.0、Spark-2.2.0、Spark-2.3.0(other spark version of piflow.jar should be built with code)
  • Hadoop-2.6.0 (other hadoop version of piflow.jar should be with code)
  • Hive-1.2.1(if you need to use hive,setup and modify the config.properties)

config.properties

  spark.master=yarn
  spark.deploy.mode=cluster
  
  #hdfs default file system
  fs.defaultFS=hdfs://10.0.85.83:9000
  
  #yarn resourcemanager hostname
  yarn.resourcemanager.hostname=10.0.85.83
  
  #if you want to use hive, set hive metastore uris
  #hive.metastore.uris=thrift://10.0.85.83:9083
  
  #show data in log, set 0 if you do not show the logs
  data.show=10

  #monitor the throughput of flow
  monitor.throughput=true

  #server port
  server.port=8001

  #h2db port
  h2.port=50001

Command

  ./start.sh
  ./stop.sh
  ./restart.sh
  ./status.sh

PiFlow V1.1 Release

06 Sep 11:13
bb7ba68
Compare
Choose a tag to compare

Requirements

  • JDK 1.8
  • Scala 2.11.8
  • Spark-2.1.0、Spark-2.2.0、Spark-2.3.0(other spark version of piflow.jar should be built with code)
  • Hadoop-2.6.0 (other hadoop version of piflow.jar should be with code)
  • Hive-1.2.1(if you need to use hive,setup and modify the config.properties)

config.properties

  spark.master=yarn
  spark.deploy.mode=cluster
  
  #hdfs default file system
  fs.defaultFS=hdfs://10.0.85.83:9000
  
  #yarn resourcemanager hostname
  yarn.resourcemanager.hostname=10.0.85.83
  
  #if you want to use hive, set hive metastore uris
  #hive.metastore.uris=thrift://10.0.85.83:9083
  
  #show data in log, set 0 if you do not show the logs
  data.show=10

  #monitor the throughput of flow
  monitor.throughput=true

  #server port
  server.port=8001

  #h2db port
  h2.port=50001

Command

  ./start.sh
  ./stop.sh
  ./restart.sh
  ./status.sh