Releases: cas-bigdatalab/piflow
Releases · cas-bigdatalab/piflow
πFlow V1.9.5 Release
Features
- Add file-based pipeline scheduling mode;
- Comprehensively upgrade system security;
- Add my center.
Requirements
- JDK 1.8
- Scala 2.12.18
- Spark-3.4.0(other spark version of piflow.jar should be built with code)
- Hadoop-3.3.0(other hadoop version of piflow.jar should be with code)
config.properties
spark.master=yarn
spark.deploy.mode=cluster
#hdfs default file system
fs.defaultFS=hdfs://master:9000
#yarn resourcemanager.hostname
yarn.resourcemanager.hostname=master
#if you want to use hive, set hive metastore uris
#hive.metastore.uris=thrift://master:9083
#show data in log, set 0 if you do not want to show data in logs
data.show=5
#server ip and port, ip can not be set to localhost or 127.0.0.1
server.ip=your_ip
server.port=8002
#h2db port, path
h2.port=50002
#h2.path=test
monitor.throughput=false
#If you want to upload python stop,please set hdfs configs
#example hdfs.cluster=hostname:hostIP
hdfs.cluster=master:127.0.0.1
hdfs.web.url=master:9870
checkpoint.path=/piflow/tmp/checkpoint/
#unstructured.parse
unstructured.parse=false
#host can not be set to localhost or 127.0.0.1
# if port is not be set, default 8000
#unstructured.port=8000
#embed models path
#embed_models_path=/data/testingStuff/models/
Command
./start.sh
./stop.sh
./restart.sh
./status.sh
πFlow V1.9 Release
Features
- Add new visualization features;
- Add a new Python base image management module;
- Add vector database storage components such as Chroma, Faiss, Weaviate, Pinecone, and Qdrant.
Requirements
- JDK 1.8
- Scala 2.12.18
- Spark-3.4.0(other spark version of piflow.jar should be built with code)
- Hadoop-3.3.0(other hadoop version of piflow.jar should be with code)
config.properties
spark.master=yarn
spark.deploy.mode=cluster
#hdfs default file system
fs.defaultFS=hdfs://master:9000
#yarn resourcemanager.hostname
yarn.resourcemanager.hostname=master
#if you want to use hive, set hive metastore uris
#hive.metastore.uris=thrift://master:9083
#show data in log, set 0 if you do not want to show data in logs
data.show=5
#server ip and port, ip can not be set to localhost or 127.0.0.1
server.ip=your_ip
server.port=8002
#h2db port, path
h2.port=50002
#h2.path=test
monitor.throughput=false
#If you want to upload python stop,please set hdfs configs
#example hdfs.cluster=hostname:hostIP
hdfs.cluster=master:127.0.0.1
hdfs.web.url=master:9870
checkpoint.path=/piflow/tmp/checkpoint/
#unstructured.parse
unstructured.parse=false
#host can not be set to localhost or 127.0.0.1
# if port is not be set, default 8000
#unstructured.port=8000
#embed models path
#embed_models_path=/data/testingStuff/models/
Command
./start.sh
./stop.sh
./restart.sh
./status.sh
πFlow V1.8 Release
Features
- Add ability to parse unstructured data;
- Supports customize Server H2 database names;
- Customized Python components optimized;
- Template function optimized;
- Pipeline optimized.
Requirements
- JDK 1.8
- Scala 2.12.18
- Spark-3.4.0(other spark version of piflow.jar should be built with code)
- Hadoop-3.3.0(other hadoop version of piflow.jar should be with code)
config.properties
spark.master=yarn
spark.deploy.mode=cluster
#hdfs default file system
fs.defaultFS=hdfs://master:9000
#yarn resourcemanager.hostname
yarn.resourcemanager.hostname=master
#if you want to use hive, set hive metastore uris
#hive.metastore.uris=thrift://master:9083
#show data in log, set 0 if you do not want to show data in logs
data.show=5
#server ip and port, ip can not be set to localhost or 127.0.0.1
server.ip=your_ip
server.port=8002
#h2db port
h2.port=50002
#h2.name=test
monitor.throughput=false
#If you want to upload python stop,please set hdfs configs
#example hdfs.cluster=hostname:hostIP
hdfs.cluster=master:127.0.0.1
hdfs.web.url=master:50070
checkpoint.path=/piflow/tmp/checkpoint/
#unstructured.parse
unstructured.parse=false
Command
./start.sh
./stop.sh
./restart.sh
./status.sh
πFlow V1.7 Release
Requirements
- JDK 1.8
- Scala 2.12.18
- Spark-3.4.0(other spark version of piflow.jar should be built with code)
- Hadoop-3.3.0(other hadoop version of piflow.jar should be with code)
config.properties
spark.master=yarn
spark.deploy.mode=cluster
#hdfs default file system
fs.defaultFS=hdfs://10.0.85.83:9000
#yarn resourcemanager hostname
yarn.resourcemanager.hostname=10.0.85.83
#if you want to use hive, set hive metastore uris
#hive.metastore.uris=thrift://10.0.85.83:9083
#show data in log, set 0 if you do not show the logs
data.show=10
#monitor the throughput of flow
monitor.throughput=true
#server port
server.port=8001
#h2db port
h2.port=50001
#if you want to mount python component,please set hdfs params
#hdfs.cluster=master:127.0.0.1;slave1:127.0.0.2
#hdfs.web.url=master:50070
Command
./start.sh
./stop.sh
./restart.sh
./status.sh
πFlow V1.6 Release
Requirements
- JDK 1.8
- Scala 2.12.18
- Spark-3.4.0(other spark version of piflow.jar should be built with code)
- Hadoop-3.3.0(other hadoop version of piflow.jar should be with code)
config.properties
spark.master=yarn
spark.deploy.mode=cluster
#hdfs default file system
fs.defaultFS=hdfs://10.0.85.83:9000
#yarn resourcemanager hostname
yarn.resourcemanager.hostname=10.0.85.83
#if you want to use hive, set hive metastore uris
#hive.metastore.uris=thrift://10.0.85.83:9083
#show data in log, set 0 if you do not show the logs
data.show=10
#monitor the throughput of flow
monitor.throughput=true
#server port
server.port=8001
#h2db port
h2.port=50001
#if you want to mount python component,please set hdfs params
#hdfs.cluster=master:127.0.0.1;slave1:127.0.0.2
#hdfs.web.url=master:50070
Command
./start.sh
./stop.sh
./restart.sh
./status.sh
πFlow V1.5 Release
Requirements
- JDK 1.8
- Scala 2.11.8
- Spark-2.1.0、Spark-2.2.0、Spark-2.3.0(other spark version of piflow.jar should be built with code)
- Hadoop-2.6.0 (other hadoop version of piflow.jar should be with code)
- Hive-1.2.1(if you need to use hive,setup and modify the config.properties)
config.properties
spark.master=yarn
spark.deploy.mode=cluster
#hdfs default file system
fs.defaultFS=hdfs://10.0.85.83:9000
#yarn resourcemanager hostname
yarn.resourcemanager.hostname=10.0.85.83
#if you want to use hive, set hive metastore uris
#hive.metastore.uris=thrift://10.0.85.83:9083
#show data in log, set 0 if you do not show the logs
data.show=10
#monitor the throughput of flow
monitor.throughput=true
#server port
server.port=8001
#h2db port
h2.port=50001
#if you want to mount python component,please set hdfs params
#hdfs.cluster=master:127.0.0.1;slave1:127.0.0.2
#hdfs.web.url=master:50070
Command
./start.sh
./stop.sh
./restart.sh
./status.sh
PiFlow V1.4 Release
Requirements
- JDK 1.8
- Scala 2.11.8
- Spark-2.1.0、Spark-2.2.0、Spark-2.3.0(other spark version of piflow.jar should be built with code)
- Hadoop-2.6.0 (other hadoop version of piflow.jar should be with code)
- Hive-1.2.1(if you need to use hive,setup and modify the config.properties)
config.properties
spark.master=yarn
spark.deploy.mode=cluster
#hdfs default file system
fs.defaultFS=hdfs://10.0.85.83:9000
#yarn resourcemanager hostname
yarn.resourcemanager.hostname=10.0.85.83
#if you want to use hive, set hive metastore uris
#hive.metastore.uris=thrift://10.0.85.83:9083
#show data in log, set 0 if you do not show the logs
data.show=10
#monitor the throughput of flow
monitor.throughput=true
#server port
server.port=8001
#h2db port
h2.port=50001
Command
./start.sh
./stop.sh
./restart.sh
./status.sh
PiFlow V1.3 Release
Requirements
- JDK 1.8
- Scala 2.11.8
- Spark-2.1.0、Spark-2.2.0、Spark-2.3.0(other spark version of piflow.jar should be built with code)
- Hadoop-2.6.0 (other hadoop version of piflow.jar should be with code)
- Hive-1.2.1(if you need to use hive,setup and modify the config.properties)
config.properties
spark.master=yarn
spark.deploy.mode=cluster
#hdfs default file system
fs.defaultFS=hdfs://10.0.85.83:9000
#yarn resourcemanager hostname
yarn.resourcemanager.hostname=10.0.85.83
#if you want to use hive, set hive metastore uris
#hive.metastore.uris=thrift://10.0.85.83:9083
#show data in log, set 0 if you do not show the logs
data.show=10
#monitor the throughput of flow
monitor.throughput=true
#server port
server.port=8001
#h2db port
h2.port=50001
Command
./start.sh
./stop.sh
./restart.sh
./status.sh
PiFlow V1.2 Release
Requirements
- JDK 1.8
- Scala 2.11.8
- Spark-2.1.0、Spark-2.2.0、Spark-2.3.0(other spark version of piflow.jar should be built with code)
- Hadoop-2.6.0 (other hadoop version of piflow.jar should be with code)
- Hive-1.2.1(if you need to use hive,setup and modify the config.properties)
config.properties
spark.master=yarn
spark.deploy.mode=cluster
#hdfs default file system
fs.defaultFS=hdfs://10.0.85.83:9000
#yarn resourcemanager hostname
yarn.resourcemanager.hostname=10.0.85.83
#if you want to use hive, set hive metastore uris
#hive.metastore.uris=thrift://10.0.85.83:9083
#show data in log, set 0 if you do not show the logs
data.show=10
#monitor the throughput of flow
monitor.throughput=true
#server port
server.port=8001
#h2db port
h2.port=50001
Command
./start.sh
./stop.sh
./restart.sh
./status.sh
PiFlow V1.1 Release
Requirements
- JDK 1.8
- Scala 2.11.8
- Spark-2.1.0、Spark-2.2.0、Spark-2.3.0(other spark version of piflow.jar should be built with code)
- Hadoop-2.6.0 (other hadoop version of piflow.jar should be with code)
- Hive-1.2.1(if you need to use hive,setup and modify the config.properties)
config.properties
spark.master=yarn
spark.deploy.mode=cluster
#hdfs default file system
fs.defaultFS=hdfs://10.0.85.83:9000
#yarn resourcemanager hostname
yarn.resourcemanager.hostname=10.0.85.83
#if you want to use hive, set hive metastore uris
#hive.metastore.uris=thrift://10.0.85.83:9083
#show data in log, set 0 if you do not show the logs
data.show=10
#monitor the throughput of flow
monitor.throughput=true
#server port
server.port=8001
#h2db port
h2.port=50001
Command
./start.sh
./stop.sh
./restart.sh
./status.sh