Things to do in order for them to run correctly:

Note: This repository is archived and merged into listenbrainz-server. Please open all pull requests in the listenbrainz-server codebase.

Set env var:

export PYSPARK_PYTHON=which python3

Install required modules:

pip3 install -r requirements.txt

Install java and scala:

apt-get install default-jdk scala

Install spark (download 2.3.0 tgz for hadoop and unzip in /usr/local/spark

To run the scripts:

spark-submit --master spark://195.201.112.36:7077 --executor-memory=29g pwd/<script>

spark-submit --master spark://195.201.112.36:7077 --executor-memory=29g pwd/train_models.py df models

Name		Name	Last commit message	Last commit date
Latest commit History 352 Commits
.github		.github
docker		docker
listenbrainz_spark		listenbrainz_spark
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
SCRIPTS.md		SCRIPTS.md
config.sh.sample		config.sh.sample
develop.sh		develop.sh
manage.py		manage.py
mlhd_manage.py		mlhd_manage.py
pytest.ini		pytest.ini
queries.md		queries.md
read.py		read.py
readme.md		readme.md
requirements.txt		requirements.txt
run.sh		run.sh
spark-submit.sh		spark-submit.sh
test.sh		test.sh