Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

James/airflow #5

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

James/airflow #5

wants to merge 4 commits into from

Conversation

jameshod5
Copy link
Collaborator

These are the steps I've had to do to get airflow running, there is probably cleaner ways to do this but this worked for me. I cannot get it going within the ingestion repo yet but will look at it in the future:

Create a directory outside the fair-mast-ingestion repo, call it airflow-dir for now. Run pretty much the same instructions for the ingestion to get the environment setup correctly:

module load python-3.9.6-gcc-5.4.0-sbr552h
python -m venv airflow-venv
source airflow-venv/bin/activate
python -m pip install -U pip
python -m pip install -e ../fair-mast-ingestion/
git clone git@git.ccfe.ac.uk:MAST-U/mastcodes.git
cd mastcodes

Edit uda/python/setup.py and change the "version" to 1.3.9.

python -m pip install uda/python
cd ..
source ~/rds/rds-ukaea-mast-sPGbyCAPsJI/uda-ssl.sh

Now we install Airflow, and get the config set up to point to the correct areas.

pip install "apache-airflow[celery]==2.10.0" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.10.0/constraints-3.8.txt"

Run Airflow for the first time, this will put an airflow dir in your airflow-dir along with the config file.

NO_PROXY="*" AIRFLOW_HOME="$(pwd)/airflow" airflow standalone

You can now shut down the process. Edit /airflow/airflow.cfg and change the following:

dags_folder = "PATH"/fair-mast-ingestion/src/dags/
load_examples = False

Reset the database and re-run:

NO_PROXY="*" AIRFLOW_HOME="$(pwd)/airflow" airflow db reset
NO_PROXY="*" AIRFLOW_HOME="$(pwd)/airflow" airflow standalone

Make note of the username and password in the terminal, and head to http://localhost:8080/ to log in.

The metadata-processing DAG should appear in your DAGs. Before triggering the workflow, please change the paths within src/dags/ingestion_dag.py to your own. This is something I need to fix still.

@NathanCummings
Copy link
Member

The first point I would make, is that we won't want airflow to become a dependency of the ingestion workflow. They should be running from separate python environments. Airflow has the ability to run tasks using their own virtual environment.

@NathanCummings
Copy link
Member

Now we install Airflow, and get the config set up to point to the correct areas.

pip install "apache-airflow[celery]==2.10.0" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.10.0/constraints-3.8.txt"

Could you explain this bit?

@NathanCummings
Copy link
Member

Also, what is NO_PROXY="*" about?

@jameshod5
Copy link
Collaborator Author

The first point I would make, is that we won't want airflow to become a dependency of the ingestion workflow. They should be running from separate python environments. Airflow has the ability to run tasks using their own virtual environment.

This is how it is set up at the moment anyway following the above instructions. I think you and Sam are giving me conflicting ideas on where Airflow should run?

Now we install Airflow, and get the config set up to point to the correct areas.

pip install "apache-airflow[celery]==2.10.0" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.10.0/constraints-3.8.txt"

Could you explain this bit?

This is just straight from the docs: docs. It explains why there are constraints, and you can change them i.e. use python 3.7 instead of 3.8.

Also, what is NO_PROXY="*" about?

I was meant to remove that, it should work without it (i don't know what it does, but a tutorial online used it when I was struggling to get Airflow going). The AIRFLOW_HOME is useful to keep though.

@NathanCummings
Copy link
Member

The first point I would make, is that we won't want airflow to become a dependency of the ingestion workflow. They should be running from separate python environments. Airflow has the ability to run tasks using their own virtual environment.

This is how it is set up at the moment anyway following the above instructions. I think you and Sam are giving me conflicting ideas on where Airflow should run?

Ah ok, sorry about that. We can all discuss next week to make sure we're all on the same page.

Now we install Airflow, and get the config set up to point to the correct areas.

pip install "apache-airflow[celery]==2.10.0" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.10.0/constraints-3.8.txt"

Could you explain this bit?

This is just straight from the docs: docs. It explains why there are constraints, and you can change them i.e. use python 3.7 instead of 3.8.

Ok, we need to tidy it up then, as in the docs this is just an example and not the actual configuration we want. In fact, also in reference to the above point, installing airflow with uv tools or pipx may be more appropriate.

Also, what is NO_PROXY="*" about?

I was meant to remove that, it should work without it (i don't know what it does, but a tutorial online used it when I was struggling to get Airflow going). The AIRFLOW_HOME is useful to keep though.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants