-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EOX: Headless execution for UC3 #70
Comments
A headless access for UC3 has now been set up ans is ready for use:
Instructions the same as for UC2, see: https://github.com/FAIRiCUBE/flux-config/issues/1#issuecomment-1689488002 |
Thanks @eox-cs1! Thanks in advance. |
Yes, "conda-env-eurodatacube8-torch-py" is the only kernel for your headless usage |
@eox-cs1 it is "codecarbon". |
Before closing this issue, I came across a minor problem. |
Hey @BachirNILU
|
I have tested it and it works. |
Hi @eox-cs1 I reopen the following issue to request headless execution for the remaining UCs (UC1, UC4 and UC5) with priority to UC4 where we need GPUs. Thanks in advance, Best regards, -Bachir. |
@BachirNILU should they all only execute on gpu then ? |
@eox-cs1 it is only UC4 for now. |
the conda kernel from eurodatacube8 (called torch) got replicated to eurodatacube17 and eurodatacube18. This was necessary to avoid conflicts between running jobs. These are now the new access URLs for the UCs: The credentials are available at: https://nilu365.sharepoint.com/sites/Horizon2021_CUBE/_layouts/15/Doc.aspx?sourcedoc={235313bb-424e-4a1e-b1d6-92296d28fbfc}&action=edit&wd=target%28technical%20library.one%7C18ca003a-ff29-4de7-925e-1f11804605c2%2FEOxHub%20headless%20execution%7C6d55dc09-f667-4dac-99d1-6d67687afc59%2F%29&wdorigin=703 |
Thanks @eox-cs1 Please let me know if I am doing something wrong. |
I know that you followed the example provided above - try it with the request below: Reformatted Request:
see also https://github.com/FAIRiCUBE/flux-config/issues/1#issuecomment-1689488002 |
Thanks @eox-cs1, yes, now I have a similar error, thanks. Best, -Bachir. |
pygeoapi-job-ee64cd3e-847b-11ef-9e15-6e556aa22337 in eurodatacube18 (UC4) indicates "succeeded", please check |
Thanks! curl -X POST -v https://headless-fairicubeuc3.hub.eox.at/processes/execute-notebook/jobs \
-u user:psw \
--header 'Content-Type: application/json' \
--data-raw '{"inputs": {
"notebook": "s3/Slicing using Headless Execution/Slicing_Headless.ipynb",
"cpu_requests": "1",
"cpu_limit": "1",
"mem_requests": "4G",
"mem_limit": "4G",
"node_purpose": "userg1",
"kernel": "conda-env-eurodatacube8-torch-py"
}}'
2- The following under UC4 has error curl -X POST -v https://headless-fairicubeuc4.hub.eox.at/processes/execute-notebook/jobs \
-u user:psw \
--header 'Content-Type: application/json' \
--data-raw '{"inputs": {
"notebook": "s3/scripts/Roof_height_ML.ipynb",
"cpu_requests": "2",
"cpu_limit": "2",
"mem_requests": "8G",
"mem_limit": "8G",
"node_purpose": "userg1",
"kernel": "conda-env-eurodatacube18-torch-py"
}}' 3- The following under UC4 has error curl -X POST -v https://headless-fairicubeuc4.hub.eox.at/processes/execute-notebook/jobs \
-u user:psw \
--header 'Content-Type: application/json' \
--data-raw '{"inputs": [
{"id": "notebook", "value": "s3/scripts/Roof_height_ML.ipynb"},
{"id": "cpu_requests", "value": "1"},
{"id": "cpu_limit", "value": "1"},
{"id": "mem_requests", "value": "4G},
{"id": "mem_limit", "value": "4G"},
{"id": "node_purpose", "value": "userg1"},
{"id": "kernel", "value": "conda-env-eurodatacube18-torch-py"}
}}' |
The following updates have been performed to to FAIRiCUBE Hub: These are the new HEADLESS KERNELS for the UCs (now providing torch, openjdk and a new cdsapi) and the NAMESPACES: eurodatacube8 -> headless-fairicubeuc2.hub.eox.at using bucket s3://hub-fairicubeuc2 The corresponding JUPYTERLAB KERNELS (now provide torch, openjdk, and a new cdsapi) are: fairicubeuc1/torch_openjdk All 5 endpoints have a uniq basic-auth configured --> TEAMS (https://nilu365.sharepoint.com/sites/Horizon2021_CUBE/_layouts/15/Doc.aspx?sourcedoc={235313bb-424e-4a1e-b1d6-92296d28fbfc}&action=edit&wd=target%28technical%20library.one%7C18ca003a-ff29-4de7-925e-1f11804605c2%2FEOxHub%20headless%20execution%7C6d55dc09-f667-4dac-99d1-6d67687afc59%2F%29&wdorigin=703) All the headless endpoints can be started either with:
In addition, the smallest Multi GPU VM available on eu-central-1 g4dn.12xlarge (4 x NVIDIA T4 16 GiB) -> for $4.89 per hour on "userg2" has been configured. The following calls for headless execution are tested and work.
|
Thank you for the update @eox-cs1
|
@BachirNILU pod was restarted |
@eox-cs1 I am not sure why, but I am getting <title>500 Internal Server Error</title> now! |
sorry, @BachirNILU there is a problem with the mounting of the s3, it fails sometimes and then the pod needs a restart, which currently is done manually since we haven't quite figured out yet how to check for the failure. |
Hi,
In UC3, we want to test the headless execution.
@Schpidi provided a comprehensive step-by-step guide on running a notebook headlessly in the following issue: How to?: Headless execution.
I understand there are two options for UC3:
Can you help us with this?
Thanks in advance.
Best regards,
-Bachir.
The text was updated successfully, but these errors were encountered: