Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetCDFMultiFieldList does not implement to_xarray #494

Open
aperaza-bsc opened this issue Oct 23, 2024 · 2 comments
Open

NetCDFMultiFieldList does not implement to_xarray #494

aperaza-bsc opened this issue Oct 23, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@aperaza-bsc
Copy link

What happened?

I am trying to run ai-models on a set of models. I want to launch multiple forecasts at different timesteps and compare them with the outputs of ERA5. Due to the requirements of my HPC, I do not have access to internet and thus, needed to download the data. For that purpose, I have created a Zarr dataset from netcdf previously requested from CDS. From this dataset I extract the initial conditions required for a model and convert them to netcdf to use as input for ai-models.

Now, I'm encountering many difficulties mainly on the use of earthkit-data to parse this netcdf files. The main one is on the use of the object NetCDFMultiFieldList as it's not implementing to_xarray.
To guide why this object is created, function reader from earthkit.data.readers.netcdf returns fs as fs.has_fields() returns True. This creates a NetCDFFieldListReader with all the fields read from the netcdf. Afterwards, this object mutates to a NetCDFMultiFieldList and ai-models tries to convert it to Xarray and fails since this conversion is not implemented.
If fs.has_fields() were to be False, a NetCDFReader object would be created, while this raises other errors, a conversion to Xarray is implemented for it. I have noticed that the methods mutate_source from NetCDFFieldListReader and NetCDFFieldListUrlReader have comments regarding themselves as already being NetCDFReader objects (" # A NetCDFReader is a source itself"). I wonder if these classes were supposed to inherit from NetCDFReader or if there are plans for this to occur at some point.

What are the steps to reproduce the bug?

Requirements with versions to reproduce the bug:
xarray=2024.0.0
netCDF4=1.7.1.post2
ai-models=0.7.0
ai-models-panguweather=0.0.7

  • Download 2023-01-01 initial conditions required to run Panguweather from CDS. To do that, I have used the following climetlab requests.
import climetlab as cml
sfc_data = cml.load_source(
    "cds", 
    "reanalysis-era5-single-levels", 
    variable =  ["msl", "10u", "10v", "2t"],
    product_type = "reanalysis",
    area = [90, 0, -90, 360],
    grid = [0.25, 0.25],
    year = ["2023"],
    month = ["01"],
    day = ["01"],
    time = "12:00",
    format = "nc",
    data_format = "netcdf",
    nocache = 123
)


atm_data = cml.load_source(
    "cds", 
    "reanalysis-era5-pressure-levels", 
    variable = ["z", "q", "t", "u", "v"],
    level = [50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000],
    product_type = "reanalysis",
    area = [90, 0, -90, 360],
    grid = [0.25, 0.25],
    year = ["2023"],
    month = ["01"],
    day = ["01"],
    time = "12:00",
    format = "nc",
    data_format = "netcdf",
    nocache = 124
)
  • Locate them and join them in a single netcdf file. I joined them with xr.open_mfdataset default arguments without any errors and preserving all the variables with the appropiate dimensions. Given the names "sfc_pangu_test.nc" and "pl_pangu_test.nc". Note that pressure_level is renamed to pl since it will be the levtype used to select atmospheric variables in ai-models (in this line)
import xarray as xr
xr.open_mfdataset("*_pangu_test.nc").rename({"pressure_level": "pl"}).to_netcdf("pangu_test.nc")
  • Run ai-models with the created file pangu_test.nc as follows:
    ai-models --download-assets --input file --file pangu_test.nc --output file --path pangu_output.grib --date 20230101 --time 1200 --lead-time 24 panguweather

I will append the returned log. Don't mind the ONNX error, it is not relevant for this issue and it simply implies that the model will be running in CPU instead of GPU as it didn't find cudnn installed. This code was ran from an environment with the aforementioned dependencies between others. This code was launched in a login node from the MareNostrum5 accelerated partition that has access to internet and is only accessible by staff from Barcelona Supercomputing Center. I am confident that the error is not related with the platform and it will happen again for any other platform.

Version

0.10.4

Platform (OS and architecture)

Linux alogin4 5.14.0-284.30.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Aug 25 09:13:12 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux

Relevant log output

2024-10-23 12:25:01,987 INFO Writing results to pangu_output.grib
2024-10-23 12:25:01,987 INFO Downloading /gpfs/home/bsc/bsc032010/git/climate-emulators/test_earthkit/pangu_weather_24.onnx
2024-10-23 12:25:01,987 INFO Downloading https://get.ecmwf.int/repository/test-data/ai-models/pangu-weather/pangu_weather_24.onnx
2024-10-23 12:25:48,428 INFO Downloading /gpfs/home/bsc/bsc032010/git/climate-emulators/test_earthkit/pangu_weather_6.onnx                                                    
2024-10-23 12:25:48,428 INFO Downloading https://get.ecmwf.int/repository/test-data/ai-models/pangu-weather/pangu_weather_6.onnx
2024-10-23 12:26:40,144 INFO Using device 'GPU'. The speed of inference depends greatly on the device.                                                                        
2024-10-23 12:26:40,145 INFO ONNXRuntime providers: ['CUDAExecutionProvider', 'CPUExecutionProvider']
2024-10-23 12:26:40.755689924 [E:onnxruntime:Default, provider_bridge_ort.cc:1992 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1637 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.9: cannot open shared object file: No such file or directory

2024-10-23 12:26:40.755744660 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:965 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Require cuDNN 9.* and CUDA 12.*. Please install all dependencies as mentioned in the GPU requirements page (https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements), make sure they're in the PATH, and that your GPU is supported.
2024-10-23 12:26:50,047 INFO Loading ./pangu_weather_24.onnx: 9 seconds.
2024-10-23 12:26:50.663496877 [E:onnxruntime:Default, provider_bridge_ort.cc:1992 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1637 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.9: cannot open shared object file: No such file or directory

2024-10-23 12:26:50.663522316 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:965 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Require cuDNN 9.* and CUDA 12.*. Please install all dependencies as mentioned in the GPU requirements page (https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements), make sure they're in the PATH, and that your GPU is supported.
2024-10-23 12:27:00,014 INFO Loading ./pangu_weather_6.onnx: 9 seconds.
2024-10-23 12:27:00,065 INFO Starting date is 2023-01-01 12:00:00
2024-10-23 12:27:00,066 INFO Writing input fields
2024-10-23 12:27:00,067 INFO Total time: 1 minute 58 seconds.
Traceback (most recent call last):
  File "/gpfs/projects/bsc32/ml_models/emulator_models/ecmwf_ai_models/wf_emulator_snake_2/bin/ai-models", line 8, in <module>
    sys.exit(main())
  File "/gpfs/projects/bsc32/ml_models/emulator_models/ecmwf_ai_models/wf_emulator_snake_2/lib/python3.10/site-packages/ai_models/__main__.py", line 362, in main
    _main(sys.argv[1:])
  File "/gpfs/projects/bsc32/ml_models/emulator_models/ecmwf_ai_models/wf_emulator_snake_2/lib/python3.10/site-packages/ai_models/__main__.py", line 310, in _main
    run(vars(args), unknownargs)
  File "/gpfs/projects/bsc32/ml_models/emulator_models/ecmwf_ai_models/wf_emulator_snake_2/lib/python3.10/site-packages/ai_models/__main__.py", line 335, in run
    model.run()
  File "/home/bsc/bsc032010/.local/lib/python3.10/site-packages/ai_models_panguweather/model.py", line 89, in run
    self.write_input_fields(fields_pl + fields_sfc)
  File "/gpfs/projects/bsc32/ml_models/emulator_models/ecmwf_ai_models/wf_emulator_snake_2/lib/python3.10/site-packages/ai_models/model.py", line 545, in write_input_fields
    fields.save("input.grib")
  File "/gpfs/projects/bsc32/ml_models/emulator_models/ecmwf_ai_models/wf_emulator_snake_2/lib/python3.10/site-packages/earthkit/data/decorators.py", line 65, in wrapped
    return func(self, *args, **kwargs)
  File "/gpfs/projects/bsc32/ml_models/emulator_models/ecmwf_ai_models/wf_emulator_snake_2/lib/python3.10/site-packages/earthkit/data/core/fieldlist.py", line 1418, in save
    self.write(f, **kwargs)
  File "/gpfs/projects/bsc32/ml_models/emulator_models/ecmwf_ai_models/wf_emulator_snake_2/lib/python3.10/site-packages/earthkit/data/readers/netcdf/fieldlist.py", line 286, in write
    return self.to_netcdf(*args, **kwargs)
  File "/gpfs/projects/bsc32/ml_models/emulator_models/ecmwf_ai_models/wf_emulator_snake_2/lib/python3.10/site-packages/earthkit/data/readers/netcdf/fieldlist.py", line 215, in to_netcdf
    return self.to_xarray().to_netcdf(*args, **kwargs)
  File "/gpfs/projects/bsc32/ml_models/emulator_models/ecmwf_ai_models/wf_emulator_snake_2/lib/python3.10/site-packages/earthkit/data/readers/netcdf/fieldlist.py", line 363, in to_xarray
    raise NotImplementedError(
NotImplementedError: NetCDFMultiFieldList.to_xarray() does not supports NetCDFMaskFieldList

Accompanying data

No response

Organisation

Barcelona Supercomputing center

@aperaza-bsc aperaza-bsc added the bug Something isn't working label Oct 23, 2024
@sandorkertesz
Copy link
Collaborator

sandorkertesz commented Oct 23, 2024

@aperaza-bsc, thank you for the detailed report. Apart from the issue withNetCDFMultiFieldList I am not sure ai-models can be run with NetCDF input. The documentation only mentions GRIB.

@aperaza-bsc
Copy link
Author

Thanks for the quick response!
You are right. I badly assumed that as it used earthkit-data as its backed it would be also format agnostic.
Unless you are still interested in looking at the issue, we can close this from my part since even if this was solved, I might encounter other issues by using netcdf for ai-models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants