Using rootless podman on HPC cluster, where many concurrent jobs on different nodes access same imagestore #24999

bbimber · 2025-01-12T15:56:51Z

bbimber
Jan 12, 2025

Hello,

We are trying to run rootless podman-based jobs on a high-performance computing cluster. As-is, the cluster is configured where each user has a folder defined as the graphroot (set in storage.conf), where image and container information is stored. Under this pattern, independent podman jobs that run on different nodes are concurrently accessing this one central store of images.

I am seeing lots of errors that I assume are due to multiple jobs butting against each other when trying to interact with this one central image store. This includes podman-run errors like 'Error: beginning transaction: database is locked', or cryptic errors about 'container not found'.

Are there ways to configure the podman image store that work well in a cluster-like environment?

If we can reasonably do so, I would like to share images across jobs (avoiding re-download); however, we do not want containers to persist after a job. I would be perfectly happy using a temporary job-specific folder for these data; however, would prefer not to duplicate the base image layers if we can avoid it.

I am looking at ways to configure runroot (https://docs.podman.io/en/stable/markdown/podman.1.html), or similar options, but it wasnt quite sure what combination of options fits this situation best.

Thank you for any ideas or suggestions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using rootless podman on HPC cluster, where many concurrent jobs on different nodes access same imagestore #24999

{{title}}

Replies: 0 comments

Select a reply

Using rootless podman on HPC cluster, where many concurrent jobs on different nodes access same imagestore #24999

bbimber Jan 12, 2025

Replies: 0 comments

bbimber
Jan 12, 2025