Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed Install on Jetson AGX Orin 64GB Developer Kit #25

Open
MacFGalempsy opened this issue Feb 21, 2024 · 1 comment
Open

Failed Install on Jetson AGX Orin 64GB Developer Kit #25

MacFGalempsy opened this issue Feb 21, 2024 · 1 comment

Comments

@MacFGalempsy
Copy link

Greetings, I have been searching for a way to run koyha_ss on the Jetson AGX Orin within the nvidia container, so it will utilize the GPU. After copying this git and running the docker compose line, the following message was received.

user@ubuntu:~/kohya_ss-docker$ docker compose --profile kohya up --build
[+] Building 1.1s (22/27) docker:default
=> [kohya internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 5.35kB 0.0s
=> [kohya] resolve image config for docker.io/docker/dockerfile:1 0.3s
=> CACHED [kohya] docker-image://docker.io/docker/dockerfile:1@sha256:ac 0.0s
=> [kohya internal] load metadata for docker.io/library/python:3.10-slim 0.2s
=> [kohya internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [kohya internal] load build context 0.0s
=> => transferring context: 187B 0.0s
=> [kohya base 1/7] FROM docker.io/library/python:3.10-slim@sha256:4bd9a 0.0s
=> CACHED [kohya base 2/7] RUN <<EOF (# apt for general container depend 0.0s
=> CACHED [kohya base 3/7] RUN <<EOF (# apt for extensions/custom script 0.0s
=> CACHED [kohya base 4/7] RUN <<EOF (# apt configurations...) 0.0s
=> CACHED [kohya base 5/7] RUN <<EOF (# cuda configurations...) 0.0s
=> CACHED [kohya base 6/7] COPY ./scripts/install-container-dep.sh /dock 0.0s
=> CACHED [kohya base 7/7] RUN <<EOF (# cuda cudnn + cutlass + tensorrt. 0.0s
=> CACHED [kohya kohya_base 1/8] RUN <<EOF (git clone https://github.com 0.0s
=> CACHED [kohya kohya_base 2/8] WORKDIR /koyah_ss 0.0s
=> CACHED [kohya kohya_base 3/8] RUN <<EOF (# Build requirements...) 0.0s
=> CACHED [kohya kohya_base 4/8] RUN <<EOF (# tensorflow...) 0.0s
=> CACHED [kohya kohya_base 5/8] RUN <<EOF (# torch, torchvision, torcha 0.0s
=> CACHED [kohya kohya_base 6/8] RUN <<EOF (# xformers...) 0.0s
=> CACHED [kohya kohya_base 7/8] RUN <<EOF (# deepspeed...) 0.0s
=> CACHED [kohya kohya_base 8/8] RUN <<EOF (#jax/tpu...) 0.0s
=> ERROR [kohya kohya_cuda 1/2] RUN <<EOF (# Hotfix for libnvinfer7...) 0.3s

[kohya kohya_cuda 1/2] RUN <<EOF (# Hotfix for libnvinfer7...):
0.276 + ln -s /venv/lib/python3.10/site-packages/tensorrt/libnvinfer.so.8 /venv/lib/python3.10/site-packages/tensorrt/libnvinfer.so.7
0.278 ln: failed to create symbolic link '/venv/lib/python3.10/site-packages/tensorrt/libnvinfer.so.7': No such file or directory


failed to solve: process "/bin/bash -ceuxo pipefail # Hotfix for libnvinfer7\nln -s $TENSORRT_PATH/libnvinfer.so.8 $TENSORRT_PATH/libnvinfer.so.7\nln -s $TENSORRT_PATH/libnvinfer_plugin.so.8 $TENSORRT_PATH/libnvinfer_plugin.so.7\n" did not complete successfully: exit code: 1

Any thoughts on getting past this, so we can move on with some training?

@tokenwizard
Copy link

I am getting this same error on my Arch Linux desktop with my 4090. I have Nvidia Container Toolkit installed and the card works fine in some other docker containers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants