Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10-init-users.sh: sed: can't move '/etc/group' to '/etc/group.bak': Invalid argument #137

Open
mikeyo opened this issue Jan 28, 2023 · 23 comments

Comments

@mikeyo
Copy link

mikeyo commented Jan 28, 2023

Trying to deploy in portainer and container refuses to work. See below log. Tried as a stack but also fails with same error.

[init ] container is starting...
[cont-env ] loading container environment variables...
[cont-env ] APP_NAME: loading...
[cont-env ] DISPLAY: executing...
[cont-env ] DISPLAY: terminated successfully.
[cont-env ] DISPLAY: loading...
[cont-env ] DOCKER_IMAGE_PLATFORM: loading...
[cont-env ] DOCKER_IMAGE_VERSION: loading...
[cont-env ] GTK_THEME: executing...
[cont-env ] GTK_THEME: terminated successfully.
[cont-env ] GTK_THEME: loading...
[cont-env ] HOME: loading...
[cont-env ] INSTALL_PACKAGES_INTERNAL: executing...
[cont-env ] INSTALL_PACKAGES_INTERNAL: terminated successfully.
[cont-env ] INSTALL_PACKAGES_INTERNAL: loading...
[cont-env ] QT_STYLE_OVERRIDE: executing...
[cont-env ] QT_STYLE_OVERRIDE: terminated successfully.
[cont-env ] QT_STYLE_OVERRIDE: loading...
[cont-env ] TAKE_CONFIG_OWNERSHIP: loading...
[cont-env ] XDG_CACHE_HOME: loading...
[cont-env ] XDG_CONFIG_HOME: loading...
[cont-env ] XDG_DATA_HOME: loading...
[cont-env ] XDG_RUNTIME_DIR: loading...
[cont-env ] XDG_STATE_HOME: loading...
[cont-env ] container environment variables initialized.
[cont-secrets] loading container secrets...
[cont-secrets] container secrets loaded.
[cont-init ] executing container initialization scripts...
[cont-init ] 10-certs.sh: executing...
[cont-init ] 10-certs.sh: terminated successfully.
[cont-init ] 10-check-app-niceness.sh: executing...
[cont-init ] 10-check-app-niceness.sh: terminated successfully.
[cont-init ] 10-cjk-font.sh: executing...
[cont-init ] 10-cjk-font.sh: terminated successfully.
[cont-init ] 10-clean-logmonitor-states.sh: executing...
[cont-init ] 10-clean-logmonitor-states.sh: terminated successfully.
[cont-init ] 10-clean-tmp-dir.sh: executing...
[cont-init ] 10-clean-tmp-dir.sh: terminated successfully.
[cont-init ] 10-fontconfig-cache-dir.sh: executing...
[cont-init ] 10-fontconfig-cache-dir.sh: terminated successfully.
[cont-init ] 10-init-users.sh: executing...
[cont-init ] 10-init-users.sh: sed: can't move '/etc/group' to '/etc/group.bak': Invalid argument
[cont-init ] 10-init-users.sh: terminated with error 1.

@jlesage
Copy link
Owner

jlesage commented Jan 28, 2023

Are you using the container on Proxmox in a LXC container ?

@mikeyo
Copy link
Author

mikeyo commented Jan 28, 2023

Are you using the container on Proxmox in a LXC container ?

Yes I am.

@jlesage
Copy link
Owner

jlesage commented Jan 28, 2023

A recent Proxmox update seems that have broken something: changes inside the container are no longer allowed. This can potentially affects any container, not just this one.

Can you share the config of your LXC container ? I tried to reproduce on my side and was not able.

Maybe you could also try to see if you have the same issue with a new LCX container.

@mikeyo
Copy link
Author

mikeyo commented Jan 29, 2023

A recent Proxmox update seems that have broken something: changes inside the container are no longer allowed. This can potentially affects any container, not just this one.

Can you share the config of your LXC container ? I tried to reproduce on my side and was not able.

Maybe you could also try to see if you have the same issue with a new LCX container.

Here you go.

unprivileged = no

cores: 2
features: nesting=1
hostname: MOLXCDOCKER01
memory: 8192
net0: name=eth0,bridge=vmbr0,firewall=1,gw=192.168.1.100,hwaddr=56:CB:97:8D:63:03,ip=192.168.1.xxx/24,type=veth
ostype: ubuntu
rootfs: VM01:subvol-132-disk-0,size=64G
swap: 512

@jlesage
Copy link
Owner

jlesage commented Jan 30, 2023

Did you removed apparmor to be able to run a container with this config ?

@jlesage
Copy link
Owner

jlesage commented Jan 30, 2023

Also, were you able to test with a new LXC container to verify if you have the same issue ?

@bernhard-da
Copy link

bernhard-da commented Jan 30, 2023

I have the exact same behaviour as @mikeyo ; my lxc-conf (a privileged container) on a fully-updated proxmox-system

arch: amd64
cores: 4
features: mknod=1,mount=nfs,nesting=1
hostname: xy
memory: 6144
net0: name=eth0,bridge=vmbr0,firewall=1,hwaddr=DA:D1:BD:18:A4:90,ip=dhcp,type=veth
ostype: archlinux
rootfs: nvmepool:subvol-102-disk-0,size=30G
swap: 1024
lxc.apparmor.profile: unconfined
lxc.cap.drop: 

@jlesage
Copy link
Owner

jlesage commented Jan 30, 2023

Thanks for the info, but now we need to know if you can reproduce after creating a new LXC container. I cannot reproduce myself with a fresh install of Proxmox 7.3-3 (no subscription), with the same LXC container config.

@jlesage
Copy link
Owner

jlesage commented Jan 30, 2023

It would also be useful if you could contact Proxmox support about this problem. This issue affects any container, not only this one. Looks like the problem occurs when modifying the content of the container. For example, you can try the following:

docker run --rm -ti alpine:3.17 sh
mv /etc/group /etc/group.bak

Or

docker run --rm -ti ubuntu:22.04 bash
mv /etc/group /etc/group.bak

To be confirmed, but doing other kind of changes with other files should result in the same failures.

@mikeyo
Copy link
Author

mikeyo commented Jan 30, 2023

Did you removed apparmor to be able to run a container with this config ?

Yes, had to remove this so I could make changes.

@mikeyo
Copy link
Author

mikeyo commented Jan 30, 2023

It would also be useful if you could contact Proxmox support about this problem. This issue affects any container, not only this one. Looks like the problem occurs when modifying the content of the container. For example, you can try the following:

docker run --rm -ti alpine:3.17 sh
mv /etc/group /etc/group.bak

Or

docker run --rm -ti ubuntu:22.04 bash
mv /etc/group /etc/group.bak

To be confirmed, but doing other kind of changes with other files should result in the same failures.

I can confirm that running the mv command manually produces this error on any file.

The LXC container is a new container. I created a ubuntu 22.10 template, installed docker, portainer, removed apparmor.

@bernhard-da
Copy link

I made now some tests and for me the issue seems to be a combination of overlayfs with backing filesystem zfs for docker in the lxc container; As soon as I installed fuse-overlayfs, added "storage-driver": "fuse-overlayfs" to /etc/docker/daemon.json, restarted docker and recreated the container, the error has been gone and I also do not see the issue @jlesage posted (simple example to mv /etc/group) any longer. So there is in fact no issue at all with your container. Thx again for you work!

@jlesage
Copy link
Owner

jlesage commented Jan 30, 2023

Thanks @bernhard-da for the details, that makes a lot of sense.

@guba91
Copy link

guba91 commented Feb 7, 2023

for me this procedure worked and i'm able to replicate it:
enable keyctl, nesting, fuse on lxc container, then reboot lxc.
mkdir /etc/docker
nano /etc/docker/daemon.json
{
"storage-driver": "fuse-overlayfs"
} <- put this inside the new file daemon.json
apt install fuse-overlayfs
fuse-overlayfs --version
reboot lxc or reload docker service
docker info if you see Storage Driver: fuse-overlayfs is done..

@Ramalama2
Copy link

Ramalama2 commented Jun 13, 2023

There is an easy solution for this.

Simply make an ext4 storage for your docker lxc containers. Learned this the hard way long ago.
That issue happens, because overlay2 on zfs doesn't support "RENAME_WHITEOUT"

If you guys simply look into your proxmox dmesg, you'll see that.

BUT!
Things are slowly changing, a year ago, overlay2 on zfs didn't worked at all.
Since some moths, 5 or 6, overlay2 got a lot improvements, so basically 90% works on zfs.
As far as i see only "rename_whiteout" needs to be implemented, and xino has some issues. (xino isn't correctly working on ext4 either)

About the docker workaround:
I myself left some space during proxmox install, then added a new partition with gdisk, formatted it with ext4 and mounted it inside proxmox as directory.

But as you can't shrink a zfs volume (or you can, but its hard + you need a spare drive), there is an easier solution:

  1. Create a zvol device (set the size as you need):
    zfs create -V 50gb YOURTANK/dockervol

  2. Format as ext4:
    mkfs.ext4 /dev/zvol/YOURTANK/dockervol

  3. mount that somewhere, you can do it permanently with fstab:
    /dev/zvol/YOURTANK/dockervol /dockervol ext4 defaults 0 0

4 Now you have 2 Options.
1 - Either you mount that directory inside LXC directly:
mp0: /dockervol,mp=/var/lib/docker,replicate=0
2 - Or you add that mountpoint as a Directory in the Proxmox Gui:
Datacenter -> Storage -> Add -> Directory -> Content: Disk Image, Container
And you move simply your LXC Disk onto that "Storage"

If you run just 1 lxc docker container,i would recommend 1.
If you have already a working docker container, and you're too lazy to rsync "/var/lib/docker" onto the new mountpoint, or you have multiple docker containers, then i would do 2.

There are no benefits or downsides, if you do just the mountpoint, you simply save a bit space. Moving the whole lxc disk to that ext4 storage just takes up more space, but your backups will include the /var/lib/docker directory.
If you just mountpoint, your backups won't have that directory.

If you want to delete that zvol, because you don't need it or whatever:
zfs destroy YOURTANK/dockervol

Cheers

@t0ny-peng
Copy link

hi @Ramalama2. Thanks for the great idea of reformatting a ZFS dataset to ext4 which I didn't even think of before. However, in my case even after mounting that volume as ext4 to a folder, docker still fails to start with the same error sed: cannot rename /etc/group: Invalid argument.

$ cat docker-compose.yaml

version: '3'

services:
  baidunetdisk:
    image: johngong/baidunetdisk:latest
    container_name: baidunetdisk
    ports:
      - "5800:5800"
      - "5900:5900"
    volumes:
      - /dockervol:/config
    # restart: unless-stopped

And the info about that folder is:

root@pve:~/docker/baidu_net_disk#
fdisk /dev/zd176

Welcome to fdisk (util-linux 2.36.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

The device contains 'ext4' signature and it will be removed by a write command. See fdisk(8) man page and --wipe option for more details.

And dmesg still shows a lot of [100403.988551] overlayfs: upper fs does not support RENAME_WHITEOUT.. Any idea what might be wrong? Thanks!

@Ramalama2
Copy link

A bit late reply, didn't checked github for a long time.
It looks for me like /var/lib/docker still runs on an zfs volume.

Rarely but sometimes i had even the case, that i had to reinstall the container on the new ext4 volume, means recreating the container from scratch and install docker inside etc...
I don't know why it sometimes doesn't work with simply moving the container to the new ext4 dataset, but yeah.

However, if it still won't work for you, there isn't much time left, till proxmox releases v8.1 with zfs2.2 and a newer kernel, that will definitively fix the docker issues, since zfs2.2 supports natively RENAME_WHITEOUT.
Means that we wont need any stupid workarounds for docker (overlay2) anymore.

Cheers

@Sp33dFr34k
Copy link

Sp33dFr34k commented Apr 21, 2024

@jlesage I can gladly confirm this issue has been fixed after upgrading to Proxmox 8.1 :)

@jlesage
Copy link
Owner

jlesage commented Apr 21, 2024

Great, thanks for the update !

@Ramalama2
Copy link

Great, thanks for the update !

Oh yes sorry, i forgot about this issue thread either. Feel free to Close @jlesage
Things will only get better as Proxmox Progresses, or more specific with ZFS Updates.
There is still 1/2 issues because zfs doesn't suppport ALL overlay2 features, only 99% but jdownloader2 at least works without issues.

Cheers

@INeedHelp321
Copy link

I encountered same error with @jlesage mkvtoolnix. Here I have 2 drive mirror ZFS pool for Docker under OMV(openmediavault). So cause for all of this is ZFS? Just as I got interested on it.

@Ramalama2
Copy link

I encountered same error with @jlesage mkvtoolnix. Here I have 2 drive mirror ZFS pool for Docker under OMV(openmediavault). So cause for all of this is ZFS? Just as I got interested on it.

yes, zfs prior to 2.2 didn't supported RENAME_WHITEOUT, overlay2 relies on it, and thats the default docker container fs.
has nothing todo with proxmox itself, same will happen on any other os that uses zfs prior to 2.2

@INeedHelp321
Copy link

INeedHelp321 commented Apr 26, 2024

I encountered same error with @jlesage mkvtoolnix. Here I have 2 drive mirror ZFS pool for Docker under OMV(openmediavault). So cause for all of this is ZFS? Just as I got interested on it.

yes, zfs prior to 2.2 didn't supported RENAME_WHITEOUT, overlay2 relies on it, and thats the default docker container fs. has nothing todo with proxmox itself, same will happen on any other os that uses zfs prior to 2.2

So I should check that zfs plugin for for OMV is prior to 2.2 and then all I can do is wait for them to update to 2.2 or newer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants