Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build-qemu-ubuntu-2204 stuck in "Waiting for SSH to become available..." #1076

Open
s-mansouri opened this issue Feb 15, 2023 · 28 comments
Open
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@s-mansouri
Copy link

s-mansouri commented Feb 15, 2023

Hi,
I installed the image builder based on this doc.
Then to building an image for Openstack I use this doc.
But with this command make build-qemu-ubuntu-2204 it stucks in the ssh step.
My operating system is ubuntu 22.04
This is the log of the command:

hack/ensure-ansible.sh
fatal: not a git repository (or any of the parent directories): .git
Starting galaxy collection install process
Nothing to do. All requested collections are already installed. If you want to reinstall them, consider using `--force`.
hack/ensure-packer.sh
hack/ensure-goss.sh
Right version of binary present
packer build -var-file="/root/image-builder/images/capi/packer/config/kubernetes.json"  -var-file="/root/image-builder/images/capi/packer/config/cni.json"  -var-file="/root/image-builder/images/capi/packer/config/containerd.json"  -var-file="/root/image-builder/images/capi/packer/config/wasm-shims.json"  -var-file="/root/image-builder/images/capi/packer/config/ansible-args.json"  -var-file="/root/image-builder/images/capi/packer/config/goss-args.json"  -var-file="/root/image-builder/images/capi/packer/config/common.json"  -var-file="/root/image-builder/images/capi/packer/config/additional_components.json"  -color=true -var-file="/root/image-builder/images/capi/packer/qemu/qemu-ubuntu-2204.json"  packer/qemu/packer.json
fatal: not a git repository (or any of the parent directories): .git
qemu: output will be in this color.

==> qemu: Retrieving ISO
==> qemu: Trying https://releases.ubuntu.com/22.04/ubuntu-22.04.1-live-server-amd64.iso
==> qemu: Trying https://releases.ubuntu.com/22.04/ubuntu-22.04.1-live-server-amd64.iso?checksum=sha256%3A10f19c5b2b8d6db711582e0e27f5116296c34fe4b313ba45f9b201a5007056cb
    qemu: ubuntu-22.04.1-live-server-amd64.iso 1.37 GiB / 1.37 GiB [==================================================================================================================] 100.00% 1m15s
==> qemu: https://releases.ubuntu.com/22.04/ubuntu-22.04.1-live-server-amd64.iso?checksum=sha256%3A10f19c5b2b8d6db711582e0e27f5116296c34fe4b313ba45f9b201a5007056cb => /root/.cache/packer/281aa9855752339063385b35198e73db74cd61ba.iso
==> qemu: Starting HTTP server on port 8247
==> qemu: Found port for communicator (SSH, WinRM, etc): 2769.
==> qemu: Looking for available port between 5900 and 6000 on 127.0.0.1
==> qemu: Starting VM, booting from CD-ROM
    qemu: The VM will be run headless, without a GUI. If you want to
    qemu: view the screen of the VM, connect via VNC without a password to
    qemu: vnc://127.0.0.1:5952
==> qemu: Waiting 10s for boot...
==> qemu: Connecting to VM via VNC (127.0.0.1:5952)
==> qemu: Typing the boot commands over VNC...
    qemu: Not using a NetBridge -- skipping StepWaitGuestAddress
==> qemu: Using SSH communicator to connect: 127.0.0.1
==> qemu: Waiting for SSH to become available...

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 15, 2023
@wwentland
Copy link
Contributor

Could you try running the build with FOREGROUND=1 to observe where the process gets stuck?

@s-mansouri
Copy link
Author

@wwentland Thanks for your response.
with this command:
make build-qemu-ubuntu-2204 FOREGROUND=1 PACKER_LOG=1
I got this error:

==> qemu: Starting VM, booting from CD-ROM
2023/02/20 08:20:59 packer-builder-qemu plugin: Qemu Builder has no floppy files, not attaching a floppy.
2023/02/20 08:20:59 packer-builder-qemu plugin: Executing /usr/bin/qemu-system-x86_64: []string{"-device", "virtio-scsi-pci,id=scsi0", "-device", "scsi-hd,bus=scsi0.0,drive=drive0", "-device", "virtio-net,netdev=user.0", "-name", "ubuntu-2204-kube-v1.23.15", "-drive", "if=none,file=output/ubuntu-2204-kube-v1.23.15/ubuntu-2204-kube-v1.23.15,id=drive0,cache=writeback,discard=unmap,format=qcow2", "-drive", "file=/root/.cache/packer/281aa9855752339063385b35198e73db74cd61ba.iso,media=cdrom", "-netdev", "user,id=user.0,hostfwd=tcp::3163-:22", "-m", "2048M", "-smp", "1", "-boot", "once=d", "-machine", "type=pc,accel=kvm", "-display", "gtk", "-vnc", "127.0.0.1:91"}
2023/02/20 08:20:59 packer-builder-qemu plugin: Started Qemu. Pid: 16836
2023/02/20 08:20:59 packer-builder-qemu plugin: Qemu stderr: qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]
2023/02/20 08:20:59 packer-builder-qemu plugin: Qemu stderr: gtk initialization failed
2023/02/20 08:20:59 packer-builder-qemu plugin: failed to unlock port lockfile: close tcp 127.0.0.1:5991: use of closed network connection
2023/02/20 08:20:59 packer-builder-qemu plugin: failed to unlock port lockfile: close tcp 127.0.0.1:3163: use of closed network connection
==> qemu: Error launching VM: Qemu failed to start. Please run with PACKER_LOG=1 to get more info.
==> qemu: Deleting output directory...
2023/02/20 08:20:59 [INFO] (telemetry) ending qemu
==> Wait completed after 11 seconds 963 milliseconds
2023/02/20 08:20:59 machine readable: error-count []string{"1"}
==> Some builds didn't complete successfully and had errors:
2023/02/20 08:20:59 machine readable: qemu,error []string{"Build was halted."}
==> Builds finished but no artifacts were created.
Build 'qemu' errored after 11 seconds 963 milliseconds: Build was halted.

==> Wait completed after 11 seconds 963 milliseconds

==> Some builds didn't complete successfully and had errors:
--> qemu: Build was halted.

@wwentland
Copy link
Contributor

Thank you! Right, this is to be expected if it is a headless box. You could try connecting via VNC or reproduce the issue locally. You would have to adjust the address to which vnc binds to 0.0.0.0 or another appropriate IP (cf. https://developer.hashicorp.com/packer/plugins/builders/qemu#vnc_bind_address) for VNC to work, I think.

Does it hang every time, or are there some builds that work and others that hang? How often have you tried?

@tibeer
Copy link

tibeer commented Mar 17, 2023

I can also confirm this. We are building our images in on a headless machine within a CI-CD pipeline. It happens every time for us. Will try to investigate according to your suggestions.

@tibeer
Copy link

tibeer commented Mar 17, 2023

Seems that it get's stuck on this step
screenshot

@tibeer
Copy link

tibeer commented Mar 19, 2023

The problem is resolved for us at least. CI-CD now works again. Regarding the reason: I honestly cannot tell you. Seems that it was just a hick-up.

@wwentland
Copy link
Contributor

That's great to hear @tibeer. I only ever ran into a stuck build once, but it failed much earlier in the process (error while entering the boot command).

I'm not seeing anything obvious in the output you pasted and it might have just been taking a long time installing the base system. This could very well be due to problems in the build environment (e.g. network issues) that only present themselves intermittedly, but aren't directly caused by a misconfiguration of the build process.

@BarthV
Copy link
Contributor

BarthV commented Apr 10, 2023

same issue here :(
I'm running a simple make build-qemu-ubuntu-2204 and it's getting struck waiting for SSH link.
image

@tibeer If you can try to remember what solved your problem it would be <3

@BarthV
Copy link
Contributor

BarthV commented Apr 10, 2023

ok .. So I just read the documentation ;-S

https://developer.hashicorp.com/packer/plugins/builders/qemu

This is an example only, and will time out waiting for SSH because we have not provided a kickstart file. You must add a valid kickstart file to the "http_directory" and then provide the file in the "boot_command" in order for this build to run. We recommend you check out the Community Templates for a practical usage example.

It seems that some kind of template is missing , And it must be a common mistake.

@xinity
Copy link

xinity commented Apr 13, 2023

having the same issue :(
falling back my demo to 2004 sadly or now.

hoping someone would found what's missing :(

@mikejoh
Copy link

mikejoh commented Apr 13, 2023

@xinity I'm also hitting this atm, locally on my computer, the build was eventually completed:

Build 'qemu' finished after 25 minutes 31 seconds.

==> Wait completed after 25 minutes 31 seconds

==> Builds finished. The artifacts of successful builds are:
--> qemu: VM files in directory: ./output/ubuntu-2204-kube-v1.24.11
--> qemu: VM files in directory: ./output/ubuntu-2204-kube-v1.24.11
--> qemu: VM files in directory: ./output/ubuntu-2204-kube-v1.24.11

yikes! But I don't know what build times to expect either, did you ever wait to see if it was completed?

If made the following changes to cut off ~10min:

diff --git a/images/capi/packer/qemu/qemu-ubuntu-2204.json b/images/capi/packer/qemu/qemu-ubuntu-2204.json
index 65efe6be0..1f620abc6 100644
--- a/images/capi/packer/qemu/qemu-ubuntu-2204.json
+++ b/images/capi/packer/qemu/qemu-ubuntu-2204.json
@@ -3,9 +3,9 @@
   "build_name": "ubuntu-2204",
   "distro_name": "ubuntu",
   "guest_os_type": "ubuntu-64",
-  "iso_checksum": "10f19c5b2b8d6db711582e0e27f5116296c34fe4b313ba45f9b201a5007056cb",
+  "iso_checksum": "5e38b55d57d94ff029719342357325ed3bda38fa80054f9330dc789cd2d43931",
   "iso_checksum_type": "sha256",
-  "iso_url": "https://old-releases.ubuntu.com/releases/jammy/ubuntu-22.04.1-live-server-amd64.iso",
+  "iso_url": "https://releases.ubuntu.com/jammy/ubuntu-22.04.2-live-server-amd64.iso",
   "os_display_name": "Ubuntu 22.04",
   "shutdown_command": "shutdown -P now",
   "unmount_iso": "true"

Not directly solving any issues here but using a newer Ubuntu 22.04 base iso so that the package upgrade steps take less time. The original Ubuntu iso used is around a year old now, with quite a bit of delta in terms of missing package upgrades.

@xinity
Copy link

xinity commented Apr 14, 2023

will try that out today and let you know 🤞

@nikParasyr
Copy link

Came across this issue as well. After 22min i cancelled the first run as it seemed to have stucked on ==> qemu: Waiting for SSH to become available.... That was an assumption at that point.

After that I made the changes recommended by @mikejoh which seems to have "solved it".

I speculate that because the default image is rather old, the package upgrade step takes too long, and depending on the environment might even pass the ssh timeout set by packer, or the patience of the user (like me who killed the first run after 22min assuming it was stuck). So using the newer image made the package upgrade faster and after ~10min i get into the config phase.

Not sure what an appropriate fix would be for this. Bump the packer ssh timeout, document it and "periodically" update the base images to newer?

@mikejoh
Copy link

mikejoh commented Apr 18, 2023

@nikParasyr 👍🏻 As a side note to this, I'm not sure if the Ubuntu 22.04 image actually works and boots correctly. I'm evaluating CAPI + CAPO at the moment, I've only managed to build the image but not tested it!

@nikParasyr
Copy link

@mikejoh I ran into some issues well that i couldnt troubleshoot. People in the capo slack channel pointed out to me that the are running ubuntu 22.04 images but they are built with https://image-builder.sigs.k8s.io/capi/providers/openstack-remote.html and not the qemu-builder. the openstack-remote provider worked for me as well. I've opened a ticket (#1137) with my findings for the qemu built. I hope this helps

@BarthV
Copy link
Contributor

BarthV commented Jun 6, 2023

Maybe it's time to stop using ubuntu legacy live iso image for newest releases ? I observed that it's seems to be the main cause of all these problems.
Legacy image is deprecated and tends to be replaced by ubuntu cloudimg : https://cloud-images.ubuntu.com/

So (on my side) I'm currently replacing ubuntu image & script used by image-builder, using this server cloudimg and everything works like a charm.

@mnaser
Copy link
Contributor

mnaser commented Jul 2, 2023

I think this indeed is an issue stemming from the fact that we have a very big apt upgrade that happens.

@fad3t
Copy link
Contributor

fad3t commented Aug 4, 2023

Maybe it's time to stop using ubuntu legacy live iso image for newest releases ? I observed that it's seems to be the main cause of all these problems. Legacy image is deprecated and tends to be replaced by ubuntu cloudimg : https://cloud-images.ubuntu.com/

So (on my side) I'm currently replacing ubuntu image & script used by image-builder, using this server cloudimg and everything works like a charm.

Hi @BarthV, any chance you can share the config you're using to build from the cloudimg? Thx!

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 25, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 24, 2024
@mboersma
Copy link
Contributor

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 11, 2024
@ygao-armada
Copy link

For 22.04, I had similar issue with provider "vsphere" with command:
image-builder build --os ubuntu --os-version 22.04 --hypervisor vsphere --release-channel 1-28 --vsphere-config vsphere.json --firmware efi

It turns out, the VM IP changes due to reboot (I workaround it by forcing VM IP back with netplan apply).
Not sure if the root case of this ticket is related to that of vsphere.

@mnaser
Copy link
Contributor

mnaser commented May 29, 2024

Just a warning, for those who are using the latest image, there has been some changes that break things.

vexxhost/magnum-cluster-api#378

So you would end up with non functional images.

@abrahamhwj
Copy link

I had similar issue with “make build-proxmox-ubuntu-2204”
image
just hang there until dead...

@justinas-b
Copy link

I had similar issue with “make build-proxmox-ubuntu-2204” image just hang there until dead...

Hey @abrahamhwj , have you found any workaround for this? For me proxmox build is stuck in same place. If i check the terminal, i see that new VM is stuck on language selection screen.

@justinas-b
Copy link

Maybe it's time to stop using ubuntu legacy live iso image for newest releases ? I observed that it's seems to be the main cause of all these problems. Legacy image is deprecated and tends to be replaced by ubuntu cloudimg : https://cloud-images.ubuntu.com/

So (on my side) I'm currently replacing ubuntu image & script used by image-builder, using this server cloudimg and everything works like a charm.

@BarthV could you share more details please what exactly needs to be updated so that cloudimg would work?

@justinas-b
Copy link

OK, it took me a while, but it seems i have figured it out. So in my case i was building proxmox template, and got Waiting for SSH to become available... due to multiple factors:

  • first, there was no DHCP server available to automatically assign IP address to temporary VM where image is bootstrapped. Network connectivity is needed to download new packages, etc. This was addressed by simply adding DHCP server into the network. I think same could have been achieved by updating ubuntu-2204.json file:
diff --git a/images/capi/packer/proxmox/ubuntu-2204.json b/images/capi/packer/proxmox/ubuntu-2204.json
index b83551b00..773dfbafa 100644
--- a/images/capi/packer/proxmox/ubuntu-2204.json
+++ b/images/capi/packer/proxmox/ubuntu-2204.json
@@ -1,5 +1,5 @@
 {
-  "boot_command_prefix": "c<wait>linux /casper/vmlinuz --- autoinstall ds='nocloud-net;s=http://{{ .HTTPIP }}:{{ .HTTPPort }}/22.04/'<enter><wait5s>initrd /casper/initrd <enter><wait5s>boot <enter><wait5s>",
+  "boot_command_prefix": "c<wait>linux /casper/vmlinuz ip=10.11.1.152::10.11.1.1:255.255.255.0::::10.1.1.10 --- autoinstall ds='nocloud-net;s=http://{{ .HTTPIP }}:{{ .HTTPPort }}/22.04/'<enter><wait5s>initrd /casper/initrd <enter><wait5s>boot <enter><wait5s>",
   "build_name": "ubuntu-2204",
   "distribution_version": "2204",
   "distro_name": "ubuntu",
  • second issue was that the server where i launched packer build was not accessible from the temporary VM where image was bootstrapped. That temporary VM needs to connect to packer server so that it could retrieve cloud-init scripts, though i had to move packer build into proxmox also

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests