Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Something Wrong with Sonic on Arista 7050QX-32. I can not run sonic on arista 7050QX-32. #592

Open
songzhi1978 opened this issue Apr 8, 2020 · 25 comments

Comments

@songzhi1978
Copy link

Aboot# ls
MD5SUMS dev init mnt root tmp
bin etc lib proc sys
Aboot# cd /mnt/
Aboot# ls
flash flash.conf
Aboot# cd flash
Aboot# ls
EOS-4.17.7M.swi debug sonic-aboot-broadcom.swi
boot-config persist startup-config
config_match schedule zerotouch-config
Aboot# boot /mnt/flash/sonic-aboot-broadcom.swi
42.05: Cleaning flash content /mnt/flash
46.39: Generating boot-config, machine.conf and cmdline
46.53: Installing image under /mnt/flash/image-HEAD.247-7bc8f129
46.53: Moving swi to a tmpfs
75.79: Extracting swi content
143.30: Extracting dockerfs.tar.gz from swi
187.61: Unpacking dockerfs.tar.gz delayed to initrd because /mnt/flash is vfat or docker_inram is on
187.61: Remove installer
193.79: Kexecing[ 193.801871] Starting new kernel
...
[ 4.239067] sd 4:0:0:0: [sda] No Caching mode page found
[ 4.302475] sd 4:0:0:0: [sda] Assuming drive cache: write through
Checking that no-one is using this disk right now ... OK

Disk /dev/sda: 7.5 GiB, 8044675072 bytes, 15712256 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x31625cc4

Old situation:

Device Boot Start End Sectors Size Id Type
/dev/sda1 2048 15712255 15710208 7.5G 83 Linux

Created a new DOS disklabel with disk identifier 0xbd6b6c2e.
/dev/sda1: Created a new partition 1 of type 'Linux' and of size 7.5 GiB.
Partition #1 contains a vfat signature.
/dev/sda2: Done.

New situation:

Device Boot Start End Sectors Size Id Type
/dev/sda1 2048 15712255 15710208 7.5G 83 Linux

The partition table has been altered.
Calling ioctl() to re-read partition table.
Re-reading the partition table failed.: Device or resource busy
The kernel still uses the old table. The new table will be used at the next reboot or after you run partprobe(8) or kpartx(8).
Syncing disks.
mke2fs 1.43.4 (31-Jan-2017)
/dev/sda1 contains a vfat file system
Creating filesystem with 1963776 4k blocks and 491520 inodes
Filesystem UUID: 0c20ebd7-8d93-4345-aa20-0283ca91e8ea
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done

tar: write error: No space left on device
[ 175.081028] rc.local[471]: + sonic-cfggen -y /etc/sonic/sonic_version.yml -v build_version
[ 179.015145] kdump-tools[470]: Starting kdump-tools: no crashkernel= parameter in the kernel cmdline ... failed!
[ 184.140222] rc.local[471]: + SONIC_VERSION=HEAD.247-7bc8f129
[ 184.216815] rc.local[471]: + FIRST_BOOT_FILE=/host/image-HEAD.247-7bc8f129/platform/firsttime
[ 184.325718] rc.local[471]: + logger SONiC version HEAD.247-7bc8f129 starting up...
[ 184.428719] rc.local[471]: + [ ! -e /host/machine.conf ]
[ 184.500714] rc.local[471]: + . /host/machine.conf
[ 184.560685] rc.local[471]: + aboot_version=2.0.10-1458058
[ 184.632667] rc.local[471]: + aboot_vendor=arista
[ 184.692670] rc.local[471]: + aboot_platform=x86_64-arista_7050_qx32
[ 184.768673] rc.local[471]: + aboot_machine=arista_7050_qx32
[ 184.840693] rc.local[471]: + aboot_arch=x86_64
[ 184.900692] rc.local[471]: + aboot_build_date=2013-09-21T19:34:39.000000000
[ 184.992670] rc.local[471]: + program_console_speed
[ 185.053733] rc.local[471]: + cat /proc/cmdline
[ 185.113403] rc.local[471]: + cut -d , -f2
[ 185.168938] rc.local[471]: + grep -Eo console=ttyS[0-9]+,[0-9]+
[ 185.245220] rc.local[471]: + speed=
[ 185.288725] rc.local[471]: + [ -z ]
[ 185.332675] rc.local[471]: + CONSOLE_SPEED=9600
[ 185.392667] rc.local[471]: + sed -i s|--keep-baud .* %I| 9600 %I|g /lib/systemd/system/serial-getty@.service
[ 185.520697] rc.local[471]: + systemctl daemon-reload
[ 185.580777] rc.local[471]: + [ -f /host/image-HEAD.247-7bc8f129/platform/firsttime ]
[ 185.684691] rc.local[471]: + echo First boot detected. Performing first boot tasks...
[ 185.792706] rc.local[471]: First boot detected. Performing first boot tasks...
[ 185.888721] rc.local[471]: + [ -n x86_64-arista_7050_qx32 ]
[ 185.960704] rc.local[471]: + platform=x86_64-arista_7050_qx32
[ 186.036709] rc.local[471]: + [ -d /host/old_config ]
[ 186.096689] rc.local[471]: + [ -f /host/minigraph.xml ]
[ 186.168728] rc.local[471]: + [ -n ]
[ 186.308188] rc.local[471]: + touch /tmp/pending_config_initialization
[ 186.396714] rc.local[471]: + touch /tmp/notify_firstboot_to_platform
[ 186.484706] rc.local[471]: + [ ! -d /host/reboot-cause/platform ]
[ 186.560741] rc.local[471]: + mkdir -p /host/reboot-cause/platform
[ 186.640758] rc.local[471]: + [ -d /host/image-HEAD.247-7bc8f129/platform/x86_64-arista_7050_qx32 ]
[ 186.748688] rc.local[471]: + sync
[ 190.768230] arista: waiting for switch chip
[ 190.918819] arista: switch chip is ready
[ 192.967795] arista: yielding...
[ OK ] Started Arista early platform initialization.
Starting Arista late platform initialization...
Starting Opennsl kernel modules init...
[ 195.083340] rc.local[471]: + [ -n ]
[ OK ] Started /etc/rc.local Compatibility.
[ 195.136653] rc.local[471]: + mkdir -p /var/platform
[ OK ] Started Opennsl kernel modules init.
[ 195.282353] rc.local[471]: + firsttime_exit
[ OK ] Started Getty on tty1.
[ OK ] Started Serial Getty on ttyS0.
[ OK ] Reached target Login Prompts.
[ 195.404451] rc.local[471]: + rm -rf /host/image-HEAD.247-7bc8f129/platform/firsttime
[ 195.680364] rc.local[471]: + exit 0
[ OK ] Started Docker Application Container Engine.
Starting Database container...
[FAILED] Failed to start Database container.
See 'systemctl status database.service' for details.
[DEPEND] Dependency failed for BGP container.
[DEPEND] Dependency failed for switch state service.
[DEPEND] Dependency failed for ICCPD container.
[DEPEND] Dependency failed for Management Framework container.
[DEPEND] Dependency failed for NAT container.
[DEPEND] Dependency failed for sFlow container.
[DEPEND] Dependency failed for syncd service.
[DEPEND] Dependency failed for LLDP container.
[DEPEND] Dependency failed for Config initialization and migration service.
[DEPEND] Dependency failed for Update minigr… configuration based on minigraph.
[DEPEND] Dependency failed for Control Plane ACL configuration daemon.
[DEPEND] Dependency failed for Update rsyslog configuration.
[DEPEND] Dependency failed for DHCP relay container.
[DEPEND] Dependency failed for TEAMD container.
[DEPEND] Dependency failed for Update interfaces configuration.
[DEPEND] Dependency failed for Host config enforcer daemon.
[DEPEND] Dependency failed for Router advertiser container.
[DEPEND] Dependency failed for Update hostname based on configdb.
[DEPEND] Dependency failed for Process and d…ry utilization data export daemon.
[DEPEND] Dependency failed for Platform monitor container.
[DEPEND] Dependency failed for Update NTP configuration.
[DEPEND] Dependency failed for Monitor warm …ry and disable warmboot when done.
Starting Credo phy init...
[FAILED] Failed to start Credo phy init.
See 'systemctl status phy-credo.service' for details.

Debian GNU/Linux 9 sonic ttyS0

sonic login: admin
Password:
Linux sonic 4.9.0-11-2-amd64 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) x86_64
You are on


/ | / _ | \ | ()/ |
_
| | | | | | | |
) | || | |\ | | |

|
/ ___/|| _||____|

-- Software for Open Networking in the Cloud --

Unauthorized access and/or use are prohibited.
All access and/or use are subject to monitoring.

Help: http://azure.github.io/SONiC/

admin@sonic:~$ ls

@aanon4
Copy link

aanon4 commented Dec 8, 2021

Did this get worked out?

@YWatchman
Copy link

It seems the created 'tmpfs' is too small, I've increased the size by remounting. Still not sure on how to boot up the containers tho.

root@sonic:~# mount -o remount,size=2048000k /var/lib/docker
root@sonic:~# mount^C
root@sonic:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            1.9G     0  1.9G   0% /dev
tmpfs           394M  1.1M  393M   1% /run
root-overlay    1.9G  1.2G  649M  64% /
/dev/sda1       1.9G  1.2G  649M  64% /host
tmpfs           2.0G  1.5G  500M  75% /var/lib/docker
tmpfs           2.0G     0  2.0G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           4.0M     0  4.0M   0% /sys/fs/cgroup
root@sonic:~# free -h
               total        used        free      shared  buff/cache   available
Mem:           3.8Gi       221Mi       1.1Gi       1.5Gi       2.5Gi       1.9Gi
Swap:             0B          0B          0B
(failed reverse-i-search)`restart dc': f^Ce -h
root@sonic:~# systemctl restart docker
root@sonic:~# docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
root@sonic:~# sonic-cli
FATAL: root cannot launch CLI
root@sonic:~#
logout
admin@sonic:~$ sonic-cli
Error: No such container: mgmt-framework
admin@sonic:~$

@YWatchman
Copy link

Adding more memory of type DDR3 Unbuffered ECC will work, default speed is 1333Mhz.

@Staphylo
Copy link
Contributor

This device unfortunately has a really small storage device (2GB).
We introduced some workaround such as unpacking the containers in RAM but these have grown in size to the point you won't have much RAM remaining for operations.
Changing the tmpfs after boot will not have any effect, the container extraction already failed and will not be retried.

Here are some pointers as to where you can poke to make it work for your case.

If it's a lab/hobby device, you should consider upgrading the storage capacity of your device.

@hugocollignon
Copy link

Hello @Staphylo

I have the same issue here.
I see what you explained, and have some questions/remarks:

So actually, this is not the flash size which limit our space but tmpfs size.

  • Question : do you know more or less which size do we need?
  • It not seems to have many people with this issue, but it could happens each time the tmpfs is used. Do I miss something?

About resolution:

Thanks

@Staphylo
Copy link
Contributor

Hi @hugocollignon

Both the flash size and ram size are troublesome at this point.
If your tmpfs is too big, then you'll be left with not enough RAM to run your operating system.
With a bigger DOCKER_RAMFS_SIZE it should still work but you'll be limited in terms of table size.
As far as I can tell 202205 needs ~1.7G and probably some room for runtime. You can extract dockerfs.tar.gz from your image to know the necessary size.

You have to consider using the right release based on their size.

  • any release shipping with both py2/py3 will be heavier.
  • any release shipping with 2 versions of debian will be heavier (e.g buster + bullseye)

USB key and SSD should work if the necessary changes are made in the code.
I would not necessarily recommend using USB since it's only 2.0 and therefore rather slow.
flash_size is not necessarily a problem here since it's just there to compute varlog_size.

The main problem you're facing is that the install destination is implied to be /mnt/flash which is internal storage device.
I do not remember where, but there are still some assumptions that /mnt/flash is where the image should be booted from.
I refactored a bit some time ago to use $target_path to eventually allow booting from /mnt/drive or /mnt/usb down the line.
However IIRC there were still some assumptions to fix in a few places (e.g I still see a few flash: in the boot0 code)

Have you tried the following?

  • downloading the swi on /mnt/drive
  • /mnt/flash/boot-config with SWI=drive:sonic-aboot-broadcom.swi
  • reboot to install sonic in Aboot
  • patch /mnt/flash/boot-config to point on drive

Yes, sonic-net/sonic-buildimage@48ba459 is possible for this product.

@hugocollignon
Copy link

Both the flash size and ram size are troublesome at this point.
If your tmpfs is too big, then you'll be left with not enough RAM to run your operating system.
With a bigger DOCKER_RAMFS_SIZE it should still work but you'll be limited in terms of table size.
As far as I can tell 202205 needs ~1.7G and probably some room for runtime. You can extract dockerfs.tar.gz from your image to know the necessary size.

You have to consider using the right release based on their size.

  • any release shipping with both py2/py3 will be heavier.
  • any release shipping with 2 versions of debian will be heavier (e.g buster + bullseye)

Maybe only for this last time (before next versions grow in size), a bigger tmpfs could fit.
dockerfs.tar.gz : 1,9G, so 2G for tmpfs and 2G for RAM may fit.

USB key and SSD should work if the necessary changes are made in the code.
I would not necessarily recommend using USB since it's only 2.0 and therefore rather slow.

Regarding this extract, does it happen only for the first boot / installation, or each boot?
For the "USB hack", I think to this if the necessary extra space is used only the first time, which allow to do it on an external device then copy files on the internal flash.

flash_size is not necessarily a problem here since it's just there to compute varlog_size.

flash_size is what is toggling to use docker_inram, so even with a large flash device, as soon as this device is detected with any "real flash size", we will try to extract in the tmpfs, where the definition is 1,5G but we need at least 1,9G. Am I wrong?

The main problem you're facing is that the install destination is implied to be /mnt/flash which is internal storage device.
I do not remember where, but there are still some assumptions that /mnt/flash is where the image should be booted from.
I refactored a bit some time ago to use $target_path to eventually allow booting from /mnt/drive or /mnt/usb down the line.
However IIRC there were still some assumptions to fix in a few places (e.g I still see a few flash: in the boot0 code)

If I saw correctly, if you wipe the internal flash (table partition), Aboot mount the USB volume as /mnt/flash (this is my hack for now). I suppose it works for SONiC as the created partition is placed on sdb and size is much larger than 2G.

Have you tried the following?

  • downloading the swi on /mnt/drive
  • /mnt/flash/boot-config with SWI=drive:sonic-aboot-broadcom.swi
  • reboot to install sonic in Aboot
  • patch /mnt/flash/boot-config to point on drive

No, I will try.

Many thanks for your quick answer and your support. I am new to SONiC and my first challenge is to install it!

@Staphylo
Copy link
Contributor

Maybe only for this last time (before next versions grow in size), a bigger tmpfs could fit.
dockerfs.tar.gz : 1,9G, so 2G for tmpfs and 2G for RAM may fit.

Yes 1.9/2G should do just fine.
One problem I forgot about is the upgrade case.
When you'll try to upgrade the image will be stored in tmpfs to be extracted on the flash (since the flash can't hold both the image and the extracted content)
I had a change that I never merged to remove the extraction process and keep the SWI as it is but things have changed and it would unfortunately take me a while to solve. Essentially booting the image in secureboot mode.

Regarding this extract, does it happen only for the first boot / installation, or each boot?
For the "USB hack", I think to this if the necessary extra space is used only the first time, which allow to do it on an external device then copy files on the internal flash.

Yes the extraction process only happens on firstboot.

flash_size is what is toggling to use docker_inram, so even with a large flash device, as soon as this device is detected with any "real flash size", we will try to extract in the tmpfs, where the definition is 1,5G but we need at least 1,9G. Am I wrong?

Right, you can override kernel parameters by putting new ones in /mnt/flash/kernel-params

If I saw correctly, if you wipe the internal flash (table partition), Aboot mount the USB volume as /mnt/flash (this is my hack for now). I suppose it works for SONiC as the created partition is placed on sdb and size is much larger than 2G

You shouldn't wipe the table partition. The storage device should have at least 1 partition available, you might run into problems otherwise.
A big caveat is that SONiC needs an ext4 partition and not a vfat. Unfortunately the product that you use doesn't have the necessary tools in Aboot to do this. In SONiC we added a workaround in the initramfs to reformat the storage device in ext4 while booting if it's formatted in vfat (this logic might not work if booting from a different device than the flash)
It's unfortunate but this product was only designed to boot EOS since it was released before SONiC was a reality.
We ended up adding quite a few workarounds for the first 3 products as they were designed and released before SONiC

@hugocollignon
Copy link

Yes 1.9/2G should do just fine.

Will try 🤞

Yes the extraction process only happens on firstboot.

Ok so eventually I will try to use an external device for the first boot then move to the flash, if I still have issues. I understand by doing this that it will not a good solution with many devices.
Except if the extracted swi is still the same between multiples devices, which eventually allow to not copy the swi but directly the extracted content?

Right, you can override kernel parameters by putting new ones in /mnt/flash/kernel-params

Thanks for this hint, if it permit to changes settings without rebuilding SONiC each time..!

You shouldn't wipe the table partition. The storage device should have at least 1 partition available, you might run into problems otherwise.

Sure I recreated it (identically as before), I only saw that it changes the mount point and "fake" the USB device to be mounted as the internal flash.

In SONiC we added a workaround in the initramfs to reformat the storage device in ext4 while booting if it's formatted in vfat (this logic might not work if booting from a different device than the flash)

Which should be in vfat when EOS is installed.

-> I try the different approaches above and I'll be back to give feedback.

@hugocollignon
Copy link

hugocollignon commented Oct 14, 2022

First observations:

  • inside sonic-aboot-broadcom.swi, we have boot0 that we can edit without needing to rebuild SONiC
  • with flash_size removed AND booted on an external device, installation just works (we don't enter inside the condition which set docker_inram=on, so we don't use the too small tmpfs, tar extract finished properly then the system is fully operational)
  • after that, we have:
root@sonic:/host# df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            1.9G     0  1.9G   0% /dev
tmpfs           393M   15M  378M   4% /run
root-overlay    110G  2.5G  107G   3% /
/dev/sda1       110G  2.5G  107G   3% /host
/dev/loop1      3.9G  2.8M  3.7G   1% /var/log
tmpfs           2.0G   60K  2.0G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           4.0M     0  4.0M   0% /sys/fs/cgroup
# docker overlay hidden

root@sonic:/host# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0    7:0    0   436M  0 loop 
loop1    7:1    0     4G  0 loop /var/log
sda      8:0    0 111.8G  0 disk 
`-sda1   8:1    0 111.8G  0 part /host
sdb      8:16   0   1.9G  0 disk 

I think this is enough to say that it will never fit into 2G and now makes it mandatory to change the internal flash before all in this device.

Second part:

  • internally, there are a USB key-like directly on an USB header (same as on a desktop motherboard), I am looking for a bigger one or a adapter to a USB A or C to plug a real USB key on it
  • meanwhile, I suppose that speeds on the internal port or the one on the front are similar
  • I installed without any issue SONiC like that

So, if we have to recommend something on this question, for a normal installation + later upgrades, is it more or less right to say:

  • 2x .swi ~= 2,4G
  • 1x dockerfs.tar.gz just extracted ~= 2G
  • 1x previous system running ~= 3G

?

Do we need an absolute minimum of 8G of flash, but maybe with logs or anything 16 to 32 should let some good days to admins?

Have you tried the following?

  • downloading the swi on /mnt/drive
  • /mnt/flash/boot-config with SWI=drive:sonic-aboot-broadcom.swi
  • reboot to install sonic in Aboot
  • patch /mnt/flash/boot-config to point on drive

So not in this way but at the end it should do the same as me. If the final FS takes directly after the install ~2,5G, I think we can skip it and go to hardware replacement.


Another thing not directly related:
after the first boot, sshd was not running. I got an "Illegad instruction (core dumped)" when ssh-keygen starts. I uninstalled / reinstalled openssh-server & openssh-client and this issue was cleared.

@Staphylo
Copy link
Contributor

Glad you were able to tweak things your way!

inside sonic-aboot-broadcom.swi, we have boot0 that we can edit without needing to rebuild SONiC

Indeed, it's pretty convenient, though you should know that the "primary" image is self-installing.
Once installed, it has extracted .sonic-boot.swi which is used afterwards (you can see that boot-config is rewritten during the installation process)
To properly update boot0 on a given image you pretty much need to do something like

unzip sonic-aboot-broadcom.swi .sonic-boot.swi boot0 
... hack boot0 ...
zip -u .sonic-boot.swi boot0
zip -u sonic-aboot-broadcom.swi .sonic-boot.swi boot0

Do we need an absolute minimum of 8G of flash, but maybe with logs or anything 16 to 32 should let some good days to admins?

Yes 8GB is the bare minimum nowadays, I'd definitely recommend upgrading to 16G minimum if possible.

  • 2 installed master/202205 image 2 x 2.5G knowing that it will only grow bigger over time
  • 1 var-log.ext4 partition 400M or 4G depending if the flash is < 28G or not
  • room for making changes to SONiC such as installing packages, pulling containers, changing files on the fs (stored as an overlayfs upper layer on the flash)
    You can see that 8GB might put you in a precarious situation pretty quickly if you're not careful.

after the first boot, sshd was not running. I got an "Illegad instruction (core dumped)" when ssh-keygen starts

Not quite sure why that is. It has happened when the flash is full but seems like it's not the case here.
Did you check /var/log/syslog to see if there was anything interesting there?

@hugocollignon
Copy link

To properly update boot0 on a given image you pretty much need to do something like

Ok understood. I didn't have issue by editing only "the first" boot0 maybe because flash_size is not used for something else. But I got your point and will do like that next.

I'd definitely recommend upgrading to 16G minimum if possible.

I found 32G USB key cheaper than 16G. And at least, the cost of storage flash is not really an issue, always less than 20/30€ now (for 32G).

Not quite sure why that is. It has happened when the flash is full but seems like it's not the case here.

It happened with a successful installation. I start from scratch with new USB keys, boot0 edited as you told, and I come back with this logs (I think it is not related to this issue btw, so if you prefer I create another one for this problem).

@hugocollignon
Copy link

Aboot 2.1.0-3037058


Press Control-C now to enter Aboot shell
Booting flash:sonic-aboot-broadcom.swi
6.86: Cleaning flash content /mnt/flash
6.87: Generating boot-config, machine.conf and cmdline
7.01: Installing image under /mnt/flash/image-202205.161660-cfc9af71e
7.01: Moving swi to a tmpfs
39.56: Extracting swi content
82.15: Extracting platform.tar.gz
82.23: Extracting dockerfs.tar.gz from swi
133.68: Unpacking dockerfs.tar.gz delayed to initrd because /mnt/flash is vfat or docker_inram is on
133.68: Remove installer
df: /tmp/tmp.N0Qjkn/sonic-aboot-broadcom.swi: can't find mount point
135.15: Next reboot will use flash:image-202205.161660-cfc9af71e/.sonic-boot.swi
135.81: Kexecing[  135.819258] Starting new kernel
...
�2048+0 records in
2048+0 records out
Checking that no-one is using this disk right now ... OK

Disk /dev/sda: 28.65 GiB, 30765219840 bytes, 60088320 sectors
Disk model:  SanDisk 3.2Gen1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

>>> Created a new DOS disklabel with disk identifier 0x96c0375c.
/dev/sda1: Created a new partition 1 of type 'Linux' and of size 28.7 GiB.
/dev/sda2: Done.

New situation:
Disklabel type: dos
Disk identifier: 0x96c0375c

Device     Boot Start      End  Sectors  Size Id Type
/dev/sda1        2048 60088319 60086272 28.7G 83 Linux

The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
mke2fs 1.46.2 (28-Feb-2021)
Creating filesystem with 7510784 4k blocks and 1880480 inodes
Filesystem UUID: 87d1d680-d18b-4c21-8478-6bb668290908
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
        4096000

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done   

tune2fs 1.46.2 (28-Feb-2021)
Setting reserved blocks percentage to 0% (0 blocks)
Setting reserved blocks count to 0
[  282.290640] rc.local[502]: + cat /etc/sonic/sonic_version.yml
[  282.384865] rc.local[503]: + grep build_version
[  282.451495] rc.local[506]: + sed -e s/build_version: //g;s/'//g
[  282.532336] rc.local[498]: + SONIC_VERSION=202205.161660-cfc9af71e
[  282.618416] rc.local[498]: + FIRST_BOOT_FILE=/host/image-202205.161660-cfc9af71e/platform/firsttime
[  282.729908] rc.local[498]: + SONIC_CONFIG_DIR=/host/image-202205.161660-cfc9af71e/sonic-config
[  282.839213] rc.local[498]: + SONIC_ENV_FILE=/host/image-202205.161660-cfc9af71e/sonic-config/sonic-environment
[  282.969356] rc.local[498]: + [ -d /host/image-202205.161660-cfc9af71e/sonic-config -a -f /host/image-202205.161660-cfc9af71e/sonic-config/sonic-environment ]
[  283.145919] rc.local[498]: + logger SONiC version 202205.161660-cfc9af71e starting up...
[  283.255577] rc.local[498]: + grub_installation_needed=
[  283.321855] rc.local[498]: + [ ! -e /host/machine.conf ]
[  283.417297] rc.local[498]: + . /host/machine.conf
[  283.480808] rc.local[498]: + aboot_version=2.1.0-3037058
[  283.555465] rc.local[498]: + aboot_vendor=arista
[  283.614496] rc.local[498]: + aboot_platform=x86_64-arista_7050_qx32
[  283.689836] rc.local[498]: + aboot_machine=arista_7050_qx32
[  283.761824] rc.local[498]: + aboot_arch=x86_64
[  283.821850] rc.local[498]: + aboot_build_date=2016-03-14T19:11:31.000000000
[FAILED] Failed to start OpenBSD Secure Shell server.
[  283.822055] rc.local[498]: + program_console_speed
[  284.063577] kdump-tools[489]: Starting kdump-tools:
[  284.132833] rc.local[549]: + grep -Eo console=ttyS[0-9]+,[0-9]+
[  284.211002] kdump-tools[525]: no crashkernel= parameter in the kernel cmdline ...
[  284.303330] kdump-tools[543]:  failed!
[FAILED] Failed to start OpenBSD Secure Shell server.
[  284.365378] rc.local[548]: + cat /proc/cmdline
[  284.518882] rc.local[550]: + cut -d , -f2
[  284.580631] rc.local[498]: + speed=
[  284.633924] rc.local[498]: + [ -z  ]
[  284.686692] rc.local[498]: + CONSOLE_SPEED=9600
[FAILED] Failed to start OpenBSD Secure Shell server.
[  284.753450] rc.local[553]: + grep keep-baud
[  284.907216] rc.local[552]: + grep agetty /lib/systemd/system/serial-getty@.service
[  285.013098] rc.local[553]: ExecStart=-/sbin/agetty -o '-p -- \\u' --keep-baud 115200,57600,38400,9600 %I $TERM
[  285.143036] rc.local[498]: + [ 0 = 0 ]
[  285.198484] rc.local[498]: + sed -i s|\-\-keep\-baud .* %I| 9600 %I|g /lib/systemd/system/serial-getty@.service
[  285.326040] rc.local[498]: + systemctl daemon-reload
[FAILED] Failed to start OpenBSD Secure Shell server.
[  285.386376] rc.local[498]: + [ -f /host/image-202205.161660-cfc9af71e/platform/firsttime ]
[  285.590049] rc.local[498]: + echo First boot detected. Performing first boot tasks...
[  285.685844] rc.local[498]: First boot detected. Performing first boot tasks...
[FAILED] Failed to start OpenBSD Secure Shell server.
[  285.791573] rc.local[498]: + [ -n x86_64-arista_7050_qx32 ]
[  285.954394] rc.local[498]: + platform=x86_64-arista_7050_qx32
[  286.029845] rc.local[498]: + [ -d /host/old_config ]
[  286.090484] rc.local[498]: + [ -f /host/minigraph.xml ]
[  286.165878] rc.local[498]: + [ -n  ]
[  286.214031] rc.local[498]: + touch /tmp/pending_config_initialization
[  286.296110] rc.local[498]: + touch /tmp/notify_firstboot_to_platform
[FAILED] Failed to start OpenBSD Secure Shell server.
[  286.381020] rc.local[498]: + [ ! -d /host/reboot-cause/platform ]
[  286.562032] rc.local[498]: + mkdir -p /host/reboot-cause/platform
[  286.638123] rc.local[498]: + [ -d /host/image-202205.161660-cfc9af71e/platform/x86_64-arista_7050_qx32 ]
[  286.765837] rc.local[498]: + sync
[FAILED] Failed to start OpenBSD Secure Shell server.
[  286.809883] rc.local[498]: + [ -n  ]
[  286.950736] rc.local[498]: + mkdir -p /var/platform
[  287.010187] rc.local[498]: + [ -f /etc/default/kdump-tools ]
[  287.085920] rc.local[498]: + sed -i -e s/__PLATFORM__/x86_64-arista_7050_qx32/g /etc/default/kdump-tools
[  287.209923] rc.local[498]: + firsttime_exit
[  287.266033] rc.local[498]: + rm -rf /host/image-202205.161660-cfc9af71e/platform/firsttime
[  287.369937] rc.local[498]: + exit 0

Debian GNU/Linux 11 sonic ttyS0

sonic login: [  297.366275] arista: waiting for switch chip
[  297.617146] arista: switch chip is ready
[  299.666256] arista: yielding...

And logs

admin@sonic:~$ sudo grep ssh /var/log/syslog
Oct 20 16:48:00.592251 sonic INFO kernel: [   18.951205] traps: ssh-keygen[451] trap invalid opcode ip:7f0e33210473 sp:7ffc9b196180 error:0 in libsymcrypt.so.102.0.0[7f0e331db000+4d000]
Oct 20 16:48:01.113772 sonic INFO host-ssh-keygen.sh[435]: /usr/local/bin/host-ssh-keygen.sh: line 8:   451 Illegal instruction     (core dumped) /usr/bin/ssh-keygen -t rsa -N '' -f /etc/ssh/ssh_host_rsy
Oct 20 16:48:01.480695 sonic INFO kernel: [   20.039709] traps: ssh-keygen[534] trap invalid opcode ip:7efeabcf5473 sp:7ffd2af9efb0 error:0 in libsymcrypt.so.102.0.0[7efeabcc0000+4d000]
Oct 20 16:48:01.896650 sonic INFO kernel: [   20.453899] traps: ssh-keygen[574] trap invalid opcode ip:7fc31e9f0473 sp:7ffdb5f62dd0 error:0 in libsymcrypt.so.102.0.0[7fc31e9bb000+4d000]
Oct 20 16:48:02.051369 sonic NOTICE systemd[1]: ssh.service: Control process exited, code=exited, status=132/n/a
Oct 20 16:48:02.201064 sonic WARNING systemd[1]: ssh.service: Failed with result 'exit-code'.
Oct 20 16:48:02.201446 sonic INFO host-ssh-keygen.sh[532]: /usr/local/bin/host-ssh-keygen.sh: line 8:   534 Illegal instruction     (core dumped) /usr/bin/ssh-keygen -t rsa -N '' -f /etc/ssh/ssh_host_rsy
Oct 20 16:48:02.309352 sonic INFO host-ssh-keygen.sh[570]: /usr/local/bin/host-ssh-keygen.sh: line 8:   574 Illegal instruction     (core dumped) /usr/bin/ssh-keygen -t rsa -N '' -f /etc/ssh/ssh_host_rsy
Oct 20 16:48:02.310760 sonic INFO systemd[1]: ssh.service: Scheduled restart job, restart counter is at 1.
Oct 20 16:48:02.311000 sonic NOTICE systemd[1]: ssh.service: Control process exited, code=exited, status=132/n/a
Oct 20 16:48:02.311079 sonic WARNING systemd[1]: ssh.service: Failed with result 'exit-code'.
Oct 20 16:48:02.311238 sonic INFO systemd[1]: ssh.service: Scheduled restart job, restart counter is at 2.
Oct 20 16:48:02.311546 sonic NOTICE systemd[1]: ssh.service: Control process exited, code=exited, status=132/n/a
Oct 20 16:48:02.311621 sonic WARNING systemd[1]: ssh.service: Failed with result 'exit-code'.
Oct 20 16:48:02.311764 sonic INFO systemd[1]: ssh.service: Scheduled restart job, restart counter is at 3.
Oct 20 16:48:02.364672 sonic INFO kernel: [   20.922825] traps: ssh-keygen[599] trap invalid opcode ip:7f8006758473 sp:7ffea09aef10 error:0 in libsymcrypt.so.102.0.0[7f8006723000+4d000]
Oct 20 16:48:02.495796 sonic INFO host-ssh-keygen.sh[597]: /usr/local/bin/host-ssh-keygen.sh: line 8:   599 Illegal instruction     (core dumped) /usr/bin/ssh-keygen -t rsa -N '' -f /etc/ssh/ssh_host_rsy
Oct 20 16:48:02.500326 sonic NOTICE systemd[1]: ssh.service: Control process exited, code=exited, status=132/n/a
Oct 20 16:48:02.500476 sonic WARNING systemd[1]: ssh.service: Failed with result 'exit-code'.
Oct 20 16:48:02.943923 sonic INFO systemd[1]: ssh.service: Scheduled restart job, restart counter is at 4.
Oct 20 16:48:03.119305 sonic INFO host-ssh-keygen.sh[621]: /usr/local/bin/host-ssh-keygen.sh: line 8:   625 Illegal instruction     (core dumped) /usr/bin/ssh-keygen -t rsa -N '' -f /etc/ssh/ssh_host_rsy
Oct 20 16:48:03.119462 sonic NOTICE systemd[1]: ssh.service: Control process exited, code=exited, status=132/n/a
Oct 20 16:48:03.119610 sonic WARNING systemd[1]: ssh.service: Failed with result 'exit-code'.
Oct 20 16:48:03.570567 sonic INFO systemd[1]: ssh.service: Scheduled restart job, restart counter is at 5.
Oct 20 16:48:03.602339 sonic INFO kernel: [   22.160651] traps: ssh-keygen[650] trap invalid opcode ip:7fef6fab5473 sp:7ffec53519a0 error:0 in libsymcrypt.so.102.0.0[7fef6fa80000+4d000]
Oct 20 16:48:03.697029 sonic INFO host-ssh-keygen.sh[648]: /usr/local/bin/host-ssh-keygen.sh: line 8:   650 Illegal instruction     (core dumped) /usr/bin/ssh-keygen -t rsa -N '' -f /etc/ssh/ssh_host_rsy
Oct 20 16:48:03.699281 sonic NOTICE systemd[1]: ssh.service: Control process exited, code=exited, status=132/n/a
Oct 20 16:48:03.699426 sonic WARNING systemd[1]: ssh.service: Failed with result 'exit-code'.
Oct 20 16:48:04.145505 sonic INFO systemd[1]: ssh.service: Scheduled restart job, restart counter is at 6.
Oct 20 16:48:04.145763 sonic WARNING systemd[1]: ssh.service: Start request repeated too quickly.
Oct 20 16:48:04.145841 sonic WARNING systemd[1]: ssh.service: Failed with result 'exit-code'.

Resolution

admin@sonic:~$ sudo systemctl status sshd
● ssh.service - OpenBSD Secure Shell server
     Loaded: loaded (/lib/systemd/system/ssh.service; enabled; vendor preset: e>
    Drop-In: /etc/systemd/system/ssh.service.d
             └─override.conf
     Active: failed (Result: exit-code) since Thu 2022-10-20 16:48:04 UTC; 8min>
       Docs: man:sshd(8)
             man:sshd_config(5)

Oct 20 16:48:04 sonic systemd[1]: ssh.service: Scheduled restart job, restart c>
Oct 20 16:48:04 sonic systemd[1]: Stopped OpenBSD Secure Shell server.
Oct 20 16:48:04 sonic systemd[1]: ssh.service: Start request repeated too quick>
Oct 20 16:48:04 sonic systemd[1]: ssh.service: Failed with result 'exit-code'.
Oct 20 16:48:04 sonic systemd[1]: Failed to start OpenBSD Secure Shell server.

admin@sonic:~$ ssh-keygen
Illegal instruction (core dumped)

admin@sonic:~$ sudo apt remove openssh-client openssh-server                                                                                                                                               
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libcbor0 libfido2-1 libprocps8 procps runit-helper
Use 'sudo apt autoremove' to remove them.
The following packages will be REMOVED:
  openssh-client openssh-server openssh-sftp-server
0 upgraded, 0 newly installed, 3 to remove and 0 not upgraded.
After this operation, 21.7 MB disk space will be freed.
Do you want to continue? [Y/n] 
(Reading database ... 36896 files and directories currently installed.)
Removing openssh-server (1:8.4p1-5+deb11u1+fips) ...
Removing openssh-sftp-server (1:8.4p1-5+deb11u1+fips) ...
Removing openssh-client (1:8.4p1-5+deb11u1+fips) ...

admin@sonic:~$ sudo apt update
Get:1 http://debian-archive.trafficmanager.net/debian bullseye InRelease [116 kB]
Get:2 http://debian-archive.trafficmanager.net/debian-security bullseye-security InRelease [48.4 kB]
Get:3 http://debian-archive.trafficmanager.net/debian bullseye-backports InRelease [49.0 kB]
Get:4 https://download.docker.com/linux/debian bullseye InRelease [43.3 kB]    
Get:5 http://debian-archive.trafficmanager.net/debian bullseye/contrib Sources [51.4 kB]
Get:6 http://debian-archive.trafficmanager.net/debian bullseye/non-free Sources [98.1 kB]
Get:7 http://debian-archive.trafficmanager.net/debian bullseye/main Sources [11.4 MB]
Get:8 https://download.docker.com/linux/debian bullseye/stable amd64 Packages [15.2 kB]
Get:9 http://debian-archive.trafficmanager.net/debian bullseye/contrib amd64 Packages [60.9 kB]
Get:10 http://debian-archive.trafficmanager.net/debian bullseye/main amd64 Packages [11.1 MB]
Get:11 http://debian-archive.trafficmanager.net/debian bullseye/non-free amd64 Packages [122 kB]
Get:12 http://debian-archive.trafficmanager.net/debian-security bullseye-security/main Sources [262 kB]
Get:13 http://debian-archive.trafficmanager.net/debian-security bullseye-security/non-free Sources [558 B]
Get:14 http://debian-archive.trafficmanager.net/debian-security bullseye-security/non-free amd64 Packages [457 B]
Get:15 http://debian-archive.trafficmanager.net/debian-security bullseye-security/main amd64 Packages [240 kB]
Get:16 http://debian-archive.trafficmanager.net/debian bullseye-backports/contrib amd64 Packages [4,400 B]
Get:17 http://debian-archive.trafficmanager.net/debian bullseye-backports/main amd64 Packages [349 kB]
Get:18 http://debian-archive.trafficmanager.net/debian bullseye-backports/non-free amd64 Packages [11.5 kB]
Fetched 24.0 MB in 6s (4,077 kB/s)                    
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
4 packages can be upgraded. Run 'apt list --upgradable' to see them.

admin@sonic:~$ sudo apt install openssh-server
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  openssh-client openssh-sftp-server
Suggested packages:
  keychain libpam-ssh monkeysphere ssh-askpass molly-guard ufw
Recommended packages:
  xauth default-logind | logind | libpam-systemd ncurses-term
The following NEW packages will be installed:
  openssh-client openssh-server openssh-sftp-server
0 upgraded, 3 newly installed, 0 to remove and 4 not upgraded.
Need to get 1,366 kB of archives.
After this operation, 6,146 kB of additional disk space will be used.
Do you want to continue? [Y/n] 
Get:1 http://debian-archive.trafficmanager.net/debian bullseye/main amd64 openssh-client amd64 1:8.4p1-5+deb11u1 [929 kB]
Get:2 http://debian-archive.trafficmanager.net/debian bullseye/main amd64 openssh-sftp-server amd64 1:8.4p1-5+deb11u1 [52.4 kB]
Get:3 http://debian-archive.trafficmanager.net/debian bullseye/main amd64 openssh-server amd64 1:8.4p1-5+deb11u1 [385 kB]
Fetched 1,366 kB in 0s (4,082 kB/s)      
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package openssh-client.
(Reading database ... 36820 files and directories currently installed.)
Preparing to unpack .../openssh-client_1%3a8.4p1-5+deb11u1_amd64.deb ...
Unpacking openssh-client (1:8.4p1-5+deb11u1) ...
Selecting previously unselected package openssh-sftp-server.
Preparing to unpack .../openssh-sftp-server_1%3a8.4p1-5+deb11u1_amd64.deb ...
Unpacking openssh-sftp-server (1:8.4p1-5+deb11u1) ...
Selecting previously unselected package openssh-server.
Preparing to unpack .../openssh-server_1%3a8.4p1-5+deb11u1_amd64.deb ...
Unpacking openssh-server (1:8.4p1-5+deb11u1) ...
Setting up openssh-client (1:8.4p1-5+deb11u1) ...
Setting up openssh-sftp-server (1:8.4p1-5+deb11u1) ...
Setting up openssh-server (1:8.4p1-5+deb11u1) ...
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 78.)
debconf: falling back to frontend: Readline
Creating SSH2 RSA key; this may take some time ...
3072 SHA256:iq1lfloSc1yAjpCxMayAd7uyHO+qh4ByvESgb+Vkc/w root@sonic (RSA)
Creating SSH2 ECDSA key; this may take some time ...
256 SHA256:Pcq/dT7Vt8ckxKHPKoDIYKKoqgBz0tv64JVHHBVurqI root@sonic (ECDSA)
Creating SSH2 ED25519 key; this may take some time ...
256 SHA256:oDLdLM8O5wXUpEPttXAKdSVj71zsTVYaIKn7m0t6C8s root@sonic (ED25519)
rescue-ssh.target is a disabled or a static unit, not starting it.

Oct 20 16:57:45.676268 System is ready

admin@sonic:~$ sudo systemctl status sshd
● ssh.service - OpenBSD Secure Shell server
     Loaded: loaded (/lib/systemd/system/ssh.service; enabled; vendor preset: e>
    Drop-In: /etc/systemd/system/ssh.service.d
             └─override.conf
     Active: active (running) since Thu 2022-10-20 16:57:45 UTC; 35s ago
       Docs: man:sshd(8)
             man:sshd_config(5)
    Process: 9438 ExecStartPre=/usr/local/bin/host-ssh-keygen.sh (code=exited, >
    Process: 9439 ExecStartPre=/usr/sbin/sshd -t (code=exited, status=0/SUCCESS)
   Main PID: 9440 (sshd)
      Tasks: 1 (limit: 4595)
     Memory: 1.0M
     CGroup: /system.slice/ssh.service
             └─9440 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups

Oct 20 16:57:45 sonic systemd[1]: Starting OpenBSD Secure Shell server...
Oct 20 16:57:45 sonic sshd[9440]: Server listening on :: port 22.
Oct 20 16:57:45 sonic sshd[9440]: Server listening on 0.0.0.0 port 22.
Oct 20 16:57:45 sonic systemd[1]: Started OpenBSD Secure Shell server.

@Staphylo
Copy link
Contributor

I was able to repro the issue and it seems to be crashing in libsymcrypt.so on pshufb %xmm2,%xmm0
Not sure why it fails on this sse3 instruction as it should be supported by the CPU.

SONiC recently added support for FIPS certification by changing from openssl to symcrypt-openssl
By reinstalling openssh I you're bypassing symcrypt which seems buggy on this platform.
The openssh version changes from 1:8.4p1-5+deb11u1+fips to 1:8.4p1-5+deb11u1
The fips version pulls in symcryptengine.so and libsymcrypt.so

@hugocollignon
Copy link

Hello

If someone has the same issue, I tried to summarize all what we said in this issue here: https://github.com/hugocollignon/SONiC/blob/main/SONiC-Arista-7050QX-32.md

If all hints are ok for the community, maybe we could add a page in the wiki https://github.com/sonic-net/SONiC/wiki and a link to it in https://github.com/sonic-net/SONiC/wiki/Supported-Devices-and-Platforms?

@etec-masterofsynapse
Copy link

etec-masterofsynapse commented May 14, 2024

Hi all,

sorry to necro this issue, it was the most fitting one I could find in this repo.

I want to run SONiC on an Arista 7050QX-32S. Its DOM is too small, so I bought some SATA M.2s to put in my switches and use as new boot drive.

I followed these steps:

Have you tried the following?

  • downloading the swi on /mnt/drive
  • /mnt/flash/boot-config with SWI=drive:sonic-aboot-broadcom.swi
  • reboot to install sonic in Aboot
  • patch /mnt/flash/boot-config to point on drive

and even modified the boot0 included in the sonic-aboot-barefoot.swi for my switches to switch the remaining flash: to drive: but I still cant get it to work.

This is what I did and the error I receive after the initial extraction happened and it wants to boot SONiC now:

Aboot# cat boot-config
SWI=drive:sonic-aboot-barefoot-epskcustomondrive.swi
Aboot# cp /mnt/usb1/sonic-aboot-barefoot
sonic-aboot-barefoot-epskcustomondrive.swi
sonic-aboot-barefoot.swi
Aboot# cp /mnt/usb1/sonic-aboot-barefoot-epskcustomondrive.swi /mnt/drive/
Aboot# ls /mnt/drive
sonic-aboot-barefoot-epskcustomondrive.swi
Aboot# reboot
Requesting system reboot
[  186.821995] Restarting system.
PM[0xc0]: 0x4058043d


Aboot 4.0.7-13599834


Press Control-C now to enter Aboot shell
Booting drive:sonic-aboot-barefoot-epskcustomondrive.swi
6.61: Cleaning flash content /mnt/drive
6.61: Generating boot-config, machine.conf and cmdline
6.73: Installing image under /mnt/drive/image-master.371261-52f6dd65a
6.73: Moving swi to a tmpfs
10.17: Extracting swi content
12.21: Extracting platform.tar.gz
12.25: Extracting dockerfs.tar.gz from swi
62.80: Remove installer
65.11: Next reboot will use drive:image-master.371261-52f6dd65a/.sonic-boot.swi
65.86: Kexecing...
[   65.867973] Starting new kernel
▒[    3.819921] sd 2:0:0:0: [sdb] No Caching mode page found
[    3.883491] sd 2:0:0:0: [sdb] Assuming drive cache: write through
ALERT! /host/image-master.371261-52f6dd65a/fs.squashfs does not exist.  Dropping to a shell!


BusyBox v1.30.1 (Debian 1:1.30.1-6+b3) built-in shell (ash)
Enter 'help' for a list of built-in commands.

(initramfs)

Am I doing something wrong or are there still pointers to flash: somewhere I didnt check because /host points to /mnt/flash, not /mnt/drive.

@Staphylo
Copy link
Contributor

Booting from /mnt/drive is currently not supported, so it's not really expected to work.
This is also not something in our roadmap for the short term.

Simple boot should be achievable in reasonable time by poking at the boot0 and the initramfs hooks.
However getting this feature in a mergeable state without breaking backward compatibility is far from trivial.

Also note that you are using a barefoot image on a broadcom based product.

@etec-masterofsynapse
Copy link

etec-masterofsynapse commented May 14, 2024

@Staphylo Thanks for the feedback.

So my only remedy for this switch is to buy a larger DOM or get a DOM to USB adapter to use a USB drive within the switch?

Also, regarding the barefoot vs broadcom thing, I used the official download link for my switch model from here: https://github.com/sonic-net/SONiC/blob/sonic_image_md_update/supported_devices_platforms.md which points to https://artprodcus3.artifacts.visualstudio.com/Af91412a5-a906-4990-9d7c-f697b81fc04d/be1b070f-be15-4154-aade-b1d3bfb17054/_apis/artifact/cGlwZWxpbmVhcnRpZmFjdDovL21zc29uaWMvcHJvamVjdElkL2JlMWIwNzBmLWJlMTUtNDE1NC1hYWRlLWIxZDNiZmIxNzA1NC9idWlsZElkLzM3MTI2MS9hcnRpZmFjdE5hbWUvc29uaWMtYnVpbGRpbWFnZS5iYXJlZm9vdA2/content?format=file&subpath=/target/sonic-aboot-barefoot.swi even though its says SONiC-Aboot-Broadcom.

Maybe I should have used https://sonic-net.github.io/SONiC/Supported-Devices-and-Platforms.html instead because on there, its correct.
EDIT: Even though the link says the right thing, you get an

{"$id":"1","innerException":null,"message":"The requested build 51255 could not be found.","typeName":"System.ArgumentException, mscorlib","typeKey":"ArgumentException","errorCode":0,"eventId":0}

error when using it.

See also #1664

@Staphylo
Copy link
Contributor

Yes, at this point I believe your options are either replace the flash device or to spend cycles trying to add support for /mnt/drive in SONiC.

I am aware of the issue with the image links being outdated.
I actually fixed the code to properly deal with broadcom/barefoot and merged the fix.
However as reported here #1473 months ago the website is stale and hasn't been updated since.
You should probably download images directly from the build website which is https://sonic-build.azurewebsites.net/ui/sonic/pipelines

@etec-masterofsynapse
Copy link

I am aware of the issue with the image links being outdated. I actually fixed the code to properly deal with broadcom/barefoot and merged the fix. However as reported here #1473 months ago the website is stale and hasn't been updated since. You should probably download images directly from the build website which is https://sonic-build.azurewebsites.net/ui/sonic/pipelines

I am not too familiar with Azure DevOps, so an add-on question:
When I go to the website you linked, I get a huge list of pipelines for the different platforms and branchnames.
Then, going to https://dev.azure.com/mssonic/build/_build?definitionId=138, which is the link for broadcom/master (is that the right branch? What branch is one supposed to use to get stable and up-to-date SONiC?), I once again get a long list, this time from the different runs of that pipeline.
I am not sure where I am supposed to find the completed .swi file from that build. I would assume its under Artifacts on the left, but going there, I only get a page telling me to connect to a feed.

@quxyzzy
Copy link

quxyzzy commented Jun 25, 2024

Debian GNU/Linux 11 sonic ttyS0

sonic login: [  297.366275] arista: waiting for switch chip
[  297.617146] arista: switch chip is ready
[  299.666256] arista: yielding...

@hugocollignon don't suppose you remember which branch you installed from to get this far?
mine never gives me the switch chip is ready line - it doesn't appear to pick up the platform correctly - and then panics and reboots after 5-15 mins.

@hugocollignon
Copy link

hugocollignon commented Jun 25, 2024

Good question, let's try to find an answer.
When I did those tests, I wrote this KB: https://github.com/hugocollignon/SONiC/blob/main/SONiC-Arista-7050QX-32.md

Inside, I found "At this date (20221110)".
By following:

Follow Platform: broadcom > BranchName: your choice > Builds: Build History > Result: last succeeded build > Artifacts: Artifacts > Name: sonic-buildimage.broadcom > Name: target/sonic-aboot-broadcom.swi
broadcom > master > Build History > Result: build 11/2022 or before
You should be able to find something working.

In the same page, there is a c/p of the first boot: 6.76: Installing image under /mnt/flash/image-202205.161660-cfc9af71e

I didn't find exactly the same artifact, but this should give a good point to start with a version which works. Then, I let you find the last working one.

To answer to

@hugocollignon don't suppose you remember which branch you installed from to get this far?

It was 202205 IIRC (seems coherent looking for the image name)

Hope that helps!

@quxyzzy
Copy link

quxyzzy commented Jul 5, 2024

@hugocollignon i'm getting the same issue with the 202205 branch. if i try and run arista platform it gives an error about prefdl:

root@sonic:~# arista platform
ERROR: Could not read prefdl from SMBus PIIX4 adapter port 1 at 0b20
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/arista/core/prefdl.py", line 216, in getPrefdlCls
    return cls.MAP[version]
KeyError: b'\x00\x00\x00\x00'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/arista/core/platform.py", line 49, in readI2cPrefdlEeprom
    pfdl = eeprom.readPrefdl()
  File "/usr/lib/python3/dist-packages/arista/components/eeprom.py", line 54, in readPrefdl
    return Prefdl.fromBinFile(self.driver.eepromPath())
  File "/usr/lib/python3/dist-packages/arista/core/prefdl.py", line 226, in fromBinFile
    return cls.getPrefdlCls(version)(f=f, version=version)
  File "/usr/lib/python3/dist-packages/arista/core/prefdl.py", line 218, in getPrefdlCls
    raise UnknownPrefdlVersion("unknown prefdl verison %s" % version)
arista.core.prefdl.UnknownPrefdlVersion: unknown prefdl verison b'\x00\x00\x00\x00'
Traceback (most recent call last):
  File "/usr/bin/arista", line 8, in <module>
    sys.exit(main(sys.argv[1:]))
  File "/usr/lib/python3/dist-packages/arista/cli/__init__.py", line 125, in main
    root.runAction(CliContext(), args)
  File "/usr/lib/python3/dist-packages/arista/cli/parser.py", line 77, in runAction
    child.runAction(ctx, args, *others, **kwargs)
  File "/usr/lib/python3/dist-packages/arista/cli/parser.py", line 64, in runAction
    self._runAction(self.action, ctx, args, *others, **kwargs)
  File "/usr/lib/python3/dist-packages/arista/cli/parser.py", line 51, in _runAction
    ret = action.func(ctx, args, *others, **kwargs)
  File "/usr/lib/python3/dist-packages/arista/cli/actions/default.py", line 26, in doDefaultPlatform
    platform = getPlatform(args.platform)
  File "/usr/lib/python3/dist-packages/arista/core/platform.py", line 160, in getPlatform
    platformCls = getPlatformCls(name)
  File "/usr/lib/python3/dist-packages/arista/core/platform.py", line 143, in getPlatformCls
    return detectPlatform()
  File "/usr/lib/python3/dist-packages/arista/core/platform.py", line 129, in detectPlatform
    sku = readSku()
  File "/usr/lib/python3/dist-packages/arista/core/platform.py", line 107, in readSku
    return getSysEepromData().get('SKU')
  File "/usr/lib/python3/dist-packages/arista/core/platform.py", line 102, in getSysEepromData
    syseepromData = getSysEeprom().prefdl()
  File "/usr/lib/python3/dist-packages/arista/core/utils.py", line 423, in funcWrapper
    return func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/arista/core/platform.py", line 85, in prefdl
    return self.readPrefdl().data()
  File "/usr/lib/python3/dist-packages/arista/core/platform.py", line 88, in readPrefdl
    return readPrefdl()
  File "/usr/lib/python3/dist-packages/arista/core/platform.py", line 73, in readPrefdl
    return readI2cPrefdlEeprom()
  File "/usr/lib/python3/dist-packages/arista/core/platform.py", line 54, in readI2cPrefdlEeprom
    raise UnknownPlatformError('Could not identify current platform')
arista.core.exception.UnknownPlatformError: Could not identify current platform

@hugocollignon
Copy link

No idea how this works. @Staphylo maybe?

@quxyzzy
Copy link

quxyzzy commented Jul 28, 2024

@hugocollignon i ended up putting EOS back on to see if it was just all-round cooked. The first attempt at installing it didn't detect the fans or PSUs (but was still able to boot). I can't remember what I ran to get the warning, but it alerted me about needing to put the 2GB variant of EOS on as the version I was using was incompatible. After doing that, it properly detected the hardware, as the startup sequence didn't just go straight to fixed fan speed. I've got a 64GB USB in there but it did indeed come with a 2GB DOM.
I'm wondering if the issue of SONiC not picking up the EEPROM info properly is related to the variant that I have, and if there's perhaps some other subtype of the $platform = "raven" that's causing the issue. @Staphylo sound like anything you've encountered before?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants