upgrade: Skip the device which only has the same label name #350

Itxaka · 2022-10-31T11:17:46Z

when upgrading, we need find the related label name and
upgrade with it. If we have many devices have the same
label name, it would cause failure. To avoid this case,
we need add more checking about the target disk should
be the active cOS disk.

Fixes #348

Signed-off-by: Vicente Cheng vicente.cheng@suse.com

- when upgrading, we need find the related label name and upgrade with it. If we have many devices have the same label name, it would cause failure. To avoid this case, we need add more checking about the target disk should be the active cOS disk. Signed-off-by: Vicente Cheng <vicente.cheng@suse.com>

Itxaka · 2022-10-31T11:21:02Z

pkg/utils/common.go

@@ -76,6 +76,25 @@ func GetFullDeviceByLabel(runner v1.Runner, label string, attempts int) (*v1.Par
 return nil, errors.New("no device found")
 }

+// GetcOSActiveFullDeviceByLabel works like GetDeviceByLabel, but it will try to get as much info as possible from the existing
+// partition and make sure they are running(active)
+func GetcOSActiveFullDeviceByLabel(runner v1.Runner, label string, attempts int) (*v1.Partition, error) {


This is the only real change, the others are lint+test fix+release fix for docker

codecov · 2022-10-31T11:27:19Z

Codecov Report

❗ No coverage uploaded for pull request base (v0.0.14-harvester1@edc412b). Click here to learn what that means.
Patch has no changes to coverable lines.

❗ Current head a04be02 differs from pull request most recent head d769394. Consider uploading reports for the commit d769394 to get more accurate results

Additional details and impacted files

@@                  Coverage Diff                  @@
##             v0.0.14-harvester1     rancher/elemental-cli#350   +/-   ##
=====================================================
  Coverage                      ?   73.45%           
=====================================================
  Files                         ?       42           
  Lines                         ?     3790           
  Branches                      ?        0           
=====================================================
  Hits                          ?     2784           
  Misses                        ?      807           
  Partials                      ?      199

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

davidcassany · 2022-11-02T08:50:31Z

mmm... I am in favor of branching to find a solution for certain specific cases requiring a hot fix. However this feels like a workaround to fix what is, or should be, a not supported scenario (multiple devices on the same host including more than one filesystem with equivalent labels).

IMHO if several installations are required over the same host and multiple state/oem/persistent/recovery partitions are required they should be simply labeled different. Fully customized labels should be possible at install time, we are not yet there but ideas like the ones discussed in rancher/elemental-toolkit#1635 and rancher/elemental-toolkit#1773 are pointing to this direction.

@Vicente-Cheng would it be possible to also solve this issue by making all labels customizable at install time? Think this would be the good solution for a long term. Also note that this is no only about filesystem labels, we also relay on partition labels (which are not configurable at all) when being on GPT under certain circumstances.

I believe we need to fully understand the use case and then see how the layout should be or look like. @Vicente-Cheng do we have anywhere some explanation about the host setup raising this issue? I am curious to know how to reproduce this issue from a user perspective.

Itxaka · 2022-11-02T09:02:26Z

@davidcassany notice that in the original report it shows that this is due to FC storage and multipath not enabled which shows the same disk 4 times in the system, but they all refer to the same disk, so its not real disks witht he same labels, its the same disk showed by the system as 4 different ones.

Also this PR is done against an older version which didnt have our install state yaml, which fixes this (Or at least I think it does). This is to support upgrading from harvester 1.0.3 to 1.1.0, which uses an old v0.0.14 elemental-cli version.

So this is a current bug on a existing installed system, not a theoretical problem that arise on CI/testing :) The cli issue has the link to the original harvester user reported issue: harvester/harvester#3070

bk201 · 2022-11-02T09:27:52Z

@Vicente-Cheng do we have anywhere some explanation about the host setup raising this issue? I am curious to know how to reproduce this issue from a user perspective.

I checked with the user, and the disk is an FC disk with multi-path enabled. Thus, multiple "disks" are observed on the host.

bk201 · 2022-11-02T09:59:44Z

For Harvester's use case, I guess it's also possible to see this when users run elemental-based VMs. Volumes are just iSCSI disks on the host, and their partitions are visible.

davidcassany · 2022-11-02T11:15:55Z

@davidcassany notice that in the original report it shows that this is due to FC storage and multipath not enabled which shows the same disk 4 times in the system, but they all refer to the same disk, so its not real disks witht he same labels, its the same disk showed by the system as 4 different ones.

Oh, damn, then this feels like a multipath issue... thanks for the insight, I had the feeling this was something like having a host with multiple disks holding a different Elemental based OS installation in each.

Also this PR is done against an older version which didnt have our install state yaml, which fixes this (Or at least I think it does). This is to support upgrading from harvester 1.0.3 to 1.1.0, which uses an old v0.0.14 elemental-cli version.

I don't think state yaml file is solving any of this, hence this is a valid issue to consider.

I checked with the user, and the disk is an FC disk with multi-path enabled. Thus, multiple "disks" are observed on the host.

Yes, multipath takes over the device control and is likely to cause issues unless you access the device through the generated device links. IIRC I think we completely disabled multipath in initrd to prevent this sort of issue.

For Harvester's use case, I guess it's also possible to see this when users run elemental-based VMs. Volumes are just iSCSI disks on the host, and their partitions are visible.

This sounds relevant to me. I am not sure I follow it, do you meant here could be a host (elemental based) that is actually holding the Elemental based VMs and the VMs volumes are disks visible to the host? This is a very interesting case... 🤔

davidcassany

The suggested patch looks good to me.

However I fail to see how this solves an eventual issue caused by multipath. I see this solving the issue for multiple devices having the same layout, but not the same device having multiple references or accessors, in such a case I would expect all of them to provide the same device information, including mountpoints.

davidcassany · 2022-11-02T13:21:15Z

pkg/action/action_suite_test.go

+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.


I think we should drop the changes from this file as in main.go.

davidcassany · 2022-11-02T13:21:59Z

main.go

@@ -5,7 +5,7 @@ Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
 You may obtain a copy of the License at

- http://www.apache.org/licenses/LICENSE-2.0
+ http://www.apache.org/licenses/LICENSE-2.0


We should drop these changes

Itxaka · 2022-11-02T14:27:07Z

FYI, still testing this. Seems that qemu can replicate this by creating 2 disks with the same serial pointing to the same image:

<disk type="file" device="disk">
  <driver name="qemu" type="raw" cache="none"/>
  <source file="/var/lib/libvirt/images/opensusetumbleweed-1.qcow2"/>
  <target dev="sda" bus="scsi"/>
  <shareable/>
  <serial>0001</serial>
  <address type="drive" controller="0" bus="0" target="0" unit="0"/>
</disk>
<disk type="file" device="disk">
  <driver name="qemu" type="raw" cache="none"/>
  <source file="/var/lib/libvirt/images/opensusetumbleweed-1.qcow2"/>
  <target dev="sdb" bus="scsi"/>
  <shareable/>
  <serial>0001</serial>
  <address type="drive" controller="0" bus="0" target="0" unit="1"/>
</disk>

Itxaka · 2022-11-02T14:29:52Z

indeed, only one is mounted:

rancher-12549:~ # lsblk -o NAME,LABEL,MOUNTPOINTS
NAME   LABEL          MOUNTPOINTS
loop0  COS_ACTIVE     /
sda                   
├─sda1 COS_GRUB       
├─sda2 COS_OEM        /oem
├─sda3 COS_RECOVERY   
├─sda4 COS_STATE      /run/initramfs/cos-state
└─sda5 COS_PERSISTENT /var/lib/calico
                      /var/lib/cni
                      /var/lib/longhorn
                      /var/lib/wicked
                      /var/lib/kubelet
                      /var/lib/rancher
                      /var/lib/elemental
                      /var/log
                      /usr/libexec
                      /root
                      /opt
                      /home
                      /etc/cni
                      /etc/iscsi
                      /etc/ssh
                      /etc/rancher
                      /etc/systemd
                      /usr/local
sdb                   
├─sdb1 COS_GRUB       
├─sdb2 COS_OEM        
├─sdb3 COS_RECOVERY   
├─sdb4 COS_STATE      
└─sdb5 COS_PERSISTENT

Itxaka · 2022-11-02T14:32:31Z

This pach is only valid for active or passive, not for latest. It should be ok for an older non-used version which only harvester uses and only for the upgrade but nothing else.

url of the test download has changed Signed-off-by: itxaka <igarcia@suse.com>

Repo has changed. Also disable master/PR trigger on the docker job as for this branch is not needed Signed-off-by: itxaka <igarcia@suse.com>

Itxaka · 2022-11-02T14:33:27Z

@davidcassany lint commit dropped.

Itxaka requested review from Vicente-Cheng, bk201 and a team October 31, 2022 11:17

Itxaka mentioned this pull request Oct 31, 2022

upgrade: Skip the device which only has the same label name #349

Closed

Itxaka commented Oct 31, 2022

View reviewed changes

Itxaka added kind/bug Something isn't working status/Do not merge Note ready for merging prio: high area/upgrade affects/harvester Issues that affect harvester directly labels Oct 31, 2022

Itxaka marked this pull request as ready for review November 2, 2022 08:20

Itxaka mentioned this pull request Nov 2, 2022

If more than one partition has a COS_* label, upgrade wont work #348

Closed

davidcassany approved these changes Nov 2, 2022

View reviewed changes

davidcassany reviewed Nov 2, 2022

View reviewed changes

itxaka added 2 commits November 2, 2022 15:33

Fix http tests

4523bf1

url of the test download has changed Signed-off-by: itxaka <igarcia@suse.com>

Update docker ci file to push to proper repo

d769394

Repo has changed. Also disable master/PR trigger on the docker job as for this branch is not needed Signed-off-by: itxaka <igarcia@suse.com>

Itxaka force-pushed the Vicente-Cheng-fix-upgrade-fail branch from a04be02 to d769394 Compare November 2, 2022 14:33

Itxaka merged commit 7aaac4a into v0.0.14-harvester1 Nov 2, 2022

Itxaka deleted the Vicente-Cheng-fix-upgrade-fail branch November 2, 2022 14:36

Vicente-Cheng mentioned this pull request Nov 7, 2022

Upgrade: wrap the elemental CLI for fixing the upgrade failure harvester/harvester#3118

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upgrade: Skip the device which only has the same label name #350

upgrade: Skip the device which only has the same label name #350

Itxaka commented Oct 31, 2022

Itxaka Oct 31, 2022

codecov bot commented Oct 31, 2022 •

edited

Loading

davidcassany commented Nov 2, 2022

Itxaka commented Nov 2, 2022

bk201 commented Nov 2, 2022

bk201 commented Nov 2, 2022

davidcassany commented Nov 2, 2022

davidcassany left a comment

davidcassany Nov 2, 2022

davidcassany Nov 2, 2022

Itxaka commented Nov 2, 2022

Itxaka commented Nov 2, 2022

Itxaka commented Nov 2, 2022

Itxaka commented Nov 2, 2022

upgrade: Skip the device which only has the same label name #350

upgrade: Skip the device which only has the same label name #350

Conversation

Itxaka commented Oct 31, 2022

Itxaka Oct 31, 2022

Choose a reason for hiding this comment

codecov bot commented Oct 31, 2022 • edited Loading

Codecov Report

davidcassany commented Nov 2, 2022

Itxaka commented Nov 2, 2022

bk201 commented Nov 2, 2022

bk201 commented Nov 2, 2022

davidcassany commented Nov 2, 2022

davidcassany left a comment

Choose a reason for hiding this comment

davidcassany Nov 2, 2022

Choose a reason for hiding this comment

davidcassany Nov 2, 2022

Choose a reason for hiding this comment

Itxaka commented Nov 2, 2022

Itxaka commented Nov 2, 2022

Itxaka commented Nov 2, 2022

Itxaka commented Nov 2, 2022

codecov bot commented Oct 31, 2022 •

edited

Loading