Skip to content
This repository has been archived by the owner on May 30, 2023. It is now read-only.

upgrade: Skip the device which only has the same label name #350

Merged
merged 3 commits into from
Nov 2, 2022

Conversation

Itxaka
Copy link
Contributor

@Itxaka Itxaka commented Oct 31, 2022

  • when upgrading, we need find the related label name and
    upgrade with it. If we have many devices have the same
    label name, it would cause failure. To avoid this case,
    we need add more checking about the target disk should
    be the active cOS disk.

Fixes #348

Signed-off-by: Vicente Cheng vicente.cheng@suse.com

    - when upgrading, we need find the related label name and
      upgrade with it. If we have many devices have the same
      label name, it would cause failure. To avoid this case,
      we need add more checking about the target disk should
      be the active cOS disk.

Signed-off-by: Vicente Cheng <vicente.cheng@suse.com>
@@ -76,6 +76,25 @@ func GetFullDeviceByLabel(runner v1.Runner, label string, attempts int) (*v1.Par
return nil, errors.New("no device found")
}

// GetcOSActiveFullDeviceByLabel works like GetDeviceByLabel, but it will try to get as much info as possible from the existing
// partition and make sure they are running(active)
func GetcOSActiveFullDeviceByLabel(runner v1.Runner, label string, attempts int) (*v1.Partition, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only real change, the others are lint+test fix+release fix for docker

@Itxaka Itxaka added kind/bug Something isn't working status/Do not merge Note ready for merging prio: high area/upgrade affects/harvester Issues that affect harvester directly labels Oct 31, 2022
@codecov
Copy link

codecov bot commented Oct 31, 2022

Codecov Report

❗ No coverage uploaded for pull request base (v0.0.14-harvester1@edc412b). Click here to learn what that means.
Patch has no changes to coverable lines.

❗ Current head a04be02 differs from pull request most recent head d769394. Consider uploading reports for the commit d769394 to get more accurate results

Additional details and impacted files
@@                  Coverage Diff                  @@
##             v0.0.14-harvester1     rancher/elemental-cli#350   +/-   ##
=====================================================
  Coverage                      ?   73.45%           
=====================================================
  Files                         ?       42           
  Lines                         ?     3790           
  Branches                      ?        0           
=====================================================
  Hits                          ?     2784           
  Misses                        ?      807           
  Partials                      ?      199           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@Itxaka Itxaka marked this pull request as ready for review November 2, 2022 08:20
@davidcassany
Copy link
Contributor

mmm... I am in favor of branching to find a solution for certain specific cases requiring a hot fix. However this feels like a workaround to fix what is, or should be, a not supported scenario (multiple devices on the same host including more than one filesystem with equivalent labels).

IMHO if several installations are required over the same host and multiple state/oem/persistent/recovery partitions are required they should be simply labeled different. Fully customized labels should be possible at install time, we are not yet there but ideas like the ones discussed in rancher/elemental-toolkit#1635 and rancher/elemental-toolkit#1773 are pointing to this direction.

@Vicente-Cheng would it be possible to also solve this issue by making all labels customizable at install time? Think this would be the good solution for a long term. Also note that this is no only about filesystem labels, we also relay on partition labels (which are not configurable at all) when being on GPT under certain circumstances.

I believe we need to fully understand the use case and then see how the layout should be or look like. @Vicente-Cheng do we have anywhere some explanation about the host setup raising this issue? I am curious to know how to reproduce this issue from a user perspective.

@Itxaka
Copy link
Contributor Author

Itxaka commented Nov 2, 2022

@davidcassany notice that in the original report it shows that this is due to FC storage and multipath not enabled which shows the same disk 4 times in the system, but they all refer to the same disk, so its not real disks witht he same labels, its the same disk showed by the system as 4 different ones.

Also this PR is done against an older version which didnt have our install state yaml, which fixes this (Or at least I think it does). This is to support upgrading from harvester 1.0.3 to 1.1.0, which uses an old v0.0.14 elemental-cli version.

So this is a current bug on a existing installed system, not a theoretical problem that arise on CI/testing :) The cli issue has the link to the original harvester user reported issue: harvester/harvester#3070

@bk201
Copy link
Member

bk201 commented Nov 2, 2022

@Vicente-Cheng do we have anywhere some explanation about the host setup raising this issue? I am curious to know how to reproduce this issue from a user perspective.

I checked with the user, and the disk is an FC disk with multi-path enabled. Thus, multiple "disks" are observed on the host.

@bk201
Copy link
Member

bk201 commented Nov 2, 2022

For Harvester's use case, I guess it's also possible to see this when users run elemental-based VMs. Volumes are just iSCSI disks on the host, and their partitions are visible.

@davidcassany
Copy link
Contributor

@davidcassany notice that in the original report it shows that this is due to FC storage and multipath not enabled which shows the same disk 4 times in the system, but they all refer to the same disk, so its not real disks witht he same labels, its the same disk showed by the system as 4 different ones.

Oh, damn, then this feels like a multipath issue... thanks for the insight, I had the feeling this was something like having a host with multiple disks holding a different Elemental based OS installation in each.

Also this PR is done against an older version which didnt have our install state yaml, which fixes this (Or at least I think it does). This is to support upgrading from harvester 1.0.3 to 1.1.0, which uses an old v0.0.14 elemental-cli version.

I don't think state yaml file is solving any of this, hence this is a valid issue to consider.

I checked with the user, and the disk is an FC disk with multi-path enabled. Thus, multiple "disks" are observed on the host.

Yes, multipath takes over the device control and is likely to cause issues unless you access the device through the generated device links. IIRC I think we completely disabled multipath in initrd to prevent this sort of issue.

For Harvester's use case, I guess it's also possible to see this when users run elemental-based VMs. Volumes are just iSCSI disks on the host, and their partitions are visible.

This sounds relevant to me. I am not sure I follow it, do you meant here could be a host (elemental based) that is actually holding the Elemental based VMs and the VMs volumes are disks visible to the host? This is a very interesting case... 🤔

Copy link
Contributor

@davidcassany davidcassany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggested patch looks good to me.

However I fail to see how this solves an eventual issue caused by multipath. I see this solving the issue for multiple devices having the same layout, but not the same device having multiple references or accessors, in such a case I would expect all of them to provide the same device information, including mountpoints.

distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should drop the changes from this file as in main.go.

main.go Outdated
@@ -5,7 +5,7 @@ Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0
http://www.apache.org/licenses/LICENSE-2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should drop these changes

@Itxaka
Copy link
Contributor Author

Itxaka commented Nov 2, 2022

FYI, still testing this. Seems that qemu can replicate this by creating 2 disks with the same serial pointing to the same image:

<disk type="file" device="disk">
  <driver name="qemu" type="raw" cache="none"/>
  <source file="/var/lib/libvirt/images/opensusetumbleweed-1.qcow2"/>
  <target dev="sda" bus="scsi"/>
  <shareable/>
  <serial>0001</serial>
  <address type="drive" controller="0" bus="0" target="0" unit="0"/>
</disk>
<disk type="file" device="disk">
  <driver name="qemu" type="raw" cache="none"/>
  <source file="/var/lib/libvirt/images/opensusetumbleweed-1.qcow2"/>
  <target dev="sdb" bus="scsi"/>
  <shareable/>
  <serial>0001</serial>
  <address type="drive" controller="0" bus="0" target="0" unit="1"/>
</disk>

@Itxaka
Copy link
Contributor Author

Itxaka commented Nov 2, 2022

indeed, only one is mounted:

rancher-12549:~ # lsblk -o NAME,LABEL,MOUNTPOINTS
NAME   LABEL          MOUNTPOINTS
loop0  COS_ACTIVE     /
sda                   
├─sda1 COS_GRUB       
├─sda2 COS_OEM        /oem
├─sda3 COS_RECOVERY   
├─sda4 COS_STATE      /run/initramfs/cos-state
└─sda5 COS_PERSISTENT /var/lib/calico
                      /var/lib/cni
                      /var/lib/longhorn
                      /var/lib/wicked
                      /var/lib/kubelet
                      /var/lib/rancher
                      /var/lib/elemental
                      /var/log
                      /usr/libexec
                      /root
                      /opt
                      /home
                      /etc/cni
                      /etc/iscsi
                      /etc/ssh
                      /etc/rancher
                      /etc/systemd
                      /usr/local
sdb                   
├─sdb1 COS_GRUB       
├─sdb2 COS_OEM        
├─sdb3 COS_RECOVERY   
├─sdb4 COS_STATE      
└─sdb5 COS_PERSISTENT 

@Itxaka
Copy link
Contributor Author

Itxaka commented Nov 2, 2022

This pach is only valid for active or passive, not for latest. It should be ok for an older non-used version which only harvester uses and only for the upgrade but nothing else.

itxaka added 2 commits November 2, 2022 15:33
url of the test download has changed

Signed-off-by: itxaka <igarcia@suse.com>
Repo has changed. Also disable master/PR trigger on the docker job
as for this branch is not needed

Signed-off-by: itxaka <igarcia@suse.com>
@Itxaka Itxaka force-pushed the Vicente-Cheng-fix-upgrade-fail branch from a04be02 to d769394 Compare November 2, 2022 14:33
@Itxaka
Copy link
Contributor Author

Itxaka commented Nov 2, 2022

@davidcassany lint commit dropped.

@Itxaka Itxaka merged commit 7aaac4a into v0.0.14-harvester1 Nov 2, 2022
@Itxaka Itxaka deleted the Vicente-Cheng-fix-upgrade-fail branch November 2, 2022 14:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
affects/harvester Issues that affect harvester directly area/upgrade kind/bug Something isn't working prio: high status/Do not merge Note ready for merging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants