-
Notifications
You must be signed in to change notification settings - Fork 11
upgrade: Skip the device which only has the same label name #350
Conversation
- when upgrading, we need find the related label name and upgrade with it. If we have many devices have the same label name, it would cause failure. To avoid this case, we need add more checking about the target disk should be the active cOS disk. Signed-off-by: Vicente Cheng <vicente.cheng@suse.com>
@@ -76,6 +76,25 @@ func GetFullDeviceByLabel(runner v1.Runner, label string, attempts int) (*v1.Par | |||
return nil, errors.New("no device found") | |||
} | |||
|
|||
// GetcOSActiveFullDeviceByLabel works like GetDeviceByLabel, but it will try to get as much info as possible from the existing | |||
// partition and make sure they are running(active) | |||
func GetcOSActiveFullDeviceByLabel(runner v1.Runner, label string, attempts int) (*v1.Partition, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only real change, the others are lint+test fix+release fix for docker
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## v0.0.14-harvester1 rancher/elemental-cli#350 +/- ##
=====================================================
Coverage ? 73.45%
=====================================================
Files ? 42
Lines ? 3790
Branches ? 0
=====================================================
Hits ? 2784
Misses ? 807
Partials ? 199 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
mmm... I am in favor of branching to find a solution for certain specific cases requiring a hot fix. However this feels like a workaround to fix what is, or should be, a not supported scenario (multiple devices on the same host including more than one filesystem with equivalent labels). IMHO if several installations are required over the same host and multiple state/oem/persistent/recovery partitions are required they should be simply labeled different. Fully customized labels should be possible at install time, we are not yet there but ideas like the ones discussed in rancher/elemental-toolkit#1635 and rancher/elemental-toolkit#1773 are pointing to this direction. @Vicente-Cheng would it be possible to also solve this issue by making all labels customizable at install time? Think this would be the good solution for a long term. Also note that this is no only about filesystem labels, we also relay on partition labels (which are not configurable at all) when being on GPT under certain circumstances. I believe we need to fully understand the use case and then see how the layout should be or look like. @Vicente-Cheng do we have anywhere some explanation about the host setup raising this issue? I am curious to know how to reproduce this issue from a user perspective. |
@davidcassany notice that in the original report it shows that this is due to FC storage and multipath not enabled which shows the same disk 4 times in the system, but they all refer to the same disk, so its not real disks witht he same labels, its the same disk showed by the system as 4 different ones. Also this PR is done against an older version which didnt have our install state yaml, which fixes this (Or at least I think it does). This is to support upgrading from harvester 1.0.3 to 1.1.0, which uses an old v0.0.14 elemental-cli version. So this is a current bug on a existing installed system, not a theoretical problem that arise on CI/testing :) The cli issue has the link to the original harvester user reported issue: harvester/harvester#3070 |
I checked with the user, and the disk is an FC disk with multi-path enabled. Thus, multiple "disks" are observed on the host. |
For Harvester's use case, I guess it's also possible to see this when users run elemental-based VMs. Volumes are just iSCSI disks on the host, and their partitions are visible. |
Oh, damn, then this feels like a multipath issue... thanks for the insight, I had the feeling this was something like having a host with multiple disks holding a different Elemental based OS installation in each.
I don't think state yaml file is solving any of this, hence this is a valid issue to consider.
Yes, multipath takes over the device control and is likely to cause issues unless you access the device through the generated device links. IIRC I think we completely disabled multipath in initrd to prevent this sort of issue.
This sounds relevant to me. I am not sure I follow it, do you meant here could be a host (elemental based) that is actually holding the Elemental based VMs and the VMs volumes are disks visible to the host? This is a very interesting case... 🤔 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggested patch looks good to me.
However I fail to see how this solves an eventual issue caused by multipath. I see this solving the issue for multiple devices having the same layout, but not the same device having multiple references or accessors, in such a case I would expect all of them to provide the same device information, including mountpoints.
pkg/action/action_suite_test.go
Outdated
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should drop the changes from this file as in main.go
.
main.go
Outdated
@@ -5,7 +5,7 @@ Licensed under the Apache License, Version 2.0 (the "License"); | |||
you may not use this file except in compliance with the License. | |||
You may obtain a copy of the License at | |||
|
|||
http://www.apache.org/licenses/LICENSE-2.0 | |||
http://www.apache.org/licenses/LICENSE-2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should drop these changes
FYI, still testing this. Seems that qemu can replicate this by creating 2 disks with the same serial pointing to the same image:
|
indeed, only one is mounted:
|
This pach is only valid for active or passive, not for latest. It should be ok for an older non-used version which only harvester uses and only for the upgrade but nothing else. |
url of the test download has changed Signed-off-by: itxaka <igarcia@suse.com>
Repo has changed. Also disable master/PR trigger on the docker job as for this branch is not needed Signed-off-by: itxaka <igarcia@suse.com>
a04be02
to
d769394
Compare
@davidcassany lint commit dropped. |
upgrade with it. If we have many devices have the same
label name, it would cause failure. To avoid this case,
we need add more checking about the target disk should
be the active cOS disk.
Fixes #348
Signed-off-by: Vicente Cheng vicente.cheng@suse.com