Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After installing QEMU Windows 10 virtual machine on the OCF block device and restarting, the internal applications of Windows do not run properly #1601

Open
sunhuan0919 opened this issue Dec 17, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@sunhuan0919
Copy link

Description

The goal is to use OCF to accelerate virtual machine system disks.
Install QEMU windows10 virtual machine on the OCF block device, and it can start normally for the first time after installation. Then shut down the virtual machine, flush the ocf cache, reboot from the OCF block device. The Windows virtual machine can restart normally, but internal applications (such as explorer.exe, ctfmon. exe, etc.) will report errors such as "xx memory referenced by xx instruction cannot be read/written", as shown in the screenshot below.

Expected Behavior

The applications in the Windows virtual machine run normally without any errors.

Actual Behavior

image
image
image
There may also be other phenomena, such as the Windows desktop constantly flashing or a black screen.

Steps to Reproduce

1.Prepare a ceph cluster, create a rbd pool and a rbd image, map the rbd image to a block device using "rbd map" command

# Refer to the Ceph documentation to prepare the Ceph clusterand create a rbd pool and a rbd image
# Map the rbd image to a block device using "rbd map"
rbd map testpool/testimage
# Then we'll have a block device /dev/rbd0, create its by-id path
ln -s /dev/rbd0 /dev/disk/by-id/rbd0

2.Create a OCF block device, SATA SSD as cache and above rbd0 as core

casadm -S -i 1 -d /dev/disk/by-id/ata-KINGSTON_SA400S37480G_50026B7783B45C46-part1 -c wb -x 4 -f
casadm -A -i 1 -d /dev/disk/by-id/rbd0

3.Install Windows10 virtual machine on OCF block device (requires installing virtio win driver)

taskset -c 32,33 qemu-system-x86_64 -name windows \
 -enable-kvm \
 -cpu host -smp 4 \
 -m 8G -object memory-backend-file,id=mem0,size=8G,mem-path=/dev/hugepages,share=on -numa node,memdev=mem0 \
 -net nic -net tap,ifname=tap0,script=no,downscript=no \
 -device ide-cd,drive=d0,bootindex=0 \
 -drive file=/root/Win10_22H2_Chinese_Simplified_x64v1.iso,if=none,id=d0 \
 -device ide-cd,drive=d1 \
 -drive file=/root/virtio-win-0.1.217.iso,if=none,id=d1 \
 -drive file=/dev/cas1-1,format=raw,id=cas1,if=none,cache=none \
 -device virtio-blk-pci,drive=cas1,id=virtioblk0,bus=pci.0,addr=0x4 \
 -device nec-usb-xhci,id=usb,bus=pci.0,addr=0x5 \
 -device usb-tablet \
 -daemonize \
 -vnc 0.0.0.0:0

Following the Windows10 installation setup guide, the virtual machine can be started and used normally without any errors
4.Shutdown the virtual machine and flush cache

kill -15 {qemupid}
casadm -F -i 1

5.Using the same OCF block device as the boot disk, restart the virtual machine

taskset -c 32,33 qemu-system-x86_64 -name windows \
 -enable-kvm \
 -cpu host -smp 4 \
 -m 8G -object memory-backend-file,id=mem0,size=8G,mem-path=/dev/hugepages,share=on -numa node,memdev=mem0 \
 -net nic -net tap,ifname=tap0,script=no,downscript=no \
 -drive file=/dev/cas1-1,format=raw,id=cas1,if=none,cache=none \
 -device virtio-blk-pci,drive=cas1,id=virtioblk0,bus=pci.0,addr=0x4,bootindex=0 \
 -device nec-usb-xhci,id=usb,bus=pci.0,addr=0x5 \
 -device usb-tablet \
 -daemonize \
 -vnc 0.0.0.0:0

It can be launched to the Windows desktop, but there will be above application running errors.

Possible Fix

It seems that the program binary data read by the Windows virtual machine from the OCF block device is incorrect, which is why errors such as incorrect instruction access to incorrect addresses occur.
I am a beginner, so I am not entirely sure if it is an OCF issue. But I have conducted a lot of comparative tests, and currently it seems most likely an OCF issue.
Some comparative tests have been conducted:

  1. Try changing various parameters of OCF, such as cache mode, cache line size, turn off sequential-cutoff, etc.: the same issue
  2. Attempt to replace multiple versions of QEMU, virtio-win driver, or OCF: the same issue
  3. Attempt to boot from SPDK ocf bdev: the same issue
  4. Boot virtual machine directly from /dev/rbd0,instead of /dev/cas1-1: it works fine
  5. Attempt to switch to a Linux virtual machine: it works fine
  6. Attempt to change the caching scheme to bcache: it works fine

Logs

Can provide any necessary related logs.

Configuration files

Can provide any necessary related conf files.

Your Environment

OpenCAS

  • OpenCAS version (commit hash or tag):v24.9
  • Operating System:openEuler 22.03 (LTS-SP4)
  • Kernel version:5.10.0-234.0.0.133.oe2203sp4.x86_64
  • Cache device type (NAND/Optane/other):SATA SSD
  • Core device type (HDD/SSD/other):Ceph RBD image(HDD rbd pool)
  • Cache configuration: As above command in "Steps to Reproduce" section
  • Other
[root@localhost ~]# lsblk
NAME                                                                                                  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda                                                                                                     8:0    0   5.5T  0 disk
└─ceph--c23e9d4e--80ae--47e6--8c37--5d49d708e2bf-osd--block--f6ae4e0e--d630--4ad8--b731--70b0ce1e5db8 253:3    0   5.5T  0 lvm
sdb                                                                                                     8:16   0 447.1G  0 disk
└─sdb1                                                                                                  8:17   0     4G  0 part
sdc                                                                                                     8:32   0   5.5T  0 disk
└─ceph--e3a6d8a4--f83c--4ed1--b8e1--335d4d663f92-osd--block--75145142--c4c6--41ae--a1d0--062b2a878dd4 253:4    0   5.5T  0 lvm
sdd                                                                                                     8:48   0   5.5T  0 disk
└─ceph--14b0fdd7--9a9b--443c--bf36--f63f91eb699a-osd--block--70c69676--15ae--4662--8ea6--a85b4b29fb77 253:2    0   5.5T  0 lvm
sr0                                                                                                    11:0    1  1024M  0 rom
rbd0                                                                                                  251:0    0    50G  0 disk
└─cas1-1                                                                                              252:768  0    50G  0 disk
  ├─cas1-1p1                                                                                          252:769  0    50M  0 part
  ├─cas1-1p2                                                                                          252:770  0  49.4G  0 part
  └─cas1-1p3                                                                                          252:771  0   583M  0 part
nvme0n1                                                                                               259:0    0 119.2G  0 disk
├─nvme0n1p1                                                                                           259:2    0   600M  0 part /boot/efi
├─nvme0n1p2                                                                                           259:3    0     1G  0 part /boot
└─nvme0n1p3                                                                                           259:4    0 117.7G  0 part
  ├─openeuler-root                                                                                    253:0    0    70G  0 lvm  /
  ├─openeuler-swap                                                                                    253:1    0   7.7G  0 lvm  [SWAP]
  └─openeuler-home                                                                                    253:5    0    40G  0 lvm  /home
[root@localhost ~]# casadm -L
type    id   disk        status    write policy   device
cache   1    /dev/sdb1   Running   wb             -
└core   1    /dev/rbd0   Active    -              /dev/cas1-1
[root@localhost ~]# casadm -P -i 1
Cache Id                  1
Cache Size                1028736 [4KiB Blocks] / 3.92 [GiB]
Cache Device              /dev/sdb1
Exported Object           -
Core Devices              1
Inactive Core Devices     0
Write Policy              wb
Cleaning Policy           alru
Promotion Policy          always
Cache line size           4 [KiB]
Metadata Memory Footprint 64.1 [MiB]
Dirty for                 0 [s] / Cache clean
Status                    Running
...

Ceph

  • Ceph version:ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)

QEMU

  • QEMU version:6.2.0 or v9.2.0-rc3

virtio-win driver

  • virtion-win driver version:virtio-win-0.1.217.iso or virtio-win-0.1.266.iso
@sunhuan0919 sunhuan0919 added the bug Something isn't working label Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant