KVM: Difference between revisions
No edit summary |
m (→Networking) |
||
(35 intermediate revisions by 21 users not shown) | |||
Line 1: | Line 1: | ||
[https://www.linux-kvm.org/page/Main_Page KVM] is an open source virtualization solution in a kernel module. KVM can virtualize x86, PowerPC, and S390 guests. | [https://www.linux-kvm.org/page/Main_Page KVM] is an free and open source virtualization solution in a kernel module. Although it is often simply referred to as KVM, the actual hypervisor is [https://www.qemu.org QEMU]. QEMU runs from user-space, but can integrate with KVM, providing better performance by leveraging the hardware from kernel-space. QEMU can virtualize x86, PowerPC, and S390 guests, amongst others. [https://libvirt.org Libvirt] is a management framework that integrates with QEMU/KVM, [[LXC]], [[Xen_Dom0|Xen]] and others. | ||
== Installation == | |||
The following commands provide '''libvirt''' as well as '''QEMU with emulation for x86_64''' and '''qemu-img''', a necessary component for using various disk formats such as qcow2. Without qemu-img, only raw disks are available. It can also convert images between several formats like vhdx and vmdk. It also provides the metapackage '''qemu-modules''', which provides subpackages needed for special features. In versions of Alpine before 3.13.0 these features were covered by '''QEMU with emulation for x86_64'''. | |||
{{Cmd|<nowiki># apk add libvirt-daemon qemu-img qemu-system-x86_64 qemu-modules openrc | |||
# rc-update add libvirtd</nowiki>}} | |||
= | == Networking == | ||
By default, libvirt uses NAT for VM connectivity. If you want to use the default configuration, you need to load the tun module. | |||
{{Cmd|# modprobe tun}} | |||
If you want to use | |||
{{Cmd| | |||
Add tun to autostart: | |||
{{Cmd| | {{Cmd|# echo "tun" >> /etc/modules-load.d/tun.conf}} | ||
To make the tun module load on boot, use this command: | |||
{{Cmd| | {{Cmd|# cat /etc/modules {{!}} grep tun {{!}}{{!}} echo tun >> /etc/modules}} | ||
If you prefer bridging a guest over your Ethernet interface, you need to make a [[Bridge#Configuration_file|bridge]]. | |||
{{Cmd| | |||
It's quite common to use bridges with KVM environments. But when IPv6 is used, Alpine will assign itself a link-local address as well as an SLAAC address in case there's a router sending Router Advertisements. You don't want this because you don't want the KVM host to have an IP address in every network it serves to guests. Unfortunately IPv6 can not just be disabled for the bridge via a sysctl configuration file, because the bridge might not be up when the sysctl config is applied during boot. What works is to put a post-up hook into the /etc/network/interfaces file like this: | |||
auto brlan | |||
iface brlan inet manual | |||
bridge-ports eth1.5 | |||
bridge-stp 0 | |||
post-up ip -6 a flush dev brlan; sysctl -w net.ipv6.conf.brlan.disable_ipv6=1 | |||
== Management == | |||
For non-root management, you will need to add your user to the libvirt group. | |||
{{Cmd|# addgroup user libvirt}} | |||
You can use libvirt's virsh at the CLI. It can execute commands as well as run as an interactive shell. Read its manual page and/or use the "help" command for more info. Some basic commands are: | |||
{{Cmd|<nowiki>virsh help | |||
virsh list --all | |||
virsh start $domain | |||
virsh shutdown $domain</nowiki> | |||
}} | |||
The libvirt project provides a GUI for managing hosts, called virt-manager. It handles local systems as well as remote ones via SSH. | |||
{{Cmd|<nowiki># apk add dbus polkit virt-manager font-terminus | |||
# rc-update add dbus</nowiki>}} | |||
In order to use libvirtd to remotely control KVM over ssh PolicyKit needs a .pkla informing it that this is allowed. | In order to use libvirtd to remotely control KVM over ssh PolicyKit needs a .pkla informing it that this is allowed. | ||
Write the following file to /etc/polkit-1/localauthority/50-local.d/50-libvirt-ssh-remote-access-policy.pkla | Write the following file to /etc/polkit-1/localauthority/50-local.d/50-libvirt-ssh-remote-access-policy.pkla | ||
Line 36: | Line 50: | ||
ResultActive=yes</nowiki> | ResultActive=yes</nowiki> | ||
}} | }} | ||
== Provision an Alpine Linux vm with virt-install == | |||
You can use {{Path|virt-install}} to install Alpine in a VM. First create the {{ Path | meta-data}} and {{Path | user-data }} files. | |||
{{Path|meta-data}}: | |||
hostname: alpine-vm | |||
{{Path|user-data}}: | |||
<pre> | |||
#alpine-config | |||
ssh_authorized_keys: | |||
- ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOIiHcbg/7ytfLFHUNLRgEAubFz/13SwXBOM/05GNZe4 ncopa@ncopa-desktop | |||
apk: | |||
repositories: | |||
- base_url: https://dl-cdn.alpinelinux.org/alpine | |||
repos: | |||
- main | |||
- community | |||
packages: | |||
- tmux | |||
- curl | |||
runcmd: | |||
- rm /etc/runlevels/*/tiny-cloud* | |||
- lbu include /root/.ssh /home/alpine/.ssh | |||
- ERASE_DISKS=/dev/vda setup-disk -m sys /dev/vda | |||
- poweroff | |||
</pre> | |||
Then run: | |||
{{Cmd | <nowiki>virt-install --name alpine-vm \ | |||
--disk size=4 \ | |||
--location $HOME/Downloads/alpine-virt-3.20.1-x86_64.iso,kernel=boot/vmlinuz-virt,initrd=boot/initramfs-virt \ | |||
--extra-args console=ttyS0 \ | |||
--osinfo alpinelinux3.19 \ | |||
--graphics none \ | |||
--console pty,target_type=serial \ | |||
--cloud-init meta-data=meta-data,user-data=user-data | |||
</nowiki>}} | |||
== Guest lifecycle management == | |||
The libvirt-guests service (available from Alpine 3.13.5) allows running guests to be automatically suspended or shut down when the host is shut down or rebooted. | |||
The service is configured in /etc/conf.d/libvirt-guests. Enable the service with {{Cmd|# rc-update add libvirt-guests}} | |||
== vfio == | |||
VFIO is more flexible way to do PCI passthrough. Let's suppose you want to use following ethernet card as PCI device in a VM. | |||
# lspci | grep 02:00.0 | |||
02:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) | |||
# lspci -n -s 02:00.0 | |||
02:00.0 0200: 8086:10c9 (rev 01) | |||
First, create ''/etc/mkinitfs/features.d/vfio.modules'' with the following content, so mkinitfs includes the VFIO modules in the initramfs. | |||
kernel/drivers/vfio/vfio.ko.* | |||
kernel/drivers/vfio/vfio_virqfd.ko.* | |||
kernel/drivers/vfio/vfio_iommu_type1.ko.* | |||
kernel/drivers/vfio/pci/vfio-pci.ko.* | |||
Add ''vfio'' the the list of features in ''/etc/mkinitfs/mkinitfs.conf''. | |||
Modify following file to instruct ''mkinitfs'' to load following module with the options and rebuild kernel ramdisk. | |||
# cat /etc/modprobe.d/vfio.conf <<EOF | |||
options vfio-pci ids=8086:10c9 | |||
options vfio_iommu_type1 allow_unsafe_interrupts=1 | |||
softdep igb pre: vfio-pci | |||
EOF | |||
# mkinitfs | |||
Now we need to edit the "default_kernel_opts" and "modules" sections in the update-extlinux.conf file. Edit the "default_kernel_opts" to include ''intel_iommu=o iommu=pt'' for Intel platform (AMD uses ''amd_iommu=on''), and add the VFIO modules to the "modules" section. | |||
# grep '^default_kernel_opts\|^modules' /etc/update-extlinux.conf | |||
default_kernel_opts="quiet rootfstype=ext4 intel_iommu=on iommu=pt" | |||
modules=sd-mod,usb-storage,ext4,raid1,vfio,vfio-pci,vfio_iommu_type1,vfio_virqfd | |||
For syslinux/extlinux, run: | |||
# update-extlinux | |||
For GRUB (which [https://web.archive.org/web/20220821140615/https://git.alpinelinux.org/aports/tree/main/grub/alpine-mkconfig.patch#n9 now also uses] the update-extlinux.conf if present), run: | |||
# grub-mkconfig -o /boot/grub/grub.cfg | |||
Reboot and check dmesg. | |||
# grep -i -e DMAR -e IOMMU /var/log/dmesg | |||
[ 0.343795] DMAR: Host address width 36 | |||
[ 0.343797] DMAR: DRHD base: 0x000000fed90000 flags: 0x1 | |||
[ 0.343804] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap c90780106f0462 ecap f020e3 | |||
[ 0.343806] DMAR: RMRR base: 0x000000000ed000 end: 0x000000000effff | |||
[ 0.343807] DMAR: RMRR base: 0x000000bf7ed000 end: 0x000000bf7fffff | |||
[ 0.553830] iommu: Default domain type: Passthrough (set via kernel command line) | |||
[ 0.902477] DMAR: No ATSR found | |||
[ 0.902563] DMAR: dmar0: Using Queued invalidation | |||
... | |||
[ 0.903256] pci 0000:02:00.0: Adding to iommu group 12 | |||
... | |||
[ 0.903768] DMAR: Intel(R) Virtualization Technology for Directed I/O | |||
If you do not run libvirt VMs under ''root'' (''egrep '^#*user' /etc/libvirt/qemu.conf''), then you must have correct permission on ''/dev/vfio/<iommu_group>'', eg. ''/dev/vfio/12''. You have to tune ''/etc/mdev.conf'' or [[udev]] rules. Also note that if there are multiple PCI devices in the same iommu group, you always have to add all of them to the VM otherwise you'll get an error message like "Please ensure all devices within the iommu_group are bound to their vfio bus driver" | |||
# virsh dumpxml vm01 | xmllint --xpath '//*/hostdev' - | |||
<hostdev mode="subsystem" type="pci" managed="yes"> | |||
<driver name="vfio"/> | |||
<source> | |||
<address domain="0x0000" bus="0x02" slot="0x00" function="0x0"/> | |||
</source> | |||
<alias name="hostdev0"/> | |||
<address type="pci" domain="0x0000" bus="0x00" slot="0x06" function="0x0"/> | |||
</hostdev> | |||
<hostdev mode="subsystem" type="pci" managed="yes"> | |||
<driver name="vfio"/> | |||
<source> | |||
<address domain="0x0000" bus="0x02" slot="0x00" function="0x1"/> | |||
</source> | |||
<alias name="hostdev1"/> | |||
<address type="pci" domain="0x0000" bus="0x00" slot="0x08" function="0x0"/> | |||
</hostdev> | |||
If you directly use QEMU without libvirt and are trying to pass a GPU to your VM, you may get a "VFIO_MAP_DMA failed: Out of memory" error, when starting the VM as a non-root user. One way to fix it is to install the ''shadow'' package, and increase the amount of memory the user can lock via the ''/etc/security/limits.conf'' file: | |||
{{Cmd|<nowiki># apk add shadow | |||
# echo "youruser soft memlock RAMamount \ | |||
youruser hard memlock RAMamount" >> /etc/security/limits.conf | |||
# reboot</nowiki>}} | |||
Replace "youruser" with the user you wish to run the VM as, and "RAMamount" with how much RAM your VM will need (in KB). The exact amount may throw the same error in the end, so you probably want to increase this value by a few dozen MB (typically +40). | |||
A lot of info on the [https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF Archwiki article for PCI passthrough via OVMF]. | |||
[[Category:Virtualization]] | [[Category:Virtualization]] |
Latest revision as of 14:01, 16 November 2024
KVM is an free and open source virtualization solution in a kernel module. Although it is often simply referred to as KVM, the actual hypervisor is QEMU. QEMU runs from user-space, but can integrate with KVM, providing better performance by leveraging the hardware from kernel-space. QEMU can virtualize x86, PowerPC, and S390 guests, amongst others. Libvirt is a management framework that integrates with QEMU/KVM, LXC, Xen and others.
Installation
The following commands provide libvirt as well as QEMU with emulation for x86_64 and qemu-img, a necessary component for using various disk formats such as qcow2. Without qemu-img, only raw disks are available. It can also convert images between several formats like vhdx and vmdk. It also provides the metapackage qemu-modules, which provides subpackages needed for special features. In versions of Alpine before 3.13.0 these features were covered by QEMU with emulation for x86_64.
# apk add libvirt-daemon qemu-img qemu-system-x86_64 qemu-modules openrc # rc-update add libvirtd
Networking
By default, libvirt uses NAT for VM connectivity. If you want to use the default configuration, you need to load the tun module.
# modprobe tun
Add tun to autostart:
# echo "tun" >> /etc/modules-load.d/tun.conf
To make the tun module load on boot, use this command:
# cat /etc/modules | grep tun || echo tun >> /etc/modules
If you prefer bridging a guest over your Ethernet interface, you need to make a bridge.
It's quite common to use bridges with KVM environments. But when IPv6 is used, Alpine will assign itself a link-local address as well as an SLAAC address in case there's a router sending Router Advertisements. You don't want this because you don't want the KVM host to have an IP address in every network it serves to guests. Unfortunately IPv6 can not just be disabled for the bridge via a sysctl configuration file, because the bridge might not be up when the sysctl config is applied during boot. What works is to put a post-up hook into the /etc/network/interfaces file like this:
auto brlan iface brlan inet manual bridge-ports eth1.5 bridge-stp 0 post-up ip -6 a flush dev brlan; sysctl -w net.ipv6.conf.brlan.disable_ipv6=1
Management
For non-root management, you will need to add your user to the libvirt group.
# addgroup user libvirt
You can use libvirt's virsh at the CLI. It can execute commands as well as run as an interactive shell. Read its manual page and/or use the "help" command for more info. Some basic commands are:
virsh help virsh list --all virsh start $domain virsh shutdown $domain
The libvirt project provides a GUI for managing hosts, called virt-manager. It handles local systems as well as remote ones via SSH.
# apk add dbus polkit virt-manager font-terminus # rc-update add dbus
In order to use libvirtd to remotely control KVM over ssh PolicyKit needs a .pkla informing it that this is allowed. Write the following file to /etc/polkit-1/localauthority/50-local.d/50-libvirt-ssh-remote-access-policy.pkla
[Remote libvirt SSH access] Identity=unix-group:libvirt Action=org.libvirt.unix.manage ResultAny=yes ResultInactive=yes ResultActive=yes
Provision an Alpine Linux vm with virt-install
You can use virt-install to install Alpine in a VM. First create the meta-data and user-data files. meta-data:
hostname: alpine-vm
user-data:
#alpine-config ssh_authorized_keys: - ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOIiHcbg/7ytfLFHUNLRgEAubFz/13SwXBOM/05GNZe4 ncopa@ncopa-desktop apk: repositories: - base_url: https://dl-cdn.alpinelinux.org/alpine repos: - main - community packages: - tmux - curl runcmd: - rm /etc/runlevels/*/tiny-cloud* - lbu include /root/.ssh /home/alpine/.ssh - ERASE_DISKS=/dev/vda setup-disk -m sys /dev/vda - poweroff
Then run:
virt-install --name alpine-vm \ --disk size=4 \ --location $HOME/Downloads/alpine-virt-3.20.1-x86_64.iso,kernel=boot/vmlinuz-virt,initrd=boot/initramfs-virt \ --extra-args console=ttyS0 \ --osinfo alpinelinux3.19 \ --graphics none \ --console pty,target_type=serial \ --cloud-init meta-data=meta-data,user-data=user-data
Guest lifecycle management
The libvirt-guests service (available from Alpine 3.13.5) allows running guests to be automatically suspended or shut down when the host is shut down or rebooted.
The service is configured in /etc/conf.d/libvirt-guests. Enable the service with
# rc-update add libvirt-guests
vfio
VFIO is more flexible way to do PCI passthrough. Let's suppose you want to use following ethernet card as PCI device in a VM.
# lspci | grep 02:00.0 02:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) # lspci -n -s 02:00.0 02:00.0 0200: 8086:10c9 (rev 01)
First, create /etc/mkinitfs/features.d/vfio.modules with the following content, so mkinitfs includes the VFIO modules in the initramfs.
kernel/drivers/vfio/vfio.ko.* kernel/drivers/vfio/vfio_virqfd.ko.* kernel/drivers/vfio/vfio_iommu_type1.ko.* kernel/drivers/vfio/pci/vfio-pci.ko.*
Add vfio the the list of features in /etc/mkinitfs/mkinitfs.conf.
Modify following file to instruct mkinitfs to load following module with the options and rebuild kernel ramdisk.
# cat /etc/modprobe.d/vfio.conf <<EOF options vfio-pci ids=8086:10c9 options vfio_iommu_type1 allow_unsafe_interrupts=1 softdep igb pre: vfio-pci EOF # mkinitfs
Now we need to edit the "default_kernel_opts" and "modules" sections in the update-extlinux.conf file. Edit the "default_kernel_opts" to include intel_iommu=o iommu=pt for Intel platform (AMD uses amd_iommu=on), and add the VFIO modules to the "modules" section.
# grep '^default_kernel_opts\|^modules' /etc/update-extlinux.conf default_kernel_opts="quiet rootfstype=ext4 intel_iommu=on iommu=pt" modules=sd-mod,usb-storage,ext4,raid1,vfio,vfio-pci,vfio_iommu_type1,vfio_virqfd
For syslinux/extlinux, run:
# update-extlinux
For GRUB (which now also uses the update-extlinux.conf if present), run:
# grub-mkconfig -o /boot/grub/grub.cfg
Reboot and check dmesg.
# grep -i -e DMAR -e IOMMU /var/log/dmesg [ 0.343795] DMAR: Host address width 36 [ 0.343797] DMAR: DRHD base: 0x000000fed90000 flags: 0x1 [ 0.343804] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap c90780106f0462 ecap f020e3 [ 0.343806] DMAR: RMRR base: 0x000000000ed000 end: 0x000000000effff [ 0.343807] DMAR: RMRR base: 0x000000bf7ed000 end: 0x000000bf7fffff [ 0.553830] iommu: Default domain type: Passthrough (set via kernel command line) [ 0.902477] DMAR: No ATSR found [ 0.902563] DMAR: dmar0: Using Queued invalidation ... [ 0.903256] pci 0000:02:00.0: Adding to iommu group 12 ... [ 0.903768] DMAR: Intel(R) Virtualization Technology for Directed I/O
If you do not run libvirt VMs under root (egrep '^#*user' /etc/libvirt/qemu.conf), then you must have correct permission on /dev/vfio/<iommu_group>, eg. /dev/vfio/12. You have to tune /etc/mdev.conf or udev rules. Also note that if there are multiple PCI devices in the same iommu group, you always have to add all of them to the VM otherwise you'll get an error message like "Please ensure all devices within the iommu_group are bound to their vfio bus driver"
# virsh dumpxml vm01 | xmllint --xpath '//*/hostdev' - <hostdev mode="subsystem" type="pci" managed="yes"> <driver name="vfio"/> <source> <address domain="0x0000" bus="0x02" slot="0x00" function="0x0"/> </source> <alias name="hostdev0"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x06" function="0x0"/> </hostdev> <hostdev mode="subsystem" type="pci" managed="yes"> <driver name="vfio"/> <source> <address domain="0x0000" bus="0x02" slot="0x00" function="0x1"/> </source> <alias name="hostdev1"/> <address type="pci" domain="0x0000" bus="0x00" slot="0x08" function="0x0"/> </hostdev>
If you directly use QEMU without libvirt and are trying to pass a GPU to your VM, you may get a "VFIO_MAP_DMA failed: Out of memory" error, when starting the VM as a non-root user. One way to fix it is to install the shadow package, and increase the amount of memory the user can lock via the /etc/security/limits.conf file:
# apk add shadow # echo "youruser soft memlock RAMamount \ youruser hard memlock RAMamount" >> /etc/security/limits.conf # reboot
Replace "youruser" with the user you wish to run the VM as, and "RAMamount" with how much RAM your VM will need (in KB). The exact amount may throw the same error in the end, so you probably want to increase this value by a few dozen MB (typically +40).
A lot of info on the Archwiki article for PCI passthrough via OVMF.