By default, libvirt runs QEMU with a CPU model that doesn't support nested
virtualization. It's possible to change this behavior by using
VirtletCPUModel: host-model annotation in the pod definition.
You can also use
cpuModel value in Virtlet config to override the value
globally for the cluster or for a particular subset of nodes.
If you are familiar with the cpu part in libvirt domain definition, you can
VirtletLibvirtCPUSetting annotation, the value is directly passed to
libvirt after reading it from yaml string. It is more flexible than usage of
VirtletCPUModel as it allows to provide more detailed configuration.
annotations: VirtletLibvirtCPUSetting: | mode: custom model: value: Westmere features: - name: avx policy: disable
See cpuSetting for a full example.
Resource monitoring on the node
As Kubelet uses cAdvisor to collect metrics about running containers and Virtlet doesn't create container per each VM, and instead spawns VMs inside Virtlet container. This leads to all the resource usage being lumped together and ascribed to Virtlet pod.
Using fixed SMBIOS UUID
By default, VM pods use autogenerated SMBIOS UUID values. Some images may expect it to have a fixed value,
for example, due to software license requirements. In such cases, the value of SMBIOS UUID can be passed
annotations: VirtletSystemUUID: 53008994-44c0-4017-ad44-9c49758083da
Note: Virtlet can't handle multiple VMs with the same SMBIOS UUID on the same node. There can be multiple VM pods with the same SMBIOS UUIDs residing on different nodes in the cluster, though.
CPU cgroups facilities:
shares- relative value of cpu time assigned, not recommended for using in production as it's hard to predict the actual performance which highly depends on the neighboring cgroups.
CFS CPU bandwidth control- period and quota - hard limits.
Parent_Period/Quota <= Child_1_Period/Quota + .. + Child_N_Period/Quota, where
Child_N_Period/Quota <= Parent_Period/Quota.
K8s CPU allocation:
sharesare set per container.
CFS CPU bandwidth control- period and quota - are set per container.
Defaults: In absence of explicitly set values each container has 2 shares set by default.
Libvirt CPU allocation:
sharesis set per each vCPU.
quotaare set per each vCPU. As libvirt imposes limits per each vCPU thread, so actual
quotavalue from the domain definition times the number of vCPUs. More details re reasons of libvirt per vCPU cgroup approach can be found there.
emulator_quotadenote the limits for emulator threads (those excluding vcpus). At the same time for unlimited domains benchmarks show that these activities may measure up to 40-80% of overall physical CPU usage by QEMU/KVM process running the guest VM.
- vCPUs per VM - it's commonly recommended to have vCPU count set to 1 (see details in section "CPU overcommit" below).
Defaults: In absence of explicitly set values each domain has 1024 shares set by default.
It's outlined that linux scheduler doesn't perform well in case of CPU overcommitment and if it's not caused real need (like having multi-core VM to perform build/compile, running application inside that can effectively utilize multiple cores and was designed for parallel processing) and widely recommended to use one vCPU per VM otherwise you can expect performance degradation.
It is not recommended to have more than 10 virtual CPUs per physical processor core. Any number of overcommitted virtual CPUs above the number of physical processor cores may cause problems with certain virtualized guests, so it's always up to cluster administrators how to set up number vCPUs per VMs.
See more considerations on KVM limitations.
Virtlet CPU resources management
- By default, all VMs are created with 1 vCPU.
To change vCPU number for VM-Pod you have to add annotation
VirtletVCPUCountwith desired number, see examples/cirros-vm.yaml.
- Due to p.2 in "Libvirt CPU Allocation" Virtlet spreads the assigned CPU resource limit equally among VM's vCPU threads.
- According to p.3 in "Libvirt CPU Allocation" Virtlet must set limits for emulator threads(those excluding vcpus). At this time Virtlet doesn't support setting these values, but there are plans to fix this in future.
K8s memory allocation
Setting memory limit to 0 or omitting it means there's no memory limit for the container. K8s doesn't support swap on the nodes (for example, k8s creates docker containers with --memory-swappiness=0, see more at https://github.com/kubernetes/kubernetes/issues/7294).
memory- allocated RAM memory at VM boot.
memtune=>hard_limit- cgroup memory limit on all domain including qemu itself usage. However, it's claimed that such limit should be set accurately.
- Swap unlimited by default.
Overcommit memory value can reach ~150% of physical RAM amount. This relies on assumption that most processes do not access 100% of their allocated memory all the time. So you can grant guest VMs more RAM than actually is available on the host. However, this strongly depends on memory swap size available on the node and workloads of VMs memory consumptions.
For more details check Overcommitting with KVM.
Virtlet Memory resources management
- By default, each VM is assigned 1GB of RAM. To set other value you need set resource memory limit for container, see examples/cirros-vm.yaml.
- Virtlet generates domain XML with memoryBacking=locked setting to prevent swapping out domain's pages.
- According to 2 and 3 in "Libvirt CPU Allocation" we need to invent some rule of setting CFS CPU bandwidth limit spread among QEMU and vCPU threads, so as to make k8s scheduler have right assumptions about the resources allocated on the node.
- Research how to configure the hard limits on memory for VM pod.