Lifecycle of a VM pod
This document describes the lifecycle of VM pod managed by Virtlet.
Communication between kubelet and Virtlet goes through criproxy which directs requests to Virtlet only if the requests concern a pod that has Virtlet-specific annotation or an image that has Virtlet-specific prefix.
VM Pod Startup
- A pod is created in Kubernetes cluster, either directly by the user or via
some other mechanism such as a higher-level Kubernetes object managed by
kube-controller-manager(ReplicaSet, DaemonSet etc.).
- Scheduler places the pod on a node based on the requested resources (CPU, memory, etc.) as well as pod's nodeSelector and pod/node affinity constraints, taints/tolerations and so on.
kubeletrunning on the target node accepts the pod.
kubeletinvokes a CRI call RunPodSandbox to create the pod sandbox which will enclose all the containers in the pod definition. Note that at this point no information about the containers within the pod is passed to the call.
kubeletcan later request the information about the pod by means of
- If there's a Virtlet-specific annotation
kubernetes.io/target-runtime: virtlet.cloud, CRI proxy passes the call to Virtlet.
- Virtlet saves sandbox metadata in its internal database, sets up the
network namespace and then uses internal
tapmanagermechanism to invoke
ADDoperation via the CNI plugin as specified by the CNI configuration on the node.
- The CNI plugin configures the network namespace by setting up network interfaces, IP addresses, routes, iptables rules and so on, and returns the network configuration information to the caller as described in the CNI spec.
tapmanagermechanism adjusts the configuration of the network namespace to make it work with the VM.
- After creating the sandbox, kubelet starts the containers defined in the pod sandbox. Currently, Virtlet supports just one container per VM pod. So, the VM pod startup steps after this one describe the startup of this single container.
- Depending on the image pull police of the container, kubelet checks if
the image needs to be pulled by means of
ImageStatuscall and then uses
PullImageCRI call to pull the image if it doesn't exist or if
imagePullPolicy: Alwaysis used.
PullImageis invoked, Virtlet resolves the image location based on the image name translation configuration, then downloads the file and stores it in the image store.
- After the image is ready (no pull was needed or the
PullImagecall completed successfully), kubelet uses
CreateContainerCRI call to create the container in the pod sandbox using the specified image.
- Virtlet uses the sandbox and container metadata to generate libvirt domain definition,
vmwrapperbinary as the emulator and without specifying any network configuration in the domain.
StartContainercall on the newly created container.
- Virtlet starts the libvirt domain. libvirt invokes
vmwrapperas the emulator, passing it the necessary command line arguments as well as environment variables set by Virtlet.
vmwrapperuses the environment variable values passed to Virtlet to communicate with
tapmanagerover an Unix domain socket, retrieving a file descriptor for a tap device and/or pci address of SR-IOV device set up by
tapmanageruses its own simple protocol to communicate with
vmwrapperbecause it needs to send file descriptors over the socket. This is not usually supported by RPC libraries, see e.g. grpc/grpc#11417.
vmwrapperthen updates the command line arguments to include the network interface information and execs the actual emulator (
At this point the VM is running and accessible via the network, and the pod is
Running state as well as it's only container.
Deleting a pod
This sequence is initiated when the pod is deleted, either by means of
or a controller manager action due to deletion or downscaling of a higher-level object.
kubeletnotices the pod being deleted.
StopContainerCRI calls which is getting forwared to Virtlet based on the containing pod sandbox annotations.
- Virtlet stops the libvirt domain. libvirt sends a signal to
qemu, which initiates the shutdown. If it doesn't quit in a reasonable time determined by pod's termination grace period, Virtlet will forcibly terminate the domain, thus killing the
- After all the containers in the pod (the single container in case of
Virtlet VM pod) are stopped, kubelet invokes
- Virtlet asks its
tapmanagerto remove pod from the network by means of
StopPodSandboxreturns, the pod sandbox will be eventually GC'd by
kubeletby means of
RemovePodSandbox, Virtlet removes the pod metadata from its internal database.