《Virtio: An I/O virtualization framework for Linux》

《KVM Virtio: An I/O virtualization framework for Linux(Linux虚拟IO框架)》

《Virtio: An I/O virtualization framework for Linux | 原文》

目录

Full virtualization vs. paravirtualization

An abstraction for Linux guests

Virtio architecture

Concept hierarchy

Virtio buffers

Core API

Example virtio drivers

Going further


Paravirtualized I/O with KVM and lguest

By M. Jones
Published January 29, 2010

In a nutshell, virtio is an abstraction layer over devices in a paravirtualized hypervisor. virtio was developed by Rusty Russell in support of his own virtualization solution called lguest. This article begins with an introduction to paravirtualization and emulated devices, and then explores the details of virtio. The focus is on the virtio framework from the 2.6.30 kernel release.

Linux is the hypervisor playground. As my article on Linux as a hypervisor showed, Linux offers a variety of hypervisor solutions with different attributes and advantages. Examples include the Kernel-based Virtual Machine (KVM), lguest, and User-mode Linux. Having these different hypervisor solutions on Linux can tax the operating system based on their independent needs. One of the taxes is virtualization of devices. Rather than have a variety of device emulation mechanisms (for network, block, and other drivers), virtio provides a common front end for these device emulations to standardize the interface and increase the reuse of code across the platforms.

Full virtualization vs. paravirtualization

Let’s start with a quick discussion of two distinct types of virtualization schemes: full virtualization and paravirtualization. In full virtualization, the guest operating system runs on top of a hypervisor that sits on the bare metal. The guest is unaware that it is being virtualized and requires no changes to work in this configuration. Conversely, in paravirtualization, the guest operating system is not only aware that it is running on a hypervisor but includes code to make guest-to-hypervisor transitions more efficient (see Figure 1).

In the full virtualization scheme, the hypervisor must emulate device hardware, which is emulating at the lowest level of the conversation (for example, to a network driver). Although the emulation is clean at this abstraction, it’s also the most inefficient and highly complicated. In the paravirtualization scheme, the guest and the hypervisor can work cooperatively to make this emulation efficient. The downside to the paravirtualization approach is that the operating system is aware that it’s being virtualized and requires modifications to work.

Figure 1. Device emulation in full virtualization and paravirtualization environments

Hardware continues to change with virtualization. New processors incorporate advanced instructions to make guest operating systems and hypervisor transitions more efficient. And hardware continues to change for input/output (I/O) virtualization, as well (see resources on the right to learn about Peripheral Controller Interconnect [PCI] passthrough and single- and multi-root I/O virtualization).

But in traditional full virtualization environments, the hypervisor must trap these requests, and then emulate the behaviors of real hardware. Although doing so provides the greatest flexibility (namely, running an unmodified operating system), it does introduce inefficiency (see the left side of Figure 1). The right side of Figure 1 shows the paravirtualization case. Here, the guest operating system is aware that it’s running on a hypervisor and includes drivers that act as the front end. The hypervisor implements the back-end drivers for the particular device emulation. These front-end and back-end drivers are where virtio comes in, providing a standardized interface for the development of emulated device access to propagate code reuse and increase efficiency.

Virtio alternatives

virtio is not entirely alone in this space. Xen provides paravirtualized device drivers, and VMware provides what are called Guest Tools.

An abstraction for Linux guests

From the previous section, you can see that virtio is an abstraction for a set of common emulated devices in a paravirtualized hypervisor. This design allows the hypervisor to export a common set of emulated devices and make them available through a common application programming interface (API). Figure 2 illustrates why this is important. With paravirtualized hypervisors, the guests implement a common set of interfaces, with the particular device emulation behind a set of back-end drivers. The back-end drivers need not be common as long as they implement the required behaviors of the front end.

Figure 2. Driver abstractions with virtio

Note that in reality (though not required), the device emulation occurs in user space using QEMU, so the back-end drivers communicate into the user space of the hypervisor to facilitate I/O through QEMU. QEMU is a system emulator that, in addition to providing a guest operating system virtualization platform, provides emulation of an entire system (PCI host controller, disk, network, video hardware, USB controller, and other hardware elements).

The virtio API relies on a simple buffer abstraction to encapsulate the command and data needs of the guest. Let’s look at the internals of the virtio API and its components.

Virtio architecture

In addition to the front-end drivers (implemented in the guest operating system) and the back-end drivers (implemented in the hypervisor), virtio defines two layers to support guest-to-hypervisor communication. At the top level (called virtio) is the virtual queue interface that conceptually attaches front-end drivers to back-end drivers. Drivers can use zero or more queues, depending on their need. For example, the virtio network driver uses two virtual queues (one for receive and one for transmit), where the virtio block driver uses only one. Virtual queues, being virtual, are actually implemented as rings to traverse the guest-to-hypervisor transition. But this could be implemented any way, as long as both the guest and hypervisor implement it in the same way.

Figure 3. High-level architecture of the virtio framework

As shown in Figure 3, five front-end drivers are listed for block devices (such as disks), network devices, PCI emulation, a balloon driver (for dynamically managing guest memory usage), and a console driver. Each front-end driver has a corresponding back-end driver in the hypervisor.

Concept hierarchy

From the perspective of the guest, an object hierarchy is defined as shown in Figure 4. At the top is the virtio_driver, which represents the front-end driver in the guest. Devices that match this driver are encapsulated by the virtio_device (a representation of the device in the guest). This refers to the virtio_config_ops structure (which defines the operations for configuring the virtio device). The virtio_device is referred to by the virtqueue (which includes a reference to the virtio_device to which it serves). Finally, each virtqueue object references the virtqueue_ops object, which defines the underlying queue operations for dealing with the hypervisor driver. Although the queue operations are the core of the virtio API, I provide a brief discussion of discovery, and then explore the virtqueue_ops operations in more detail.

Figure 4. Object hierarchy of the virtio front end

The process begins with the creation of a virtio_driver and subsequent registration via register_virtio_driver. The virtio_driver structure defines the upper-level device driver, list of device IDs that the driver supports, a features table (dependent upon the device type), and a list of callback functions. When the hypervisor identifies the presence of a new device that matches a device ID in the device list, the probe function is called (provided in the virtio_driver object) to pass up the virtio_device object. This object is cached with the management data for the device (in a driver-dependent way). Depending on the driver type, the virtio_config_ops functions may be invoked to get or set options specific to the device (for example, getting the Read/Write status of the disk for a virtio_blk device or setting the block size of the block device).

Note that the virtio_device includes no reference to the virtqueue (but the virtqueue does reference the virtio_device). To identify the virtqueues that associate with this virtio_device, you use the virtio_config_ops object with the find_vq function. This object returns the virtual queues associated with this virtio_device instance. The find_vq function also permits the specification of a callback function for the virtqueue (see the virtqueue structure in Figure 4), which is used to notify the guest of response buffers from the hypervisor.

The virtqueue is a simple structure that identifies an optional callback function (which is called when the hypervisor consumes the buffers), a reference to the virtio_device, a reference to the virtqueue operations, and a special priv reference that refers to the underlying implementation to use. Although the callback is optional, it’s possible to enable or disable callbacks dynamically.

But the core of this hierarchy is the virtqueue_ops, which defines how commands and data are moved between the guest and the hypervisor. Let’s first explore the object that is added or removed from the virtqueue.

Virtio buffers

Guest (front-end) drivers communicate with hypervisor (back-end) drivers through buffers. For an I/O, the guest provides one or more buffers representing the request. For example, you could provide three buffers, with the first representing a Read request and the subsequent two buffers representing the response data. Internally, this configuration is represented as a scatter-gather list (with each entry in the list representing an address and a length).

Core API

Linking the guest driver and hypervisor driver occurs through the virtio_device and most commonly through virtqueues. The virtqueue supports its own API consisting of five functions. You use the first function, add_buf, to provide a request to the hypervisor. This request is in the form of the scatter-gather list discussed previously. To add_buf, the guest provides the virtqueue to which the request is to be enqueued, the scatter-gather list (an array of addresses and lengths), the number of buffers that serve as out entries (destined for the underlying hypervisor), and the number of in entries (for which the hypervisor will store data and return to the guest). When a request has been made to the hypervisor through add_buf, the guest can notify the hypervisor of the new request using the kick function. For best performance, the guest should load as many buffers as possible onto the virtqueue before notifying through kick.

Responses from the hypervisor occur through the get_buf function. The guest can poll simply by calling this function or wait for notification through the provided virtqueue callback function. When the guest learns that buffers are available, the call to get_buf returns the completed buffers.

The final two functions in the virtqueue API are enable_cb and disable_cb. You can use these functions to enable and disable the callback process (via the callback function initialized in the virtqueue through the find_vq function). Note that the callback function and the hypervisor are in separate address spaces, so the call occurs through an indirect hypervisor call (such as kvm_hypercall).

The format, order, and contents of the buffers are meaningful only to the front-end and back-end drivers. The internal transport (rings in the current implementation) move only buffers and have no knowledge of their internal representation.

Example virtio drivers

You can find the source to the various front-end drivers within the ./drivers subdirectory of the Linux kernel. The virtio network driver can be found in ./drivers/net/virtio_net.c, and the virtio block driver can be found in ./drivers/block/virtio_blk.c. The subdirectory ./drivers/virtio provides the implementation of the virtio interfaces (virtio device, driver, virtqueue, and ring). virtio has also been used in High-Performance Computing (HPC) research to develop inter-virtual machine (VM) communications through shared memory passing. Specifically, this was implemented through a virtualized PCI interface using the virtio PCI driver. You can learn more about this work in the resources section on the right.

You can exercise this paravirtualization infrastructure today in the Linux kernel. All you need is a kernel to act as the hypervisor, a guest kernel, and QEMU for device emulation. You can use either KVM (a module that exists in the host kernel) or with Rusty Russell’s lguest (a modified Linux guest kernel). Both of these virtualization solutions support virtio (along with QEMU for system emulation and libvirt for virtualization management).

The result of Rusty’s work is a simpler code base for paravirtualized drivers and faster emulation of virtual devices. But even more important, virtio has been found to provide better performance (2-3 times for network I/O) than current commercial solutions. This performance boost comes at a cost, but it’s well worth it if Linux is your hypervisor and guest.

Going further

Although you may never develop front-end or back-end drivers for virtio, it implements an interesting architecture and is worth understanding in more detail. virtio opens up new opportunities for efficiency in paravirtualized I/O environments while building from previous work in Xen. Linux continues to prove itself as a production hypervisor and a research platform for new virtualization technologies. virtio is yet another example of the strengths and openness of Linux as a hypervisor.

Virtio: An I/O virtualization framework for Linux | 原文相关推荐

  1. KVM Virtio: An I/O virtualization framework for Linux(Linux虚拟IO框架)

    目录 Virtio 适用于kvm / Linux的半虚拟化驱动程序 如何使用Virtio 如何使用Virtio获得高性能 Virtio:Linux的I / O虚拟化框架 完全虚拟化与半虚拟化 Linu ...

  2. Virtio: An I/O virtualization framework for Linux

  3. 电脑按F1/F12/F10等进不去BIOS进入BIOS里面Advance下设置CPU Setup的Intel Virtualization Technology设置Linux长模式不兼容

    笔者实验笔记本是ThinkPad E470,实验内容是电脑按F1/F12/F10等进不去BIOS和进入BIOS里面Advance下设置CPU Setup的Intel  Virtualization T ...

  4. cpuidle framework in Linux Kernel(2)what's idle state

    processor中可以有多个不同的idle级别,对应着不同的power consumption和exit latency.当CPU上没有任务执行时,可以系统当前的状态,把processor切换到不同 ...

  5. 2021年四月中旬推荐文章

    <图解 Linux 文件系统> <Linux 内存管理之CMA>本站<Linux内存管理:CMA(连续内存分配)> <Memory Leak (and Gro ...

  6. Linux虚拟化:Virtio: 一个 I/O 虚拟化框架

    <Virtio: An I/O virtualization framework for Linux> 目录 什么是 virtio# 为什么是 virtio# virtio 的架构# vi ...

  7. Linux虚拟化KVM-Qemu分析(十)之virtio驱动

    目录 1. 概述 2. 数据结构 3. 流程分析 3.1 virtio总线创建 3.2 virtio驱动调用流程 参考 <Linux PCI驱动框架分析:(Peripheral Componen ...

  8. Linux虚拟化KVM-Qemu分析(九)之virtio设备

    目录 1. 概述 2. 流程分析 3. tap创建 - 网卡后端设备 4. virtio-net创建 4.1 数据结构 4.2 流程分析 4.2.1 class_init 4.2.2 instance ...

  9. Linux虚拟化KVM-Qemu分析(八)之virtio初探

    目录 概述 1. 网卡 1.1 网卡工作原理 1.2 Linux网卡驱动 2. 网卡全虚拟化 2.1 全虚拟化方案 2.2 弊端 3. 网卡半虚拟化 3.1 virtio 3.2 半虚拟化方案 参考 ...

最新文章

  1. 你知道如何在springboot中使用redis吗
  2. 周长相等的正方形面积一定相等_习题创编——周长相等的长方形面积(20200108)...
  3. linux搭建gitlab内网,ubuntu14搭建内网gitlab服务器(示例代码)
  4. Red and Black---DFS深度优先算法
  5. 企业:怎样的渗透测试频率是合适的?
  6. python中ln怎么表示_Python math库 ln(x)运算的实现及原理
  7. Amoeba Architecture
  8. 【缺陷检测】基于matlab形态学液晶显示器表面缺陷检测【含Matlab源码 1304期】
  9. java 驼峰自动映射_总结springboot开启mybatis驼峰命名自动映射的三种方式
  10. 数据库之SQL笛卡尔积
  11. win10中bochs仿真linux0.11环境快速搭建方法
  12. 为什么我们应该使用 HTML5 开发网站
  13. Transforms的结构和用法
  14. 流失用户召回方法策略,教你如何挽回流失用户
  15. elasticsearch nested嵌套查询
  16. 动手开发一个滴滴出行,是的,你没有看错!
  17. Cobbler实现系统自动安装和cobbler的web管理实现
  18. 手把手教你拥有自己的代码生成器-------->坑居多
  19. 警告关于测试人员的职场生存,千万要避开这5个坑(不看后悔)
  20. 文件生成过程编译过程

热门文章

  1. IDEA报错: Port already in use: 2100
  2. ASP.NET Core默认注入方式下如何注入多个实现(多种方式)
  3. 车牌识别算法介绍与实践(转)
  4. TestNG+Maven+IDEA 自动化测试(一) 环境搭建
  5. 5月16日上午学习日志
  6. JAVA SE 基础复习-基本程序设计(1)
  7. 用CSS使DIV水平居中
  8. mysql操作json优点和缺点_详解Mysql中的JSON系列操作函数
  9. 中心对称数 java_【LeetCode(Java) - 246】中心对称数
  10. python全栈和java全栈_全栈和python的区别