With cloud computing as its core,
cloud data centers, featuring high technology, powerful computing capability,
energy efficiency, and security, are becoming a new type of data center, which
provide users with comprehensive, diversified, and convenient software and
hardware services. With the emergence of new technologies such as AI, 5G, and
IoT, the services of cloud data center are gradually diversified. The
infrastructure construction of cloud data center has shifted from focusing on
computing power to data, while also placing higher demands on data storage. As
an application virtualization technology for cloud data centers, the
distributed storage can provide block storage, file storage, object storage,
and other services. Internal software processing and network transmission in
the system may result in additional performance overhead that hinder it from
achieving local disk performance in respect of IO access latency, leading to a
poor user experience. Therefore, local disks remain the main storage option for
AI analysis, large-scale distributed databases, and high-performance
application caches. However, local disks still have issues such as low
flexibility and low utilization rates. To address these gaps, the
"SR-IOV+SSD" solution is proposed.
As a virtual pass-through technology, SR-IOV has been widely used in
scenarios such as networks and GPU heterogeneous computing power. With the
exponential increase in SSD capacity and disk performance, it has become
feasible to apply SR-IOV technology to SSD. By using SR-IOV virtualization
technology, it is possible to virtualize a single SSD into multiple virtual
SSDs and pass them through directly to virtual machines for internal use.
Combined with high-precision QoS capabilities, virtual machines can obtain
performance similar to that of local SSDs, thus reducing the overhead on
processing power of the storage device caused by Hypervisor virtualization
layer. In combination with cloud platforms, SR-IOV solutions can achieve
dynamic allocation of high-performance storage resources, greatly meeting the
demands for storage flexibility in scenarios such as AI, distributed databases,
and high-performance enterprise applications in a cloud environment.
SR-IOV is an extension specification of the PCIe specification defined by the international organization PCI-SIG, which aims to eliminate VMM's intervention in virtualized I/O operations, improve data transfer performance, and provide VMs (virtual machines) with independent memory space, terminals, and DMA data streams. Based on the PCIe specifications, I/O devices with SR-IOV support can manage and create multiple VFs (virtual function) . The PCIe PF (physical function) is the main entity on the PCIe bus. A PCIe device has one or more PFs that set VF quantity and can globally start or stop VFs. VFs can access and transfer data without VMM intervention.
With the help of SR-IOV, interrupts that were originally handled by VMM are now processed directly by the virtual machine, improving device I/O performance. At the same time, the virtual machine can directly interact with the PCIe device, greatly reducing the burden on the CPU of physical host , allowing it to support more virtual machine devices. By means of SR-IOV, the demand for PCIe devices can be reduced, thereby conserving PCIe slots and maximizing the utilization of hardware resources.
Based on these advantages, many SSD manufacturers have begun to deploy SR-IOV. Union Memory has successfully implemented SR-IOV in its UH8 and UH7 series products, making the company a leader in this area presently.
Ø Server: Self-developed TP6520
Ø CPU: 2* Intel(R) Xeon(R) Gold 6336Y CPU @ 2.40GHz 24cores/48threads
Ø Memory: 16*32GB 3200MT/S DDR4
Ø System disk: 1*960G 12Gbps
Ø SSD: 1*UH8 series 3.84T
Ø HBA card: 1*MegaRAID 9440-8i
Ø NIC: 2* SC332
Ø Operating system: CentOS Linux release 8.3.2011
Ø Core: Linux 4.18.0-240.el8.x86_64
Ø Test tool: Version 2.6 and above
Ø QEMU-KVM: 2.12
Ø NVMe open source driver: 1.11.1
Ø Umtool tool: 1.0.1.5
For the verification, in the scenario of virtualized partitioning, a PCIe 4.0 3.84TB SSD (bound to 16 CPU cores) is configured as 2 VFs with a capacity of 1.92TB each (bound to 8 CPUs), 4 VFs with a capacity of 960G each (bound to 4 cores), and 8 VFs with a capacity of 480G each (bound to 2 cores), as shown in the figure below.
The basic IO model of the test is as follows:
Ø Read/Write bandwidth (bs=128k,job=1,iodepth=128,read/write)
Ø Read IOPS (bs=4k,job=16,iodepth=128,randread)
Ø Write IOPS (bs=4k,job=8,iodepth=64,randwrite)
In a scenario where the SSD is divided into 2/4/8 VFs, the SR-IOV solution is verified using a standard I/O test model.
The figure below shows the comparison between total performance and SSD performance in different VFs divisions under SR-IOV (the data presented are from the current verification and are for reference only).
Note: Total bandwidth/total IOPS = Average performance × Number of VFs
Figure 1: Comparison of SSD performance before and after using SR-IOV
The verification of the SR-IOV solution depicted in Figure 1 shows a relatively stable performance for both PF and VF in comparison to the original disk, with fluctuations remaining below 5%. This translates to fully leveraging the performance of Union Memory's UH series SSD and optimizing SSD utilization.
To verify the stability of VF performance , tests were conducted with 2/4/8 VFs divisions, as shown in Figure 2.
Note: Deviation = (VF average - VF performance) / VF average * 100%
Figure 2: Deviation of VFs and mean values under SR-IOV
The data shows that the deviation between each VF test value and the VF mean value is generally less than 1%, and the performance allocation among VFs is relatively stable, with high scalability. The quantity of VFs can be flexibly set and managed according to user's needs.
When the SSD is divided into multiple virtual disks, in order to avoid multiple virtual machines running simultaneously when SR-IOV is used for virtualization and potential performance interference between upper and lower processings of the SSD. Union Memory has designed SR-IOV with QoS to achieve performance isolation, which can control the flow of each VF through SSD chips and achieve balanced performance.
Figure 3: VF performance in specific processing scenarios
To verify the QoS function, tests were conducted to compare IOPS, bandwidth, and latency of the VFs under the parameters of bs=128,qd=128,Thread=8,Read%=70%. It can be seen from Figure 2 that upon utilizing the SR-IOV, the performance of each VF in terms of IOPS and bandwidth is similar with low latency, which can basically achieve no influence on each other in daily processing scenarios. Besides, the high performance and low latency of virtual SSD are guaranteed, meeting the demand for efficient data storage.
Currently, the Union Memory SR-IOV uses NS to logically isolate data between VFs. However, since all VFs use a common Nand flash space, performance impacts between VFs are inevitable, although they are controlled within a certain range under QoS scheduling, they are still unavoidable in some more complex processing scenarios. To solve this problem, Union Memory will continue dedicating to the application of SR-IOV on SSDs, further optimize the performance isolation algorithms, and achieve better isolation effects.
The data security of virtual machine is of paramount importance. Although the virtual machine data can be cleared using "trim" method, it may not be enough for clients with high data security demands, such as those in finance and government sectors. For instance, there is a risk of data theft if an SSD is taken out from the cabinet of a data center. Union Memory has made initial strides in the research of virtual machine data security, and will thoroughly destroy the physical data on SSD that corresponds with the virtual machine when it is released, thus ensuring the data security.
The Union Memory's current generation of SR-IOV can collect performance statistics for each VF. The new generation of SSD supports intelligent analysis of virtual machine performance, provides performance configuration feedback based on performance status, and supports intelligent diagnosis and remote repair of virtual machine status, and other features.
The verification has clearly demonstrated that the implementation of SR-IOV solution in Union Memory's UH8 and UH7 SSDs effectively improves disk utilization, reduces storage space wastage, and contributes to the eco-friendliness and sustainability of data centers. The Union Memory's SR-IOV solution also boasts of stable performance and remarkable isolation between PFs and VFs, which enables flexible VF policies based on different scenarios and application in cloud-based environments such as AI, distributed database, and high-performance enterprise applications. As a result, users can fully utilize resources and reduce TCO. The Union Memory's SR-IOV solution is an effective and sustainable storage option for green cloud data centers, promoting efficiency and cost reduction, with less carbon footprints.