《SNIA-SDC23-bates-gupta-comp-storage-emulation.pdf》由会员分享,可在线阅读,更多相关《SNIA-SDC23-bates-gupta-comp-storage-emulation.pdf(36页珍藏版)》请在三个皮匠报告上搜索。
1、1|2023 SNIA.All Rights Reserved.Virtual ConferenceSeptember 28-29,2021An Emulation Framework for Computational StorageStephen Bates and Abhishek Gupta,TESL,Huawei2|2023 SNIA.All Rights Reserved.Emulation=Inception=QEMUception3|2023 SNIA.All Rights Reserved.Emulation=Inception=QEMUceptionIs it real o
2、r is it emulated?And do you even care?4|2021 Storage Developer Conference.Insert Company Name Here.All Rights Reserved.The Case for EmulationHardware is Hard!5|2023 SNIA.All Rights Reserved.Hardware is hard!Chips take a long time to develop.Chips today need firmware,this is buggy.Chips often(always)
3、are broken first time around.Look at CXL for example!Spec is at 3.0.Hardware 1.0;-).How do software developers develop without hardware?The Case for Emulation-IWhile other options exist,QEMU is becoming the emulation environment of choice.There are several ways QEMU can provide emulation of hardware
4、.We will review these in this talk!6|2023 SNIA.All Rights Reserved.Hardware is expensive!Look at CXL for example!CXL-enabled servers cost a bunch of money,have buggy UEFI code,lack OS support etc.If the system breaks how do you know what is to blame?Bad hardware?Bad firmware?Bad software?A software
5、developer in a coffee shop in Lima does not have room in their backpack for a Sapphire Rapids.The Case for Emulation-IIWhile other options exist,QEMU is becoming the emulation environment of choice.There are several ways QEMU can provide emulation of hardware.We will review these in this talk!7|2023
6、 SNIA.All Rights Reserved.Hardware is hard to debug!Look at CXL for example;-).Reboot times are measured in(ten)minutes.When things go wrong early in the boot process there is often little(zero)visibility or debug capability.And then you need to tweak something and reboot again(and again and again).
7、Emulation enables full visibility(gdb hardware anyone!?).The Case for Emulation-IIIWhile other options exist,QEMU is becoming the emulation environment of choice.There are several ways QEMU can provide emulation of hardware.We will review these in this talk!8|2023 SNIA.All Rights Reserved.Sometimes
8、you do need actual hardware!Sometimes performance is important;-)(but perhaps not*that*often if you are a developer).Sometimes you need to actually sell something;-).But much of the time emulation is just fine.The Case for Emulation-IVWhile other options exist,QEMU is becoming the emulation environm
9、ent of choice.There are several ways QEMU can provide emulation of hardware.We will review these in this talk!9|2023 SNIA.All Rights Reserved.The Case for Emulation-VIIs this the real life?Is this just fantasy?Caught in a landslide,no escape from realityPerhaps all Freddie was really looking for was
10、 an emulation?10|2021 Storage Developer Conference.Insert Company Name Here.All Rights Reserved.Emulation in QEMUIs it real and do you care?11|2023 SNIA.All Rights Reserved.QEMU originated in 2003.Two types of emulation:Can emulate a CPU with a different Instruction Set Architecture(ISA)to the host(
11、e.g.emulate an arm64 CPU on an Intel server).This is not the emulation this talk cares about.Can emulate hardware that is not actually present on the host server.This is the emulation this talk cares about.Emulation in QEMU-IQEMU supports a range of different emulation modes.This includes the emulat
12、ion of hardware while using accelerated emulation modes for the CPU(e.g.Xen and KVM).12|2023 SNIA.All Rights Reserved.This talk focuses on the System Emulation mode of QEMU.CPU is either emulated or KVM/Xen accelerated.Rest of the system seen by the”Virtual Machine”(VM)can be a mix of real hardware
13、and emulated hardware.Real hardware can be assigned completely to the VM(e.g.via vfio)or para-virtualized.Fake hardware can be emulated by QEMU or another process running on the host.Emulation in QEMU-IIWhile not covered in this talk the topic of assigning real hardware to one or more VMs running on
14、 a system is fascinating.See topics like SR-IOV,SIOV,vfio and mediated devices for more information.13|2023 SNIA.All Rights Reserved.System Emulation A mix of real hardware and fake(emulated)hardware is presented to the VM.The VM has no way of knowing which hardware is“real”and which is“fake”.Perhap
15、s a totem is needed?Since“fake”hardware is actually software we can fully control its behavior by editing the source code!Emulation in QEMU-IIISome examples of hardware that can be emulated inside QEMU include:Storage devices(like NVMe).Persistent Memory(NVDIMMs).Networking Interface Cards(NICs).Per
16、ipherals.14|2021 Storage Developer Conference.Insert Company Name Here.All Rights Reserved.Emulation Example in QEMUKV-Capable NVMe SSDs in a Server Anyone?15|2023 SNIA.All Rights Reserved.batesstebunbeg:$lspci 00:00.0 Host bridge:Red Hat,Inc.QEMU PCIe Host bridge 00:01.0 Ethernet controller:Red Hat
17、,Inc.Virtio network device 00:02.0 Display controller:Red Hat,Inc.Virtio GPU(rev 01)00:03.0 Audio device:Intel Corporation 82801FB/FBM/FR/FW/FRW(ICH6 Family)High Definition Audio Controller(rev 01)00:04.0 USB controller:NEC Corporation uPD720200 USB 3.0 Host Controller(rev 03)00:05.0 USB controller:
18、Red Hat,Inc.QEMU XHCI Host Controller(rev 01)00:06.0 SCSI storage controller:Red Hat,Inc.Virtio block device 00:07.0 Non-Volatile memory controller:Red Hat,Inc.QEMU NVM Express Controller(rev 02)00:08.0 Communication controller:Red Hat,Inc.Virtio console 00:09.0 Unclassified device 0002:Red Hat,Inc.
19、Virtio filesystem 00:0a.0 Unclassified device 00ff:Red Hat,Inc.Virtio RNG“I never knew Red Hat made NVMe SSDs!”16|2023 SNIA.All Rights Reserved.batesstebunbeg:$sudo nvme listNodeSN ModelNamespace UsageFormat FW Rev-/dev/nvme0n15061A8FB-EF70-47BC-B QEMU NVMe Ctrl 168.72 GB/68.72 GB512 B+0 B 8.0.0Of c
20、ourse,this is an emulated NVMe SSD1.The code running inside the Virtual Machine(VM)has no way of knowing if this NVMe SSD is real or emulated.2.The code running inside the VM is identical to the code that would talk to a real NVMe SSD.Same kernel driver and same nvme-cli userspace code.3.We can add
21、or remove features to this NVMe SSD by changing the source code of QEMU(see next slide).This means we can explore new NVMe features(either standard or vendor-specifc)before hardware becomes available:1.Key-Value capable NVMe SSDs.2.FDP capable NVMe SSDs.4.Now assuming point 2 this means SW developer
22、s can write code for new hardware without needing new hardware.Cool!17|2023 SNIA.All Rights Reserved.18|2023 SNIA.All Rights Reserved.How to build a KV-capable NVMe SSD(the hard way)1.Build a LBA-capable NVMe SSD.2.Allocate a bunch of over-committed firmware engineers to the project.Ignore the howls
23、 of protest from sales,marketing,management etc.3.Write firmware that implements the KV command set.4.Debug firmware that implements the KV command set.5.Validate your KV-capable NVMe SSD.How to build a KV-capable NVMe SSD(the easy way)1.git clone :qemu/qemu.git2.cd qemu3.git checkout b key-value or
24、igin/main4.Make about 379 LOC changes to the nvme software.5.git commit a m“key-value:Add key-value command set to NVMe emulation model”6.mkdir build&cd build&./configure&make j 32 all&sudo make install7.Spin up a VM using this new version of qemu-system-8.Check out and test your KV-capable NVMe SSD
25、.If you find a problem repeat steps 4-8.9.Upstream your code!10.Oh and Samsung have already done 1-8!19|2023 SNIA.All Rights Reserved.20|2021 Storage Developer Conference.Insert Company Name Here.All Rights Reserved.Emulation Example in QEMUNVMe SSD below a CXL Switch in a Server Anyone?21|2023 SNIA
26、.All Rights Reserved.This is Interesting.But I cant build it today.CPUDRAMCXLSwitchNVMeSSDCXLType 3 NVMe/CXL SSD for memory expansion P2P DMA between NVMe SSD and Type 3 CXL device for swap/page-cache.I would like to have this hardware available so software developers can start working on code for t
27、he topics above and use-cases(like Computational Storage and Computational Memory)22|2023 SNIA.All Rights Reserved.This is Interesting.But I cant build it today.Or Can I?CPUDRAMCXLSwitchNVMeSSDCXLType 3 NVMe/CXL SSD for memory expansion P2P DMA between NVMe SSD and Type 3 CXL device for swap/page-ca
28、che.I would like to have this hardware available so software developers can start working on code for the topics above and use-cases(like Computational Storage and Computational Memory)23|2023 SNIA.All Rights Reserved.qemu-system-x86_64-machine type=q35,hmat=on,nvdimm=on,cxl=on-enable-kvm-cpu host,m
29、igratable=no-nographic-serial mon:stdio-m 4G,maxmem=10G-smp 4,sockets=1,maxcpus=4-numa node,nodeid=0,cpus=0-3,memdev=m0-drive if=none,file=./images/cxl.qcow2,format=qcow2,id=hd-device virtio-blk-pci,drive=hd-device e1000,netdev=user0-netdev user,id=user0,hostfwd=tcp:2222-:22-rtc clock=host-kernel$KE
30、RNEL-append nokaslr norandmaps root=/dev/vda1 console=ttyS0 earlyprintk=serial,ttyS0 ignore_loglevel printk_delay=0-object memory-backend-ram,id=m0,size=4G-object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M-object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tm
31、p/lsa.raw,size=256M-device pxb-cxl,bus_nr=52,bus=pcie.0,id=cxl.1-device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2-device cxl-type3,bus=root_port13,volatile-memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0-M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=256M24|2021 Storage Developer Conference.Inse
32、rt Company Name Here.All Rights Reserved.Emulation Example in QEMUA Rack-Scale Architecture that includes CXL Memory Area Networking(MAN)25|2023 SNIA.All Rights Reserved.This is Interesting.But I cant build it today.26|2023 SNIA.All Rights Reserved.This is Interesting.But I cant build it today.Or Ca
33、n I?27|2023 SNIA.All Rights Reserved.This is Interesting.But I cant build it today.Or Can I?https:/ Storage Developer Conference.Insert Company Name Here.All Rights Reserved.Emulation Example outside QEMUA NVMe Computational Storage Drive(CSD using QEMU,SPDK and vfio-user)29|2023 SNIA.All Rights Res
34、erved.Up until now we have looked at emulations that are based on software that resides in the QEMU git tree.This is great but not all devices deserve to be upstream,.This is great but other code software applications exist.Can do emulation in a separate process to QEMU and connect it too QEMU via s
35、omething like a socket?Emulation outside of QEMU-IQEMU could leverage emulation code running in a separate process to the main qemu system emulation process.30|2023 SNIA.All Rights Reserved.A team at Nutanix have proposed vfio-user.A mechanism that allows PCIe devices to be emulated in a separate pr
36、ocess to QEMU.vfio-user has a specification and consists of a server(the emulated device)and the client(QEMU or some other VMM).SPDK has been updated to support being an NVMe emulated device.Emulation outside of QEMU-IIhttps:/ SNIA.All Rights Reserved.Emulation outside of QEMU-IIIQEMU32|2023 SNIA.Al
37、l Rights Reserved.We can modify SPDK to support the upcoming TP 4091 and TP 4131 command sets that pertain to computational storage.We can then use QEMU and vfio-user to expose this version of SPDK as a NVMe CSD.We can then install software inside the VM to allow the VM and its applications to lever
38、age the emulated CSD.Emulation outside of QEMU-IVQEMUVMSPDKvfio-user33|2023 SNIA.All Rights Reserved.We can modify SPDK to support the upcoming TP 4091 and TP 4131 command sets that pertain to computational storage.We can then use QEMU and vfio-user to expose this version of SPDK as a NVMe CSD.We ca
39、n then install software inside the VM to allow the VM and its applications to leverage the emulated CSD.Emulation outside of QEMU-IVQEMUVMSPDKvfio-userSo what software goes here?34|2023 SNIA.All Rights Reserved.Linux Kernel:Can provide a path to NVMe namespaces(off all types)via io_uring_passthru.xN
40、VMe:Can provide access to NVMe commands from new and emerging command sets.SNIAs Comp.Storage API:Can tie xNVMe to applications.Emulation outside of QEMU-IVQEMUVMSPDKvfio-userSo what software goes here?35|2023 SNIA.All Rights Reserved.Conclusions Real hardware is HARD.And most of the time your softw
41、are developers dont need it.QEMU is on a roll.A hugely successful hypervisor and system emulator.Get to know it!Emulation can be done inside the QEMU source tree.And this can be ISA emulation and/or hardware device emulation.Emulation of PCIe devices can be done outside the QEMU source tree via vfio-user.Keeps QEMU source tree clean.All the parts we need for CSD/CSP/CSA emulation are coming.Hopefully software devs can run with this to tie these devices to applications!36|2023 SNIA.All Rights Reserved.Please take a moment to rate this session.Your feedback is important to us.