《SNIA-SDC23-Helmick-Live-Migration-for-PCIe-SSDs_0.pdf》由会员分享,可在线阅读,更多相关《SNIA-SDC23-Helmick-Live-Migration-for-PCIe-SSDs_0.pdf(15页珍藏版)》请在三个皮匠报告上搜索。
1、1|2023 SNIA.All Rights Reserved.Virtual ConferenceSeptember 28-29,2021Live Migration for PCIe SSDsPresented by Dan Helmick,PhD2|2023 SNIA.All Rights Reserved.AgendaBackground and Assumed System Set-upPre-Copy Phase:StartPre-Copy Phase:Namespace(NS)MigrationStop-and-Copy Phase:Pause and Final Copies3
2、|2023 SNIA.All Rights Reserved.Live Migration Background This presentation focuses on the SSD aspects of implementing Live Migration in a direct attached scenario.NVMe Resources TP4165 Tracking LBA Allocation with Granularity TP4159 PCIe Infrastructure for Live Migration TP4176 Quality of Service fo
3、r NVM subsystem Resources for a Controller Public Conference Resources Flash Memory Summit Presentation“Host Controlled Live Migration”by Mike Allison and Lee Prewitt Storage Developers Conference“NVM Express State of the Union”by Ross Stenfort and Mike Allison Open Compute Global Summit“Standardizi
4、ng Live Migration with NVM Express”by Mike Allison,Amber Huffman,and Lee Prewitt4|2023 SNIA.All Rights Reserved.Motivation for Live Migration Why Migrate a workload?Data Center down time,errors,or other access anomalies Load Balancing Example:AI training is long running without user interactions Dat
5、a Centers load may vary as a function of the local time zone Migrate the AI training to a Data Center(DC)experiencing reduced load due to night time Why Live Migrate?Workload can continue to run without awareness of migration event Minimizes downtime Why enable Live Migration at the SSD?Allows the r
6、emoval of SW shim layers on the IO queues Reduces Host SW load Improved storage access latenciesDC candidates for AI TrainingWorkloadSSDDual Access Paths5|2023 SNIA.All Rights Reserved.An Example System Set-up Virtual Machines(VMs)and VM Monitor(VMM)1 VMM to many VMsAll Live Migration(LM)commands co
7、me through VMM May not share memory spacesEx:Migration Queue(MQ)in VMM memory spaceEx:VMs IO and Admin Queues in VM memory spaceVM is unaware LM is happeningLogging in the MQ may be in the form of Migration Queue Entries(MQE)SSD example with SR-IOVPrimary Controller(Ctlr)per VMM on PF_0Secondary Ctl
8、r per VM on VF_Y and VF_H Target vs SourceSimilar setupsTarget VM may send writes/reads to Ctlr H prior to“start”Target VMs commands may be generated by VMM prior to migrationSource HostVMMVMSource SSD(NVM Subsystem)NSCtlr YCtlr XTarget HostVMMVMTarget SSDNSCtlr HCtlr GMQAdmin QIO QRd/Wr DataAdmin Q
9、PF_0PF_0VF_HVF_YChildParent6|2023 SNIA.All Rights Reserved.Pre-Copy Phase:Start Logging VM continues to interact with Secondary Ctlr on SSD(Rd/Wr)Race Conditions are a concern“Start Logging”Command FlowOngoing VM IOsVMM sends“Start Logging”CommandPrimary Ctlr begins tracking all requested MQ events
10、occurring in VMs Ctlr(Secondary Ctlr)Some commands in flight may be logged(excess logging is allowed)Some commands in flight may not be loggedPrimary Ctlr completes“Start Logging”CommandSSD Promise:All potentially log-able commands will now be logged VMM has successfully started logging in MQRelatio
11、nship of Logging Start and some commands is unknownUnknown timing of where Logging Start occurred with respect to Completion of Start Logging command“Logging Started”ensures All prior commands in flight have finishedAll future commands in flight will be loggedTimeSSDStart LoggingLogging StartedVMVMM
12、Primary CtlrSecondary CtlrLogging Starts?7|2023 SNIA.All Rights Reserved.Target SSDPre-Copy Phase:Target Preparation Target Precondition Available Secondary Controller Available Host side VM resources Standard NVMe commands for initializing Target SSD Initialize any Queue and IO command structures n
13、eeded Create NS Above illustrates one potential flow,but other options exist Ex:Shared NS created by VMM on Ctlr GSource HostVMMVMSource SSD(NVM Subsystem)NSCtlr YCtlr XTarget HostVMMSoon to be a VMCtlr HCtlr GMQAdmin QIO QRd/Wr DataAdmin QSoon NSNSAdmin QIO QRd/Wr Data8|2023 SNIA.All Rights Reserve
14、d.Pre-Copy Phase:Initial NS Migration Option 1:VMM copies entire VM NS Not optimal for sparsely written data Option 2:VMM sends Primary Ctlr:Get LBA Status Granularity:Set by SSDCustomer requirements discussion Primary Ctlr Returns results with granularity restrictions Any data state other than deal
15、located is returned as mappedEx:Read Uncorrectable VMM For each mapped LBA status Submitted as Read of Childs NSExample Namespace MappingLBA Status per GranularityGranularityValid/Mapped DataReadReadReadReadFor more info:TP4165 Tracking LBA Allocation with Granularity9|2023 SNIA.All Rights Reserved.
16、Target SSDNSPre-Copy Phase:Initial NS Migration to TargetVMs NS MappingReturned LBA Status per GranularityVMM submits Read to Childs NS for each contiguous mapped LBA rangeNew NS is populated with no dependence/knowledge of Source SSDs granularitiesSource HostVMMVMSource SSD(NVM Subsystem)NSCtlr YCt
17、lr XTarget HostVMMCtlr HCtlr GMQAdmin QIO QRd/Wr DataAdmin QAdmin QIO QRd/Wr Data10|2023 SNIA.All Rights Reserved.Source SSD View Start Logging Copy Initial NS LBA Mapped Status Query Read Mapped Data Iterative Data Copy Read data from parsed MQEsTarget SSD View Initialize Child Initialize Child Ctl
18、r Create NS Copy Initial NS Write Mapped Data Iterative Data Copy Write data from parsed MQEsPre-Copy SummaryTimeLoop11|2023 SNIA.All Rights Reserved.Pre-Copy Phase:Iterative Data Copy Ongoing VM has continued to Rd/Wr to Source NS Source Primary Ctlr X has continued to log all appropriate activitie
19、s to VMM Copying from Source SSD to Target SSD takes time Source Drive View Has experienced Reads from initial copy of Source NS to Target NS Continues to experience more Reads from VMM parsing MQ logs VMM is continuing to catch up to the VMs activity Data is written to Target Child NSSource HostVMM
20、VMSource SSD(NVM Subsystem)NSCtlr YCtlr XTarget HostVMMTarget SSDNSCtlr HCtlr GMQAdmin QIO QRd/Wr DataAdmin QSoon to be a VM12|2023 SNIA.All Rights Reserved.Stop and Copy Phase:Pause VMM decides to complete/execute the migrationVMM issues Pause Command to Primary Ctlr“Pause”Command Flow Secondary co
21、ntroller stops fetching new commandsSecondary controller completes all commands in flight Success vs Error are both acceptableAll CQEs are properly returned to VM With any MQEs for loggingPrimary Ctlr completes the Pause command to VMMAnd may concurrently log this successful pause in the MQ Stopped
22、status SummarySQE/CQEs may be on the SQ/CQs of the VM Source SSD Must be prepared for potential Resume CommandPerhaps due to a system errorConceptually Resume/Start should behave the same on both Source and TargetExcept:Source SSD would continue logging If not resumed,expect Secondary Ctlr to be res
23、et.VMM will Parse all remaining MQEsCopy any remaining data to Target Child NSTimeHostSSDPausePausedVMVMMPrimary CtlrSecondary CtlrStop FetchingNo Cmds in flightOptional:Resume13|2023 SNIA.All Rights Reserved.Post-Copy Phase:Copy Final Data and Migrate Controller State Final Data Copy Iterations fro
24、m MQ Parsing Get/Set Controller State Reads Ctlr Y out to the VMM VMM Writes Ctlr H into the Target SSD VMM will migrate the VM From SSDs view Same behavior:Resume Ctlr Y sent to Ctlr XResume Ctlr H sent to Ctlr G One difference:unlikely Ctlr G has enabled logging on Ctlr H Nominal NVMe Flows Source
25、 VMM will clean up and reset Ctlr Y and NSSource HostVMMVMSource SSD(NVM Subsystem)NSCtlr YCtlr XTarget HostVMMVMTarget SSDNSCtlr HCtlr GMQAdmin QIO QRd/Wr DataAdmin QGet Ctlr Y StateSet Ctlr H StateSoon to be a VMAdmin QIO QRd/Wr Data14|2023 SNIA.All Rights Reserved.Source SSD ViewStop-and-Copy Pau
26、se Read data tracked in MQPost-Copy Read Child Controller StateResume/Reset Optional:SSD ready to recover from system error Otherwise:VMM will reset Child CtlrTarget SSD ViewStop-and-Copy Write data tracked in MQPost-Copy Write Child Controller StateResume Child Controller begins operatingFinalizing Migration SummaryTimeVMM is copying VMVMM is pausing VM15|2023 SNIA.All Rights Reserved.Please take a moment to rate this session.Your feedback is important to us.