《SNIA-SDC23-Helmick-Host-Workloads-Achieving-WAF_0.pdf》由会员分享,可在线阅读,更多相关《SNIA-SDC23-Helmick-Host-Workloads-Achieving-WAF_0.pdf(18页珍藏版)》请在三个皮匠报告上搜索。
1、1|2023 SNIA.All Rights Reserved.Virtual ConferenceSeptember 28-29,2021Host Workloads Achieving WAF=1 in an FDP SSDPresented by Dan Helmick,PhD2|2023 SNIA.All Rights Reserved.AgendaBackground FDP Overview Visualizing Writes in an SSD QD1 impacts with FDPSome example WAF=1 workloads Circular FIFO Modi
2、fied Circular Buffer Log Structured File Systems Probabilistic Log Structured File Systems with Mismatched Host Extent and SSD RU3|2023 SNIA.All Rights Reserved.Flexible Data Placement(FDP)Overview Apps can direct write data to be co-located in an SSD Possible for a VMM to set-up defaults for legacy
3、 VMs Filling and deallocating appropriately can achieve WAF=12023 Flash Memory Summit.All Rights Reserved3Logical ViewSSDApp 1App 2App 3StreamsFlexible Data Placement(FDP)Zoned Namespaces(ZNS)Open Loop WAF=1Polling for WAF=1WAF=1 or ErrorBackwards CompatibleBackwards CompatibleNot Backwards Compatib
4、leStreams Granularity Size(SGS)Reclaim Unit(RU)SizeZone Capacity 1 allowedQD1 allowedQD1 requires Zone AppendFull FTL mapping requiredFull FTL mapping requiredPotential for compacted FTL Mapping4|2023 SNIA.All Rights Reserved.Simplified SSD Composition Reclaim Units(RUs)are composed of 1 or more Era
5、se Blocks(EBs)Ex:RU is equal to a SuperBlock(SB)SB=1 EB per Plane for every Die RU is filled in order even if the LBAs are out-of-order After filling an RU,a new set of empty EBs are selected to create a new RU Rules may be applied in selecting EBs from the Free Pool Ex:1 EB per Plane for every Die
6、to create a SB Diagramming a Conventional Drive =1 RUH Random trafficEBEBEBRUWrites fill the RUFree Pool of EBsFilled RUs/SBs with InvalidsRU/SB being FilledFree Pool(OP)Incoming Writes(Append Point)Decreasing Valid CountMost Simplified Drive ViewNot diagrammed:GC moves valid data and adds to Free P
7、oolRUs are not delineated5|2023 SNIA.All Rights Reserved.Visualized NAND and Performance Transitioning Write Traffic:SequentialRandom Sequentially Written(Preconditioned):Random Writes start:Random Writes Reach Worst Case Performance:Random Write Steady State(SS):Reported Logical CapacityOPTotal Phy
8、sical CapacityEvenly Distributed InvalidsDecreasing Valid CountMin Free PoolNo GC neededPerformanceTimeWorst Perf All SB/RU having roughly same invalid count High valid counts for any SB/RU that is selected6|2023 SNIA.All Rights Reserved.Extrapolating to FDP with Multiple RUHs Each RUH is a new appe
9、nd point in the NAND Characterization of Write behavior per RUH is required to understand SSDs WAF WAF=1 on each RUH required for perfect drive WAF=1 However,WAF improvements on each RUH benefit entire SSD Persistently Isolated vs Initially Isolated RUHs only matters for WAF1 Not an emphasized discu
10、ssion in this presentationWAF=1 on any RUH means GC path is not exercised7|2023 SNIA.All Rights Reserved.Previous RU AOver Provisioning vs GC Triggers Similarities Valid Count in Previously Filled RUs Incoming RUH Traffic Differences System 2 is low on OP Small amounts of non-optimal traffic will re
11、sult in very high WAF SSD OP provides protection Buffers against race conditions of traffic ordering(Writes vs Deallocates)Protects against minor imperfections in Host optimized trafficRU ARU BPrevious RU APrevious RU APrevious RU APrevious RU ARU Free PoolRU Free PoolRU Free PoolRU Free PoolRU Free
12、 PoolRU Free PoolRU Free PoolRU Free PoolRU Free PoolRU Free PoolRU Free PoolRU Free PoolRU Free PoolRU Free PoolRU Free PoolRU Free PoolPrevious RU BRUH XRUH YSystem 1Previous RU ARU ARU BPrevious RU APrevious RU APrevious RU APrevious RU APrevious RU BRUH XRUH YSystem 2SSD OPRU Free PoolRU Free Po
13、ol8|2023 SNIA.All Rights Reserved.Background:QD1 System Race Conditions High QD has a chance of out-of-order processing This can create a disconnect of Expected RU as tracked by Host SW Actual RU as placed on SSD NAND Through the length of the RU,this doesnt matter.But at RU boundaries can potential
14、ly create orphan LBAs Example:1.For each LBA in range 0,9999(Write LBA)2.Deallocate Logical RU of range 0 1023Problem:LBA 1024 was placed in an older RU because it arrived earlier than LBA 1022 and 1023 Mitigations Run Host SW as QD=1 Wait for all completions of a Logical RU to return before startin
15、g a new RU Accept RiskIncreased Host OP and/or SSD OP to protect the system against errant GCHost SWLogical RU TrackingLBA 1022LBA 1023LBA 1024SSD NANDActual RU FillingLBA 1024LBA 1023LBA 1022System and InterconnectPotential per CMD DelaysLBA 0LBA 09|2023 SNIA.All Rights Reserved.Some example WAF=1
16、workloadsCircular FIFOProbabilisticModified Circular Buffer Log Structured File Systems10|2023 SNIA.All Rights Reserved.Circular FIFO Looping over any LBA Range LBA Range is constant Deallocate or direct overwrite of LBA acceptable Any length in relation to RU New empty RUs appended as needed Implem
17、entation concerns:If QD1,race conditions can alter RU associationParticularly at RU boundaries Some drive architectures are exposed to different delays:Deallocate then Write of LBAsDirect overwrite of LBAs Recommendations:Allow both SSD and Host OP SSD OP:Some SSD OP reduces the probability the empt
18、ying RU will need to be used for the newest RU Host OP:Deallocations far ahead of the LBAs overwrite enable the most consistent cross-vendor behaviorsRURewriting an LBA Range keeps compact valid dataRUEmpty RU1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4Example:Circling over LBAs 1-411|2023 SNIA.All Right
19、s Reserved.Visualizing Multiple Circular FIFOs Sequentially Written(Preconditioned):This is a single Circular FIFO 2 Circular FIFOs written to 2 RUHs Each FIFO is written compactly on the NAND Each FIFO consumes minor OP Shown visually on RUH_A Majority of OP remains available for drive wide benefit
20、s.Examples:WAF reduction Endurance extension NAND handling Every compactly written RUH preserves the SSD OPReported Logical CapacityOPTotal Physical CapacityRUH_A1 FIFO with OPRUH_B12|2023 SNIA.All Rights Reserved.Probabilistic Low WAF can be achieved through probabilities High OP correlates to low
21、WAF Several well behaved RUHs allow poorly behaved RUHs to consume more OP Overall system improvements!RUH N illustrates a small logical capacity using a large physical capacityRUH1RUH2RUHNRUH3Logical Capacity of RUH1Logical Capacity of RUH NOP of RUH NOP=RU for compact RUHsWAFOP13|2023 SNIA.All Rig
22、hts Reserved.Modified Circular Buffer WAF=1 through Deallocate assurancesCommon example is Cache management Head:Appends incoming cache entries Tail:Reads out still valid cache entries Transitioning them to invalidOptions:Invalid cache entries can be deallocated to the drive or left in placeNew vali
23、d data to be writtenHeadInvalid cache dataRecently written still valid dataStill valid dataRead out to another cache for compactionDeallocate after compactionTailDeallocate invalid data 14|2023 SNIA.All Rights Reserved.Objects Appended to fill an RU Emphasizes writing sequentially to storage Helps b
24、oth HDDs and SSDs Reads are Random SSD strength Variations are found everywhere by different names Blobs Zones Slices Extents Higher level protections may be applied at the system level RAID or Erasure Codes CRC Data Volumes Data Nodes SSTablesLog Structured File Systems(FS)RUHost Extent used in thi
25、s presentation15|2023 SNIA.All Rights Reserved.Log Structured File Systems Interacting with an FDP SSD When Host Extent=RU Host GC aligned with Drive GC activity Deallocates are a critical part of achieving WAF=1 Full RU deallocates aligned with FS Invalid objects may be communicated to SSD Implemen
26、tation Object-to-RU endings can be misaligned if QD1 Object deallocates are not required to be communicated to SSD Recommendations Allow both SSD and Host OP SSD OP:enables robust operation without object deallocates communicated to SSD Host OP and SSD OP:can both compensate for race-conditions on O
27、bject-to-RU placementFile System GCHost ExtentInvalid data16|2023 SNIA.All Rights Reserved.Size Mismatch:Host Extent vs SSD RU Log Structured File System built with Host Extents rather than RU matching Host Extent may not match SSD RU size Reasons Host Extent may not match SSD RU Vendor-to-Vendor mi
28、smatch Generation over Generation SSD RU changes SW developed separate from SSDs Some Critical Findings WAF=1 singularities Host Extent=N*(SSD RU),where N=1,2,Deallocating a Host Extent frees up several SSD RUs Large Host Extents improve WAF System OP is always a helpful tool to leverageHost SWHost
29、ExtentSSD NANDActual RUSSD NANDActual RU17|2023 SNIA.All Rights Reserved.ConclusionsVarious WAF=1 workloads are possible Circular FIFO Probabilistic Modified Circular Buffer Log Structured File Systems with Host Extents a multiple of SSD RUWrite,Overwrite,and/or Deallocate assurances are all reasonable methods of reaching WAF=1Enable System OP(Host OP and/or SSD OP)Compensates for QD1 out of orderingDeallocate far before LBA re-use to cover delay differences in SSD implementations18|2023 SNIA.All Rights Reserved.Please take a moment to rate this session.Your feedback is important to us.