《SNIA-SDC-Villa-Platonov-Deep-Dive-Comparison-RAID-Solutions-PCIE-Gen5_1.pdf》由会员分享,可在线阅读,更多相关《SNIA-SDC-Villa-Platonov-Deep-Dive-Comparison-RAID-Solutions-PCIE-Gen5_1.pdf(31页珍藏版)》请在三个皮匠报告上搜索。
1、XINNOR|2023 SNIA.All Rights Reserved.Virtual ConferenceSeptember 28-29,2021Deep Dive and Comparison of RAID Solutions for PCIe Gen5Performance Analysis and Datapath BreakdownPresented by Davide Villa and Sergei Platonov,XINNORXINNOR|2023 SNIA.All Rights Reserved.AgendaWho we arePCIe Gen5:great perfo
2、rmance if properly handledRAID benchmark with PCIe Gen5 SSDConclusionsQ&AXINNOR|2023 SNIA.All Rights Reserved.Who we areXINNOR|2023 SNIA.All Rights Reserved.About XinnorFounded in Haifa,Israel,May 2022Background:10+years of experience with software RAID design and mathematical researchMission:to be
3、the fastest RAID EngineTeam:Around 40 people;30 are accomplished mathematicians and industry talents from Global Storage OEMs20 selling partners worldwide100PB of end-customers data4Technology partnersXINNOR|2023 SNIA.All Rights Reserved.Xinnors xiRAID unique architecture5CPU assisted RAID(AVX)Lockl
4、ess data pathXINNOR|2023 SNIA.All Rights Reserved.PCIe Gen5:great performance if properly handledXINNOR|2023 SNIA.All Rights Reserved.PCIe Gen5:a new wave of modern Servers 4th Gen Intel and 3rd Gen AMD Epyc processors.12-24 PCIe Gen5 drives.Theoretically capable of 60 million IOPs 300GB/s throughpu
5、t.Warning:Fault Tolerance Needed!XINNOR|2023 SNIA.All Rights Reserved.Test EnvironmentCM7 PCIe Gen 5 NVMe SSD CPU:Beaverton/Intel Xeon Gold 6430(32Cores x2)Memory:2TB(DDR5 4800 64GBx32)OS:Oracle Linux 8.8(kernel 5.4.17-2136.322.6.2.el8uek.x86_64 and kernel-ml-6.5.1-1.el8)Benchmarking tools:fio,bdevp
6、erf 12xXINNOR|2023 SNIA.All Rights Reserved.Test EnvironmentXINNOR|2023 SNIA.All Rights Reserved.4k Random Read Single drive performance:2.7M IOPSExpected performance over 12 drives:30M IOPSFirst challenge:performance scalability0551112M IOPSNumber of drivesLinux libaio Perform
7、anceSPDK PerformanceReality:XINNOR|2023 SNIA.All Rights Reserved.Trying different settings to find the optimal scenario0551112M IOPSNUMBER OF DRIVESLibaioPolling mode driver+IO UringPolling mode driver+IO Uring+hipri=1+disabled multipathContinuous polling mode driver+IO Uring+h
8、ipri=1+disabled multipathSPDKInterrupt coalescingXINNOR|2023 SNIA.All Rights Reserved.Interrupt coalescing:technical dive050200300400500600700800900Interrupt Coalescing Time-OutM IOPS050030035000500600700800900ICTirq_handler_entry,Millions of starts over 40 sec020406
9、08000500600700800900ICTSys CPU,LoadM IOPS%XINNOR|2023 SNIA.All Rights Reserved.Bad news Interrupt coalescing is OKish only for high workloads:QD=16+Number of Jobs=16+QD=1 or Number of Jobs=1:Interrupt coalescing should be switched off!Polling mode drivers and io_uring with hipri=1 can“eat
10、”your CPU as well,SPDK is not the“REMEDY”for all the cases:Great solution for VirtIO,vfio-user and NVMEoF networks,but no support of Linux block devices,and significant performance degradation with ublk target XINNOR|2023 SNIA.All Rights Reserved.RAID Benchmark with PCIe Gen5 SSDXINNOR|2023 SNIA.All
11、 Rights Reserved.RAID Engines under review1.xiRAID(Linux kernel mode)Kernel space driver:expose Linux block devices User space functionality for management2.xiRAID(Linux user space):SPDK:supports export via VirtIO,vfio-user and NVMEoF Evaluated with SPDK fio plugin User space functionality for manag
12、ement3.mdRAID(Linux kernel mode only)Kernel 5.4 Kernel 6.5-New4.RAID5F(Linux user space)Intel SPDK RAIDNot applicable due to lack of enterprise readinessXINNOR|2023 SNIA.All Rights Reserved.How to compare different RAIDs:workloads1.Random READ:in normal and degraded2.Random WRITE:in normal mode and
13、degraded3.Sequential WRITE:in normal mode Full stripe AND not-aligned sequential write4.Sequential READ:in normal and degradedCPU consumption mattersXINNOR|2023 SNIA.All Rights Reserved.How to compare:metrics1.RAID efficiency=RAID performance/Raw drive performance2.RAID CPU efficiency=RAID performan
14、ce/CPU consumption RAID engines comparison3.RAID relative CPU efficiency(RAID Engine1 performance/CPU consumption)(RAID Engine2 performance/CPU consumption)If 1,RAID1 is better than RAID24.RAID relative latency efficiency(RAID Engine2 99,9%latency)(RAID Engine1 99,9%latency)If 1,RAID1 is better than
15、 RAID2XINNOR|2023 SNIA.All Rights Reserved.BASELINE definitionBASELINE is NOT a single number,It is the theoretical RAID performance based on:measured RAW drives performance in SPDK Specific workloadand taking into consideration the RAID penaltyN Jobs/IODepth BASELINE,IOPs1/140 96616/169 514 05332/3
16、223 557 22064/6434 982 233EXAMPLE:RANDOM READS BASELINEXINNOR|2023 SNIA.All Rights Reserved.Normal operationDegraded modeRandom Read RAID5x2.RAID Efficiency17M IOpsBASELINEMDRAID,kernel 5.4.17MDRAID,kernel 6.5MDRAID,kernel 6.5,ICT=600 xiRAIDxiRAID,ICT=600 xiRAID SPDK35M IOpsXINNOR|2023 SNIA.All Righ
17、ts Reserved.Random Read RAID5x2.RAID CPU relative efficiency(in relation to MDRAID 5.4)Normal operationDegraded modeMDRAID,kernel 6.5xiRAIDxiRAID SPDK1.20.40.60.81.20.40.60.80.10.83.06.60.1250.250.512481/116/1632/3264/64NUMJOBS/IODEPTH1.11.11.41.03.87.35.55.40.617.568.855.70.215251251/116/1632/3264/
18、64NUMJOBS/IODEPTH00XINNOR|2023 SNIA.All Rights Reserved.MDRAID,kernel 6.5xiRAIDxiRAID SPDKRandom Read RAID5x2.RAID relative latency efficiency(in relation to MDRAID 5.4)Normal operationDegraded mode1.01.00.40.31.00.91.11.31.11.11.54.90.1250.250.51248161/116/1632/3264/64NUMJOBS/IODEPTH1.10.30.30.40.6
19、3.12.07.41.55.69.38.80.1250.250.51248161/116/1632/3264/64NUMJOBS/IODEPTH00XINNOR|2023 SNIA.All Rights Reserved.Random Write RAID5x2BASELINEMDRAID,kernel 5.4.17MDRAID,kernel 6.5xiRAIDxiRAID SPDK0.52.53.82.91.910.512.124.30.950.395.4150.70.2/116/1632/3264/64NUMJOBS/IODEPTHRAID EfficiencyRAI
20、D Relative CPU Efficiencyvs MDRAID 5.404,4M IOpsXINNOR|2023 SNIA.All Rights Reserved.A Single RAID Scalability in RAID 5The maximum performance numbers achieved under growing workload No NUMA NODE affinitybs=4k34.982.967.929.118.6318.6128.44010203040BASELINE MDRAID,kernel5.4.17MDRAID,kernel 6.5MDRAI
21、D,kernel 6.5,ICT=600XIRAIDXIRAID,ICT=600XIRAIDSPDKM IOPSEngine4.390.300.310.381.283.054.20012345BASELINE MDRAID,kernel5.4.17MDRAID,kernel 6.5MDRAID,kernel 6.5,ICT=600XIRAIDXIRAID,ICT=600XIRAIDSPDKM IOPSEngineRandom ReadRandom WriteXINNOR|2023 SNIA.All Rights Reserved.Sequential write RAID6(10+2).RAI
22、D EfficiencyFull Stripe WritesUnaligned WritesMDRAID,kernel 5.4.17MDRAID,kernel 6.5MDRAID,kernel 6.5,NO BITMAPSxiRAIDxiRAID SPDKxiRAID,MERGESBASELINE70GBps70GBpsXINNOR|2023 SNIA.All Rights Reserved.Sequential write RAID6.RAID CPU relative efficiency(in relation to MDRAID 5.4)MDRAID,kernel 6.5MDRAID,
23、kernel 6.5,NO BITMAPSxiRAIDxiRAID SPDKFull Stripe Writes1.02.11.61.41.22.62.72.86.010.012.012.57.215.220.421.50.1250.250.516NUMJOBS0XINNOR|2023 SNIA.All Rights Reserved.Sequential write RAID6.RAID CPU relative efficiency(in relation to MDRAID 5.4)MDRAID,kernel 6.5MDRAID,kernel 6.5,NO BITM
24、APSxiRAIDxiRAID SPDKxiRAID,MERGESUnaligned Writes0.91.11.01.01.01.01.42.16.75.26.29.13.07.110.510.90.29.311.113.20.1250.250.516NUMJOBS0XINNOR|2023 SNIA.All Rights Reserved.Sequential read Degraded RAID6(10+2).RAID Efficiency and RAID CPU relative efficiency1.31.11.81.85.49.911.611.77.823.
25、531.430.20.516NUMJOBSRAID EfficiencyCPU EfficiencyBASELINEMDRAID,kernel 5.4.17MDRAID,kernel 6.5xiRAIDxiRAID SPDK0111GBpsXINNOR|2023 SNIA.All Rights Reserved.Virtual ConferenceSeptember 28-29,2021Final considerationsXINNOR|2023 SNIA.All Rights Reserved.Conclusions1.Proper system tuning is
26、critical to enable performance scalability on PCIe Gen5 environment2.RAID benchmarks should look at multiple variables:Normal and degraded mode Different workloads Performance vs CPU and latency efficiency3.MDRAID 6.5 provides performance improvements in normal operations but not in degraded mode an
27、d sometimes at the expense of CPU and latency efficiency4.For Block Devices,xiRAID(kernel)outperforms by multiple times MDRAID 6.5,particularly in degraded mode,random and sequential write and in CPU and latency efficiency.5.In virtualized environments and NVMeoF,with xiRAID SPDK we can exploit almost full theoretical PCIe Gen5 performanceXINNOR|2023 SNIA.All Rights Reserved.Virtual ConferenceSeptember 28-29,2021Q&AXINNOR|2023 SNIA.All Rights Reserved.Please take a moment to rate this session.Your feedback is important to us.