《SNIA-SDC23-Zheng-KV-CSD-An-Ordered-Hardware-Accelerated.pdf》由会员分享,可在线阅读,更多相关《SNIA-SDC23-Zheng-KV-CSD-An-Ordered-Hardware-Accelerated.pdf(27页珍藏版)》请在三个皮匠报告上搜索。
1、1|2023 Triad National Security,LLC.All Rights Reserved.Virtual ConferenceSeptember 28-29,2021KV-CSDAn Ordered,HW-Accelerated KV Store For Rapid Data Insertion and QueriesQing Zheng,Scientist,Los Alamos National Laboratory(LANL)LA-UR-23-302732|2023 Triad National Security,LLC.All Rights Reserved.A Co
2、llaboration with SK hynix3|2023 Triad National Security,LLC.All Rights Reserved.ProblemScientific data analytics often slowed down by unordered,unindexed data accessKV-CSDAn ordered,hardware-accelerated KV store for rapid data insertion and queriesGoalLeverage computational storage to sort and index
3、 data at restOverview4|2023 Triad National Security,LLC.All Rights Reserved.A Quick LookThe arm board implements KV atop SSD zonesApps use custom NVMe KV commands for bulk data insertion,index creation,and queriesAppArm SoC boradZNS SSDTwo components:(1)an arm SoC board,(2)a ZNS SSDKV5|2023 Triad Na
4、tional Security,LLC.All Rights Reserved.KV-CSD in Real WorldCurrent PrototypeZNS SSDARM SoC(FPGA in future)PCIe(NVMeOF in future)ZNS SSDARM SoC 6|2023 Triad National Security,LLC.All Rights Reserved.Why ordered computational KV storage?How does it work?Todays Talk7|2023 Triad National Security,LLC.A
5、ll Rights Reserved.How Scientific Simulations RunTime based bulk-synchronous parallel programsIterate between compute&I/O phasesAnalytics occur after simulationCompute IO Compute IO Compute IOAnalyticsSimulation PipelineTimeTimestep 0-15Timestep 16-31Timestep 32-47Persist timestep 15 to storage8|202
6、3 Triad National Security,LLC.All Rights Reserved.How Data is Stored TodayThrough filesystemsData stored as one big or many small files per timestepData typically accompanied by metadata that describes the data(type,dimension,)Compute IO Compute IO Compute IOAnalyticsSimulation PipelineTimeFilesFile
7、sFiles9|2023 Triad National Security,LLC.All Rights Reserved.Problem:Queries Often Read More Data Than NecessaryData may not be persisted in the same order as queries,leading to full data scansPre-sorting data prior to queries using many compute nodes can be equally inefficientComputational storage
8、offers new ways of accelerationImage from LANL VPIC simulation done by L.Yin,et al at SC10For example:a simulation may store its particles in particle ID order,but queries may target their energy levels10|2023 Triad National Security,LLC.All Rights Reserved.Toward Ordered,Computational KV StorageApp
9、 converts data to KV pairs and bulk inserts them into storageOne KV namespace per app process per timestepStorage sorts data by key asynchronously and builds secondary indexes per app query needsCompute IO Compute IO Compute IOAnalyticsSimulation PipelineTimeKeyspaceKeyspaceKeyspaceBulk KVBulk KVBul
10、k KVPoint or range queriesQueries sped up by storage-built primary and secondary indexes11|2023 Triad National Security,LLC.All Rights Reserved.Why KV?Scientific data often resembles records with keys and valuesKV interface already very popular thanks to open software like RocksDBKV provides suffici
11、ent knowledge of data without having to resort to external metadataSwitching from files to KV not awfully difficultNo need to map filenames to LBAs to enable offload12|2023 Triad National Security,LLC.All Rights Reserved.Why Hardware Acceleration?Software KV stores(such as RocksDB)rely on background
12、 processing to hide data sorting latencyInsertion is suspended when background jobs cannot keep upHardware acceleration allows for more aggressive latency hidingBy deferring background work until after insertion concludes and by performing it within a computational storage device13|2023 Triad Nation
13、al Security,LLC.All Rights Reserved.Why Hardware Acceleration?A reduction of software layers also enables higher performanceHostSSDApplicationDevice DriverBlock LayerFilesystemKV Store(e.g.:RocksDB)HostApplicationKV ClientKV-CSD SoCBoradZNS SSD14|2023 Triad National Security,LLC.All Rights Reserved.
14、Why ordered computational KV storage?How does it work?Todays Talk15|2023 Triad National Security,LLC.All Rights Reserved.A Closer Look at the DeviceZonesKeyspaceManagerZoneManagerKeyspaceKeyspaceKV-CSD SoCZNS SSDApplicationZone ClusterZone ClusterZone ClusterZone ClusterZone Cluster16|2023 Triad Nat
15、ional Security,LLC.All Rights Reserved.Keyspace APIHostApplicationKV ClientKV-CSDArm SoC BoardZNS SSDKeyspace StateNewKeyspace InfoWritableIndexingIndexedKV insertionQueryKeyspace Deletion17|2023 Triad National Security,LLC.All Rights Reserved.Primary and Secondary IndexesParticle IDEnergyLocation X
16、Location YLocation Z00.310.620.730.140.250.460.570.2ValueKeyPrimaryIndexUser-Definable Secondary IndexesSecondary indexes are defined by users specifying the byte range and the type of a portion of value to serve as the secondary index keys18|2023 Triad National Security,LLC.All Rights Reserved.Two
17、scenarios Data insertion Range query against a secondary indexA 256-million particle dataset stored as KV pairs Key:particle ID(16B)Value:particle payload(32B)Analytics:range query over particle energy with varying selectivityEvaluation Against RocksDB19|2023 Triad National Security,LLC.All Rights R
18、eserved.RocksDB vs KV-CSD RunsHostApp ProcessLightweight KV ClientKV-CSDArm SoC Board(KV Zones)ZNS SSDFull KV management(foreground&background jobs)HostApp ProcessRocksDB(KV Files)SSDOperating SystemNVMe KV command generation onlyRocksDBKV-CSD20|2023 Triad National Security,LLC.All Rights Reserved.R
19、esults:Data Insertion32066384608001000RocksDBKV-CSDRuntime(s)InsertionAdditional Background WorkKV-CSD more effectively hides background work latencyLower is betterUser experiences both latenciesUser experiences only insertion latency21|2023 Triad National Security,LLC.All Rights Reserved
20、.Results:Range Query Against a Secondary Index358.40.81.93.77.460801001200.1%0.2%0.5%1%2%5%10%20%Query Latency(s)Query SelectivityRocksDBKV-CSDKV-CSD allows for more rapidly answering user queries thanks to hardware specializationLower is betterKV-CSD 7.4xfaster22|2023 Triad Na
21、tional Security,LLC.All Rights Reserved.More on KV-CSD2.KV-CSD demo at Flash Memory Summit 20231.KV-CSD paper at IEEE Cluster Computing Conference 20233.R&D 100 Award12323|2023 Triad National Security,LLC.All Rights Reserved.A More Complete Picture Tue Sep 19|4:05pm-4:55pmSalon V24|2023 Triad Nation
22、al Security,LLC.All Rights Reserved.ConclusionEfficient data retrieval performance is key to scientific analyticsComputational storage opens new ways of acceleration infeasible with traditional methodsPreliminary results are very encouragingMore work/collaboration/innovation is needed for production
23、 deployment25|2023 Triad National Security,LLC.All Rights Reserved.AcknowledgementJason Lee(jasonleelanl.gov)David Bonnie(dbonnielanl.gov)Dominic Manno(dmannolanl.gov)Gary Grider(ggriderlanl.gov)Bradley Settlemyer()Youngjae Kim(youkimsogang.ac.kr)Inhyuk Park()Soonyeal Yang()Jungki Noh()Woosuk Chung()Hoshik Kim()Pui York Wong()Jongryool Kim()Jin Lim()26|2023 Triad National Security,LLC.All Rights Reserved.27|2023 Triad National Security,LLC.All Rights Reserved.