《SNIA-SDC23-Stephens-Computational-Storage-Service.pdf》由会员分享,可在线阅读,更多相关《SNIA-SDC23-Stephens-Computational-Storage-Service.pdf(29页珍藏版)》请在三个皮匠报告上搜索。
1、1|2023 AirMettle,Inc.All Rights Reserved.Virtual ConferenceSeptember 28-29,2021Computational Storage ServiceA Real-Time Smart Data LakePresented by:Donpaul Stephens2|2023 AirMettle,Inc.All Rights Reserved.AgendaWhat is Big Data?Computational Storage:challenges Computational Storage ServiceReference
2、Design3|2023 AirMettle,Inc.All Rights Reserved.What is Big Data?Digital Packratism?4|2023 AirMettle,Inc.All Rights Reserved.Unstructured?What is Big Data?5|2023 AirMettle,Inc.All Rights Reserved.Most data is Semi-StructuredEncrypted data is closest to uncompressible white noiseStored in a formatted
3、fileObject!Because historical records can be appended,But you cant rewrite the past,corrections must be trackable6|2023 AirMettle,Inc.All Rights Reserved.How BIG is the data?They dont call it Big Data for nuthin!0.4 to 1GB+per file:Video:1.5 GB to 4GB per hour7|2023 AirMettle,Inc.All Rights Reserved
4、.Extracting insights from Tabular Data(via SQL)Security Information&Event Management Collects sample measurements with certain flags and arguments and groups them by minute.Returns the number of samples,average duration and standard deviation of duration for each group.select to_string(event_ts,yyyy
5、-mm-dd hh24:mi)as interval,count(*),avg(cast(event_dur as int),stddev_samp(cast(event_dur as int)from events where flgs like C_ and regexp_contains(args,JY.)and event_ts between to_timestamp(2000-01-01 00)and to_timestamp(2000-01-01 01)group by interval8|2023 AirMettle,Inc.All Rights Reserved.Extrac
6、ting insights from Tabular Data(via SQL)select sum(lo_revenue),d_year,p_brand1from lineorder,date,part,supplier where lo_orderdate=d_datekey and lo_partkey=p_partkey and lo_suppkey=s_suppkey and p_category=MFGR#12 and s_region=AMERICA group by d_year,p_brand1 order by d_year,p_brand1Star Schema Benc
7、hmarkComparison of revenue for some product classes,for suppliers in a certain region,grouped by product brand and year.select to_string(event_ts,yyyy-mm-dd hh24:mi)as interval,count(*),avg(cast(event_dur as int),stddev_samp(cast(event_dur as int)from events where flgs like C_ and regexp_contains(ar
8、gs,JY.)and event_ts between to_timestamp(2000-01-01 00)and to_timestamp(2000-01-01 01)group by intervalSecurity Information&Event Management Collects sample measurements with certain flags and arguments and groups them by minute.Returns the number of samples,average duration and standard deviation o
9、f duration for each group.Award from:9|2023 AirMettle,Inc.All Rights Reserved.Re-scaling weather data?Scientific(ex:Climate)Awards from:10|2023 AirMettle,Inc.All Rights Reserved.Could we search video?Make sure current-day items (e.g.coffee cup)Do not appear on screen Child not found at amusement par
10、k Parent/guardian has pictures Private SectorFantasy Adventure showPublic SectorFind missing people/validate alibi11|2023 AirMettle,Inc.All Rights Reserved.Object StorageTraditional Data LakeData LakeObjects are internally partitioned For storage in parallel12|2023 AirMettle,Inc.All Rights Reserved.
11、Data LakePrimarily Semi-structured dataComes from EverywhereTraditional Data LakeObject StorageObjects are internally partitioned For storage in parallel13|2023 AirMettle,Inc.All Rights Reserved.Comes from EverywhereAnalyzed In IslandsApplications retrieve full objects*To their own(small)clusters fo
12、r processingObject StorageData LakePrimarily Semi-structured dataObjects are internally partitioned For storage in parallelTraditional Data Lake14|2023 AirMettle,Inc.All Rights Reserved.This is AWESOME for selling networking gear!https:/ 29secTime toMove the data!15|2023 AirMettle,Inc.All Rights Res
13、erved.This is AWESOME for selling networking gear!Time toMove the data!IMoved the data!IMade the donuts!16|2023 AirMettle,Inc.All Rights Reserved.Why not process where the data is?Computational StorageActive Disks circa 98 wha happened?!?!?17|2023 AirMettle,Inc.All Rights Reserved.Resiliency 101:How
14、 do storage solutions protect data?RAID:Erasure Coding:Data protection algorithms designed for HDD18|2023 AirMettle,Inc.All Rights Reserved.What that means for data reliably placed in storage:First 4 devices shownSimple Table:#1#3#2#4Bytes of data divided evenly across SSDs!Data protection and strea
15、ming performance!18Supports data protection algorithms designed for HDD!19|2023 AirMettle,Inc.All Rights Reserved.What that means for data reliably placed in storage:First 4 devices shownSimple Table:#1#3#2#4HDD-centric RAID/Erasure Coding prevent in-storage analyticsBytes of data divided evenly acr
16、oss SSDs!Data protection and streaming performance!1920|2023 AirMettle,Inc.All Rights Reserved.AirMettle:Data partitioning for processing AND protecting dataAirMettle internal metadata enables parallel in-storage analyticsNot to scale!Meta-data typically 0.1%of dataData is unchanged for clientEach i
17、nternal component can be processed in parallelAirMettle21|2023 AirMettle,Inc.All Rights Reserved.AirMettle:Data partitioning for processing AND protecting dataData is unchanged for clientEach internal component can be processed in parallelNot to scale!Erasure coding for our non-uniform data segments
18、AirMettle22|2023 AirMettle,Inc.All Rights Reserved.In practice Initial Results23|2023 AirMettle,Inc.All Rights Reserved.Accelerated analytics of classic tabular data Search for key-words Gather statistics of usage Extract text if required for further analysis Scan historical data to diagnose current
19、 events Determine how many records might be relevant before retrieving anyNatural Language Processing Security Information&Event Management24|2023 AirMettle,Inc.All Rights Reserved.Accelerated analytics of classic tabular data(S3 Select API)Search for key-words Gather statistics of usage Extract tex
20、t if required for further analysis Scan historical data to diagnose current events Determine how many records might be relevant before retrieving anyNatural Language Processing Security Information&Event ManagementUnder a minute vs.1 hour 45minStar Schema Benchmark Utilized 223 Select queries to Obj
21、ect Storage:Validated with&Unprecedented speed of analysis:Directly from storageX fasterNo data warehouse required25|2023 AirMettle,Inc.All Rights Reserved.i3en.6xlarge(x8)c5n.18xlargec5n.18xlargeGatewayAirMettle AcceleratesS3 Select API enables comparison vs.major clouds object storage26|2023 AirMe
22、ttle,Inc.All Rights Reserved.i3en.6xlarge(x8)c5n.18xlargec5n.18xlargeGatewayAirMettle AcceleratesS3 Select API enables comparison vs.major clouds object storageStar Schema Benchmark,Scale Factor 1 with 1 object per tableFor more complex queries,acceleration depends on how much time was spent in port
23、ions we offload:Q1.1:50%of time 100 x faster:2x overallQ1.2:80%of time 100 x faster:5x overall27|2023 AirMettle,Inc.All Rights Reserved.Computational Storage DevicesFor*GASP*computation28|2023 AirMettle,Inc.All Rights Reserved.NVMe-Object SSD:yes,we can do it!https:/ AirMettle,Inc.All Rights Reserved.Please take a moment to rate this session.Your feedback is important to us.