《HTAP Made Simple:Fallacies and Pitfalls-孙若曦-黄东旭 .pdf》由会员分享,可在线阅读,更多相关《HTAP Made Simple:Fallacies and Pitfalls-孙若曦-黄东旭 .pdf(39页珍藏版)》请在三个皮匠报告上搜索。
1、HTAP Made SimpleFallacies and PitfallsRossi Sun2023/02/07About MeRossi Sun(孙若曦)Tech lead,Compute Arch.&Engine team PingCAPWas Tech lead,SQL on Hadoop team Transwarp Tech lead,GPU Arch Infra team NVIDIAOver 10-year experience on infra System-level software Database kernel&big-data GPU techniquesHTAPH
2、TAP:Hybrid Transactional/Analytical ProcessingCoined by Gartner in 2014 Reatime Analytics Analytics see the exact operational data Simplify data infrastructure(no data movement)Going mainstream 20%database buyers cited HTAP capabilities 40%marked HTAP as Top 3 factorTiDB Architecture 1.0Pitfall:Non-
3、Scalable APAuto-shardingDistributed transactionTP scales wellAP:we can do join and aggregation,only that on a single nodeSELECT COUNT(*)FROM t0 JOIN t1 ON t0.c1=t1.c1 WHERE t0.c2=xxx and t1.c2=yyyIntroducing TiSpark(Since 2.0)Introducing TiSpark(Cont.)Fallacy:One Size Fits AllThe“Hybrid”what?Row sto
4、re performs poor for APidnameage0962Jane307658John453589Jim205523Susan52id0962765835895523nameJaneJohnJimSusanage30452052Fallacy:One Size Fits All(Cont.)TP is severely interferedOLTPOLAPIntroducing Columnar Store(POC)Pitfall:Unbounded StalenessPitfall:ComplexityEmbracing Raft(Since 4.0)Introducing M
5、PP(Since 5.0)Introducing MPP(Cont.)Fallacy:Consistency&FreshnessUse MVCC+learner read(raft)to guarantee data consistency and freshnessJust like serializable isolation level Ideal,but at the cost of performanceFallacy:Consistency&Freshness(Cont.)Raft Learner.TS:10$100MVCCRaft Leader42Timestamp:11bala
6、nce=?Fallacy:Consistency&Freshness(Cont.)Raft Learner.TS:10$100TS:11$200TS:12$150MVCCRaft Leader44Timestamp:11balance=$200Stale Read(Coming in 6.6)Bounded stalenessData integrityGets 30%more QPS in our internal POCRaft Learner.TS:10$100TS:11?MVCCRaft Leader42Timestamp:11Staleness:-5balance=?Xbalance
7、=$100$100Fallacy:Shared NothingFallacy:Shared Nothing(Cont.)Disaggregated Compute/StorageAvailable on TiDB Cloud Serverless TierDisaggregated Compute/Storage(Cont.)Coming in 7.x on TiDB CloudFallacy:HTAP=TP+APThe subtle boundary between TP and APFallacy:HTAP=TP+APThe subtle boundary between TP and A
8、PWindow function on16m rows80k qps5ms latencyHeavy aggregation on6b rows20 qps10s latencyOne Optimizor To Rule Them All(Since 6.5)Case study:Top repos on github over a specified period320m rows10 rows5b rowsOne Optimizor To Rule Them All(Cont.)320m rows10 rows5b rowsOne Optimizor To Rule Them All(Co
9、nt.)320m rows10 rows1m rowsOne Optimizor To Rule Them All(Cont.)320m rows10 rows14 m rowsCompute Engine EvolvementCase study:CPU under utilizationCompute Engine Evolvement(Cont.)The devil in std:thread:lll_lock stack_cache deallocate_stack munmap TLB shootdown IPIMitigation:Dynamic threadpool(since
10、5.4)Compute Engine Evolvement(Cont.)Moderate concurrency brings new challenges to compute engine:Out-Of-Memory Out-Of-Thread(exceeds the max mmap ulimit)Lack of fine-grained scheduling and task prioritizationMitigation:Memory tracker(since 6.1)and spill disk(coming in 7.0)Min-TSO scheduling(since 6.
11、0)Push-model and pipeline execution(coming in 7.0)Fallacy:The Cost of HTAPHTAP Serverless&ElasticityComing in TiDB Cloud Serverless TierPay-As-You-Go,cost model re-designed Data stored Read/write/computationExtreme elasticity Pooling instances to reduce cold start time to second-level Scale-in to zero Scale-out to infinityHTAP Serverless&Elasticity(Cont.)Summary5 fallacies and 3 pitfalls are the spiral path to make HTAP:Powerful Cost-efficient SimpleThanks&Questions?