2018年Uber搭建基于Kafka的跨数据中心复制平台.pdf

编号：95399

PDF 46页 1.91MB 下载积分：VIP专享

下载报告请您先登录！

2018年Uber搭建基于Kafka的跨数据中心复制平台.pdf

1、How Uber Builds A Cross Data Center Replication Platform on Apache Kafka01 Apache Kafka at Uber02 Apache Kafka pipeline&replication03 uReplicator04 Data loss detection05 Q&AAgendaReal-time Dynamic PricingStreamProcessingDynamic pricing App ViewsVehicle InformationApache KafkaUber Eats-Real-Time ETAs

2、A bunch more.Fraud DetectionDriver&Rider Sign-ups,etc.Apache Kafka-Use CasesGeneral pub-sub,messaging queueStream processing AthenaX-self-service streaming analytics platform(Apache Samza&Apache Flink)Database changelog transportCassandra,MySQL,etc.Ingestion HDFS,S3LoggingData Infrastructure UberPRO

3、DUCERSCONSUMERSReal-time Analytics,Alerts,DashboardsSamza/FlinkApplicationsData ScienceAnalyticsReportingKafkaVertica/HiveRider AppDriver AppAPI/ServicesEtc.Ad-hoc ExplorationELKDebuggingHadoopSurgeMobile AppCassandraMySQLDATABASES(Internal)ServicesAWS S3PaymentPBsMessages/DayTrillionsData Tens of T

4、housands TopicsScaleexcluding replication01 Apache Kafka at Uber02 Apache Kafka pipeline&replication03 uReplicator04 Data loss detection05 Q&AAgendaApache Kafka Pipeline UberDC2DC1ApplicationsProxyClientKafka RESTProxyRegionalKafkaApplicationsProxyClientKafka RESTProxyRegionalKafkaSecondaryApache Ka

5、fkaAggregateKafkauReplicatorOffset Sync ServiceAggregateKafkauReplicatorAggregationRegionalKafkaRegionalKafkaAggregateKafkauReplicatorOffset Sync ServiceAggregateKafkauReplicatorDC1DC2Global viewCross-Data Center FailoverRegionalKafkaRegionalKafkaAggregateKafkauReplicatorOffset Sync ServiceAggregate

6、KafkauReplicatorDC1DC2During runtimeuReplicator reports offset mapping to offset sync serviceOffset sync service is all-active and the offset info is replicated across data centersDuring failoverConsumers ask offset sync service for offsets to resume consumption based on its last commit offsetsOffse

7、t sync service translates offsets between aggregate clustersIngestionAggregateKafkaHDFS/S3DC1DC2HDFSS3AggregateKafkaHDFS/S3Topic MigrationRegionalKafkaRegionalKafkaConsumerDC1DC2Setup uReplicator from new cluster to old clusterMove producerMove consumerProducerTopic MigrationRegionalKafkaRegionalKaf

8、kaConsumerDC1DC2Setup uReplicator from new cluster to old clusterMove producerMove consumerRemove uReplicatorProduceruReplicatorTopic MigrationRegionalKafkaRegionalKafkaConsumerDC1DC2Setup uReplicator from new cluster to old clusterMove producerMove consumerRemove uReplicatorProduceruReplicatorConsu

9、merTopic MigrationRegionalKafkaRegionalKafkaDC1DC2Setup uReplicator from new cluster to old clusterMove producerMove consumerRemove uReplicatorProduceruReplicatorConsumerTopic MigrationRegionalKafkaRegionalKafkaConsumerDC1DC2Setup uReplicator from new cluster to old clusterMove producerMove consumer

10、Remove uReplicatorProducerReplication-Use CasesAggregationGlobal viewAll-activeIngestionMigrationMove topics between clusters/DCs01 Apache Kafka at Uber02 Apache Kafka pipeline&replication03 uReplicator04 Data loss detection05 Q&AAgendaMotivation-MirrorMakerPain pointExpensive rebalancingDifficulty

11、adding topicsPossible data lossMetadata sync issuesRequirementsStable replicationSimple operationsHigh throughputNo data lossAuditingDesign-uReplicatorApache HelixuReplicator controllerStable replicationAssign topic partitions to each worker processHandle topic/worker changesSimple operationsHandle

12、adding/deleting topicsDesign-uReplicatoruReplicator workerApache Helix agent Dynamic Simple ConsumerApache Kafka producerCommit after flushApache ZooKeeperApache Helix ControllerWorker ThreadApache HelixAgentFetcherManagertopic:testTopicpartition:0FetcherThreadSimpleConsumerLeaderFind ThreadLinkedBl

13、ockingQueueApache Kafka ProducerRequirementsStable replicationSimple operationsHigh throughputNo data lossAuditingPerformance IssuesCatch-up time is too long(4 hours failover)Full-speed phase:23 hoursLong tail phase:56 hoursProblem:Full Speed PhaseDestination brokers are bound by CPUsSolution 1:Incr

14、ease Batch Size Destination brokers are bound by CPUsIncrease throughputproducer.batch.size:64KB=128KBproducer.linger.ms:100=1000EffectBatch size increases:1022KB=5090KBCompress rate:4062%=2735%(compressed size)Solution 2:1-1 Partition MappingRound robinTopic with N partitionsN2 connectionDoS-like t

15、raffic Deterministic partitionTopic with N partitionsN connectionReduce contentionMM 1MM 2MM 3p0p1p3MM 4SourceDestinationp2p0p1p2p3MM 1MM 2MM 3p0p1p3MM 4SourceDestinationp2p0p1p2p3Full Speed Phase ThroughputFirst hour:27.2MB/s=53.6MB/s per aggregate brokerBeforeAfterProblem 2:Long Tail PhaseDuring f

16、ull speed phase,all workers are busySome workers catch up and become idle,but the others are still busySolution 1:Dynamic Workload BalanceDuring full speed phase,all workers are busyOriginal partition assignmentNumber of partitionsHeavy partition on the same workerWorkload-based assignmentTotal work

17、load when addedSource cluster bytes-in-rateRetrieved from Chaperone3Dynamic rebalance periodicallyExceeds 1.5 times the average workloadSolution 2:Lag-Feedback RebalanceDuring full speed phase,all workers are busyMonitors topic lagsBalance the workers based on lagsLarger lag=heavier workloadDedicate

18、d workers for lagging topicsDynamic Workload RebalancePeriodically adjust workload every 10 minutesMultiple lagging topics on the same worker spreads to multiple workersRequirementsStable replicationSimple operationsHigh throughputNo data lossAuditingRequirementsStable replicationSimple operationsHi

19、gh throughputNo data lossAuditingMore IssuesScalabilityNumber of partitionsUp to 1 hour for a full rebalance during rolling restartOperationNumber of deploymentsNew clusterSanityLoopDouble routeDesign-Federated uReplicatorScalabilityMultiple routeOperationDeployment groupOne managerDesign-Federated

20、uReplicatoruReplicator ManagerDecide when to create new routeDecide how many workers in each routeAuto-scalingTotal workload of routeTotal lag in the routeTotal workload/expected workload on workerAutomatically add workers to routeDesign-Federated uReplicatoruReplicator Frontwhitelist/blacklist topi

21、cs(topic,src,dst)Persist in DBSanity checksAssigns to uReplicator ManagerRequirementsStable replicationSimple operationsHigh throughputNo data lossAuditing01 Apache Kafka at Uber02 Apache Kafka pipeline&replication03 uReplicator04 Data loss detection05 Q&AAgendaDetect data lossIt checks data as it f

22、lows through each tier of the pipelineIt keeps ts-offset index to support ts-based queryIt also checks latency from the PoV of consumerIt also provides workload for each topicRequirementsStable replicationSimple operationsHigh throughputNo data lossAuditing01 Apache Kafka at Uber02 Apache Kafka pipeline&replication03 uReplicator04 Data loss detection05 Q&AAgenda

友情提示

1、下载报告失败解决办法
2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。
3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

本文（2018年Uber搭建基于Kafka的跨数据中心复制平台.pdf）为本站（云闲）主动上传，三个皮匠报告文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知三个皮匠报告文库（点击联系客服），我们立即给予删除！

温馨提示：如果因为网速或其他原因下载失败请重新下载，重复下载不扣分。