1、How Uber Builds A Cross Data Center Replication Platform on Apache Kafka01 Apache Kafka at Uber02 Apache Kafka pipeline&replication03 uReplicator04 Data loss detection05 Q&AAgendaReal-time Dynamic PricingStreamProcessingDynamic pricing App ViewsVehicle InformationApache KafkaUber Eats-Real-Time ETAs
2、A bunch more.Fraud DetectionDriver&Rider Sign-ups,etc.Apache Kafka-Use CasesGeneral pub-sub,messaging queueStream processing AthenaX-self-service streaming analytics platform(Apache Samza&Apache Flink)Database changelog transportCassandra,MySQL,etc.Ingestion HDFS,S3LoggingData Infrastructure UberPRO
3、DUCERSCONSUMERSReal-time Analytics,Alerts,DashboardsSamza/FlinkApplicationsData ScienceAnalyticsReportingKafkaVertica/HiveRider AppDriver AppAPI/ServicesEtc.Ad-hoc ExplorationELKDebuggingHadoopSurgeMobile AppCassandraMySQLDATABASES(Internal)ServicesAWS S3PaymentPBsMessages/DayTrillionsData Tens of T
4、housands TopicsScaleexcluding replication01 Apache Kafka at Uber02 Apache Kafka pipeline&replication03 uReplicator04 Data loss detection05 Q&AAgendaApache Kafka Pipeline UberDC2DC1ApplicationsProxyClientKafka RESTProxyRegionalKafkaApplicationsProxyClientKafka RESTProxyRegionalKafkaSecondaryApache Ka
5、fkaAggregateKafkauReplicatorOffset Sync ServiceAggregateKafkauReplicatorAggregationRegionalKafkaRegionalKafkaAggregateKafkauReplicatorOffset Sync ServiceAggregateKafkauReplicatorDC1DC2Global viewCross-Data Center FailoverRegionalKafkaRegionalKafkaAggregateKafkauReplicatorOffset Sync ServiceAggregate
6、KafkauReplicatorDC1DC2During runtimeuReplicator reports offset mapping to offset sync serviceOffset sync service is all-active and the offset info is replicated across data centersDuring failoverConsumers ask offset sync service for offsets to resume consumption based on its last commit offsetsOffse
7、t sync service translates offsets between aggregate clustersIngestionAggregateKafkaHDFS/S3DC1DC2HDFSS3AggregateKafkaHDFS/S3Topic MigrationRegionalKafkaRegionalKafkaConsumerDC1DC2Setup uReplicator from new cluster to old clusterMove producerMove consumerProducerTopic MigrationRegionalKafkaRegionalKaf
8、kaConsumerDC1DC2Setup uReplicator from new cluster to old clusterMove producerMove consumerRemove uReplicatorProduceruReplicatorTopic MigrationRegionalKafkaRegionalKafkaConsumerDC1DC2Setup uReplicator from new cluster to old clusterMove producerMove consumerRemove uReplicatorProduceruReplicatorConsu
9、merTopic MigrationRegionalKafkaRegionalKafkaDC1DC2Setup uReplicator from new cluster to old clusterMove producerMove consumerRemove uReplicatorProduceruReplicatorConsumerTopic MigrationRegionalKafkaRegionalKafkaConsumerDC1DC2Setup uReplicator from new cluster to old clusterMove producerMove consumer
10、Remove uReplicatorProducerReplication-Use CasesAggregationGlobal viewAll-activeIngestionMigrationMove topics between clusters/DCs01 Apache Kafka at Uber02 Apache Kafka pipeline&replication03 uReplicator04 Data loss detection05 Q&AAgendaMotivation-MirrorMakerPain pointExpensive rebalancingDifficulty
11、adding topicsPossible data lossMetadata sync issuesRequirementsStable replicationSimple operationsHigh throughputNo data lossAuditingDesign-uReplicatorApache HelixuReplicator controllerStable replicationAssign topic partitions to each worker processHandle topic/worker changesSimple operationsHandle
12、adding/deleting topicsDesign-uReplicatoruReplicator workerApache Helix agent Dynamic Simple ConsumerApache Kafka producerCommit after flushApache ZooKeeperApache Helix ControllerWorker ThreadApache HelixAgentFetcherManagertopic:testTopicpartition:0FetcherThreadSimpleConsumerLeaderFind ThreadLinkedBl
13、ockingQueueApache Kafka ProducerRequirementsStable replicationSimple operationsHigh throughputNo data lossAuditingPerformance IssuesCatch-up time is too long(4 hours failover)Full-speed phase:23 hoursLong tail phase:56 hoursProblem:Full Speed PhaseDestination brokers are bound by CPUsSolution 1:Incr
14、ease Batch Size Destination brokers are bound by CPUsIncrease throughputproducer.batch.size:64KB=128KBproducer.linger.ms:100=1000EffectBatch size increases:1022KB=5090KBCompress rate:4062%=2735%(compressed size)Solution 2:1-1 Partition MappingRound robinTopic with N partitionsN2 connectionDoS-like t
15、raffic Deterministic partitionTopic with N partitionsN connectionReduce contentionMM 1MM 2MM 3p0p1p3MM 4SourceDestinationp2p0p1p2p3MM 1MM 2MM 3p0p1p3MM 4SourceDestinationp2p0p1p2p3Full Speed Phase ThroughputFirst hour:27.2MB/s=53.6MB/s per aggregate brokerBeforeAfterProblem 2:Long Tail PhaseDuring f
16、ull speed phase,all workers are busySome workers catch up and become idle,but the others are still busySolution 1:Dynamic Workload BalanceDuring full speed phase,all workers are busyOriginal partition assignmentNumber of partitionsHeavy partition on the same workerWorkload-based assignmentTotal work
17、load when addedSource cluster bytes-in-rateRetrieved from Chaperone3Dynamic rebalance periodicallyExceeds 1.5 times the average workloadSolution 2:Lag-Feedback RebalanceDuring full speed phase,all workers are busyMonitors topic lagsBalance the workers based on lagsLarger lag=heavier workloadDedicate
18、d workers for lagging topicsDynamic Workload RebalancePeriodically adjust workload every 10 minutesMultiple lagging topics on the same worker spreads to multiple workersRequirementsStable replicationSimple operationsHigh throughputNo data lossAuditingRequirementsStable replicationSimple operationsHi
19、gh throughputNo data lossAuditingMore IssuesScalabilityNumber of partitionsUp to 1 hour for a full rebalance during rolling restartOperationNumber of deploymentsNew clusterSanityLoopDouble routeDesign-Federated uReplicatorScalabilityMultiple routeOperationDeployment groupOne managerDesign-Federated
20、uReplicatoruReplicator ManagerDecide when to create new routeDecide how many workers in each routeAuto-scalingTotal workload of routeTotal lag in the routeTotal workload/expected workload on workerAutomatically add workers to routeDesign-Federated uReplicatoruReplicator Frontwhitelist/blacklist topi
21、cs(topic,src,dst)Persist in DBSanity checksAssigns to uReplicator ManagerRequirementsStable replicationSimple operationsHigh throughputNo data lossAuditing01 Apache Kafka at Uber02 Apache Kafka pipeline&replication03 uReplicator04 Data loss detection05 Q&AAgendaDetect data lossIt checks data as it f
22、lows through each tier of the pipelineIt keeps ts-offset index to support ts-based queryIt also checks latency from the PoV of consumerIt also provides workload for each topicRequirementsStable replicationSimple operationsHigh throughputNo data lossAuditing01 Apache Kafka at Uber02 Apache Kafka pipeline&replication03 uReplicator04 Data loss detection05 Q&AAgenda