1、The Evolution of the Uber Eats Architecture1.Business Overview&Challenges2.Architecture Overview&Evolution3.Leveraging Ridesharing Platforms4.Tackling i18n Challenges 5.Q&AAgenda$6B Gross Bookings 350CitiesOur ScaleThenNowHow does Uber Eats work today?On-demand Uber EatsMarketplace complexity vs res
2、ource constraintInternationalization(i18n)Operation(reliability)Performance(app,network)Extensibility(dev)Challenges1.Business Overview&Challenges2.Architecture Overview&Evolution3.Leveraging Ridesharing Platforms4.Tackling i18n Challenges 5.Q&AAgenda BackgroundUber:Monolith(from 2009)=lots of micro
3、services Py/JS=Golang/JavaMySQL=CassandraUber Eats(2015):Microservices*+Golang+Cassandra*at the onsetPain points0=1=N cities Microservices(70+)Long e2e chainMessy dep graph*Hairy migrations*Any service can bring down the biz*Identify core flowsRevisit product phases=Core Flows=Tier 1 services=Extra
4、rigor for T1=Tech convergence*=Fewer servicesSimplified architecture(flows)1.Business Overview&Challenges2.Architecture Overview&Evolution3.Leveraging Ridesharing Platforms4.Tackling i18n Challenges 5.Q&AAgenda Batching:beforeGreedy matching1 order 1 delivery“Nearest”winsBatching:afterClustering1 or
5、ders per deliveryEfficiency Win winConstraintsEater ETARoute overlapSystemScan local/globalCase study:disaster recoveryActive-active(2 DC)3 levels of mitigation DNS(L1)Data center(L2)Service(L3)Tiered operation powerDNS:SREDC:Ring0Service:ownersRecent exampleCase study:storageMySQL=C*Gocql can be to
6、o much2 different kinds of entitiesState machine vs SOTWrite-optimal:K-V+dual-writeState machine,e.g.order/cartRead-optimal:K-V+RedisSOT entities,e.g.menu/storeMany more examples.Machine Learning Platform(eng blog)Experimentation Platform(eng blog)Forecasting Platform(eng blog)Dynamic Configuration
7、Platform Translation PlatformDeployment Platform.1.Overview&Challenges2.Architecture Overview&Evolution3.Leveraging Ridesharing Platforms4.Tackling i18n Challenges 5.Q&AAgenda Challenge#1:Operation at global scaleThings go wrong all the timeNature(weather)Ops(promo eyeball fanout)Eng(dev)Can lead to
8、 cascading failure Reliability key to customer trustSolution:Graceful degradationCircuit breakingClient rejects outgoing req highly likely to failLoad shedderServer rejects incoming req when exceeding X delayCity&user rate limitingCity counter via centralized city routing in RTAPISolution:External p
9、robingSimulate core flow globally 24x7Alert when M concurrent failures in N minutesHighly effective(time,SNR)=Auto rollbacks(deploy/config),or manual interventionSolution:Instant root causingIntegrated w/monitoringUI w/problematic stack&error messageVia tracing injection throughout the stack=fast mi
10、tigationChallenge#2:Performance around the globe Slow&unreliable networks(512Kbps=broadband in India)App assumes developed marketsPolling for updatesParallel net callsLarge payload=Subpar experience Solution:Push FrameworkSolution:Many more.Pagination(fewer stores)Lazy loadingWeb Eats(UberLite)Cash.1.Overview&Challenges2.Architecture Overview&Evolution3.Leveraging Ridesharing Platforms4.Tackling i18n Challenges 5.Q&AAgenda