《06 刘方明-2021中国网络开源技术生态峰会-0.1(40页).pdf》由会员分享,可在线阅读,更多相关《06 刘方明-2021中国网络开源技术生态峰会-0.1(40页).pdf(40页珍藏版)》请在三个皮匠报告上搜索。
1、Open Source Practice from Intra-Cloud to Inter-Cloud:木兰社区开源项目实践刘方明 华中科技大学https:/fangmingliu.github.io/木兰托管项目主页:https:/ Open source practice from Intra-Cloud to Inter-Cloud面向科技部“云计算与大数据”重点专项国家重点研发计划项目(National Key Research&Development Plan)高效能云计算数据中心关键技术与装备Intra-cloud(数据中心内资源调度)PostMan:rapidly mitiga
2、te load imbalance for services processing small requestsDHL:an FPGA-CPU co-design framework for accelerating software network functions with Intel DPDKInter-cloud(跨域多数据中心间资源调度)Tricircle:provide networking automation across Neutron in multi-region OpenStack cloudsTricircle(cascaded multi-region datac
3、enters)Datacenter 1 OpenStackCPU+FPGAPostManCloud stackCloud resourceInfrastructureDHLDatacenter NOpenStackNFVCPU+FPGAPostManCloud stackCloud resourceInfrastructureDHLRegion 1Tricircle(Local)Tricircle(Local)Region NNFV1Intra-cloudPostMan:an alternative approach to rapidly mitigate load imbalance for
4、 services processing small requestsDHL:a high performance FPGA-CPU co-design framework for accelerating software NFs(Network Functions)with Intel DPDKOutline2Sale on Cyber Monday hits new record at$6.6 Billion in 20171Black Friday racks up$5.03 billion in online sales in 20172The 24-hour sale on Nov
5、.11 reaches$25 billion in sales in 20173 42%Black Friday Cyber Monday35%Conversation rate in 24h4Statistics on double 115 325,000 orders/s at peak 256,000 transactions/s at peakBursty traffic is arriving!Peak shopping season is going global3Large volumeShort durationSmall packetsSevere overheadServe
6、rs 051010 Gb Linux 10 Gb IXPacket processing throughput(Gbps)64 bytes64 KBPayload size breakdown60,31 bytes(31 bytes,41 bytesBursty traffic is a headache!4Clients Memcached servers Exacerbate overloadTime-consumingHigh packet processing overheadTraditional remedy:migrating hot data for load balancin
7、g5Clients Server with normal loadServer experiencing bursty trafficMemcached servers Helper batches small packets into large onesPostMan helper nodes Large packetsNo data migration Rapid mitigation PostMan offloads packet overhead from overloaded server to helpers.PostMan:batching and offloading on
8、demand 6Server experiencing bursty trafficMemcached servers Helper nodes Clients 0136PostMan headerType Request:a packet sent by a client Reply:a packet sent by a server Connect:a command to create a connection How to assemble small packets?7Server experiencing bursty trafficMemcached servers Helper
9、 nodes Clients 0136PostMan headerType LengthLength:length of the payloadHow to assemble small packets?8Server experiencing bursty trafficMemcached servers Helper nodes Clients 0136PostMan headerType LengthSrc IP&Src port031Source IPHash 015Source IP in PostMan header16How to assemble small packets?9
10、DPDKDPDK&mTCP based stackEfficient packet processingRemove duplicated headersAlleviate packet processing overheadPayload of assembled packetPostMan headerServer experiencing bursty trafficmTCPMemcached servers Helper nodesClients Is batching in helpers efficient?10Server with normal load1Memcached s
11、ervers Helper nodesClients 2 3 42 3 4Server experiencing bursty trafficHelper nodes fail,no enough information for re-transmissionEverything works fine,except111Memcached servers Helper nodesClients 2 3 44 5 6Server experiencing bursty trafficTimeout Reconnect msg Last received:3Freely migrating con
12、nection Stateless No scalability bottleneck657Pending 1234321Sent Pending 6571234Received Stateless failover mechanism12Server with normal loadServer experiencing bursty trafficLatency is higher than SLABrings minimal overhead to the server side Memcached servers Helper nodesClients How to enable he
13、lpers?13Server experiencing bursty trafficCPU:50%CPU:70%Choose the helper with the lowest utilizationServer with normal loadMemcached servers Helper nodesClients Load balancing across helpers14Server experiencing bursty trafficCPU:90%CPU:60%Helper is overloadServer with normal loadMemcached servers
14、Helper nodesClients Load balancing across helpers15Memcached servers Helper nodesClients Server experiencing bursty trafficServer with normal loadThroughput is lowMakes the best decisions&overhead is acceptable when load is lowHow to disable helpers?16Memcached servers Helper nodesClients Server exp
15、eriencing bursty trafficServer with normal load Positioning of PostManPostMan is an alternative solution to data migration for bursty trafficData migration is the ultimate solution to mitigate bursty trafficComplementary to data migration17Servers libraryHelper serversClients library pm_connect:Choo
16、ses a helper and connect to the helper.Sends a special“connect”packet to the helper node.decompose:Identifies the“connect”packet and notifies application that a new client tries to connect Disassembles the packet into small packetscompose:Buffers multiple replies and assemble themget_info:Allows the
17、 application to retrieve connection information,such as the number of sent and received packets How to program with PostMan library?18Mitigation time:550ms vs.13sMitigating bursty traffic in Memcached The latency with different load for Paxos and Paxos+PostManThroughput:2.81880 1880 10103 35210 5210
18、 10103 313 s13 s550 550 msmsPostMan:Rapidly Mitigating Bursty Traffic by Offloading Packet Processing,USENIX ATC 2019PostMan vs.state-of-the-art:rapid and efficient19 Rapid:much faster than data migration Efficient:Fast I/O,user-level stack for packet processing Fault-tolerant:stateless failover des
19、ign Scalable:no scalability bottleneck PostMan is an alternative approach to rapidly mitigate load imbalance for services processing small requestsSummary20OutlineIntra-cloudPostMan:an alternative approach to rapidly mitigate load imbalance for services processing small requestsDHL:a high performanc
20、e FPGA-CPU co-design framework for accelerating software NFs(Network Functions)with Intel DPDK21FirewallNATDedicated hardwareFirewallNATIDSIPSx86 serverCommodity serversPacket ProcessingRunning assoftwareReduce capital expensesIncrease flexibilitySpeedup deploymentNetwork Function Virtualization(NFV
21、)22FirewallsNetFlow RouterNATIP Transport ConcentratorVPNGatewayIPTVBroadband RouterProxyCompressionLoad balancerEncryptionBRASCachingCDNsIDSDPIHLRHSSMonitorOver 30+VNFs23l OVS:520 Mbpsl Snort:400 Mbpsl Click(router):320 Mbpswith single CPU coreDisadvantage:Poor performanceSoftware NFReadPacketsProt
22、ocolClassificationProcessing 1Processing nSendPacketsNICsNetwork StackUser spaceKernel spaceAdvantage:Highly flexible to develop and deploy 40 Gbpslatest network linksNetwork Functions(NFs)in Software24l OVS:520 Mbpsl Snort:400 Mbpsl Click(router):320 Mbpswith single CPU coreDisadvantage:Poor perfor
23、manceSoftware NFReadPacketsProtocolClassificationProcessing 1Processing nSendPacketsNICsNetwork StackUser spaceKernel spaceAdvantage:Highly flexible to develop and deploy 40 Gbpslatest network links2.Deep packet processing1.Packet IOWhy poor performance?Too many CPU cycles to process the whole packe
24、te.g.,encryption/decryption in IPsecInterruptMemory copy between kernel and user spaceWhy software NFs have poor performance?25Software NFReadPacketsProtocolClassificationProcessing 1Processing nSendPacketsNICsUser spaceKernel space2.Deep packet processingInterruptMemory copy between kernel and user
25、 space1.Packet IONetwork StackIntel DPDKNetwork FunctionsLatency(CPU cycles)ThroughputIPsec gateway7961.47 GbpsIntel 10G NICIntel Xeon E5-2650 V3 2.30 GHz with single CPU coreHow to accelerate itslow&inapplicableToo many CPU cycles to process the whole packete.g.,encryption/decryption in IPsecIs Int
26、el DPDK enough?26FPGA cardFPGAPCIe interface40GPHY40GMAC40GMACProtocolClassificationProcessing 1Processing nProcessing 240GPHY I/O BlockConfigurable Logic BlockCLBCLBCLBCLBCLBCLBCLBCLBCLBBRAMBRAMBlock RAMInternal architecture of FPGAProgrammableCustomized functionsHigh performanceNetwork FunctionApp
27、licable for deep packet processingLogic cellsFPGA-based NF Solution27 Long compilation timeCodingSynthesisImplementationhours/days Limited non-shared programmable resources Bugs/New demands I/O BlockConfigurable Logic BlockCLBCLBCLBCLBCLBCLBCLBCLBCLBBRAMBRAMBlock RAMRevise codes High price:Xilinx Ul
28、traScale+$46,297 20 times higher CPU$2,280XCVU9P-2FSGD2104E Intel Xeon E5-2696v4 nightmareDrawbacks of FPGA-based NF Solution28 Pure FPGA solution Pure CPU solutionSoftware NFReadPacketsProtocolClassificationProcessing 1Processing nSendPacketsNICsNetwork StackIntel DPDKFPGA cardFPGAPCIe interface40G
29、PHY40GMAC40GMACProtocolClassificationProcessing 1Processing nProcessing 240GPHYComparison29 Pure FPGA solution Pure CPU solutionSoftware NFReadPacketsProtocolClassificationProcessing 1Processing nSendPacketsNICsNetwork StackIntel DPDKFPGA cardFPGAPCIe interface40GPHY40GMAC40GMACProtocolClassificatio
30、nProcessing 1Processing nProcessing 240GPHYComparison30FPGAcardFPGAPCIe interfaceNFReadPacketsProtocolClassificationProcessing 1Processing nSendPacketsNICsNetwork stackUser spaceKernel spaceIntel DPDKDeep packet processingExtract&OffloadDHL:CPU-FPGA co-design solution31FPGAcardFPGANFNICsNetwork stac
31、kUser spaceKernel spaceIntel DPDKReadPacketsProcessing nSendPacketsPCIe interfaceProtocolClassificationDHL:CPU-FPGA co-design solution32HostFPGANF1NF2NFnFPGADMA1.UIO-based driver2.Polling3.Lock-free buffer queue4.BatchingInput Buffer QueueOutput Buffer QueuePCI-ekerneluser spaceUIODHL driverTXRXPack
32、erDistributorRequirements for network functionl Throughput 40 Gbpsl Latency s level 6KB42Gbps5s6KBHow to transfer packets between host and FPGA?33DHL(dynamic hardware library)frameworkNF1NF2NFnPCI-ekerneluser spaceUIODHL driverTXRXInput Buffer QueueOutput Buffer QueuePackerDistributorFPGADMAPR Regio
33、nAcc 1Acc 2Acc 3Acc nConfig ModuleDispatcherDHL APIsDHL LibrariesDeveloperHow to use?Hardware Function AbstractionAccelerators Hardware functions Call a function:developers can enjoy FPGA acceleration just like calling a functionDecoupleSoftware developmentHardware developmentOriginal software NF co
34、des for encryptionCodes with DHL APIsFor developers:Hardware Function Abstraction34Original software NF codes for encryptionCodes with DHL APIsnightmare Enjoy!Example of how to code with DHL35 CPU-onlyPure CPU implementation DHLCPU+FPGA ClickNPPure FPGA implementation 4 CPU coresl Higher throughput&
35、lower latency than pure CPU implementationl Similar performance to pure FPGA implementationDHL:Enabling flexible software network functions with FPGA acceleration,IEEE ICDCS 2018Performance of DHL36国家重点研发计划项目:高效能云计算数据中心关键技术与装备 合作研发OpenStack Tricircle被华为、浪潮等大型企业采纳贡献核心功能,20,000行代码 自主研发DHL&PostMan采用木兰系
36、列许可证,已发布至开源中国GiTee码云科技部云计算和大数据专项开源社区建设进展首批自主开源项目Summary37THANK YOU391 Cyber Monday Hits New Record At$6.6 Billion,The Largest Online Shopping Day In U.S.History,https:/ Black Friday racks up$5.03B in online sales,$2B on mobile alone,https:/ Alibabas Singles Day By The Numbers:A Record$25 Billion Haul,https:/ STATE OF ONLINE RETAIL PERFORMANCE,https:/ 5$25 billion in 24 hours:Alibaba creates history.Highlights from Double 11 at Shanghai,https:/ Workload Analysis of a Largescale Key-value Store.In Proc.of SIGMETRICS,2012.Reference 40