《1A-202_SmartNIC Architecture for Distributed Services at the Network Edge - Pensando.PDF》由会员分享,可在线阅读,更多相关《1A-202_SmartNIC Architecture for Distributed Services at the Network Edge - Pensando.PDF(25页珍藏版)》请在三个皮匠报告上搜索。
1、SmartNIC Architecture for Distributed Services at the Network EdgeMario BaldiFellowPensando Systems,Inc.San Jose,CA April 26-28,2022San Jose,CA April 26-28,2022Data Center ServicesNetworkNAT,load balancing,overlay(VxLAN,GENEVE)SecurityFirewall,IDS,IPSStorageShared disks,disaggregated storageObservab
2、ilityTelemetry,packet captureSan Jose,CA April 26-28,2022The Traditional Approach(Virtual)appliancesPossibly embedded in switchesPossibly executed in hostsNetwork Function Virtualization(NFV)Topology design implicationsTraffic routing and stitchingSan Jose,CA April 26-28,2022N-S TrafficSingle point
3、of entrance/exitFits well to appliance solution InternetSan Jose,CA April 26-28,2022E-W Traffic90%of data center traffic according to some estimatesDoes not fit well to appliance solutionTraffic tromboningSan Jose,CA April 26-28,2022Distributed Services ApproachOptimal,unchanged routingNo additional
4、 traffic loadSan Jose,CA April 26-28,2022Where can Services be Implemented?Network nodesSwitchesRoutersEnd systemsSan Jose,CA April 26-28,2022Challenges Network NodesDeal with very large volumes of trafficShort time to execute processingDesigned for forwarding packetsSimple,fixed processing(ASIC)Do
5、P4-based switches offer an opportunity?ProgrammableHardware performanceSan Jose,CA April 26-28,2022Challenges-HostsAgents consume host CPUProblematic to support many operating systemsProblematic to handle updates and agent versionsSan Jose,CA April 26-28,2022Where should distributed services run?The
6、 network edge(hosts)is a good candidate Consistent scale out modelSoftware execution takes resources from paying workloadsProgrammable hardware on a card11 Network interface card and ToR are good candidates Needed anyway By nature on the path of trafficSan Jose,CA April 26-28,2022P4 Programmable Pro
7、cessorPensandoDistributed Services PlatformTelemetryNetworkingMicro SegmentationStateful FirewallLoad BalancerEncryption&TLS OffloadStorage ServicesCentrally ManagedREST APIAutomationObservabilityTroubleshooting&SecurityOrchestration&ProvisioningPolicyEcosystemCompute,Analytics,IT OpsPolicy and Serv
8、ices Manager(PSM)ControllerPensando Distributed Services CardsDSCDSCDSCDSCDSCDSCDSCDSCSan Jose,CA April 26-28,2022Pensando SoC ArchitectureHost adaptorCan perform NIC functionsCan add significant value running servicesHost InterfacePCIeARMCoresP4Packet Processing DataplaneServiceProcessingOffloadsMe
9、moryCoherent InterconnectPacket BufferTraffic ManagerEthernet PortEthernet PortNetwork InterfaceNetwork InterfaceHost adaptor(or ToR)Can perform NIC functionsCan add significant valuerunning servicesSan Jose,CA April 26-28,2022Did it hit the sweet spot?Hosts see less trafficMore time to execute soph
10、isticated processingA hardware assisted approachNo load on the host CPUSpecialized,programmablePerformance and flexibilityScale out modelSample Use CasesSan Jose,CA April 26-28,2022TLS OffloadHTTPSTCP port 80TCP port 443Proxy running on DSCSan Jose,CA April 26-28,2022TLS Offloading Support17Ethernet
11、 PortHost InterfaceMemoryPCIeCoherent InterconnectARMCoresPacket BufferTraffic ManagerP4Packet Processing DataplaneServiceProcessingOffloadsEthernet PortNetwork InterfaceNetwork Interface(2)TCP connection and TLS session initiation packets are forwarded to the ARM cores for software processing(1)pac
12、kets are generally processed by the pipelineHandle connection establishmentInstall state in pipeline tables(3)subsequent packets are processed in the pipeline(3.1)encryption/decryption performed by service processing offloadArm involved only in connection/session setup/tear downPipeline ensuresWire
13、speed throughout Minimal delayMinimal jitterSan Jose,CA April 26-28,2022NVMe-oFAccess a remote disk as if localThrough a regular NVMe driverMultiple transports includingRDMATCP18OSNVMEoF InitiatorRDMATCP/IPNICRemote Storage ManagementNVMEoF Initiator Emulated Local StorageNVME Initiator*VMNVME Emula
14、tionHypervisorRDMA DriverTCP/IP StackNICRemote Storage ManagementNVMEoF TargetNon-Volatile Memory Express Over FabricSan Jose,CA April 26-28,2022NVMe-oF Offload19NVMEoF TargetNVMEoF Initiator DSCRemote Storage ManagementEmulated Local StorageNVME EmulationNVMEoF Initiator DSCRemote Storage Managemen
15、tEmulated Local StorageNVME EmulationHypervisorNVME Initiator*VMNVME Initiator*VMNVME Initiator*OS/ContainerRDMA TCPRDMA TCPSan Jose,CA April 26-28,2022NVMEoF Offloading Support20Ethernet PortHost InterfaceMemoryPCIeCoherent InterconnectARMCoresPacket BufferTraffic ManagerP4Packet Processing Datapla
16、neServiceProcessingOffloadsEthernet PortNetwork InterfaceNetwork Interface(1)NVMe commandsLoad balance across remote controllersEncapsulationTCP segmentation(2)NVMe commands translated into NVMEoF capsules(2.1)encryption/decryption;data digest generation/verification(3)encryption/decryption;of data
17、at restSan Jose,CA April 26-28,2022Distributed Stateful E-W FirewallFirewalling E-W traffic is particularly challengingLarge volume compared to N-SApplications expect small latency21Appliances are not suitable as they would create“traffic tromboning”Increased loadIncreased latencyIncreased jitterSan
18、 Jose,CA April 26-28,2022The DSC is the perfect spot where to implement thisIt is on the path of each packetFlow caching to reduce latencyEvaluate rules on first packetInstall entry in flow cache table for handling following packets22San Jose,CA April 26-28,2022Distributed Stateful Firewall Support2
19、3Ethernet PortHost InterfaceMemoryPCIeCoherent InterconnectARMCoresPacket BufferTraffic ManagerP4Packet Processing DataplaneServiceProcessingOffloadsEthernet PortNetwork InterfaceNetwork Interface(3)packet and corresponding action are passed to ARM cores(1)packets belonging to a known flow are forwa
20、rded directly(flow cache table)(2)packets of new flows are further processed in the pipeline to evaluate rulesSoftware creates forward and reverse flow entries in the flow cache table(4)packet is passed to pipeline for processing based on newly installed flow cache entryMemory shared by ARM cores an
21、d pipeline ensures that entry is up-to-dateSan Jose,CA April 26-28,2022ConclusionsDistributing services in data centers is beneficialDistributing them to the edge has advantagesUsing a host adaptor or ToR is idealThe right hardware architecture is keyAdditional off-load is possibleThank you!pensando.iobaldi.info