1、POLARDB云数据库分布式存储引擎揭秘阿里云资深技术专家 曹伟PolarStore,An Ultra-low Latency and Fault Resilient Distributed Storage Systemfor Cloud DatabaseWei Cao,Senior Staff Engineer,Alibaba Cloud计算和存储分离的优势超高性能超低延迟2 3 4 1The benefit of decoupling storage from compute Start of the art distributed storage system,high performa
2、nce and ultra-low lantencyThe smart storage designed for database workload面向数据库优化的智能存储控制平面与数据平面分离Distributed system architecture:separation of control plane and data plane计算与存储分离的优势The Benefit of Decoupling Storage from Compute 计算存储硬件分离优化存储池化降低成本数据库快速迁移能力一份存储多节点共享Higher storage utilization rate andl
3、ower total cost of ownershipCustomization and Optimization for compute and storage separatelyFaster migration of database nodeShared storage for share everything architecture三副本一致性数据安全单盘扩容到100TB分布式数据快照技术Capacity of a single disk can be scaled out to 100TBData coherency guaranteed by three Replicas C
4、onsistent snapshotting among distributed nodesThin-provisioning软件定义存储能力更强按使用分配资源Powerful Features Provided by Software Defined Storage架构,控制平面与数据平面分离Separation of control plane and data plane划时代技术,超高性能 超低延迟State of the art distributed storage system,high performance and ultra-low latencyRDMA&SPDK零拷贝技
5、术用户态协议栈旁路OS内核POLARFS用户态文件系统ParallelRaft并行同步技术OS-bypassUser space network and I/O stackzero-copywith RDMA and SPDKPOSIX compatible embeddeduser space share-disk file systemA high parallelized Replication algorithmOS-bypass&zero-copyPOLARFS用户态文件系统PolarDBlibpfsPolarDBlibpfsPolar Block DeviceJournal fil
6、ePaxos fileFile System Metadata CacherootDirectory TreeFile Mapping TableLBA10.2PBA1500348.0201headtailBlock Mapping Table200489PBAFileIDLBA20247837.ChunksPolarFS ClusterNode1Node2Node3PolarDBlibpfs.superblockfile blockpending tail(2)(5)(1)(3)(4)(6)(6)乱序提交乱序应答乱序回放日志快速重搭流式重搭Out of order ac
7、knowledgeOut of order commitOut of order apply logFast catch up&streaming catch upParallelRaft并行副本技术Parallelized data replication while keeping data coherency面向数据库优化的智能存储The smart storage designed for database workloadRedoLog高优先级写入防止DB脑裂写坏数据保证Page原子写入避免DoubleWrite开销GroupCommit批量I/O写入优化Protect no data corrupted by brain-splitted database nodeHigh I/O priority for I/O requestson critical data pathSupport atomic write of large data granularity to save double write costOptimization for group commit workload