《SNIA-SDC23-Metzmacher-io-uring-Status-Update-within-Samba_0.pdf》由会员分享,可在线阅读,更多相关《SNIA-SDC23-Metzmacher-io-uring-Status-Update-within-Samba_0.pdf(35页珍藏版)》请在三个皮匠报告上搜索。
1、io uringStatus Update within SambaStefan MetzmacherSamba Team/SerNet2023-09-20https:/samba.org/metze/presentations/2023/SDC/TopicsWhat is io-uring?io-uring for SambaPerformance research,prototyping and ideasThe road to upstreamFuture ImprovementsQuestions?Feedback!Stefan Metzmacherio uring(2/21)Last
2、 Status Updates(SDC 2020/2021-SambaXP 2023)I gave a similar talk at the storage developer conference 2020:See https:/samba.org/metze/presentations/2020/SDC/It explains the milestones and design up to Samba 4.13(in detail)I gave a similar talk at the storage developer conference 2021:See https:/samba
3、.org/metze/presentations/2021/SDC/It explains the milestones and updates up to Samba 4.15(in detail)I gave a similar talk at the SambaXP conference 2023:See https:/samba.org/metze/presentations/2023/SambaXP/It explains the milestones and updates up to Samba 4.19(in detail)Stefan Metzmacherio uring(3
4、/21)Last Status Updates(SDC 2020/2021-SambaXP 2023)I gave a similar talk at the storage developer conference 2020:See https:/samba.org/metze/presentations/2020/SDC/It explains the milestones and design up to Samba 4.13(in detail)I gave a similar talk at the storage developer conference 2021:See http
5、s:/samba.org/metze/presentations/2021/SDC/It explains the milestones and updates up to Samba 4.15(in detail)I gave a similar talk at the SambaXP conference 2023:See https:/samba.org/metze/presentations/2023/SambaXP/It explains the milestones and updates up to Samba 4.19(in detail)Stefan Metzmacherio
6、 uring(3/21)Last Status Updates(SDC 2020/2021-SambaXP 2023)I gave a similar talk at the storage developer conference 2020:See https:/samba.org/metze/presentations/2020/SDC/It explains the milestones and design up to Samba 4.13(in detail)I gave a similar talk at the storage developer conference 2021:
7、See https:/samba.org/metze/presentations/2021/SDC/It explains the milestones and updates up to Samba 4.15(in detail)I gave a similar talk at the SambaXP conference 2023:See https:/samba.org/metze/presentations/2023/SambaXP/It explains the milestones and updates up to Samba 4.19(in detail)Stefan Metz
8、macherio uring(3/21)What is io-uring?(Part 1)Linux 5.1 introduced a new scalable AIO infrastructureIts designed to avoid syscalls as much as possiblekernel and userspace share mmaped rings:submission queue(SQ)ring buffercompletion queue(CQ)ring bufferSee”Ringing in a new asynchronous I/O API”on LWN.
9、NETThis can be nicely integrated with our async tevent modelIt may delegate work to kernel threadsIt seems to perform better compared to our userspace threadpoolIt can also inline non-blocking operationsStefan Metzmacherio uring(4/21)What is io-uring?(Part 1)Linux 5.1 introduced a new scalable AIO i
10、nfrastructureIts designed to avoid syscalls as much as possiblekernel and userspace share mmaped rings:submission queue(SQ)ring buffercompletion queue(CQ)ring bufferSee”Ringing in a new asynchronous I/O API”on LWN.NETThis can be nicely integrated with our async tevent modelIt may delegate work to ke
11、rnel threadsIt seems to perform better compared to our userspace threadpoolIt can also inline non-blocking operationsStefan Metzmacherio uring(4/21)io-uring for Samba(Part 1)Between userspace and filesystem(available from 5.1):IORING OP READV,IORING OP WRITEV and IORING OP FSYNCSupports buffered and
12、 direct ioIORING OP FSETXATTR,IORING OP FGETXATTR(from 5.19)IORING OP GETDENTS,under discussion,but seems to be trickyIORING OP FADVISE(from 5.6)Path based syscalls with async impersonation(from 5.6)IORING OP OPENAT2,IORING OP STATXUsing IORING REGISTER PERSONALITY for impersonationIORING OP UNLINKA
13、T,IORING OP RENAMEAT(from 5.10)IORING OP MKDIRAT,IORING OP SYMLINKAT,IORING OP LINKAT(from 5.15)IORING OP SETXATTR,IORING OP GETXATTR(from 5.19)Stefan Metzmacherio uring(5/21)io-uring for Samba(Part 1)Between userspace and filesystem(available from 5.1):IORING OP READV,IORING OP WRITEV and IORING OP
14、 FSYNCSupports buffered and direct ioIORING OP FSETXATTR,IORING OP FGETXATTR(from 5.19)IORING OP GETDENTS,under discussion,but seems to be trickyIORING OP FADVISE(from 5.6)Path based syscalls with async impersonation(from 5.6)IORING OP OPENAT2,IORING OP STATXUsing IORING REGISTER PERSONALITY for imp
15、ersonationIORING OP UNLINKAT,IORING OP RENAMEAT(from 5.10)IORING OP MKDIRAT,IORING OP SYMLINKAT,IORING OP LINKAT(from 5.15)IORING OP SETXATTR,IORING OP GETXATTR(from 5.19)Stefan Metzmacherio uring(5/21)io-uring for Samba(Part 2)Between userspace and socket(and also filesystem)(from 5.8)IORING OP SEN
16、DMSG,IORING OP RECVMSGImproved MSG WAITALL support(5.12,backported to 5.11,5.10)Maybe using IOSQE ASYNC in order to avoid inline memcpyIORING OP SPLICE,IORING OP TEEIORING OP SENDMSG ZC,zero copy with an extra completion(from6.1)IORING OP GET BUF,under discussion to replaceIORING OP SPLICEStefan Met
17、zmacherio uring(6/21)vfs io uring in Samba 4.12(2020)With Samba 4.12 we added”io uring”vfs moduleFor now it only implementsSMB VFS PREAD,PWRITE,FSYNC SEND/RECVIt has less overhead than our pthreadpool default implementationsI was able to speed up a smbclient get largefile/dev/nullUsing against smbd
18、on loopbackThe speed changes from 2.2GBytes/s to 2.7GBytes/sThe improvement only happens by avoiding context switchesBut the data copying still happens:From/to a userspace buffer to/from the filesystem/page cacheThe data path between userspace and socket is completely unchangedFor both cases the cpu
19、 is mostly busy with memcpyStefan Metzmacherio uring(7/21)vfs io uring in Samba 4.12(2020)With Samba 4.12 we added”io uring”vfs moduleFor now it only implementsSMB VFS PREAD,PWRITE,FSYNC SEND/RECVIt has less overhead than our pthreadpool default implementationsI was able to speed up a smbclient get
20、largefile/dev/nullUsing against smbd on loopbackThe speed changes from 2.2GBytes/s to 2.7GBytes/sThe improvement only happens by avoiding context switchesBut the data copying still happens:From/to a userspace buffer to/from the filesystem/page cacheThe data path between userspace and socket is compl
21、etely unchangedFor both cases the cpu is mostly busy with memcpyStefan Metzmacherio uring(7/21)Performance research(SMB2 Read)In October 2020 I was able to do some performance researchWith 100GBit/s interfaces and two NUMA nodes per server.At that time I focussed on the SMB2 Read performance onlyWe
22、had limited time on the given hardwareWe mainly tested with fio.exe on a Windows clientLinux kernel 5.8.12 on the serverMore verbose details can be found here:https:/lists.samba.org/archive/samba-technical/2020-October/135856.htmlStefan Metzmacherio uring(8/21)Performance research(SMB2 Read)In Octob
23、er 2020 I was able to do some performance researchWith 100GBit/s interfaces and two NUMA nodes per server.At that time I focussed on the SMB2 Read performance onlyWe had limited time on the given hardwareWe mainly tested with fio.exe on a Windows clientLinux kernel 5.8.12 on the serverMore verbose d
24、etails can be found here:https:/lists.samba.org/archive/samba-technical/2020-October/135856.htmlStefan Metzmacherio uring(8/21)Performance research(SMB2 Read)In October 2020 I was able to do some performance researchWith 100GBit/s interfaces and two NUMA nodes per server.At that time I focussed on t
25、he SMB2 Read performance onlyWe had limited time on the given hardwareWe mainly tested with fio.exe on a Windows clientLinux kernel 5.8.12 on the serverMore verbose details can be found here:https:/lists.samba.org/archive/samba-technical/2020-October/135856.htmlStefan Metzmacherio uring(8/21)Perform
26、ance with MultiChannel,sendmsg()4 connections,3.8 GBytes/s,bound by 500%cpu in total,sendmsg()takes up to 0.5 msecsStefan Metzmacherio uring(9/21)IORING OP SENDMSG(Part1)4 connections,6.8 GBytes/s,smbd only uses 11%cpu,(io wqe work 50%cpu)per connection,we still use 300%cpu in totalStefan Metzmacher
27、io uring(10/21)IORING OP SENDMSG(Part2)The major problem still exists,memory copy done by copy user enhanced fast string()Stefan Metzmacherio uring(11/21)IORING OP SENDMSG+IORING OP SPLICE(Part1)16 connections,8.9 GBytes/s,smbd 5%cpu,(io wqe work 3%-12%cpu filesystem-pipe-socket),only 100%cpu in tot
28、al.The Windows client was still the bottleneck with”Set-SmbClientConfiguration-ConnectionCountPerRssNetworkInterface 16”Stefan Metzmacherio uring(12/21)smbclient IORING OP SENDMSG/SPLICE(network)4 connections,11 GBytes/s,smbd 8.6%cpu,with 4 io wqe work threads(pipe to socket)at 20%cpu each.smbclient
29、 is the bottleneck here tooStefan Metzmacherio uring(13/21)smbclient IORING OP SENDMSG/SPLICE(loopback)8 connections,22 GBytes/s,smbd 22%cpu,with 4 io wqe work threads(pipe to socket)at 22%cpu each.smbclient is the bottleneck here too,it triggers the memory copy done by copy user enhanced fast strin
30、g()Stefan Metzmacherio uring(14/21)More loopback testing on brand new hardwareRecently I re-did the loopback read testsIORING OP SENDMSG/SPLICE(from/dev/shm/)1 connection,10-13 GBytes/s,smbd 7%cpu,with 4 iou-wrk threads at 7%-50%cpu.4 connections,24-30 GBytes/s,smbd 18%cpu,with 16 iou-wrk threads at
31、 3%-35%cpu.I also implemented SMB2 writes withIORING OP RECVMSG/SPLICE(tested to/dev/null)1 connection,7-8 GBytes/s,smbd 5%cpu,with 3 io-wrk threads at 1%-20%cpu.4 connections,10 GBytes/s,smbd 15%cpu,with 12 io-wrk threads at 1%-20%cpu.I tested with a Linux Kernel 5.13In both cases the bottleneck is
32、 clearly on the smbclient sideWe could apply similar changes to smbclient and add true multichannelsupportIt seems that the filesystem-pipe-socket path is much betteroptimizedStefan Metzmacherio uring(15/21)More loopback testing on brand new hardwareRecently I re-did the loopback read testsIORING OP
33、 SENDMSG/SPLICE(from/dev/shm/)1 connection,10-13 GBytes/s,smbd 7%cpu,with 4 iou-wrk threads at 7%-50%cpu.4 connections,24-30 GBytes/s,smbd 18%cpu,with 16 iou-wrk threads at 3%-35%cpu.I also implemented SMB2 writes withIORING OP RECVMSG/SPLICE(tested to/dev/null)1 connection,7-8 GBytes/s,smbd 5%cpu,w
34、ith 3 io-wrk threads at 1%-20%cpu.4 connections,10 GBytes/s,smbd 15%cpu,with 12 io-wrk threads at 1%-20%cpu.I tested with a Linux Kernel 5.13In both cases the bottleneck is clearly on the smbclient sideWe could apply similar changes to smbclient and add true multichannelsupportIt seems that the file
35、system-pipe-socket path is much betteroptimizedStefan Metzmacherio uring(15/21)More loopback testing on brand new hardwareRecently I re-did the loopback read testsIORING OP SENDMSG/SPLICE(from/dev/shm/)1 connection,10-13 GBytes/s,smbd 7%cpu,with 4 iou-wrk threads at 7%-50%cpu.4 connections,24-30 GBy
36、tes/s,smbd 18%cpu,with 16 iou-wrk threads at 3%-35%cpu.I also implemented SMB2 writes withIORING OP RECVMSG/SPLICE(tested to/dev/null)1 connection,7-8 GBytes/s,smbd 5%cpu,with 3 io-wrk threads at 1%-20%cpu.4 connections,10 GBytes/s,smbd 15%cpu,with 12 io-wrk threads at 1%-20%cpu.I tested with a Linu
37、x Kernel 5.13In both cases the bottleneck is clearly on the smbclient sideWe could apply similar changes to smbclient and add true multichannelsupportIt seems that the filesystem-pipe-socket path is much betteroptimizedStefan Metzmacherio uring(15/21)The road to upstream(TEVENT FD ERROR)We need supp
38、ort for TEVENT FD ERROR in order to monitor errorsWhen using IORING OP SEND,RECVMSG we still want to noticeerrorsThis is the main merge request:https:/ requests/2793This merge request converts Samba to use TEVENT FD ERROR:https:/ requests/2885(It also simplifies other places in the code without io u
39、ring)Stefan Metzmacherio uring(16/21)The road to upstream(samba io uring abstraction 1)API glue to tevent:voidsamba_io_uring_ev_register(void);conststructsamba_io_uring_features*samba_io_uring_system_features(void);structsamba_io_uring*samba_io_uring_ev_context_get_ring(structtevent_context*ev);cons
40、tstructsamba_io_uring_features*samba_io_uring_get_features(conststructsamba_io_uring*ring);ev=tevent_context_init_byname(mem_ctx,samba_io_uring_ev);samba io uring abstraction factored out of vfs io uring:samba io uring ev hybrid tevent backend(glued on epoll backend)It means every layer getting the
41、tevent context can use io uringNo#ifdefs just checking if the required features are availableStefan Metzmacherio uring(17/21)The road to upstream(samba io uring abstraction 1)API glue to tevent:voidsamba_io_uring_ev_register(void);conststructsamba_io_uring_features*samba_io_uring_system_features(voi
42、d);structsamba_io_uring*samba_io_uring_ev_context_get_ring(structtevent_context*ev);conststructsamba_io_uring_features*samba_io_uring_get_features(conststructsamba_io_uring*ring);ev=tevent_context_init_byname(mem_ctx,samba_io_uring_ev);samba io uring abstraction factored out of vfs io uring:samba io
43、 uring ev hybrid tevent backend(glued on epoll backend)It means every layer getting the tevent context can use io uringNo#ifdefs just checking if the required features are availableStefan Metzmacherio uring(17/21)The road to upstream(samba io uring abstraction 2)generic submission/completion api:voi
44、dsamba_io_uring_completion_prepare(structsamba_io_uring_completion*completion,void(*completion_fn)(structsamba_io_uring_completion*completion,void*completion_private,conststructio_uring_cqe*cqe),void*completion_private);voidsamba_io_uring_submission_prepare(structsamba_io_uring_submission*submission
45、,void(*submission_fn)(structsamba_io_uring*ring,structsamba_io_uring_submission*submission,void*submission_private),void*submission_private,structsamba_io_uring_completion*completion);structio_uring_sqe*samba_io_uring_submission_sqe(structsamba_io_uring_submission*submission);size_tsamba_io_uring_qu
46、eue_submissions(structsamba_io_uring*ring,structsamba_io_uring_submission*submission);Using it.convert vfs io uringuse it in smb2 server.cIn future use it in other performance critical places too.Stefan Metzmacherio uring(18/21)The road to upstream(samba io uring abstraction 2)generic submission/com
47、pletion api:voidsamba_io_uring_completion_prepare(structsamba_io_uring_completion*completion,void(*completion_fn)(structsamba_io_uring_completion*completion,void*completion_private,conststructio_uring_cqe*cqe),void*completion_private);voidsamba_io_uring_submission_prepare(structsamba_io_uring_submis
48、sion*submission,void(*submission_fn)(structsamba_io_uring*ring,structsamba_io_uring_submission*submission,void*submission_private),void*submission_private,structsamba_io_uring_completion*completion);structio_uring_sqe*samba_io_uring_submission_sqe(structsamba_io_uring_submission*submission);size_tsa
49、mba_io_uring_queue_submissions(structsamba_io_uring*ring,structsamba_io_uring_submission*submission);Using it.convert vfs io uringuse it in smb2 server.cIn future use it in other performance critical places too.Stefan Metzmacherio uring(18/21)The road to upstream(smb2 server.c)Refactoring of smb2 se
50、rver.cadd optional IORING OP SENDMSG,IORING OP RECVMSG supportThere are structural problems with splice from a fileI had a discussion with the Linux developers about it:The page content from the page cache may change unexpectetlyhttps:/lists.samba.org/archive/samba-technical/2023-February/thread.htm
51、l#137945We may not able to use IORING OP SENDMSG/SPLICE by defaultMaybe IORING OP RECVMSG/SPLICE is possibleWith IORING OP SENDMSG ZC only 1 one copy is used:It is able to avoid copying to the socketWe get an extra completion once the buffers are not needed anymoreOnly with real hardware,not on loop
52、back in an upstream kernelA custom kernel loopback gives 7.5 GBytes/s instead of 3.5 GBytes/sWith a noop vfs module,we get 18 GBytes/s instead of 6 GBytes/sStefan Metzmacherio uring(19/21)The road to upstream(smb2 server.c)Refactoring of smb2 server.cadd optional IORING OP SENDMSG,IORING OP RECVMSG
53、supportThere are structural problems with splice from a fileI had a discussion with the Linux developers about it:The page content from the page cache may change unexpectetlyhttps:/lists.samba.org/archive/samba-technical/2023-February/thread.html#137945We may not able to use IORING OP SENDMSG/SPLICE
54、 by defaultMaybe IORING OP RECVMSG/SPLICE is possibleWith IORING OP SENDMSG ZC only 1 one copy is used:It is able to avoid copying to the socketWe get an extra completion once the buffers are not needed anymoreOnly with real hardware,not on loopback in an upstream kernelA custom kernel loopback give
55、s 7.5 GBytes/s instead of 3.5 GBytes/sWith a noop vfs module,we get 18 GBytes/s instead of 6 GBytes/sStefan Metzmacherio uring(19/21)The road to upstream(smb2 server.c)Refactoring of smb2 server.cadd optional IORING OP SENDMSG,IORING OP RECVMSG supportThere are structural problems with splice from a
56、 fileI had a discussion with the Linux developers about it:The page content from the page cache may change unexpectetlyhttps:/lists.samba.org/archive/samba-technical/2023-February/thread.html#137945We may not able to use IORING OP SENDMSG/SPLICE by defaultMaybe IORING OP RECVMSG/SPLICE is possibleWi
57、th IORING OP SENDMSG ZC only 1 one copy is used:It is able to avoid copying to the socketWe get an extra completion once the buffers are not needed anymoreOnly with real hardware,not on loopback in an upstream kernelA custom kernel loopback gives 7.5 GBytes/s instead of 3.5 GBytes/sWith a noop vfs m
58、odule,we get 18 GBytes/s instead of 6 GBytes/sStefan Metzmacherio uring(19/21)Future ImprovementsPatches are slowly getting prepared for masterSome preparations are already in or pending merge requestsWe even have basic automated ci testing in place nowBut changes need to be checked for performance
59、regressionsWe can use io uring deep inside of the smbclient codeThe low layers can just use samba io uring ev context get ring()And use if available without changing the whole stackStefan Metzmacherio uring(20/21)Future ImprovementsPatches are slowly getting prepared for masterSome preparations are
60、already in or pending merge requestsWe even have basic automated ci testing in place nowBut changes need to be checked for performance regressionsWe can use io uring deep inside of the smbclient codeThe low layers can just use samba io uring ev context get ring()And use if available without changing the whole stackStefan Metzmacherio uring(20/21)Questions?Feedback!Stefan Metzmacher,metzesamba.orghttps:/https:/samba.plus SerNet/SAMBA+sponsor boothSlides:https:/samba.org/metze/presentations/2023/SDC/Stefan Metzmacherio uring(21/21)