艾滋病阻断药多少小时内吃有效(一边理发一边做,上海油压店哪里不错)

上海品茶

【4】Speech signal improvement in real-time communication.pdf

上传人：2***

编号：129367

2023-05-01

29页 2.96MB

《【4】Speech signal improvement in real-time communication.pdf》由会员分享，可在线阅读，更多相关《【4】Speech signal improvement in real-time communication.pdf（29页珍藏版）》请在三个皮匠报告上搜索。

1、Speech Signal Improvement In Real-time CommunicaitonYannan WangTencent Ethereal Audio Lab,Tencent,Shenzhen,ChinaOutline1.Introduction2.Speech Signal Improvement3.Future work2 BackgroundReal-time communication(RTC)systems widely used:Teleconferencing systems Video callsReason for speech quality of cu

2、rrent RTC systems:Device robustness Acoustical capturing Noise/reverberation corruption Interfering speakers Network congestion3Introduction4Device robustnessOutline1.Introduction2.Speech Signal ImprovementI.EnhancementII.Restoration3.Future work56键盘雨声微信消息提示桌子放水杯咳嗽语音降噪7 房间墙壁、天花板、地面、各种物体的反射声波和直达波叠加，降

3、低语音质量和清晰度传统方法缺陷：难以准确估计纯净语音和混响语音的非线性映射关系算法需要先验信息较多，收敛较慢去除混响的成分较少，效果不够明显去混响8说话人提取有感注册无感注册Yukai Ju et al.ASLPNPU&Tencent Ethereal Audio Lab,ChinaOur pervious winner model-TEA-PSE2The 1ststage network:estimate the target speakers magnitude with noisy phaseThe 2ndstage network:estimate the residual re

4、al and imaginary part Use simple concatenation method to combine speaker embeddingRelated WorksYukai Ju et al.ASLPNPU&Tencent Ethereal Audio Lab,ChinaContributionIncorporate a residual LSTM3after squeezed temporal convolution network(S-TCN)to enhance sequence modeling capabilitiesLocal-global repres

5、entation(LGR)4 structure is introduced to boost speaker information extractionMulti-STFT resolution loss5 is used to effectively capture the time-frequency characteristics of the speech signalsRetraining methods are employed based on the freeze training strategy to fine-tune the systemTEA-PSE 3.0 ra

6、nks 1st in both ICASSP 2023 DNS-Challenge track 1 and track 26TEA-PSE 3.0Yukai Ju et al.ASLPNPU&Tencent Ethereal Audio Lab,ChinaNetwork structureSame dual-stage framework as TEA-PSEResidual LSTM is added after every S-TCN module to further enhance the models sequence modeling capabilitiesLocal-globa

7、l representation(LGR)is adopted to boost better speaker information extractionTEA-PSE 3.0Yukai Ju et al.ASLPNPU&Tencent Ethereal Audio Lab,ChinaLoss Function1st-stage2nd-stage3rd-stageWe train MAG-Net and COM-Net sequentially as described above and then load these pre-trained models to retrain the e

8、ntire system using L2.Yukai Ju et al.ASLPNPU&Tencent Ethereal Audio Lab,ChinaExperimental SetupDatasetICASSP DNS-2022-speech 750h,noise 181hData augmentationReverberationDifferent interference scenarioOur training data are generated on-the-flySNR range-5,20dB,SIR range-5,20dBMixture scale range-35,-

9、15dBFSFFT configuration20ms frame length,10ms frame shift,1024 FFT pointsFor multi-STFT resolution loss,we use 3 different groups with FFT length 512,1024,2048,window length 480,960,1920,and frameshift 240,480,960Yukai Ju et al.ASLPNPU&Tencent Ethereal Audio Lab,ChinaExperimental Results and Analysi

10、sTEA-PSE 3.0 has the highest BAK and OVRLCompared with unprocessed speech,the SIG and WAcc of the submission model are decreasedYukai Ju et al.ASLPNPU&Tencent Ethereal Audio Lab,ChinaDemoNoisyTEA-PSE 3.0Yukai Ju et al.ASLPNPU&Tencent Ethereal Audio Lab,ChinaDemoNoisyTEA-PSE 3.0Outline1.Introduction2

11、.Speech Signal ImprovementI.EnhancementII.Restoration3.Future work17 MotivationSpeech restoration modules:Still exist residual noise components and artifactsAffects the perceptual quality of the speech signalNoise reduction modules:Distortions of the degraded speech signalIncrease the difficulty in

12、restoring the desired speech signal1819 Speech Signal Improvement Network(SSI-Net)Restoration Module Enhancement ModuleMethodology:SSI-NetMTFAA-LiteSTFTTRGANISTFTRestoration ModuleEnhancement Module Restoration Module:TRGAN Speech distortion restoration Bandwidth expansion Preliminary denoising and

13、dereverberation20Methodology:SSI-Net21 Restoration Module:TRGAN Time-domain mapping-based generator:Pseudo quadrature mirror filter bank(PQMF)for sub-band decomposition Utilization of phase information Discriminators:Multi-resolution frequency discriminators 1 Our proposed multi-band discriminatorsM

14、ethodology:SSI-Net1 Liu Z,Qian Y.Basis-MelGAN:Efficient neural vocoder based on audio decompositionJ.arXiv preprint arXiv:2106.13419,2021.Enhancement Module22Methodology:SSI-NetFrequency-attentionAmplitude/Phase EncoderERB MergingConv-2DERB SplittingMask EstimationFrequency-attentionConv-2DFrequency

15、-attentionConv-2DNoisy Speech Signal Enhanced Speech Signal Signal ResynthesisEliminates residual noise components and artifactsAdopt the lite version of MTFAA-Net 1.Retain the frequency downsampling,frequency upsampling,and T-F convolution modules in MTFAA-NetDrop the T-attention with high time-com

16、plexity in axial self-attention.1 Zhang,G.,Wang,C.,Yu,L.,&Wei,J.(2022,May).Multi-Scale Temporal Frequency Convolutional Network with Axial Attention for Multi-Channel Speech Enhancement.In ICASSP 2022-2022 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP)(pp.9206-9210).

17、IEEE.Experimental Setup 100,000 Room impulse responses(RIRs)with the image method Clean speech and noise:subsets from DNS Challenge 1 and some private dataset.A simulated 1500-hour dataset：coloration,discontinuity,loudness,noise and reverberationcodec23Experiments1 Dubey H,Gopal V,Cutler R,et al.ICA

18、SSP 2022 Deep Noise Suppression ChallengeC/ICASSP 2022-2022 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2022:9271-9275.Evaluation on the SSI Challenge Blind Test SetTable 1:partial results of multi-dimensional subjective testSSI-Net yields a significant improvement in all metricsEfficiently alleviate the difficulties:Noise Coloration Discontinuity Loudness Reverberation24Experiments*More detailed results are available on the website:https:/ for Track 1Results for Track 2官方比赛结果：https:/