上海后花园1314(上海419论坛/上海贵族后花园)

上海品茶

纵目--Radar Perception and Fusion with Deep Learning.pdf

上传人：2***

编号：149625

2023-12-18

22页 6.44MB

《纵目--Radar Perception and Fusion with Deep Learning.pdf》由会员分享，可在线阅读，更多相关《纵目--Radar Perception and Fusion with Deep Learning.pdf（22页珍藏版）》请在三个皮匠报告上搜索。

1、Radar Perception and Fusion with Deep Learning2023.11.14Yu Su 苏煜Senior Perception Algorithm ExpertZongMu Technology2About MevWork in AcademyPh.D.at Harbin Institute of Technology/ICT,Chinese Academy of Sciences,19992009Research Engineer at University of Caen/CNRS,France,20092012Research on Computer

2、Vision,Pattern Recognition and Machine LearningFocus on Face Recognition,Large-Scale Image Classification/Retrieval30+Publications in IJCV,TIP,ICCV,CVPR,ECCV,etc.with 2500+citations.vWork in IndustrySenior Algorithm Engineer at APTIV(previously DELPHI),2013202311 Years Experience in Autonomous Drivi

3、ng R&DVision-based ADAS:Object/Lane DetectionRadar/LiDAR Perception with Deep Learning20+Patents in Related Fields3Contentv Traditional Radar Signal Processingv Radar Perception with Deep Learningv Deep Fusion of Radar and Camera4Traditional Radar Signal ProcessingDopplerAngleRAD DataCube(Dense)Rang

4、eDopplerPointAngleRangeADC Data Point Cloud(Sparse)Red:Radar points,Grey:LiDAR pointsSample from NuScenes DatasetFFTsCFARv LimitationsRadar point cloud is very sparseAngular resolution is low and no height informationDoppler spectrum information is not fully exploited(Range,Angle,Doppler,RCS)5Radar

5、Perception with Sparse Point Cloudv Pipeline1.Detection:find target proposals by clustering,e.g.DBSCAN2.Tracking:temporally track target by filtering,e.g.Kalman filter3.Classification:predict semantic labels by classifier,e.g.SVMv LimitationsHard to classify stationary targetsLow performance on VRU(

6、Vulnerable Road User)Need handcraft design and tuningDetection by ClusteringTracking by Kalman FilterClassification by SVM6Impact on Autonomous Driving ApplicationsStationary target is filtered outVRU(Pedestrian/Bicyclist)detection is unreliable No grid-level information to support high-level AD app

7、lications,e.g.autopilot,valet parking,etc.7Next Generation:Raw Data+Deep LearningDopplerAngleRAD DataCube(Dense)RangeDopplerpointAngleRangeADC Data Point Cloud(Sparse)FFTsCFARDeep Neural NetworkMoving&Stationary Vehicle/Ped/Bike/etc.Occupancy Grid with Semantic/Dynamic Information Solve the problems

8、 of traditional Radar perceptionHard to classify stationary targetsLow performance on VRUNeed handcraft design and tuning*Pictures of outputs are from NVIDIA RadarNet https:/ Increasing InterestsPicture is from https:/ Main TopicsPoint cloud+deep learning for light weighted system and L2 application

9、s,e.g.ACC on highwayRaw data+deep learning for advanced AD applications,e.g.Autopilot and Valet parkingEnd-to-end radar signal processing+perceptionFusion with Camera and LiDAR in Birds Eye View4D Radar perception with point cloud or raw data 9Research in Academy and Industry General MotorsCruiseOEM

10、AcademyUni.of WashingtonUni.of UlmTU of DelftDaimlerTier1Chip SupplierBoschValeoAPTIVQualcommNVIDIANXP10Work from NVIDIA：Point CloudPopov et al.,NVRadarNet:Real-Time Radar Obstacle and Free Space Detection for Autonomous Driving,2023.InputAccumulate these peak detections temporally over 0.5 seconds

11、Perform ego-motion compensation for the accumulated detections to the latest known vehicle positionMulti-Task OutputBbox classBbox SizeFreespaceGroundTruth:From LiDAR to Radar11Work from Qualcomm:RDA TensorMajor et al.,Vehicle Detection With Automotive Radar Using Deep Learning on Range-Azimuth-Dopp

12、ler Tensors,2019.vRDA Tensor ProcessingRDA=RA:Doppler dim as featuresRDA=RA,RD,AD=RARA Map：Encoding Doppler and RCSvCoordinate TransformRA(Polar)=XY(Cartesian)vTemporal Fusion+Object DetectionLSTM，SSD CFAR is replaced by NN.12Work from Uni.of Washington:RA-Chirp TensorWang et.al,RODNet:A Real-Time R

13、adar Object Detection Network Cross-Supervised by Camera-Radar Fused Object 3D Localization,2020.vRange-Azimuth-Chirp Doppler FFT is replaced by NN to extract motion informationvFusion of multi-framesFeature maps from M-Net for multi-framesDeformable Conv for temporal fusion13Summary of Technical Be

14、nefitsCompared with traditional Radar perceptionAvoid large information loss in point cloud generationKeep Doppler spectrum to better capture motion of targetsHigher perception abilityDetection of stationary/slow-moving vehicles/VRUs/debrisGrid output with dynamic/semantic informationDoppler spectru

15、m instead of a single Doppler value in point cloud14Summary of Technical BenefitsCompared with Vision PerceptionMuch better perception ability in dark,bad weather and glare conditions.Much higher positional accuracy.Compared with LiDAR PerceptionMuch lower price.Robust to adverse weather.More sensit

16、ive to moving targets.Similar performance to 32-beam LiDAR.From Sensor Fusion PerspectiveClose the gap of information density to Camera/LiDARFacilitate early(feature-level)fusionPedestrian Recall 30%-90%Vehicle Recall 70%-90%CameraColor ImageLiDARPoint CloudRadarLow-level Data15Summary of Technical

17、ChallengesData Recording and Ground-Truth LabelingRaw ADC data has much larger size than point cloud data.Synchronized LiDAR/Camera data is needed for automatic/semi-automatic labelling.Collect all data from scratch:there is no general dataset like ImageNet in vision domain for pre-training purpose.

18、Algorithm DevelopmentReplace traditional Radar processings(FFT,AngleFinding,CFAR,etc.)with deep learning.Adapt neural network architecture with Radar know-how.Temporal fusion,multiple Radar sensors fusion,multiple task learning,etc.System DesignSpatial/temporal calibration of camera,LiDAR and Radar

19、sensors.Visualization of input/intermediate/output data for error diagnose.Embedded deployment:implementation of non-standard ops,e.g.ego-motion-compensation,polar-to-cartesian transform.16Fusion of Radar and CameraWhen/Where to Fuse Camera and Radar?Camera+Radar Low cost and mature series productio

20、nEnough perception ability for most scenarios Suitable for L2/L2+systemsIn theory,fusion of Camera and Radar could cover all the perception abilities with low cost.17When to Fuse:Early Fusion or Late Fusion?Camera DataRadar DataFeature ExtractionFeature ExtractionFinal OutputFinal OutputEarly Fusion

21、Late FusionObject ProposalObject ProposalMixed FusionTwo key problems to solveAvoid information loss/make better use of complementary informationDeal with spatial and temporal misalignment/match information of different sensorsFuse ProposalFuse FeatureFuse Output18When to Fuse:Early Fusion or Late F

22、usion?Late FusionPros.High technical maturityHigh flexibility and modularityRobust to misalignmentCons.Heavily rely on the accuracy of individual sensorRadar detector is too unstable to fuse with CameraRich intermediate features are discarded Early FusionPros.Minimize information lossRobust to senso

23、r degradation Support of down-steam tasksEnd2End learning frameworkCons.Need coupled sensorsSensitive to misalignmentDetected Objects from ImageDetected Objects from RadarFused FeatureObject DetectionFused ObjectsSemantic Segmentation19Where to Fuse:Image View or BEV?Image(Perspective View)vs.Radar(

24、Birds Eye View)Project Radar Points to Image ViewCan be replaced by Radar point cloudProject Images to Birds Eye ViewChadwick et al.,Distant vehicle detection using radar and vision,2019.From University of Oxford.Liu et al.,BEVFusion:Multi-Task Multi-Sensor Fusion with Unified Birds-Eye View Represe

25、ntation,2022.From MIT20Fusion in Image ViewPros.Benefit from off-the-shelf visual perception algorithms,e.g.YOLO,Mask-RCNN,CenterNet,etc.Cons.Radar points lack of height,therefore the projection is not reliable For AD tasks,the perception results are still needed to be transformed to 3D space.Radar

26、points in image view are close to 1D.Y-axis values are not reliable.Projection of perception result from image to BEV leads to big distortion.21Fusion in Birds Eye ViewPros.BEV is a nature choice for autonomous driving.Direct perception in BEV space with high spatial accuracy.Benefit from the state-

27、of-the-art transformer/attention model.Cons.Image to BEV transformation highly depends on the accuracy of depth estimation.Transformer model needs high computation cost.GT labelling in BEV is more difficult than in 2D image.Accurate perception in BEV spaceCross-attention for image to BEV transformation 22Thanks for Your Attention!Q&A