Abstract

Recent LSS-based multi-view 3D object detection has made tremendous progress, by processing the features in Brid-Eye-View (BEV) via the convolutional detector. However, the typical convolution ignores the radial symmetry of the BEV features and increases the difficulty of the detector optimization. To preserve the inherent property of the BEV features and ease the optimization, we propose an azimuth-equivariant convolution (AeConv) and an azimuth-equivariant anchor. The sampling grid of AeConv is always in the radial direction, thus it can learn azimuth-invariant BEV features. The proposed anchor enables the detection head to learn predicting azimuth-irrelevant targets. In addition, we introduce a camera-decoupled virtual depth to unify the depth prediction for the images with different camera intrinsic parameters. The resultant detector is dubbed Azimuth-equivariant Detector (AeDet). Extensive experiments are conducted on nuScenes, and AeDet achieves a 62.0% NDS, surpassing the recent multi-view 3D object detectors such as PETRv2 and BEVDepth by a large margin.

Results

Comparison on the nuScenes val set. With ResNet-50 and ResNet-101, AeDet achieves 50.1% NDS and 56.1% NDS, outperforming the current multi-view 3D object detectors such as BEVFormer (by 4.4%) and BEVDepth (by 2.6%).

Comparison on the nuScenes test set. AeDet improves the mAOE and mAVE by 3.4% and 2.8% respectively, and sets a new state-of-the-art result with 53.1% mAP and 62.0% NDS in multi-view 3D object detection.

Revolving Test

The detection robustness to different azimuths is important to the autonomous driving system, since sometimes the vehicle may turn at a large angle. For example, at small roundabouts or corner roads, the turning angle of the vehicle becomes large, resulting in a great change in the camera orientation. The autonomous vehicle should be able to accurately detect the surrounding objects even in such situations. To verify the robustness of the detector, we propose a revolving test to simulate this scenario: we turn the vehicle 60 degrees clockwise to obtain a revolved view, and evaluate the detectors in the revolved view. As the Figure shows, AeDet yields almost the same prediction in both original view and revolved view.

Publication

C. Feng, Z. Jie, Y. Zhong, X. Chu, L. Ma
AeDet: Azimuth-invariant Multi-view 3D Object Detection
CVPR 2023
ArXiv | Code | Bibtex

Webpage template modified from here.