Abstract
Vision-based environmental perception has demonstrated significant promise for autonomous driving applications. However, the traditional unidirectional feature flow in many perception networks often leads to inadequate information propagation, which hinders the system’s ability to comprehensively perceive complex driving environments. Issues such as similar objects, illumination variations, and scale differences aggravate this limitation, introducing noise and reducing the reliability of the perception system. To address these challenges, we propose a novel Attention-Aware Upsampling-Downsampling Network (AUDNet). AUDNet utilizes a bidirectional feature fusion structure, incorporating a multi-scale attention upsampling module (MAU) to enhance the fine details in high-level features by guiding the selection of feature information. Additionally, the multi-scale attention downsampling module (MAD) is designed to reinforce the semantic understanding of low-level features by emphasizing relevant spatial dfigureetails. Extensive experiments on a large-scale, real-world driving dataset demonstrate the superior performance of AUDNet, particularly in multi-task environment perception in complex and dynamic driving scenarios.