The reliability of autonomous driving systems fundamentally depends on precise environmental perception, where advanced algorithmic processing of sensor data generates comprehensive and accurate environmental information to ensure robust support for downstream path planning, decision-making, and control modules, thereby guaranteeing operational safety. Current perception systems predominantly employ vision- and LiDAR-based approaches, where camera-based methods are inherently constrained to 2-D interpretation and are highly sensitive to light intensity, while LiDAR's 3-D point cloud data effectively circumvents these limitations through inherent illumination-invariant spatial representation. Point clouds encapsulate rich geometric, spatial, and radiometric information, enabling robust large-scale environmental perception; however, existing LiDAR-based segmentation methods fail to simultaneously achieve precise local boundary delineation and globally consistent semantic segmentation in autonomous driving scenarios. To address the challenges of domain adaptation in autonomous driving scenarios and resolve the dual issues of ambiguous object boundaries and global semantic confusion in point cloud segmentation, this work proposes a hierarchical superpoint self-attention-based method with multi-level feature fusion capabilities.A hierarchical superpoint mechanism-based multi-scale fusion framework with self-attention for point cloud semantic segmentation. The hierarchical superpoint partitioning process was implemented within a U-shaped encoder-decoder architecture that interactively combined: (i) a dual partial attention(DPA) module that modeled both local-global coordinate variations and long-range dependencies of superpoint features, (ii) a boundary profile enhancement (BPE) module that utilized multi-scale convolutions to refine edge features, and (iii) a unique hierarchical feature fusion (HFF) module that integrated fine-grained superpoint characteristics and topological relationships across adjacent layers. The hierarchical partitioning process progressively merged adjacent superpoints with similar features into larger units, leading to a systematic reduction in superpoint quantity while simultaneously increasing intra-superpoint semantic purity at each level. This bottom-up aggregation propagated coherent feature representations through the network hierarchy. In the final stage, a classifier transformed the refined superpoint features into semantic labels, generating the final segmentation output.
Visual comparisons in Figures 7, Fig.8, and Fig.9 respectively between the proposed Partition Demarcation Lift-Superpoint Transformer (PDL-SPT) and baseline Superpoint Transformer (SPT) on S3DIS, KITTI-360, and DALES datasets demonstrated PDL-SPT's superior performance in both boundary delineation and large-scale semantic segmentation tasks. Tables 1-3 presented the detailed performance improvements of PDL-SPT across different categories, achieving high mean intersection over union (mIoU) scores of 67.6% on S3DIS, 61.7% on KITTI-360, and 79.2% on DALES, respectively, while Table 4 showed a 0.2s inference time reduction versus SPT. The proposed PDL-SPT method demonstrated significant performance gains across 11 of the 13 categories in the indoor S3DIS dataset for autonomous driving point cloud segmentation, with notable segmentation accuracy improvements of 8.9% for columns, 2.2% for sofas, and 3.4% for walls, particularly enhancing the recognition of critical structural elements like load-bearing pillars, crash barriers, and wall and other markers. The KITTI-360 dataset is designed for autonomous driving point cloud segmentation in complex urban road scenarios.
The PDL-SPT method achieved accuracy improvements in 8 of the 15 categories in the urban-focused KITTI-360 dataset, with segmentation gains of 1.2% for fences, 8.8% for traffic lights, 1.4% for traffic signs, 1.5% for pedestrians, and 8.2% for motorcycles. KITTI-360's visualization samples cover common complex urban scenarios including straight streets, street corners, and intersections, where dense environments particularly challenge local boundary delineation. Experimental results confirmed PDL-SPT's effectiveness in handling segmentation tasks under such complex traffic conditions. The DALES dataset, focusing on rural and suburban autonomous driving point cloud segmentation, featured sparse scenes that particularly challenged large-scale semantic segmentation capabilities. The PDL-SPT method demonstrated accuracy improvements across all 8 categories in the rural or suburban-focused DALES dataset, with segmentation gains of 4.2% for trucks, 4.7% for utility poles, and 0.7% for power lines. The sparse environments in DALES particularly challenge large-scale semantic segmentation capabilities, as rural roads and suburbs typically contain numerous utility poles and power lines. Precise segmentation of these objects enables reliable detection by autonomous vehicles, effectively preventing potential collisions. In summary, PDL-SPT demonstrated superior performance in both boundary delineation and large-scale semantic segmentation for autonomous driving scenarios.The hierarchical superpoint mechanism-based multi-scale fusion architecture with self-attention effectively addresses boundary ambiguity and global semantic confusion in point cloud segmentation across indoor, urban, rural, and suburban driving scenarios. Experimental results demonstrate that PDL-SPT significantly enhances segmentation accuracy for key autonomous driving object categories, including indoor columns and walls, fences and traffic signs in urban scenarios, dynamic moving objects, utility facilities and large vehicles in rural scenarios. Meanwhile, the model's inference time is reduced, providing vehicles with more reaction time in complex and dynamic traffic environments.