| 60 | 0 | 5 |
| Downloads | Citas | Reads |
[Objective] With the rapid development of autonomous driving technology, accurate perception of the surrounding environment has become increasingly critical, and 3D environment perception has emerged as a major research focus in this field. Traditional 3D perception systems rely heavily on expensive sensors such as LiDAR, which offer high accuracy but incur substantial costs and computational demands, limiting their scalability in large autonomous vehicle fleets. Although more recent 3D occupancy prediction methods rely solely on multicamera inputs, they typically require supervised learning with annotated 3D occupancy data, which is costly to obtain and consumes substantial memory. To address these challenges, this article proposes Image2 Occupancy, an improved 3D Gaussian-splatting-based occupancy prediction method that uses only 2D surround-view camera images. The method enables effective semantic occupancy prediction of 3D scenes while reducing the need for annotated data and large memory capacity. [Methods] The Image2 Occupancy framework consists of two components:(1) 2D-to-3D feature extraction and spatial mapping, and(2) self-supervised 3D occupancy representation learning. In the first component, BEVStereo and Swin Transformer modules extract 2D features from panoramic input images. These features are then interpolated and mapped to 3D space using the intrinsic and extrinsic parameters of the camera, yielding voxel-level feature representations. This process converts 2D image information into 3D semantic occupancy cues, providing accurate input for subsequent self-supervised learning. In the second component, an improved Gaussian splatting technique projects 3D voxel features back onto the 2D image plane while preserving semantic information. Gaussian points placed at each voxel center approximate scene occupancy, enabling rendering of semantic and depth maps by computing pixel-level depth and semantic information. A novel self-supervised learning framework generates pseudo-labels from the predicted depth and semantic maps of the model, eliminating the need for real 3D occupancy labels. A specialized loss function, combining cross-entropy and depth losses, minimizes discrepancies between rendered and ground-truth semantic and depth maps, optimizing prediction accuracy. [Results] Experiments on the NuScenes dataset show that Image2 Occupancy achieves an mIoU of 27.87, improving performance by 3.94 percentage points(a 16.5% increase) over existing 2D-input methods and performing comparable to, or better than, several 3D-input methods. Compared with NeRF-based approaches, GPU memory usage is reduced by 54.7% while maintaining the same number of Gaussian points. Ablation studies further validate the effectiveness of the core components of the method. [Conclusions] Image2 Occupancy reduces hardware dependence and substantially decreases the need for large annotated datasets through self-supervised learning, offering a cost-effective and scalable 3D environment perception solution for autonomous driving systems with strong potential for practical deployment.
[1]周崇秋,高春甫,李林峰,等.基于三维点云的轮对踏面轮廓动态测量教学实验设计[J].实验技术与管理, 2024, 41(1):208–213.ZHOU C Q, GAO C F, LI L F, et al. Design of teaching experiment for dynamic measurement of wheelset tread profile based on3D point cloud[J]. Experimental Technology and Management, 2024, 41(1):208–213.(in Chinese)
[2]王立鹏,王小晨,刘梦杰,等.基于虚拟空间三维体素—语义地图的机械臂任务规划[J].实验室研究与探索, 2024, 43(3):57–61, 89.WANG L P, WANG X C, LIU M J, et al. Task planning of robot arm based on 3D voxel-semantic map of virtual space[J].Research and Exploration in Laboratory, 2024, 43(3):57–61,89.(in Chinese)
[3]李庆玲,郭鸿锐,蔡轩,等.一种激光三维点云动态障碍剔除算法框架[J].实验技术与管理, 2023, 40(7):56–62.LI Q L, GUO H R, CAI X, et al. A framework on dynamic obstacle removal algorithm of laser 3D point cloud[J].Experimental Technology and Management, 2023, 40(7):56–62.(in Chinese)
[4]YAN Z Y, DONG W Z, SHAO Y H, et al. RenderWorld:World model with self-supervised 3D label[DB/OL].(2024-09-17)[2025-07-08]. https://doi.org/10.48550/arXiv.2409.11356.
[5]刘镇,孙振,孙哲,等.非结构化环境下基于占据预测的可通行性分析[J].南京信息工程大学学报, 2025, 17(4):557–565.LIU Z, SUN Z, SUN Z, et al. WildOcc:Traversability analysis based on occupancy prediction in unstructured environment[J].Journal of Nanjing University of Information Science&Technology,2025, 17(4):557–565.(in Chinese)
[6]王郑拓,杨波,林志伟,等.机器人三维视觉实验平台坐标系标定实验设计[J].实验室研究与探索, 2024, 43(8):52–56.WANG Z T, YANG B, LIN Z W, et al. Experiment design of robot 3D vision experimental platform coordinate system calibration[J]. Research and Exploration in Laboratory, 2024,43(8):52–56.(in Chinese)
[7]ZHANG C B, YAN J C, WEI Y, et al. OccNeRF:Self-supervised multi-camera occupancy prediction with neural radiance fields[DB/OL].(2023-12-14)[2025-07-08]. https://doi.org/10.48550/arXiv.2312.09243.
[8]LIU Y L, MOU L Z, YU X, et al. Let occ flow:Self-supervised3D occupancy flow prediction[DB/OL].(2024-07-10)[2025-07-08]. https://doi.org/10.48550/arXiv:2407.07587.
[9]MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. Nerf:Representing scenes as neural radiance fields for view synthesis[J].Communications of the ACM, 2021, 65(1):99–106.
[10]HUANG Y H, ZHENG W Z, ZHANG Y P, et al. Gaussianformer:Scene as gaussians for vision-based 3d semantic occupancy prediction[C]//Computer Vision-ECCV2024, part XXVII:376–393.
[11]GAN W S, LIU F, XU H B, et al. Gaussianocc:Fully selfsupervised and efficient 3d occupancy estimation with gaussian splatting[DB/OL].(202-08-21)[2025-07-08]. https://doi.org/10.48550/arXiv.2408.11447, 2024.
[12]DUAN Y X, WEI F Y, DAI Q Y, et al. 4D-rotor gaussian splatting:Towards efficient novel view synthesis for dynamic scenes[C]//ACM SIGGRAPH 2024 Conference Papers, New York:ACM, 2024, 1–11.
[13]HUANG N, WEI X B, ZHENG W Z, et al. S3 gaussian:Selfsupervised street gaussians for autonomous driving[DB/OL].(2024-05-30)[2025-07-08]. https://doi.org/10.48550//arXiv.2405.20323.
[14]KERBL B, KOPANAS G, LEIMKÜHLER T, et al. 3D gaussian splatting for real-time radiance field rendering[J]. ACM Transaction Graphics, 2023, 42(4):139.
[15]LI Y, BAO H, GE Z, et al. Bevstereo:Enhancing depth estimation in multi-view 3d object detection with temporal stereo[C]//Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington, D. C, USA, 2023:1486–1494.
[16]LIU Z, LIN Y, CAO Y, et al. Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the 18th IEEE/CVF international conference on computer vision(ICCV), Montreal, Canada, 2021:9992–10002.
[17]CAESAR H, BANKITI V, LANG A H, et al. NuScenes:A multimodal dataset for autonomous driving[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition(CVPR2020), Seattle, USA, 2020:11618–11628.
[18]LI Z Q, WANG W H, LI H Y, et al. Bevformer:Learning bird’seye-view representation from lidar-camera via spatiotemporal transformers[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, 47(3):2020–2036.
[19]WEI Y, ZHAO L Q, ZHENG W Z, et al. Surroundocc:Multicamera 3D occupancy prediction for autonomous driving[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris France:21672–21683.
[20]TIAN X Y, JIANG T, YUN L F, et al. Occ3D:A large-scale 3D occupancy prediction benchmark for autonomous driving[DB/OL].(2023-04-27)[2025-07-08]. https://doi.org/10.48550//arXiv.2304.14365.
Basic Information:
DOI:10.16791/j.cnki.sjg.2026.01.014
China Classification Code:TP391.41;U463.6
Citation Information:
[1]FENG Tao,LI Qing,SONG Ruizhuo ,et al.Image2Occupancy: An environment perception method based on improved 3D Gaussian splatting for occupancy prediction[J].Experimental Technology and Management,2026,43(01):112-121.DOI:10.16791/j.cnki.sjg.2026.01.014.
Fund Information:
国家自然科学基金项目(62301030); 北京市高等教育本科教学改革创新项目(2023-81)