nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo journalinfonormal searchdiv searchzone qikanlogo popupnotification paper paperNew
2026, 01, v.43 219-226
Design of an experimental platform for robotic-arm grasping based on large models
Email: 15062129100@163.com;
DOI: 10.16791/j.cnki.sjg.2026.01.027
Abstract:

[Objective] In recent years, with the rapid development of large model technology, research on robotic arm manipulation based on large models has advanced and become a mainstream research direction. However, most existing educational platforms for robotic grasping focus on traditional algorithms and do not delve into integrating large-scale model-based technologies. This limits the students' understanding of cutting-edge intelligent grasp methods. To address this issue, an intelligent experimental teaching platform for robotic arm grasping was designed and developed based on domestic large models. The new platform integrates advanced technologies such as multimodal human–computer interaction, decision-making by large-language-model agents, open-vocabulary visual detection, and fine-tuning of large models. [Methods] The platform has a modular design comprising three core functional modules. First, a human–robot voice interaction channel was constructed using speech recognition and synthesis APIs to support Mandarin and English instructions and feedback. Second, the locally deployed DeepSeek, fine-tuned with Prompt-Tuning or QLoRA, was used to develop an agent system. This system parses user instructions, automatically decomposes them into executable task steps, and generates corresponding robotic-arm motion-control sequences. Third, the locally deployed Grounding DINO or the cloud-based multimodal Qwen model was used to recognize and locate target objects of any category, returning their position coordinates in the image workspace. In addition, a simulation environment was developed for the experimental platform. This environment was built using CoppeliaSim software, thereby significantly reducing experimental costs and facilitating offline learning for students. [Results] The results of physical and simulated grasping experiments on the designed platform revealed that the robotic arm grasping system performed complex grasping tasks through natural language interaction. The fine-tuned large-language model effectively parsed user instructions, automatically decomposed complex tasks into executable steps, and generated corresponding robotic-arm motion-control sequences in a rational manner. The deployed vision-language models detected various object types and provided accurate position coordinates for robotic-arm grasping. [Conclusions] The developed platform helps students to gain a deep understanding of the core knowledge needed for tasks such as robotic visual control and motion planning. In addition, it stimulates their research interests in artificial intelligence and robotics and cultivates their interdisciplinary innovation capabilities.

References

[1]王文晟,谭宁,黄凯,等.基于大模型的具身智能系统综述[J].自动化学报, 2025, 51(1):1–19.WANG W S, TAN N, HUANG K, et al. A survey on embodied intelligence systems based on large models[J]. Acta Automatica Sinica, 2025, 51(1):1–19.(in Chinese)

[2]HUANG S Y, JIANG Z K, DONG H, et al. Instruct2Act:Mapping multi-modality instructions to robotic actions with large language model[DB/OL].(2023-05-18)[2025-07-03].http://doi.org/10.48550/arXiv.2305.11176.

[3]HUANG W L, WANG C, ZHANG R H, et al. VoxPoser:Composable 3D value maps for robotic manipulation with language models[DB/OL].(2023-07-12)[2025-07-03]. http://doi.org/10.48550/arXiv.2307.05973.

[4]AHN M, BROHAN A, BROWN N, et al. Do as I can, not as I say:Grounding language in robotic affordances[DB/OL].(2022-04-04)[2025-07-03]. http://doi.org/10.48550/arXiv.2204.01691.

[5]袁孟飞.基于prompt设计的机械臂任务规划研究[D].重庆:重庆大学, 2023.YUAN M F. Research on robotic arm task planning based on prompt design[D]. Chongqing:Chongqing University, 2023.

[6]BROHAN A, BROWN N, CARBAJAL J, et al. RT-2:Visionlanguage-action models transfer web knowledge to robotic control[DB/OL].(2023-07-28)[2025-07-03]. http://doi.org/10.48550/arXiv.2307.15818.

[7]罗晶,陈金海,彭志轩,等.基于机器视觉的机器人抓取实验系统[J].实验技术与管理, 2022, 39(4):45–50.LUO J, CHEN J H, PENG Z X, et al. Robot graspingexperimental system based on machine vision[J]. Experimental Technology and Management, 2022, 39(4):45–50.(in Chinese)

[8]孙明晓,王潇,胡军,等.基于自主移动抓取机器人的多功能实验教学平台设计[J].实验技术与管理, 2024, 41(4):140–146.SUN M X, WANG X, HU J, et al. Design of multifunctional experimental teaching platform based on autonomous mobile grasping robot[J]. Experimental Technology and Management,2024, 41(4):140–146.(in Chinese)

[9]TOMMY Z H.机械臂+大模型+多模态=人机协作具身智能体[DB/OL].[2025-07-04]. https://github.com/TommyZihao/vlm_arm.

[10]DEEPSEEK-AI, GUO D Y, YANG D J, et al. DeepSeek-R1:Incentivizing reasoning capability in LLMs via reinforcement learning[DB/OL].(2023-07-28)[2025-07-03]. http://doi.org/10.48550/arxiv.2501.12948.

[11]LIU S L, ZENG Y Z, REN T H, et al. Grounding DINO:Marrying DINO with grounded pre-training for open-set object detection-lecture notes in computer science[M]. Cham:Springer Nature Switzerland, 2025:38–55.

[12]RADFORD A, KIM J W, XU T, et al. Robust speech recognition via large-scale weak supervision[C]//Proceedings of the 40th International Conference on Machine Learning, Honolulu,USA, ICML, 2023(Part 36 of 54):28492–28518.

[13]YANG A, YANG B S, HUI B Y, et al. Qwen2 technical report[DB/OL].(2023-07-15)[2025-07-03]. http://doi.org/10.48550/arXiv.2407.10671.

[14]CHEN Y S, NIU Z K, MA Z Y, et al. F5-TTS:A fairytaler that fakes fluent and faithful speech with flow matching[DB/OL].(2024-10-09)[2025-07-03]. http://doi.org/10.48550/ar Xiv.2410.06885.

[15]RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//Proceedings of the 38th International Conference on Machine Learning, ICML, 2022(Part 11of 16):8738–8753.

[16]LESTER B, AL-RFOU R, CONSTANT N. The power of scale for parameter-efficient prompt tuning[C]//Proceedings of the2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, DOM, 2021:3045–3059.

[17]DETTMERS T, PAGNONI A, HOLTZMAN A, et al. QLoRA:Efficient finetuning of quantized LLMs[DB/OL].(2023-05-23)[2025-07-03]. http://doi.org/10.48550/arXiv.2305.14314.

[18]HU E J, SHEN Y L, WALLIS P, et al. LoRA:Low-rank adaptation of large language models[DB/OL].(2021-06-17)[2025-07-05]. http://doi.org/10.48550/arXiv.2106.09685.

[19]TURSYNBEK I, SHINTEMIROV A. Modeling and simulation of spherical parallel manipulators in coppeliaSim(V-REP)robot simulator software[C]//2020 International Conference Nonlinearity, Information and Robotics(NIR):Innopolis, RUS,2020:1–6.

Basic Information:

DOI:10.16791/j.cnki.sjg.2026.01.027

China Classification Code:TP241

Citation Information:

[1]ZHU Meiqiang,QIU Bangyan,CAO Yin ,et al.Design of an experimental platform for robotic-arm grasping based on large models[J].Experimental Technology and Management,2026,43(01):219-226.DOI:10.16791/j.cnki.sjg.2026.01.027.

Fund Information:

江苏省青蓝工程优秀教学团队(智能机器人课程群教学团队)(苏教师函[2022]51号); 江苏省学位与研究生教育教学改革课题(JGKT25_B038); 中国矿业大学-友达光电产教融合协同育人基地建设项目(202204); 中国矿业大学课程思政建设项目(2022KCSZ18,2023KCSZ65);中国矿业大学教学改革研究项目(2025KC21,2025JSJG054)

quote

GB/T 7714-2015
MLA
APA
Search Advanced Search