Homepage - Yuqing Xie 谢昱清

Yuqing Xie

4th year Ph.D. Student
Tsinghua University

xieyq22(at)tsinghua.edu.cn

About Me

I am Yuqing Xie (谢昱清), a 4th year Ph.D. student in the Department of Electronic Engineering, Tsinghua University. Currently, I work in NICS-EFC Lab, supervised by Prof. Yu Wang. From 2018 to 2022, I studied in Xinya College, Tsinghua University. I received my B.E. in Computer Science and Technology and B.A. in Foreign Languages and Literatures in 2022.

My research interests lie in the field of reinforcement learning (RL) and its applications in robotics. More specifically, I am exploring the following topics:

Robotic control: UAV flight control, robotic arm control;
RL algorithm: reward shaping, curriculum learning, Sim2Real;
Large model post-training: RL for LLM and VLA.

Additionally, I contributed to the development of wechaty, OmniDrones, and RLinf.

Education

Tsinghua University

Department of Electronic Engineering,
PhD Student

Sep. 2022 - present
Tsinghua University

Xinya College,
BEng in Computer Science and Technology,
BA in Foreign Languages and Literatures

Sep. 2018 - Jul. 2022

Experience

Infinigence AI

Research Intern

Jun. 2025 - present
Microsoft Research Asia

Research Intern

Jun. 2021 - Dec. 2021

Honors & Awards

Tsinghua Graduate Scholarship (First Class)

2025
Tsinghua Undergraduate Scholarship (top 5%)

2021
Google Women TechMakers Scholar

2020

Selected Publications (view all )

Automatic Reward Shaping from Multi-Objective Human Heuristics

Yuqing Xie, Jiayu Chen, Chao Yu, Yu Wang

NeurIPS 2025 Workshop: Second Workshop on Aligning Reinforcement Learning Experimentalists and Theorists

Submitted to International Conference on Learning Representations (ICLR) 2026. Under review.

We propose Multi-Objective Reward Shaping with Exploration, a general framework that automatically combines multiple human-designed heuristic rewards into a unified reward function. MORSE formulates the shaping process as a bi-level optimization problem: the inner loop trains a policy to maximize the current shaped reward, while the outer loop updates the reward function to optimize task performance. To encourage exploration in the reward space, MORSE introduces stochasticity into the outer-loop optimization.

[Paper]

Automatic Reward Shaping from Multi-Objective Human Heuristics

Yuqing Xie, Jiayu Chen, Chao Yu, Yu Wang

NeurIPS 2025 Workshop: Second Workshop on Aligning Reinforcement Learning Experimentalists and Theorists

Submitted to International Conference on Learning Representations (ICLR) 2026. Under review.

[Paper]

Multi-UAV Formation Control with Static and Dynamic Obstacle Avoidance via Reinforcement Learning

Yuqing Xie*, Chao Yu*, Hongzhi Zang*, Feng Gao, Wenhao Tang, Jingyi Huang, Jiayu Chen, Botian Xu, Yi Wu, Yu Wang (* equal contribution)

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025

We require multiple UAVs to (1) maintain formation; (2) avoid both static and dynamic obstacles; (3) follow velocity commands; (4) perform smooth actions. To tackle the 4-objective task, we propose a 2-stage RL training pipeline.

[Paper] [Website] [Code] [Video]

Multi-UAV Formation Control with Static and Dynamic Obstacle Avoidance via Reinforcement Learning

Yuqing Xie*, Chao Yu*, Hongzhi Zang*, Feng Gao, Wenhao Tang, Jingyi Huang, Jiayu Chen, Botian Xu, Yi Wu, Yu Wang (* equal contribution)

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025

[Paper] [Website] [Code] [Video]

Warning

Action required

Education

Experience

Honors & Awards

Selected Publications (view all )

Automatic Reward Shaping from Multi-Objective Human Heuristics

Automatic Reward Shaping from Multi-Objective Human Heuristics

Multi-UAV Formation Control with Static and Dynamic Obstacle Avoidance via Reinforcement Learning

Multi-UAV Formation Control with Static and Dynamic Obstacle Avoidance via Reinforcement Learning

All publications