Paper Reading: Robot learning 3 • Xiaohei's Blog

Rating Criteria

Mediocre paper

Regular paper

Widely recognized or eye-catching paper

Foundational paper or one I really love

Undoubtedly the best paper

RL-BASED DRONE PAPER READING

Large-scale RL Exploration#

Arxiv ID 2403.10833

幻觉翻译 2403.10833

Privileged RL + attention-map networks with graph sparsification for large-scale scenes

show A deep RL exploration planner for large-scale environments. It uses privileged learning so the critic network can access true environment information to evaluate policies more accurately, combines an attention-map network to capture multi-scale spatial dependencies, and uses a graph-sparsification algorithm so a model trained in small scenes can be directly applied to large-scale environments. Simulation and real-world tests show better path length, time cost, and planning speed than frontier methods such as TARE.

推荐指数：

Agile Flight from Pixels#

Arxiv ID p082.pdf

Asymmetric actor-critic + gate-inner-edge visual abstraction + pixel-to-control without explicit state estimation

show1 This paper presents the first vision-based agile quadrotor flight system without explicit state estimation. It uses an asymmetric actor-critic framework where the critic gets privileged state information to improve training, uses inner edges of racing gates as a visual abstraction to simplify pixel-level RL training, and combines a SwinTransformer gate detector. With only an onboard camera video stream, it achieves racing flight up to 40 km/h and 2g acceleration, with zero-shot sim-to-real transfer.

推荐指数：

HOLA-Drone#

Arxiv ID 2409.08767

幻觉翻译 2409.08767

Hypergraph open-ended learning + zero-shot coordination for pursuit with unknown teammate drones

show2 It models multi-UAV cooperative pursuit as a zero-shot coordination problem and proposes HOLA-Drone, a hypergraph open-ended learning algorithm. It formalizes multi-agent interaction relationships with hypergraphs, adaptively adjusts learning objectives to strengthen collaboration with unknown teammates. Simulation and real-world experiments show significantly higher capture success rate and efficiency than baselines such as self-play and population training, in both homogeneous and heterogeneous unknown-teammate settings.

推荐指数：

OmniDrones#

Arxiv ID 2309.12825

幻觉翻译 2309.12825

A UAV RL platform with GPU-parallel simulation for multiple models and tasks

show3 OmniDrones is built on NVIDIA Omniverse Isaac Sim as a UAV reinforcement learning platform. It uses PyTorch to parallelize dynamics computation on GPU to improve sampling efficiency, supports 4 UAV types, 5 sensors, 4 control modes, and 10+ benchmark tasks, and is compatible with mainstream single-/multi-agent RL algorithms. It provides an efficient, scalable, and customizable simulation and evaluation environment for learning UAV control.

推荐指数：

Multi-UAV Pursuit-Evasion#

Arxiv ID 2409.15866

幻觉翻译 2409.15866

Evasion prediction network + adaptive environment generation + two-stage rewards for pursuit-evasion in unknown environments

show4 For multi-UAV pursuit-evasion in unknown environments, it proposes an evader prediction-enhanced network to handle partial observability, combines an adaptive environment generator to improve policy generalization and sample efficiency, and uses two-stage rewards to refine commands for smooth, deployable control. In simulation it achieves 100% capture rate in unknown scenes and outperforms all baselines. It is also among the first to deploy an RL policy (outputting total thrust and body angular rates) zero-shot to a real quadrotor to complete pursuit-evasion tasks.

推荐指数：

PKCC#

Arxiv ID 2503.02408

幻觉翻译 2503.02408

Learned kinematics correction + weight-allocation MPC for high-precision aerial manipulator coordination

show5 To address low accuracy and difficult motion allocation in aerial manipulator kinematic control, it proposes a predictive kinematic cooperative control method. It builds a corrected kinematics model that incorporates closed-loop dynamics and online residual learning to improve modeling accuracy, and designs a weight-allocation MPC to coordinate motion strategies between the UAV and robotic arm. Experiments show a 59.6% improvement in trajectory tracking accuracy, enabling complex-trajectory and moving-target tracking.

推荐指数：

Pixel Motion#

Arxiv ID 2411.04413

幻觉翻译 2411.04413

Monocular optical flow + central-flow attention + differentiable simulation for high-speed quadrotor obstacle avoidance

show6 For monocular-vision quadrotor obstacle avoidance, it proposes an optical-flow-based end-to-end learning framework. Policy training is done via differentiable simulation, introducing central-flow attention and an action-guided active-perception mechanism to strengthen extraction of key visual information. With only a monocular FPV camera, it achieves agile obstacle avoidance up to 6 m/s in unknown cluttered environments, with zero-shot sim-to-real transfer.

推荐指数：

Simple Flight#

Arxiv ID 2412.11764

幻觉翻译 2412.11764

PPO + five key design dimensions for zero-shot sim2real transfer control on quadrotors

show7 It distills five core elements for zero-shot deployment of RL on quadrotors. Based on PPO, it builds the SimpleFlight framework and optimizes observation inputs, reward smoothing, system identification and selective domain randomization, and large-batch training. On nano drones, trajectory tracking error is reduced by over 50%, enabling stable tracking of smooth and infeasible trajectories and cross-platform generalization.

推荐指数：

Whole-Body Control Gap#

Arxiv ID 2409.00895

幻觉翻译 2409.00895

RL + observation distillation + informed resets for pixel-to-action whole-body narrow-gap traversal

show8 End-to-end narrow-gap whole-body control from pixels to actions for a quadrotor. It uses model-free RL to learn a low-dimensional point-cloud policy, then transfers to high-dimensional pixel input via online observation distillation. It proposes informed resets to mitigate sparse-exploration difficulty, enabling body-level narrow-gap traversal across many geometries and large attitudes without hand-crafted curricula.

推荐指数：

YOPO#

Arxiv ID 10528860

Single-stage planning + guided learning + motion primitives for real-time mapless quadrotor trajectories

show9 YOPO is a single-stage learning-based planner for quadrotors that fuses perception, path search, and trajectory optimization into a single network. It covers the planning solution space with motion primitives, and uses an innovative guided-learning method to train the network with numerical gradients. Inference latency is only 1.6 ms. Simulation and real-world experiments demonstrate high-speed safe flight in complex forest environments, outperforming traditional gradient-optimization methods.

推荐指数：