🚘 Learning Social Navigation from Positive and Negative Demonstrations and Rule-Based Specifications

Chanwoo Kim¹, Jihwan Yoon¹, Hyeonseong Kim¹, Taemoon Jeong¹, Changwoo Yoo², Seungbeen Lee^3,4, Soohwan Byeon⁵, Hoon Chung⁵, Matthew Pan⁶, Jean Oh⁴, Kyungjae Lee⁷, Sungjoon Choi¹,

Department of Aritificial Intelligence, Korea University¹
Department of Computer Science and Engineering, Korea University²
Department of Artificial Intelligence, Yonsei University³
Robotics Institute, Carnegie Mellon University⁴
Mobinn⁵
Department of Electrical and Computer Engineering, Queen's University⁶
Department of Statistics, Korea University⁷

Paper

This video presents real-world results of PioneeR (Positive and Negative demonstration density–driven Rewards with Rule-based specifications), a framework that combines data-driven learning and rule-based safety for socially aware robot navigation.

Abstract

Mobile robot navigation in dynamic human environments requires policies that balance adaptability to diverse behaviors with compliance to safety constraints. We hypothesize that integrating data-driven rewards with rule-based objectives enables navigation policies to achieve a more effective balance of adaptability and safety. To this end, we develop a framework that learns a density-based reward from positive and negative demonstrations and augments it with rule-based objectives for obstacle avoidance and goal reaching. A sampling-based lookahead controller produces supervisory actions that are both safe and adaptive, which are subsequently distilled into a compact student policy suitable for real-time operation with uncertainty estimates. Experiments in synthetic and elevator co-boarding simulations show consistent gains in success rate and time efficiency over baselines, and real-world demonstrations with human participants confirm the practicality of deployment.

Overview

PioneeR is a framework that combines data-driven learning from demonstrations with rule-based safety to achieve reliable social navigation. Stage 1 – Reward Learning: Construct density-based rewards from positive and negative demonstrations, augmented with rule-based safety and goal terms. Stage 2 – Teacher Policy: Generate safe and adaptive supervisory actions through sampling-based lookahead control using the combined reward. Stage 3 – Student Policy: Distill the teacher’s guidance into a compact, uncertainty-aware policy for real-time robot navigation.

Effect of Reward Design

(a) Learning-Based
(Pos.)

(b) Learning-Based
(Pos. & Neg.)

The synthetic example illustrates how each component contributes to the navigation. (a) Positive demonstrations only: the learned reward highlights both feasible corridors but lacks explicit safety awareness. (b) Positive + negative demonstrations: unsafe regions near humans are suppressed, guiding the robot toward safer trajectories. (c) With rule-based specifications: the final reward yields smooth, goal-directed paths that preserve clearance and achieve reliable navigation.

Dataset Generation

Positive Demonstrations

Negative Demonstrations

We trained the density-based reward using a dataset gathered through keyboard teleoperation that included both positive and negative demonstrations.

Teacher Policy

HR-RL (Human-Right, Robot-Left)

HL-RR (Human-Left, Robot-Right)

Selected Action High Reward Low Reward

The teacher policy was constructed using lookahead control on a reward that combines the density-based learning from positive and negative demonstrations with rule-based terms for obstacle avoidance and goal seeking.

Student Policy

HR-RL (Human-Right, Robot-Left)

HL-RR (Human-Left, Robot-Right)

The privileged information leveraged by the teacher policy was distilled into a student policy, enabling deployment in real-world environments.

Uncertainty Analysis

Danger!
Epistemic Uncertainty: 0.741

Danger!
Epistemic Uncertainty: 0.524

Safe!
Epistemic Uncertainty: 0.360

Uncertainty analysis reveals that higher epistemic uncertainty consistently corresponds to risky interactions, enabling the policy to distinguish safe and risky situations.

🚘 Learning Social Navigation from Positive and Negative Demonstrations and Rule-Based Specifications

Abstract

Overview

Effect of Reward Design

Dataset Generation

Teacher Policy

HR-RL (Human-Right, Robot-Left)

HL-RR (Human-Left, Robot-Right)

Student Policy

HR-RL (Human-Right, Robot-Left)

HL-RR (Human-Left, Robot-Right)

Uncertainty Analysis

Real-World Demonstrations

Single Policy Across Diverse Scenarios

Scenario Involving Multiple Human Interactions