AI Research

RL-AV Smoothing: Boosting Autonomous Vehicle Planning Through Reinforcement Learning for Smoother Trajectories

Claude Directory December 29, 2025

0 views

Discover RL-AV Smoothing, a novel RL-based post-processing technique that refines jerky AV trajectories into smooth, safe paths while preserving planning intent. Achieve up to 50% jerk reduction on nuPlan benchmarks.

## The Challenge of Jerky Trajectories in Autonomous Driving Autonomous vehicles (AVs) rely on sophisticated trajectory planners to navigate complex urban environments safely and efficiently. However, a persistent issue plagues these systems: the output trajectories often exhibit abrupt changes in acceleration and velocity, resulting in high jerk metrics. Jerk, the third derivative of position, measures how abruptly a vehicle's speed changes, and excessive jerk can lead to passenger discomfort, mechanical wear, and even safety risks during real-world deployment. ### Why Do Current Planners Struggle? Most AV planners, such as those based on optimization or sampling methods, prioritize collision avoidance, goal-reaching, and dynamic constraints. Yet, they frequently produce piecewise-linear or low-order polynomial paths that introduce discontinuities at waypoints. Perception uncertainties— like noisy LiDAR detections or delayed sensor fusion—exacerbate this, forcing planners to make sharp corrections. For instance, consider a scenario where an AV must yield to a pedestrian crossing unexpectedly. A typical planner might generate a trajectory with sudden braking followed by an evasive swerve, spiking jerk values beyond human tolerance (typically under 2 m/s³). Real-world datasets like nuScenes and nuPlan reveal that even top-performing planners score poorly on smoothness metrics: average jerk often exceeds 10 m/s³, far from the smooth 1-2 m/s³ seen in human driving. Traditional post-processing fixes, such as spline fitting or low-pass filters, fall short. Splines can overshoot and cause collisions, while filters introduce latency, misaligning with dynamic obstacles. This gap highlights the need for a method that respects the original plan's intent—safety and progress—while enforcing smoothness. ## RL-AV Smoothing: A Reinforcement Learning Post-Processor To address these limitations, researchers at UC Berkeley introduced **RL-AV Smoothing**, a reinforcement learning (RL) agent designed as a lightweight post-processing step. Deployed after the primary planner, it ingests the raw planned trajectory and refines it into a smoother version without retraining the entire planning stack. ### Core Problem Formulation Given a planned trajectory $\\tau_p = (p_t, v_t, a_t)_{t=1}^T$ (positions, velocities, accelerations over horizon $T$), the RL policy $\\pi_\\theta$ outputs a smoothed trajectory $\\tau_s = (p'_t, v'_t, a'_t)_{t=1}^T$. The policy must satisfy: - **Smoothness**: Minimize jerk $j_t = \\frac{da_t}{dt}$ via $L_2$ penalties on jerk and snap (fourth derivative). - **Feasibility**: Enforce kinematic limits (max speed, accel) and dynamic constraints (e.g., steering angle). - **Safety**: Maintain collision-free paths w.r.t. map and agents. - **Intent Preservation**: Stay close to $\\tau_p$ in Fréchet distance and ensure similar goal progress. The RL agent is trained in a model predictive control (MPC) style, using imitation learning bootstrapping followed by RL fine-tuning. ### Training Pipeline 1. **Dataset Preparation**: Leverage large-scale driving datasets like [nuScenes](https://www.nuscenes.org/) and [nuPlan](https://www.nuplan.org/nuplan). Sample diverse scenarios: unprotected turns, merges, roundabouts. For each expert trajectory (from logged human driving), perturb it to simulate planner outputs, creating noisy $\\tau_p$ pairs. 2. **Imitation Learning Phase**: Train a behavior cloning (BC) policy to map noisy plans to smooth experts. Loss combines trajectory MSE and smoothness terms: ``` \\mathcal{L}_{BC} = \\sum_t ||\\tau_s - \\tau_e||^2 + \\lambda_j \\sum_t ||j_t||^2 ``` where $\\tau_e$ is the expert trajectory. 3. **RL Fine-Tuning**: Use soft actor-critic (SAC) with a reward function balancing multiple objectives: ``` r_t = w_s \\cdot (-\\sum ||j_t||^2) + w_c \\cdot (-\\text{collision cost}) + w_g \\cdot \\text{goal progress} + w_f \\cdot (-\\text{feasibility violations}) ``` Weights $w_*$ are tuned via hyperparameter search. The policy network is a lightweight MLP (3 layers, 256 units) conditioned on ego-state, map, agents, and $\\tau_p$. Training converges in ~10M steps on a single A100 GPU, thanks to the fixed-horizon formulation and dataset scale. The implementation is open-sourced at the [RL-AV Smoothing GitHub repository](https://github.com/rlav-smoothing/rlav-smoothing), including training scripts, pretrained models, and evaluation tools. Users can integrate it via: ```python import rlav_smoothing smoother = rlav_smoothing.Smoother(checkpoint='nuplan_pretrained.pth') tau_smooth = smoother.smooth(tau_planned, ego_state, scene_context) ``` ## Experimental Validation and Outcomes ### Benchmark Results on nuPlan Evaluated on nuPlan's full challenge set (10k scenarios), RL-AV Smoothing consistently outperforms baselines: | Method | Jerk ↓ (m/s³) | Accel ↓ (m/s²) | Collision Rate ↓ | Success Rate ↑ | |---------------------|---------------|----------------|------------------|---------------| | Raw Planner (Ours) | 12.5 | 3.2 | 2.1% | 78% | | Spline Fit | 8.1 | 2.5 | 5.3% | 72% | | Low-Pass Filter | 7.4 | 2.1 | 3.8% | 75% | | RL-AV (Ours) | **4.2** | **1.6** | **1.2%** | **82%** | Up to 66% jerk reduction, with minimal success drop. Ablations confirm RL's edge over pure BC (jerk: 5.8 m/s³). ### Real-World Transfer Sim-to-real experiments on Berkeley's vehicle fleet (using CARLA simulator bridge) show robust transfer. In closed-course tests mimicking nuPlan scenarios, smoothed trajectories reduced measured jerk by 45% per IMU data, with no interventions needed. Videos demonstrating before/after trajectories are available in the [GitHub repo](https://github.com/rlav-smoothing/rlav-smoothing), showcasing smoother lane changes and yielding behaviors. ### Scalability and Deployment Insights - **Latency**: Inference <10ms on Jetson Orin, suitable for 10Hz planning cycles. - **Modularity**: Planner-agnostic; plug-and-play with MPPI, Lattice, or neural planners. - **Extensions**: Future work explores online adaptation via meta-RL and multi-modal predictions. ## Practical Applications and Takeaways For AV developers, RL-AV Smoothing offers an actionable upgrade path: - **Ride-Hailing Fleets**: Prioritize comfort for 50%+ jerk cuts. - **Robotaxis**: Enhance safety margins in dense traffic. - **Simulation**: Accelerate data generation with realistic smooth experts. To get started: 1. Clone [the repo](https://github.com/rlav-smoothing/rlav-smoothing). 2. Install deps: `pip install -r requirements.txt`. 3. Run evals: `python eval_nuplan.py --model pretrained`. 4. Fine-tune on custom data for domain adaptation. This approach demonstrates RL's potential beyond end-to-end control, shining in targeted interventions. By decoupling smoothness from core planning, it paves the way for more human-like AV motion. --- <div style="text-align: center; margin-top: 2rem;"> <a href="https://bair.berkeley.edu/blog/2025/03/25/rl-av-smoothing/" target="_blank" rel="noopener noreferrer" class="view-full-resource-btn" style="display: inline-block; background-color: #f97316; color: white; padding: 12px 24px; border-radius: 8px; text-decoration: none; font-weight: 600; transition: background-color 0.2s;">View Full Resource</a> </div>

Comments

More Blog

View all

Data & Analysis

Model Predictive Control Fundamentals: Concepts, Math, and Python Implementation

Discover the essentials of Model Predictive Control (MPC), from its core principles and mathematical foundations to practical Python implementations for dynamic systems control.

Claude Directory

Data & Analysis

Overcoming GPU Limitations: Implementing FP8 Emulation in Software for Legacy Hardware

Discover how to run FP8-optimized AI models on older GPUs without native hardware support using a clever software emulation layer. Boost inference speeds dramatically on Turing-era cards like the RTX 2080.

Claude Directory

Data & Analysis

Hands-On Guide to Hugging Face Transformers: Supercharge Your NLP Projects with AI

Discover how Hugging Face's Transformers library makes advanced NLP accessible. From quick pipelines for sentiment analysis to fine-tuning models, build powerful AI apps effortlessly.

Claude Directory

Data & Analysis

Demystifying Matrix-Matrix Multiplication: Essential Concepts and Practical Insights

Dive deep into matrix-matrix multiplication, from fundamental row-column rules to efficient algorithms like Strassen's, with Python examples and real-world applications in data science.

Claude Directory

Data & Analysis

Demystifying Matrix Transpose: Your Ultimate Guide to A^T and Its Superpowers in Data Science

Dive into the exciting world of matrix transpose! Discover what A^T really means, master its properties, code it up in Python, and explore real-world applications that transform your data game.

Claude Directory

Data & Analysis

Empowering AI Agents to Build Other Agents: A Practical Guide to Meta-Agent Development

Discover how large language models like Claude can generate code for autonomous AI agents, streamlining development and enabling rapid iteration on complex tasks. This approach turns manual coding into an automated, scalable process.

Claude Directory

RL-AV Smoothing: Boosting Autonomous Vehicle Planning Through Reinforcement Learning for Smoother Trajectories

Tags

Comments

More Blog

Model Predictive Control Fundamentals: Concepts, Math, and Python Implementation

Overcoming GPU Limitations: Implementing FP8 Emulation in Software for Legacy Hardware

Hands-On Guide to Hugging Face Transformers: Supercharge Your NLP Projects with AI

Demystifying Matrix-Matrix Multiplication: Essential Concepts and Practical Insights

Demystifying Matrix Transpose: Your Ultimate Guide to A^T and Its Superpowers in Data Science

Empowering AI Agents to Build Other Agents: A Practical Guide to Meta-Agent Development