## The Challenge of Jerky Trajectories in Autonomous Driving
Autonomous vehicles (AVs) rely on sophisticated trajectory planners to navigate complex urban environments safely and efficiently. However, a persistent issue plagues these systems: the output trajectories often exhibit abrupt changes in acceleration and velocity, resulting in high jerk metrics. Jerk, the third derivative of position, measures how abruptly a vehicle's speed changes, and excessive jerk can lead to passenger discomfort, mechanical wear, and even safety risks during real-world deployment.
### Why Do Current Planners Struggle?
Most AV planners, such as those based on optimization or sampling methods, prioritize collision avoidance, goal-reaching, and dynamic constraints. Yet, they frequently produce piecewise-linear or low-order polynomial paths that introduce discontinuities at waypoints. Perception uncertainties— like noisy LiDAR detections or delayed sensor fusion—exacerbate this, forcing planners to make sharp corrections.
For instance, consider a scenario where an AV must yield to a pedestrian crossing unexpectedly. A typical planner might generate a trajectory with sudden braking followed by an evasive swerve, spiking jerk values beyond human tolerance (typically under 2 m/s³). Real-world datasets like nuScenes and nuPlan reveal that even top-performing planners score poorly on smoothness metrics: average jerk often exceeds 10 m/s³, far from the smooth 1-2 m/s³ seen in human driving.
Traditional post-processing fixes, such as spline fitting or low-pass filters, fall short. Splines can overshoot and cause collisions, while filters introduce latency, misaligning with dynamic obstacles. This gap highlights the need for a method that respects the original plan's intent—safety and progress—while enforcing smoothness.
## RL-AV Smoothing: A Reinforcement Learning Post-Processor
To address these limitations, researchers at UC Berkeley introduced **RL-AV Smoothing**, a reinforcement learning (RL) agent designed as a lightweight post-processing step. Deployed after the primary planner, it ingests the raw planned trajectory and refines it into a smoother version without retraining the entire planning stack.
### Core Problem Formulation
Given a planned trajectory $\\tau_p = (p_t, v_t, a_t)_{t=1}^T$ (positions, velocities, accelerations over horizon $T$), the RL policy $\\pi_\\theta$ outputs a smoothed trajectory $\\tau_s = (p'_t, v'_t, a'_t)_{t=1}^T$. The policy must satisfy:
- **Smoothness**: Minimize jerk $j_t = \\frac{da_t}{dt}$ via $L_2$ penalties on jerk and snap (fourth derivative).
- **Feasibility**: Enforce kinematic limits (max speed, accel) and dynamic constraints (e.g., steering angle).
- **Safety**: Maintain collision-free paths w.r.t. map and agents.
- **Intent Preservation**: Stay close to $\\tau_p$ in Fréchet distance and ensure similar goal progress.
The RL agent is trained in a model predictive control (MPC) style, using imitation learning bootstrapping followed by RL fine-tuning.
### Training Pipeline
1. **Dataset Preparation**: Leverage large-scale driving datasets like [nuScenes](https://www.nuscenes.org/) and [nuPlan](https://www.nuplan.org/nuplan). Sample diverse scenarios: unprotected turns, merges, roundabouts. For each expert trajectory (from logged human driving), perturb it to simulate planner outputs, creating noisy $\\tau_p$ pairs.
2. **Imitation Learning Phase**: Train a behavior cloning (BC) policy to map noisy plans to smooth experts. Loss combines trajectory MSE and smoothness terms:
```
\\mathcal{L}_{BC} = \\sum_t ||\\tau_s - \\tau_e||^2 + \\lambda_j \\sum_t ||j_t||^2
```
where $\\tau_e$ is the expert trajectory.
3. **RL Fine-Tuning**: Use soft actor-critic (SAC) with a reward function balancing multiple objectives:
```
r_t = w_s \\cdot (-\\sum ||j_t||^2) + w_c \\cdot (-\\text{collision cost}) + w_g \\cdot \\text{goal progress} + w_f \\cdot (-\\text{feasibility violations})
```
Weights $w_*$ are tuned via hyperparameter search. The policy network is a lightweight MLP (3 layers, 256 units) conditioned on ego-state, map, agents, and $\\tau_p$.
Training converges in ~10M steps on a single A100 GPU, thanks to the fixed-horizon formulation and dataset scale.
The implementation is open-sourced at the [RL-AV Smoothing GitHub repository](https://github.com/rlav-smoothing/rlav-smoothing), including training scripts, pretrained models, and evaluation tools. Users can integrate it via:
```python
import rlav_smoothing
smoother = rlav_smoothing.Smoother(checkpoint='nuplan_pretrained.pth')
tau_smooth = smoother.smooth(tau_planned, ego_state, scene_context)
```
## Experimental Validation and Outcomes
### Benchmark Results on nuPlan
Evaluated on nuPlan's full challenge set (10k scenarios), RL-AV Smoothing consistently outperforms baselines:
| Method | Jerk ↓ (m/s³) | Accel ↓ (m/s²) | Collision Rate ↓ | Success Rate ↑ |
|---------------------|---------------|----------------|------------------|---------------|
| Raw Planner (Ours) | 12.5 | 3.2 | 2.1% | 78% |
| Spline Fit | 8.1 | 2.5 | 5.3% | 72% |
| Low-Pass Filter | 7.4 | 2.1 | 3.8% | 75% |
| RL-AV (Ours) | **4.2** | **1.6** | **1.2%** | **82%** |
Up to 66% jerk reduction, with minimal success drop. Ablations confirm RL's edge over pure BC (jerk: 5.8 m/s³).
### Real-World Transfer
Sim-to-real experiments on Berkeley's vehicle fleet (using CARLA simulator bridge) show robust transfer. In closed-course tests mimicking nuPlan scenarios, smoothed trajectories reduced measured jerk by 45% per IMU data, with no interventions needed.
Videos demonstrating before/after trajectories are available in the [GitHub repo](https://github.com/rlav-smoothing/rlav-smoothing), showcasing smoother lane changes and yielding behaviors.
### Scalability and Deployment Insights
- **Latency**: Inference <10ms on Jetson Orin, suitable for 10Hz planning cycles.
- **Modularity**: Planner-agnostic; plug-and-play with MPPI, Lattice, or neural planners.
- **Extensions**: Future work explores online adaptation via meta-RL and multi-modal predictions.
## Practical Applications and Takeaways
For AV developers, RL-AV Smoothing offers an actionable upgrade path:
- **Ride-Hailing Fleets**: Prioritize comfort for 50%+ jerk cuts.
- **Robotaxis**: Enhance safety margins in dense traffic.
- **Simulation**: Accelerate data generation with realistic smooth experts.
To get started:
1. Clone [the repo](https://github.com/rlav-smoothing/rlav-smoothing).
2. Install deps: `pip install -r requirements.txt`.
3. Run evals: `python eval_nuplan.py --model pretrained`.
4. Fine-tune on custom data for domain adaptation.
This approach demonstrates RL's potential beyond end-to-end control, shining in targeted interventions. By decoupling smoothness from core planning, it paves the way for more human-like AV motion.
---
<div style="text-align: center; margin-top: 2rem;">
<a href="https://bair.berkeley.edu/blog/2025/03/25/rl-av-smoothing/" target="_blank" rel="noopener noreferrer" class="view-full-resource-btn" style="display: inline-block; background-color: #f97316; color: white; padding: 12px 24px; border-radius: 8px; text-decoration: none; font-weight: 600; transition: background-color 0.2s;">View Full Resource</a>
</div>