HandelBot

Abstract

Mastering dexterous manipulation with multi-fingered hands has been a grand challenge in robotics for decades. Despite its potential, the difficulty of collecting high-quality data remains a primary bottleneck for high-precision tasks. While reinforcement learning and simulation-to-real-world transfer offer a promising alternative, the transferred policies often fail for tasks demanding millimeter-scale precision, such as bimanual piano playing. In this work, we introduce Hand-elBot (inspired by composer Handel), a framework that combines a simulation policy and rapid adaptation through a two-stage pipeline. Starting from a simulation-trained policy, we first apply a structured refinement stage to correct spatial alignments by adjusting lateral finger joints based on physical rollouts. Next, we use residual reinforcement learning to autonomously learn fine-grained corrective actions. Through extensive hardware experiments across five recognized songs, we demonstrate that HandelBot can successfully perform precise bimanual piano playing. Our system outperforms direct simulation deployment by a factor of 1.8x and requires only 30 minutes of physical interaction data.

Method

Piano-playing policies trained in simulation achieve strong performance in the simulated environment but only limited success on real hardware.

RL in Sim: To bridge the sim-to-real gap, HandelBot extracts a coarse base policy, π_sim, from which we extract an open-loop rollout, τ_sim.
Policy Refinement: Next, we refine τ_sim, yielding τ*_sim. We use real-world updates to iteratively update the lateral joints of the fingers, moving the finger horizontally in the direction of the keys it is intended to press.
Residual RL: Finally, we perform residual RL atop τ*_sim, using the keyboard's MIDI output as reward. This allows us to further update our policy for better piano playing.

Real-World Piano Videos

We evaluate HandelBot with a suite of 5 songs. Below, we include an mp3 of the song, and videos of HandelBot and baselines.

Twinkle Twinkle

Ode to Joy

Hot Cross Buns

Prelude in C

Fur Elise

Results

HandelBot demonstrates significant improvements over baseline methods, achieving high success rates in precise bimanual piano playing. Methods not utilizing real-data (π closed-loop, π open-loop) are unable to effectively adapt to real-world piano playing.

We visualize HandelBot trajectories for our 4 evaluation songs. Per each song, we visualize the notes pressed correctly, pressed incorrectly, and missed. The x-axis is the timestep of the song, and the y-axis are the different notes, with the top half representing keys for the right hand, and the bottom for the left hand. Across easier songs such as Twinkle Twinkle and Ode to Joy, we find that HandelBot makes few mistakes, with occasional timing errors or wrong presses. For harder songs such as Fur Elise, large jumps in the left hand notes (bottom section of each song plot) are challenging for the left hand.

We include 5 evaluation trajectories during HandelBot training. Across these 4 trajectories, we see that HandelBot initially struggles with many keys in the left hand. However, with real-world interactions, the residual policy is able to adapt to real world and press the correct keys.

Citation


@misc{xie2026handelbotrealworldpianoplaying,
      title={HandelBot: Real-World Piano Playing via Fast Adaptation of Dexterous Robot Policies}, 
      author={Amber Xie and Haozhi Qi and Dorsa Sadigh},
      year={2026},
      eprint={2603.12243},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2603.12243}, 
}

HandelBot: Real-World Piano Playing via
Fast Adaptation of Dexterous Robot Policies

Amber Xie

Stanford

Haozhi Qi

Amazon FAR (Frontier AI & Robotics)

Dorsa Sadigh

Stanford

Abstract

Citation