Outplaying elite table tennis players with an autonomous robot | Nature

Created 4/23/2026 at 12:08:39 AMEdited 4/23/2026 at 12:12:57 AM

A good video is at https://www.theguardian.com/science/2026/apr/22/ai-powered-robot-beats-elite-table-tennis-players-milestone-robotics

Here we present Ace, to our knowledge the first real-world autonomous system competitive with elite human table tennis players. Ace addresses the challenges of physical real-time interaction through a new, high-speed perception system using event-based vision sensors4, and a new control system based on model-free reinforcement learning, as well as state-of-the-art high-speed robot hardware. Evaluated in matches against elite and professional players under official competition rules, Ace achieved several victories and demonstrated consistent returns of high-speed, high-spin shots. These results highlight the potential of physical AI agents to perform complex, real-time interactive tasks, suggesting broader applications in domains requiring fast, precise human–robot interaction.

Ace is equipped with a new perception system using event-based vision sensors as well as a control system based on policies learnt using deep RL. In contrast to earlier approaches using RL for robot table tennis, the control policies used by Ace are learnt using an asymmetric actor–critic architecture27,28,29, and the actions produced by the policies exist in an abstract space that is then mapped to a hard constraint for a convex optimization problem. This setup allows learning of collision-free, agile motions, addressing the full challenge of human-competitive robot table tennis.

The perception system uses a combination of conventional APS cameras for ball triangulation and EVS cameras for ball angular velocity estimation to infer the current ball state at high frequencies. The ball state is then provided to two different control components, depending on whether Ace is serving or in a rally. When serving, the robot performs a single-arm serve from a library of serve motions that were found using a genetic algorithm.

During the rally, a fixed deep RL policy (π′) is queried at 31.25 Hz using the robot joint states and the ball position and spin histories. The policy is sampled during the match from a bank of policies trained to perform different skills.

The actions (a) produced by the policy are mapped to a 32-ms segment trajectory, and a corresponding reset trajectory is calculated. If the robot has yet to hit the ball and no collisions are predicted, then the segment trajectory is executed by the robot interface; otherwise, a reset trajectory is executed.

The training of all policies is performed entirely in simulation with custom physics models, noise models and data-driven distributions of the initial ball state. Training is performed asynchronously with multiple instances of the training environment. To aid in the learning process, the critic () is provided with the true ball state, whereas the policy (πi) is given a history of noisy sensor measurements.

Public