GDC Festival of Gaming is part of the Informa Festivals Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

March 9-13, 2026
Moscone CenterSan Francisco, CA

Speakers

Christopher NulandTechnical Marketing ManagerRed Hat

This talk shows how we turn a suboptimal reinforcement learning exploration agent into a route seeking speedrunner by pairing PPO with a panel of model judges orchestrated by llm-d and served with vLLM. The agent plays normally while a narrow planner handles dialog and puzzle moments, then llm-d routes short clips to a vision judge, a state checker, and a rule judge to produce preferences that train a small reward model. The next PPO burst learns with shaped rewards, improving coverage, reliable dialog completion, and key speedrun metrics. We keep it practical for game developers with YAML first deployment on OpenShift AI using KServe and vLLM, plus a reusable kit of prompts, rubrics, and metrics. We will showcase Double Dragon and Zelda: Oracle of Seasons on the original Nintendo Gameboy, but the same pattern applies to any game for automated testing or optimal route discovery for speed running.

Presenting: