Algorithms and Computing Systems Series

From Natural Language to Extensive-Form Game Representations: Defining and Automatically Verifying Consistency

3^rd June 2026, 13:00 GH223
Rahul Savani
University of Liverpool

Abstract

Game description translation (GDT) converts natural language descriptions of strategic interactions into formal game-theoretic models. Historically, the evaluation of the correctness (“consistency”) of such a translation has relied on manual inspection by game theory experts. This manual process is time-consuming and error-prone. Moreover, it is subjective, due to the lack of a formal definition of consistency, which hinders reproducibility. To address these limitations, we formalize consistency, and develop a two-component framework for the automated evaluation of GDT correctness for extensive-form game models. The first component is a set of translation instances. Each instance comprises a natural language description, an example of a consistent translation (“reference game”) and further constraints. The second component is an automated checker. The checker works by first using the reference game and any relevant constraints to check that the set of reduced strategies in the reference game and candidate translation match. Then, if the candidate passes that check, the checker goes on to check any remaining constraints, such as on the relationships between payoffs in the game.

Relative to prior practice, the framework shifts the role of experts from repetitive manual evaluation to the one-time creation of translation instances. To support reproducible research on GDT, we provide an extensible instantiation of our framework: a curated dataset of extensive-form game translation instances that were created and verified by a group of game theory experts; and an implementation of automated consistency checkers for this dataset and future extensions of it. Using the dataset and checkers, we evaluate a range of translators, including a new translator that we designed for this study, and which demonstrates state-of-the-art performance. The evaluation demonstrates the ability of the framework to effectively assess the performance of translators.

Joint work with:
Yongzhao Wang, Shilong Deng, Daniel A. Kadnikov, Ed Chalstrey, Martin Gairing, Enbo Sun, and Theodore L. Turocy