The Bounds of Algorithmic Collusion: Q-learning, Gradient Learning, and the Folk Theorem
Loading...
Date
Advisor
Journal Title
Journal ISSN
Volume Title
Publisher
London School of Economics, University of Waterloo
Abstract
We explore the behaviour emerging from learning agents repeatedly interacting strategically for a wide range of learning dynamics, including Q-learning, projected gradient, replicator and log-barrier dynamics. Going beyond the better understood classes of potential games and zero-sum games, we consider the setting of a general repeated game with finite recall under different forms of monitoring. We obtain a Folk Theorem-style result and characterise the set of payoff vectors that can be obtained by these dynamics, discovering a wide range of possibilities for the emergence of algorithmic collusion. Achieving this requires a novel technical approach, which, to the best of our knowledge, yields the first convergence result for multi-agent Q-learning algorithms in repeated games.