|Humans and other animals have an impressive ability to quickly adapt to unfamiliar environments, with only minimal feedback. Computational models have been able to provide intriguing insight into these processes, by making connections between abstract computational theories of reinforcement learning (RL) and neurophysiological data. However, the ability of these models falls well below the level of real neural systems, thus it is clear that there are important aspects of the neural computation not being captured by our models.
In this work we explore how new developments from the computational study of RL can be expanded to the realm of neural modelling. Specifically, we examine the field of hierarchical reinforcement learning (HRL), which extends RL by dividing the RL process into a hierarchy of actions, where higher level decisions guide the choices made at lower levels. The advantages of HRL have been demonstrated from a computational perspective, but HRL has never been implemented in a neural model. Thus it is unclear whether HRL is a purely abstract theory, or whether it could help explain the RL ability of real brains.
Here we show that all the major components of HRL can be implemented in an integrated, biologically plausible neural model. The core of this system is a model of ``flat'' RL that implements the processing of a single layer. This includes computing action values given the current state, selecting an output action based on those values, computing a temporal difference error based on the result of that action, and using that error to update the action values. We then show how the design of this system allows multiple layers to be combined hierarchically, where inputs are received from higher layers and outputs delivered to lower layers. We also provide a detailed neuroanatomical mapping, showing how the components of the model fit within known neuroanatomical structures.
We demonstrate the performance of the model in a range of different environments, in order to emphasize the aim of understanding the brain's general, flexible reinforcement learning ability. These results show that the model compares well to previous modelling work and demonstrates improved performance as a result of its hierarchical ability. We also show that the model's output is consistent with available data on human hierarchical RL. Thus we believe that this work, as the first biologically plausible neural model of HRL, brings us closer to understanding the full range of RL processing in real neural systems.
We conclude with a discussion of the design decisions made throughout the course of this work, as well as some of the most important avenues for the model's future development. Two of the most critical of these are the incorporation of model-based reasoning and the autonomous development of hierarchical structure, both of which are important aspects of the full HRL process that are absent in the current model. We also discuss some of the predictions that arise from this model, and how they might be tested experimentally.