Some Continuous-Time Reinforcement Learning Problems in Finance
| dc.contributor.author | Chen, Yuling | |
| dc.date.accessioned | 2026-04-27T14:37:29Z | |
| dc.date.available | 2026-04-27T14:37:29Z | |
| dc.date.issued | 2026-04-27 | |
| dc.date.submitted | 2026-04-17 | |
| dc.description.abstract | Reinforcement Learning (RL) is an emerging technique to solve for optimal policies of sequential decision-making problems, particularly when the parameters of the dynamic environment are unknown. Classical RL is framed under a discrete-time setting, often modeled on a Markov Decision Process. While recent RL literature has extended their problems to continuous-time, the lack of mathematical foundation limits their success to the empirical level. On the other hand, while classic stochastic control provides theoretically optimal solutions to continuous-time sequential decision-making problems, it relies on the model assumptions and the knowledge of true model parameters of the dynamic environment. Therefore, recent stochastic control literature has shown a growing interest in applying RL techniques to classical control problems and formulating traditional RL methodologies into rigorous stochastic control frameworks. By randomizing the control to a probability distribution, RL facilitates an informed exploration within the control space, allowing the decision-maker to seek for the optimal strategy without the full knowledge of the model of the dynamic environment. We refer to this RL-facilitated control randomization as an exploratory extension to the stochastic control problems, leading to an exploratory control framework. In light of the wide applications of continuous-time RL and stochastic control, this thesis studies four classical stochastic control problems under the exploratory control framework, which is motivated by three mainstream finance problems — the mean-variance portfolio optimization problem (Chapters 2 and 3), the optimal investment-consumption problem (Chapter 4), and the optimal stopping problem with an application in exercising American-type options (Chapter 5). Chapter 2 studies the mean-variance portfolio optimization problem in a regime-switching market. By adopting the Lagrangian dual of the mean-variance objective, we transform the mean-variance problem to a time-consistent control problem and solve for a precommitted Gaussian investment policy that is globally optimal at the initial time. Whereas, Chapter 3 revisits the mean-variance portfolio optimization problem in a market with jumps and directly optimizes the original mean-variance objective, where time inconsistency arises. As a result, we solve for an equilibrium Gaussian investment policy that is locally optimal in a game-theoretical equilibrium sense. Chapter 4 addresses Merton’s optimal investment-consumption problem with non-exponential discounting, which is a time-inconsistent dual-control problem. We solve for an equilibrium investment-consumption policy, as a joint distribution over the investment and consumption space, and further show that it can be expressed as the product of a Gaussian investment policy and a Gamma consumption policy. Finally, Chapter 5 tackles the optimal stopping problem and accounts for a general (not necessarily exponential) discounting function. With the binary character of the control space, stop and continue, we solve for an equilibrium Bernoulli stopping policy, which represents the probability of stopping. We also examine the effect of exponential and non-exponential discounting on the optimal stopping policies and the stopping regions. For each problem, we solve for the optimal policy and the corresponding value functions, analytically or semi-analytically. We provide verification theorems and policy improvement (or policy iteration) theorems, as well as supplementary lemmas and propositions, which serve as the theoretical foundation of the subsequent analysis. Beyond these, we also design an RL algorithm for each problem. We derive effective loss functions that apply the martingale properties of the theoretical solutions. With proper RL model configurations, our algorithms have empirically shown their capability of learning the optimal policies in our numerical studies with simulated and real market data. | |
| dc.identifier.uri | https://hdl.handle.net/10012/23055 | |
| dc.language.iso | en | |
| dc.pending | false | |
| dc.publisher | University of Waterloo | en |
| dc.subject | reinforcement learning | |
| dc.subject | stochastic control | |
| dc.subject | quantitative finance | |
| dc.title | Some Continuous-Time Reinforcement Learning Problems in Finance | |
| dc.type | Doctoral Thesis | |
| uws-etd.degree | Doctor of Philosophy | |
| uws-etd.degree.department | Statistics and Actuarial Science | |
| uws-etd.degree.discipline | Statistics | |
| uws-etd.degree.grantor | University of Waterloo | en |
| uws-etd.embargo.terms | 0 | |
| uws.contributor.advisor | Li, Bin | |
| uws.contributor.advisor | Saunders, David | |
| uws.contributor.affiliation1 | Faculty of Mathematics | |
| uws.peerReviewStatus | Unreviewed | en |
| uws.published.city | Waterloo | en |
| uws.published.country | Canada | en |
| uws.published.province | Ontario | en |
| uws.scholarLevel | Graduate | en |
| uws.typeOfResource | Text | en |