Some Continuous-Time Reinforcement Learning Problems in Finance

Chen, Yuling

Some Continuous-Time Reinforcement Learning Problems in Finance

dc.contributor.author	Chen, Yuling
dc.date.accessioned	2026-04-27T14:37:29Z
dc.date.available	2026-04-27T14:37:29Z
dc.date.issued	2026-04-27
dc.date.submitted	2026-04-17
dc.description.abstract	Reinforcement Learning (RL) is an emerging technique to solve for optimal policies of sequential decision-making problems, particularly when the parameters of the dynamic environment are unknown. Classical RL is framed under a discrete-time setting, often modeled on a Markov Decision Process. While recent RL literature has extended their problems to continuous-time, the lack of mathematical foundation limits their success to the empirical level. On the other hand, while classic stochastic control provides theoretically optimal solutions to continuous-time sequential decision-making problems, it relies on the model assumptions and the knowledge of true model parameters of the dynamic environment. Therefore, recent stochastic control literature has shown a growing interest in applying RL techniques to classical control problems and formulating traditional RL methodologies into rigorous stochastic control frameworks. By randomizing the control to a probability distribution, RL facilitates an informed exploration within the control space, allowing the decision-maker to seek for the optimal strategy without the full knowledge of the model of the dynamic environment. We refer to this RL-facilitated control randomization as an exploratory extension to the stochastic control problems, leading to an exploratory control framework. In light of the wide applications of continuous-time RL and stochastic control, this thesis studies four classical stochastic control problems under the exploratory control framework, which is motivated by three mainstream finance problems — the mean-variance portfolio optimization problem (Chapters 2 and 3), the optimal investment-consumption problem (Chapter 4), and the optimal stopping problem with an application in exercising American-type options (Chapter 5). Chapter 2 studies the mean-variance portfolio optimization problem in a regime-switching market. By adopting the Lagrangian dual of the mean-variance objective, we transform the mean-variance problem to a time-consistent control problem and solve for a precommitted Gaussian investment policy that is globally optimal at the initial time. Whereas, Chapter 3 revisits the mean-variance portfolio optimization problem in a market with jumps and directly optimizes the original mean-variance objective, where time inconsistency arises. As a result, we solve for an equilibrium Gaussian investment policy that is locally optimal in a game-theoretical equilibrium sense. Chapter 4 addresses Merton’s optimal investment-consumption problem with non-exponential discounting, which is a time-inconsistent dual-control problem. We solve for an equilibrium investment-consumption policy, as a joint distribution over the investment and consumption space, and further show that it can be expressed as the product of a Gaussian investment policy and a Gamma consumption policy. Finally, Chapter 5 tackles the optimal stopping problem and accounts for a general (not necessarily exponential) discounting function. With the binary character of the control space, stop and continue, we solve for an equilibrium Bernoulli stopping policy, which represents the probability of stopping. We also examine the effect of exponential and non-exponential discounting on the optimal stopping policies and the stopping regions. For each problem, we solve for the optimal policy and the corresponding value functions, analytically or semi-analytically. We provide verification theorems and policy improvement (or policy iteration) theorems, as well as supplementary lemmas and propositions, which serve as the theoretical foundation of the subsequent analysis. Beyond these, we also design an RL algorithm for each problem. We derive effective loss functions that apply the martingale properties of the theoretical solutions. With proper RL model configurations, our algorithms have empirically shown their capability of learning the optimal policies in our numerical studies with simulated and real market data.
dc.identifier.uri	https://hdl.handle.net/10012/23055
dc.language.iso	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	reinforcement learning
dc.subject	stochastic control
dc.subject	quantitative finance
dc.title	Some Continuous-Time Reinforcement Learning Problems in Finance
dc.type	Doctoral Thesis
uws-etd.degree	Doctor of Philosophy
uws-etd.degree.department	Statistics and Actuarial Science
uws-etd.degree.discipline	Statistics
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0
uws.contributor.advisor	Li, Bin
uws.contributor.advisor	Saunders, David
uws.contributor.affiliation1	Faculty of Mathematics
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Chen_Yuling.pdf
Size:: 1.66 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Statistics and Actuarial Science