Orok, Gavin2024-05-242024-05-242024-05-242024-05-22http://hdl.handle.net/10012/20596Machine learning involves many challenging integrals that can be estimated using numerical methods. One application of these methods which has been explored in recent work is the estimation of policy gradients for reinforcement learning. They found that for many standard continuous control problems, the numerical methods randomized Quasi-Monte Carlo (RQMC) and Array-RQMC that used low-discrepancy point sets improved the efficiency of both policy evaluation and policy gradient-based policy iteration compared to standard Monte Carlo (MC). We extend this work by investigating the application of these numerical methods to model-free reinforcement learning algorithms in portfolio optimization, which are of interest because they do not rely on complex model assumptions that pose difficulties to other analytical methods. We find that RQMC significantly outperforms MC under all conditions for policy evaluation and that Array-RQMC outperforms both MC and RQMC in policy iteration with a strategic choice of the reordering function.enreinforcement learningnumerical methodsquasi-Monte Carloportfolio optimizationcontinuous controlOptimization of Policy Evaluation and Policy Improvement Methods in Portfolio Optimization using Quasi-Monte Carlo MethodsMaster Thesis