Policy Extraction via Online Q-Value Distillation
Loading...
Date
2019-08-27
Authors
Jhunjhunwala, Aman
Advisor
Czarnecki, Krzysztof
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Recently, deep neural networks have been capable of solving complex control tasks in certain challenging environments. However, these deep learning policies continue to be hard to interpret, explain and verify which limits their practical applicability. Decision Trees lend themselves well to explanation and verification tools but are not easy to train especially in an online fashion. The aim of this thesis is to explore online tree construction algorithms and demonstrate the technique and effectiveness of distilling reinforcement learning policies into a Bayesian tree structure. We introduce Q-BSP Trees and an Ordered Sequential Monte Carlo training algorithm that helps condense the Q-function from fully trained Deep Q-Networks into the tree structure. QBSP Forests generate partitioning rules that transparently reconstruct the value function for all possible states. It convincingly beats performance benchmarks provided by earlier policy distillation methods resulting in performance closest to the original Deep Learning policy.
Description
Keywords
LC Subject Headings
Neural networks (Computer science), Machine learning