AlphaSMT: A Reinforcement Learning Guided SMT Solver
MetadataShow full item record
Satisfiability Modulo Theories (SMT) solvers are programs that decide whether a first-order logic formula is satisfiable. Over the last two decades, these solvers have become central to many methods and tools in fields as diverse as software engineering, verification, security, and Artificial Intelligence. Most modern SMT solvers provide user-controllable strategies, i.e., users can define a strategy that customizes a solving algorithm for their own problems by combining individual tactics as building blocks. A tactic is a well-defined and implemented reasoning step provided by the SMT solver, which either simplifies, trans- forms, or solves the given input SMT formula. The flexibility of customizing a strategy to a specialized type of formula is important since no existing strategy is considered optimal for all instances. However, finding a good customized strategy is challenging even for experts. In this thesis we present a novel class of reinforcement-learning (RL) guided methods, implemented in the Z3 SMT solver and that we refer to as AlphaSMT, which adaptively constructs the expected best strategy for any given input SMT formula. Briefly, the AlphaSMT RL framework combines deep Monte-Carlo Tree Search (MCTS) and logical reasoning in a unique way in order to enable the RL agent to learn the best combination of tactics for a given class of formulas. In more detail, a deep neural network serves as both the value function and the policy, evaluating state-action pairs and making the decision of which tactic to choose at each step. The neural network is trained toward the optimal policy by learning from self-exploring sampling solving processes. MCTS is used as a lookahead planning step for each decision made in the sampling processes. We evaluated the performance of AlphaSMT on benchmark sets from three SMT logics, namely, quantifier-free non-linear real arithmetic (QF NRA), quantifier-free non-linear integer arithmetic (QF NIA), and quantifier-free bit-vector (QF BV). In all these logics, AlphaSMT outperforms its base solver, Z3, by solving up to 80.5% more instances in a testing set. Evaluation results also show that a reasonably longer tactic timeout helps solve more instances and a pre-solver contributes significantly to the speedup.
Cite this version of the work
Zhengyang Lu (2023). AlphaSMT: A Reinforcement Learning Guided SMT Solver. UWSpace. http://hdl.handle.net/10012/19562