OppropBERT: An Extensible Graph Neural Network and BERT-style Reinforcement Learning-based Type Inference System

Jha, Piyush

OppropBERT: An Extensible Graph Neural Network and BERT-style Reinforcement Learning-based Type Inference System

Files

Jha_Piyush.pdf (776.66 KB)

Date

2022-12-20

Authors

Jha, Piyush

Advisor

Dietl, Werner

Publisher

University of Waterloo

Abstract

Built-in type systems for statically-typed programming languages (e.g., Java) can only prevent rudimentary and domain-specific errors at compile time. They do not check for type errors in other domains, e.g., to prevent null pointer exceptions or enforce owner-as-modifier encapsulation discipline. Optional Properties (Opprop) or Pluggable type systems, e.g., Checker Framework, provide a framework where users can specify type rules and guarantee the particular property holds with the help of a type checker. However, manually inserting these user-defined type annotations for new and existing large projects requires a lot of human effort. Inference systems like Checker Framework Inference provide a constraint-based whole-program inference framework. However, to develop such a system, the developer must thoroughly understand the underlying framework (Checker Framework) and the accompanying helper methods, which is time-consuming. Furthermore, these frameworks make expensive calls to SAT and SMT solvers, which increases the runtime overhead during inference. The developers write test cases to ensure their framework covers all the possible type rules and works as expected. Our core idea is to leverage only these manually written test cases to create a Deep Learning model to learn the type rules implicitly using a data-driven approach. We present a novel model, OppropBERT, which takes as an input the raw code along with its Control Flow Graphs to predict the error heatmap or the type annotation. The pre-trained BERT-style Transformer model helps encode the code tokens without specifying the programming language's grammar including the type rules. Moreover, using a custom masked loss function, the Graph Convolutional Network better captures the Control Flow Graphs. Suppose a sound type checker is already provided, and the developer wants to create an inference framework. In that case, the model, as mentioned above, can be refined further using a Proximal Policy Optimization (PPO)-based reinforcement learning (RL) technique. The RL agent enables the model to use a more extensive set of publicly available code (not written by the developer) to create training data artificially. The RL feedback loop reduces the effort of manually creating additional test cases, leveraging the feedback from the type checker to predict the annotation better. Extensive and comprehensive experiments are performed to establish the efficacy of OppropBERT for nullness error prediction and annotation prediction tasks by comparing against state-of-the-art tools like Spotbugs, Eclipse, IntelliJ, and Checker Framework on publicly available Java projects. We also demonstrate the capability of zero and few-shot transfer learning to a new type system. Furthermore, to illustrate the model's extensibility, we evaluate the model for predicting type annotations in TypeScript and errors in Python by comparing it against the state-of-the-art models (e.g., BERT, CodeBERT, GraphCodeBERT, etc.) on standard benchmarks datasets (e.g., ManyTypes4TS).