Direct and Expressive Type Inference for the Rank 2 Fragment of System F

Lushman, Bradley

Direct and Expressive Type Inference for the Rank 2 Fragment of System F

Files

thesis.pdf (840.08 KB)

Date

2007-09-20T21:44:07Z

Authors

Lushman, Bradley

Publisher

University of Waterloo

Abstract

This thesis develops a semiunification-based type inference procedure for the rank 2 fragment of System F, with an emphasis on practical considerations for the adoption of such a procedure into existing programming languages. Current semiunification-based rank 2 inference procedures (notably that of Kfoury and Wells) are limited in several ways, which hinder their use in real-world settings. First of all, the translation from an instance of the type inference problem to an instance of the semiunification problem (SUP) is indirect; in particular, because of a series of source-level transformations that take place before translation, the translation is not syntax-directed. As a result, type errors discovered during the semiunification process cannot be cleanly translated back to specific subexpressions of the source program that caused the error. Also, because the rank 2 fragment of System F lacks a principal types property, an inference procedure cannot output a single type that encompasses all of a given term's derivable types. The procedure must therefore either rely on user assistance to produce the right type, or simply choose arbitrarily one of the given term's possible types. The algorithm of Kfoury and Wells in particular makes degenerate type assumptions in the absence of user assistance, and consequently produces types that are of no practical use. We build up our system in stages; we begin by improving the SUP translation. Whereas termination for the Kfoury-Wells rank 2 inference procedure is assured by translating terms into instances of the acyclic semiunification problem (a decidable subset of SUP, which is undecidable in general), we formulate and target the R-acyclic semiunification problem---a larger decidable subset of SUP that facilitates a more concise translation from source terms. We next eliminate the source-level transformation of terms, in order to formulate a truly syntax-directed translation from a source term to a set of SUP-like constraints. In doing so, we find that even the full SUP itself is not sufficiently equipped to support such a translation. We formulate USUP, a superset of SUP that incorporates a new class of identifier we call an unknown. We formulate decidable subsets of USUP, and then formulate a truly syntax-directed translation from source terms into USUP, with guaranteed termination. Finally, to address the principal types problem, we introduce a notation for types in which we allow a particular class of variable to stand for type constructors, rather than ordinary types (an idea based on the so-called third-order lambda-calculus). We show that, with third-order types we can not only output large sets of useful types for a given term, without programmer assistance, but the types we output also generalize over more System F types than any type within System F itself can do, and still be a valid type for the source term. Thus, our system increases opportunities for separate compilation and code reuse beyond any existing system of which we are aware. Our system is sound, though incomplete in certain well-characterized ways, despite which our system performs exactly as one would hope on a variety of examples, which we illustrate in this thesis.