Automated Knowledge Discovery using Neural Networks

Panju, Maysum

Automated Knowledge Discovery using Neural Networks

Files

Panju_Maysum.pdf (3.6 MB)

Date

2021-05-21

Authors

Panju, Maysum

Advisor

Ghodsi, Ali

Publisher

University of Waterloo

Abstract

The natural world is known to consistently abide by scientific laws that can be expressed concisely in mathematical terms, including differential equations. To understand the patterns that define these scientific laws, it is necessary to discover and solve these mathematical problems after making observations and collecting data on natural phenomena. While artificial neural networks are powerful black-box tools for automating tasks related to intelligence, the solutions we seek are related to the concise and interpretable form of symbolic mathematics. In this work, we focus on the idea of a symbolic function learner, or SFL. A symbolic function learner can be any algorithm that is able to produce a symbolic mathematical expression that aims to optimize a given objective function. By choosing different objective functions, the SFL can be tuned to handle different learning tasks. We present a model for an SFL that is based on neural networks and can be trained using deep learning. We then use this SFL to approach the computational task of automating discovery of scientific knowledge in three ways. We first apply our symbolic function learner as a tool for symbolic regression, a curve-fitting problem that has traditionally been approached using genetic evolution algorithms. We show that our SFL performs competitively in comparison to genetic algorithms and neural network regressors on a sample collection of regression instances. We also reframe the problem of learning differential equations as a task in symbolic regression, and use our SFL to rediscover some equations from classical physics from data. We next present a machine-learning based method for solving differential equations symbolically. When neural networks are used to solve differential equations, they usually produce solutions in the form of black-box functions that are not directly mathematically interpretable. We introduce a method for generating symbolic expressions to solve differential equations while leveraging deep learning training methods. Unlike existing methods, our system does not require learning a language model over symbolic mathematics, making it scalable, compact, and easily adaptable for a variety of tasks and configurations. The system is designed to always return a valid symbolic formula, generating a useful approximation when an exact analytic solution to a differential equation is not or cannot be found. We demonstrate through examples the way our method can be applied on a number of differential equations that are rooted in the natural sciences, often obtaining symbolic approximations that are useful or insightful. Furthermore, we show how the system can be effortlessly generalized to find symbolic solutions to other mathematical tasks, including integration and functional equations. We then introduce a novel method for discovering implicit relationships between variables in structured datasets in an unsupervised way. Rather than explicitly designating a causal relationship between input and output variables, our method finds mathematical relationships between variables without treating any variable as distinguished from any other. As a result, properties about the data itself can be discovered, rather than rules for predicting one variable from the others. We showcase examples of our method in the domain of geometry, demonstrating how we can re-discover famous geometric identities automatically from artificially generated data. In total, this thesis aims to strengthen the connection between neural networks and problems in symbolic mathematics. Our proposed SFL is the main tool that we show can be applied to a variety of tasks, including but not limited to symbolic regression. We show how using this approach to symbolic function learning paves the way for future developments in automated scientific knowledge discovery.