A Novel Mathematical Framework for the Analysis of Neural Networks

Caterini, Anthony

A Novel Mathematical Framework for the Analysis of Neural Networks

Files

caterini_anthony.pdf (948.04 KB)

Date

2017-08-22

Authors

Caterini, Anthony

Advisor

Chang, Dong Eui

Publisher

University of Waterloo

Abstract

Over the past decade, Deep Neural Networks (DNNs) have become very popular models for processing large amounts of data because of their successful application in a wide variety of fields. These models are layered, often containing parametrized linear and non-linear transformations at each layer in the network. At this point, however, we do not rigorously understand why DNNs are so effective. In this thesis, we explore one way to approach this problem: we develop a generic mathematical framework for representing neural networks, and demonstrate how this framework can be used to represent specific neural network architectures. In chapter 1, we start by exploring mathematical contributions to neural networks. We can rigorously explain some properties of DNNs, but these results fail to fully describe the mechanics of a generic neural network. We also note that most approaches to describing neural networks rely upon breaking down the parameters and inputs into scalars, as opposed to referencing their underlying vector spaces, which adds some awkwardness into their analysis. Our framework strictly operates over these spaces, affording a more natural description of DNNs once the mathematical objects that we use are well-defined and understood. We then develop the generic framework in chapter 3. We are able to describe an algorithm for calculating one step of gradient descent directly over the inner product space in which the parameters are defined. Also, we can represent the error backpropagation step in a concise and compact form. Besides a standard squared loss or cross-entropy loss, we also demonstrate that our framework, including gradient calculation, extends to a more complex loss function involving the first derivative of the network. After developing the generic framework, we apply it to three specific network examples in chapter 4. We start with the Multilayer Perceptron, the simplest type of DNN, and show how to generate a gradient descent step for it. We then represent the Convolutional Neural Network (CNN), which contains more complicated input spaces, parameter spaces, and transformations at each layer. The CNN, however, still fits into the generic framework. The last structure that we consider is the Deep Auto-Encoder, which has parameters that are not completely independent at each layer. We are able to extend the generic framework to handle this case as well. In chapter 5, we use some of the results from the previous chapters to develop a framework for Recurrent Neural Networks (RNNs), the sequence-parsing DNN architecture. The parameters are shared across all layers of the network, and thus we require some additional machinery to describe RNNs. We describe a generic RNN first, and then the specific case of the vanilla RNN. We again compute gradients directly over inner product spaces.