A Novel Mathematical Framework for the Analysis of Neural Networks
Abstract
Over the past decade, Deep Neural Networks (DNNs) have become very popular models for processing large amounts of data because of their successful application in a wide variety of fields. These models are layered, often containing parametrized linear and nonlinear transformations at each layer in the network. At this point, however, we do not rigorously understand why DNNs are so effective. In this thesis, we explore one way to approach this problem: we develop a generic mathematical framework for representing neural networks, and demonstrate how this framework can be used to represent specific neural network architectures.
In chapter 1, we start by exploring mathematical contributions to neural networks. We can rigorously explain some properties of DNNs, but these results fail to fully describe the mechanics of a generic neural network. We also note that most approaches to describing neural networks rely upon breaking down the parameters and inputs into scalars, as opposed to referencing their underlying vector spaces, which adds some awkwardness into their analysis. Our framework strictly operates over these spaces, affording a more natural description of DNNs once the mathematical objects that we use are welldefined and understood.
We then develop the generic framework in chapter 3. We are able to describe an algorithm for calculating one step of gradient descent directly over the inner product space in which the parameters are defined. Also, we can represent the error backpropagation step in a concise and compact form. Besides a standard squared loss or crossentropy loss, we also demonstrate that our framework, including gradient calculation, extends to a more complex loss function involving the first derivative of the network.
After developing the generic framework, we apply it to three specific network examples in chapter 4. We start with the Multilayer Perceptron, the simplest type of DNN, and show how to generate a gradient descent step for it. We then represent the Convolutional Neural Network (CNN), which contains more complicated input spaces, parameter spaces, and transformations at each layer. The CNN, however, still fits into the generic framework. The last structure that we consider is the Deep AutoEncoder, which has parameters that are not completely independent at each layer. We are able to extend the generic framework to handle this case as well.
In chapter 5, we use some of the results from the previous chapters to develop a framework for Recurrent Neural Networks (RNNs), the sequenceparsing DNN architecture. The parameters are shared across all layers of the network, and thus we require some additional machinery to describe RNNs. We describe a generic RNN first, and then the specific case of the vanilla RNN. We again compute gradients directly over inner product spaces.
Collections
Cite this version of the work
Anthony Caterini
(2017).
A Novel Mathematical Framework for the Analysis of Neural Networks. UWSpace.
http://hdl.handle.net/10012/12173
Other formats
Related items
Showing items related by title, author, creator and subject.

A Mixed Signal 65nm CMOS Implementation of a Spiking Neural Network
Yan, Yangtian (University of Waterloo, 20220826)Spiking neural networks (SNNs) are an emerging class of biologically inspired Artificial Neural Networks implemented in machine learning and artificial intelligence. Current stateoftheart small and largescale SNNs ... 
Some Mathematical Perspectives of Graph Neural Networks
Nguyen, Duy (University of Waterloo, 20220512)Many realworld entities can be modelled as graphs, such as molecular structures, social networks, or images. Despite coming with such a great expressive power, the complex structure of graphs poses significant challenges ... 
Text Detection and Recognition in the Wild
Raisi, Zobeir (University of Waterloo, 20220719)Text detection and recognition (TDR) in highly structured environments with a clean background and consistent fonts (e.g., office documents, postal addresses and bank cheque) is a well understood problem (i.e., OCR), however ...