Analysis of Neural Networks with Physics Applications
Loading...
Date
Authors
Advisor
Yevick, David
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
This thesis investigates core aspects of machine learning, spanning foundational studies
on generalization phenomena in neural networks, novel architectural strategies for enhancing
representation learning and classification performance, and high-accuracy predictive
and inverse modeling of emerging nanoelectronic devices. Together, these studies highlight
the significance of data and model structure, the impact of nonlinearity, and the potential
of interpretable, generalizable machine learning methods for scientific and engineering
applications.
For generalization in neural networks, the thesis focuses on the phenomenon of grokking,
a delayed generalization effect where models initially overfit but eventually learn to generalize
well after extended training. Through a series of interconnected studies, this work
proposes insights and practical tools to diagnose, forecast, and enhance generalization
in modern machine learning systems. The first part of the thesis examines grokking in
modular arithmetic tasks, revealing how dropout-induced variance, embedding similarity,
activation sparsity, and weight entropy evolve across training, and hence introduces
diagnostic metrics to capture phase transitions between memorization and generalization.
Further analysis shows that nonlinearity, network depth, and symmetry in data collectively
modulate grokking behavior, linking model architecture to its capacity for structured generalization.
Next, the thesis introduces a Branched Variational Autoencoder (BVAE), a hybrid
architecture that integrates generative and discriminative objectives. By shaping latent
representations through a supervised branch, the BVAE achieves improved class separability
and interpretability on benchmark datasets, illustrating the potential of structured
latent shaping for semi-supervised learning.
Finally, the research extends to scientific machine learning, demonstrating how neural
and ensemble models as Random Forests can accelerate the modeling and inverse design
of Carbon Nanotube Tunnel Field-Effect Transistors (CNT TFETs). By coupling physical
insights with machine learning interpretability techniques, this work bridges the gap
between theoretical ML and real-world scientific applications.