Mathematics (Faculty of)
http://hdl.handle.net/10012/9924
Mon, 12 Apr 2021 04:08:43 GMT
20210412T04:08:43Z

Simple Termination Criteria for Stochastic Gradient Descent Algorithm
http://hdl.handle.net/10012/16872
Simple Termination Criteria for Stochastic Gradient Descent Algorithm
Baghal, Sina
Stochastic gradient descent (SGD) algorithm is widely used in modern mathematical optimization. Because of its scalability and ease of implementation, SGD is usually preferred to other methods including the gradient descent algorithm in the large scale optimization. Similar to other iterative methods, SGD also needs to be employed in conjunction with a strategy to terminate the algorithm in order to prevent a phenomenon called overfitting. As overfitting is prevalent in supervised machine learning and noisy optimization problems, developing simple and practical termination criteria is therefor important. This thesis focuses on developing simple termination criteria for SGD for two fundamental problems: binary linear classification and least squares deconvolution.
In the binary linear classification problem, we introduce a new and simple termination criterion for SGD applied to binary classification using logistic regression and hinge loss with constant stepsize $\alpha>0$. Precisely, we terminate the algorithm once the margin is at least to 1. Namely,
$$
\text{Terminate when }(2y_{k+1}1)\zeta_{k+1}^T\theta_k\geq 1
$$
where $\theta_k$ is the current iterate of SGD and $(\zeta_{k+1},y_{k+1})$ is the sampled data point at the next iteration of SGD. Notably, our proposed criterion adds no additional computational cost to the SGD algorithm. We analyze the behavior of the classifier at termination, where we sample from a normal distribution with unknown means $\mu_0,\mu_1\in \mathbb{R}^d$ and variances $\sigma^2I_d$. Here $\sigma>0$ and $I_d$ is the $d \times d$ identity matrix. As such, we make no assumptions on the separability of the data set. When the variance is not too large, we have the following results:
\begin{enumerate}
\item The test will be activated for any fixed positive stepsize. In particular, we establish an upper bound for the expected number of iterations before the activation occurs. This upper bound tends to a numeric constant when $\sigma$ converges to zero. In fact, we show that the expected time until termination decreases linearly as the data becomes more separable (\textit{i.e.}, as the noise $\sigma \to 0$).
\item We prove that the accuracy of the classifier at termination nearly matches the accuracy of an optimal classifier. Accuracy is the fraction of predictions that a classification model got right while an optimal classifier minimizes the probability of misclassification when the sample is drawn from the same distribution as the training data.
\end{enumerate}
When the variance is large, we show that the test will be activated for a sufficiently small stepsize. Finally, we empirically evaluate the performance of our termination criterion versus a baseline competitor. We compare performances on both synthetic (Gaussian and heavytailed $t$distribution) as well as real data sets (MNIST and CIFAR10 ). In our experiments, we observe that our test yields relatively accurate classifiers with small variation across multiple runs.
The termination criteria for SGD for the least squares deconvolution problem has not been studied in the previous literature. In this thesis, we study the SGD algorithm with a fixed step size $\alpha$ applied to the least square deconvolution problem. We adopt the setting wherein the blurred image is contaminated with a Gaussian white noise. Under this model, we first demonstrate a novel concentration inequality which shows that for small enough step size $\alpha$, the SGD path should follow the gradient flow trajectory with overwhelming probability. Inspired by numerical observation, we propose a new termination criterion for SGD for the least squares deconvolution. As a first step towards developing theoretical guarantees for our termination criterion, we provide an upper bound for the $\ell_2$error term for the iterate at termination when the gradient descent algorithm is considered. We postpone a full analysis of our termination criterion to future work.
Fri, 09 Apr 2021 00:00:00 GMT
http://hdl.handle.net/10012/16872
20210409T00:00:00Z

Computational and Theoretical Insights into MultiBody Quantum Systems
http://hdl.handle.net/10012/16866
Computational and Theoretical Insights into MultiBody Quantum Systems
Stasiuk, Andrew
In generality, perfect predictions of the structure and dynamics of multibody quantum systems are few and far between. As experimental design advances and becomes more refined, experimentally probing the interactions of multiple quantum systems has become commonplace. Predicting this behavior is not a ``one size fits all" problem, and has lead to the inception of a multitude of successful theoretical techniques which have made precise and verifiable predictions through, in many cases, clever approximations and assumptions. As the stateoftheart pushes the quantum frontier to new experimental regimes, many of the old techniques become invalid, and there is often no tractable methodology to fall back on.
This work focuses on expanding the theoretical techniques for making predictions in newly accessible experimental regimes. The transport of quantum information in a roomtemperature dipolar spin network is veritably diffusive in nature, but much less is known about the transport properties of such a sample at low temperatures. This work presupposes that diffusion is still a good model for incoherent transport at low temperatures, and proposes a new method to calculate its diffusion coefficient. The diffusion coefficient is reported as a function of the temperature of the ensemble. Further, the interaction of an i.i.d. spin ensemble with a quantized electromagnetic field has long been analyzed via restriction to the Dicke subspace implicit in the HolsteinPrimakoff approximation, as well as other within other approximations. This work reanalyzes the conditions under which such a restriction is valid. In regimes where it is shownt that restricting to the Dicke subspace would be invalid, the Hamiltonian structure is thoroughly analyzed. Various predictions can be made by appealing to a reduction in effective dimensionality via a direct sum decomposition.
The main theme of the techniques utilized throughout this work is to appeal to a reduction in difficulty via various theoretical tools in order to prepare for an otherwise intractable computational analysis. Computational insights due to this technique have then gone on to motivate directly provable theoretical results, which might otherwise have remained hidden behind the complexity of the structure and dynamics of a multibody quantum system.
Wed, 31 Mar 2021 00:00:00 GMT
http://hdl.handle.net/10012/16866
20210331T00:00:00Z

EnergyEfficient Transaction Scheduling in Data Systems
http://hdl.handle.net/10012/16864
EnergyEfficient Transaction Scheduling in Data Systems
Korkmaz, Mustafa
Natural short term fluctuations in the load of transactional data systems present an opportunity for power savings. For example, a system handling 1000 requests per second on average can expect more than 1000 requests in some seconds, fewer in others. By quickly adjusting processing capacity to match such fluctuations, power consumption can be reduced. Many systems do this already, using dynamic voltage and frequency scaling (DVFS) to reduce processor performance and power consumption when the load is low.
DVFS is typically controlled by frequency governors in the operating system or by the processor itself. The work presented in this dissertation shows that transactional data systems can manage DVFS more effectively than the underlying operating system. This is because data systems have more information about the workload, and more control over that workload, than is available to the operating system.
Our goal is to minimize power consumption while ensuring that transaction requests meet specified latency targets. We present energyefficient scheduling algorithms and systems that manage CPU power consumption and performance within data systems. These algorithms are workloadaware and can accommodate concurrent workloads with different characteristics and latency budgets.
The first technique we present is called POLARIS. It directly manages processor DVFS and controls database transaction scheduling. We show that POLARIS can simultaneously reduce power consumption and reduce missed latency targets, relative to operatingsystembased DVFS governors.
Second, we present PLASM, an energyefficient scheduler that generalizes POLARIS to support multicore, multiprocessor systems. PLASM controls the distribution of requests to the processors, and it employs POLARIS to manage power consumption locally at each core. We show that PLASM can save power and reduce missed latency targets compared to generic routing techniques such as roundrobin.
Tue, 30 Mar 2021 00:00:00 GMT
http://hdl.handle.net/10012/16864
20210330T00:00:00Z

“Wait and see” vaccinating behavior during a pandemic: a game theoretic analysis
http://hdl.handle.net/10012/16861
“Wait and see” vaccinating behavior during a pandemic: a game theoretic analysis
Bhattacharyya, Samit; Bauch, Chris T.
During the 2009 H1N1 pandemic, many individuals adopted a “wait and see” approach to vaccinating until further information was available on the course of the pandemic and emerging vaccine risks. This behaviour implies two sources of strategic interactions between individuals: both perceived vaccine risk and the probability of becoming infected decline as more individuals become vaccinated. Here we analyze the outcome of these two strategic interactions by combining game theory with a mathematical model of disease transmission during an outbreak of a novel influenza strain. We include a case where perceived vaccine risk declines according to the cumulative number of individuals vaccinated. A common Nash equilibrium strategy exhibited by this model is a “wait and see” strategy where some individuals delay the decision to vaccinate, relying on the herd immunity provided by early vaccinators who also act as “guinea pigs” that validate the safety of the vaccine. The occurrence of “wait and see” strategies leads to a higher disease burden than occurs under socially optimal vaccine coverage. The model also exhibits both feedback and feedforward processes. Feedback takes the form of individuals adjusting their vaccinating behaviour to accommodate changing transmissibility or risk parameters. Among other effects, this causes in the epidemic peak to occur at approximately the same time across a broad range of R0 values. Feedforward takes the form of high initial perceived vaccine risk perpetuating high perceived vaccine risks (and lower vaccine coverage) throughout the remainder of the outbreak, when perceived risk declines with the cumulative number vaccinated. This suggests that any effect of risk communication efforts at the start of a pandemic outbreak will be amplified compared to the same level of risk communication effort distributed throughout the outbreak, since any reductions in initial perceived risk will also result in reduced perceived risk throughout the outbreak.
The final publication is available at Elsevier via http://dx.doi.org/10.1016/j.vaccine.2011.05.028. © 2011. This manuscript version is made available under the CCBYNCND 4.0 license http://creativecommons.org/licenses/byncnd/4.0/
Tue, 26 Jul 2011 00:00:00 GMT
http://hdl.handle.net/10012/16861
20110726T00:00:00Z