Theory and Results on Restarting Schemes for Accelerated First Order Methods

Loading...
Thumbnail Image

Date

2024-09-10

Advisor

Vavasis, Stephen
Moursi, Walaa

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Composite convex optimization problems are abundant in industry, and first order methods to solve them are growing in popularity as the size of variables reaches billions. Since the objective function could be possibly non-smooth, proximal gradient methods are one of the main tools for these problems. These methods benefit from acceleration, which uses the memory of past iterates to add momentum to the algorithms. Such methods have a O(1/k^2) convergence rate in terms of function value where k is the iteration number. Restarting algorithms has been seen to help speed up algorithms. O'Donoghue and Candes introduced adaptive restart strategies for accelerated first order methods which rely on easy to compute conditions, and indicate a large performance boost in terms of convergence. The restart works by resetting the momentum gained from acceleration. Their strategies in general are a heuristic, and there is no proof of convergence. In this thesis we show that restarting with the O'Donoghue and Candes condition improves the standard convergence rate in special cases. We consider the case of one-dimensional functions where we prove that the gradient based restart strategy from O'Donoghue and Candes improves the O(1/k^2) bound. We also study the restarting scheme applied to the method of alternating projections (MAP) for two closed, convex, and nonempty sets. It is shown in Chapter 6 that MAP falls into the convex composite paradigm and therefore acceleration can be applied. We study the case of MAP applied to two hyperplanes in arbitrary dimension. Furthermore we make observations as to why the restarts help, what makes a good restart condition, as well as what is needed to make progress in the general case.

Description

Keywords

FISTA, convex optimization, first order methods, accelerated gradient descent

LC Keywords

Citation