Continual Learning and Forgetting in Deep Learning Models

El Khatib, Alaa

dc.contributor.author	El Khatib, Alaa
dc.date.accessioned	2020-12-14 17:31:51 (GMT)
dc.date.available	2020-12-14 17:31:51 (GMT)
dc.date.issued	2020-12-14
dc.date.submitted	2020-12-04
dc.identifier.uri	http://hdl.handle.net/10012/16544
dc.description.abstract	Continual learning is a framework of learning in which we aim to move beyond the limitations of standard isolated optimization of deep learning models toward a more intelligent setting, where models or agents are able to accumulate skills and knowledge, across diverse tasks and over extended periods of time, much like humans do. Like much of neural networks research, interest in continual learning has ebbed and flowed over the decades, and ultimately saw a sharp increase over the past few years, buoyed by the successes of deep learning thus far. One obstacle that has dominated continual learning research over the years is the so-called catastrophic forgetting phenomenon, which refers to the tendency of neural networks to "forget" older skills and knowledge as soon as they are subsequently optimized for additional tasks. Researchers have proposed various approaches to counter forgetting in neural networks. In this dissertation, we review some of those approaches and build upon them, and address other aspects of the continual learning problem. We make the following four contributions. First, we address the critical role of importance estimation in fixed-capacity models, where the aim is to find a balance between countering forgetting and preserving a model's capacity to learn additional tasks. We propose a novel unit importance estimation approach, with a small memory and computational footprint. The proposed approach builds on recent work that showed that the average of a unit's activation values is a good indicator of its importance, and extends it by taking into consideration the separation between class-conditional distributions of activation values. Second, we observe that most methods that aim to prevent forgetting by explicitly penalizing changes to parameters can be seen as post hoc remedies that ultimately lead to inefficient use of model capacity. We argue that taking into account the continual learning objective requires a modification to the optimization approach from the start rather than only after learning. In particular, we argue that key to the effective use of a model's capacity in the continual learning setting is to drive the optimization process toward learning more general, reusable, and thus durable representations that are less susceptible to forgetting. To that end, we explore the use of supervised and unsupervised auxiliary tasks as regularization, not against forgetting, but against learning representations that narrowly target any single classification task. We show that the approach is successful at mitigating forgetting, even though it does not explicitly penalize forgetting. Third, we explore the effect of inter-task similarity in sequences of image classification tasks on the overall performance of continual learning models. We show that certain models are adversely affected when the learned tasks are dissimilar. Moreover, we show that, in those cases, a small replay memory, even 1% the size of the training data, is enough to significantly improve performance. Fourth and lastly, we explore the performance of continual learning models in the so-called multi-head and single-head settings and approaches to narrow the gap between the two settings. We show that unlabelled auxiliary data, not sampled from any task in the learning sequence, can be used to improve performance in the single-head setting. We provide extensive empirical evaluation of the proposed approaches and compare their performance against recent continual learning methods in the literature.	en
dc.language.iso	en	en
dc.publisher	University of Waterloo	en
dc.subject	deep learning	en
dc.subject	continual learning	en
dc.subject	catastrophic forgetting	en
dc.title	Continual Learning and Forgetting in Deep Learning Models	en
dc.type	Doctoral Thesis	en
dc.pending	false
uws-etd.degree.department	Electrical and Computer Engineering	en
uws-etd.degree.discipline	Electrical and Computer Engineering	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.degree	Doctor of Philosophy	en
uws.contributor.advisor	Karray, Fakhri
uws.contributor.affiliation1	Faculty of Engineering	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.typeOfResource	Text	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en

Files in this item

Name:: El-Khatib_Alaa.pdf
Size:: 2.513Mb
Format:: PDF
Description:: Dissertation

View/ Open

This item appears in the following Collection(s)

Show simple item record