Parallelizing Legendre Memory Unit Training

Chilkuri, Narsimha Reddy

Parallelizing Legendre Memory Unit Training

Files

Chilkuri_NarsimhaReddy.pdf (728.61 KB)

Date

2021-07-14

Authors

Chilkuri, Narsimha Reddy

Publisher

University of Waterloo

Abstract

Recently, a new recurrent neural network (RNN) named the Legendre Memory Unit (LMU) was proposed and shown to achieve state-of-the-art performance on several benchmark datasets. Here we leverage the linear time-invariant (LTI) memory component of the LMU to construct a simplified variant that can be parallelized during training (and yet executed as an RNN during inference), resulting in up to 200 times faster training. We note that our efficient parallelizing scheme is general and is applicable to any deep network whose recurrent components are LTI systems. We demonstrate the improved accuracy and decreased parameter count of our new architecture compared to the original LMU and a variety of published LSTM and transformer networks across seven benchmarks. For instance, our LMU sets a new state-of-the-art result on psMNIST, and uses half the parameters while outperforming DistilBERT and LSTM models on IMDB sentiment analysis.

Keywords

Deep Learning, Recurrent Neural Network, Legendre Memory Unit

URI

http://hdl.handle.net/10012/17142

Collections

Theses
Systems Design Engineering

Full item page

Parallelizing Legendre Memory Unit Training

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By