Model Compression via Generalized Kronecker Product Decomposition

Abdel Hameed, Marawan

Model Compression via Generalized Kronecker Product Decomposition

Files

AbdelHameed_Marawan.pdf (2.43 MB)

Date

2022-09-26

Authors

Abdel Hameed, Marawan

Advisor

Clausi, David
Zelek, John

Publisher

University of Waterloo

Abstract

Modern convolutional neural network (CNN) architectures, despite their superiority in solving various problems, are generally too large to be deployed on resource constrained edge devices. In practice, this limits many real-world applications by requiring them to off-load computations to cloud-based systems. Such a limitation introduces concerns related to privacy as well as bandwidth capabilities. The design of efficient models as well as automated compression methodologies such as quantization, pruning, knowledge distillation and tensor decomposition have been proposed to allow models to operate in such resource-constrained environments. In particular, tensor decomposition approaches have gained interest in recent years as they can achieve a wide variety of compression rates while maintaining efficient memory access patterns. However, they typically cause significant reduction in model performance on classification tasks after compression. To address this challenge, a new method that improves performance of decomposition-based model compression has been designed and tested on a variety of classification tasks. Specifically, we compress convolutional layers by generalizing the Kronecker product decomposition to apply to multidimensional tensors, leading to the Generalized Kronecker Product Decomposition (GKPD). Our approach yields a plug-and-play module that can be used as a drop-in replacement for any convolutional layer to simultaneously reduce its memory usage and number of floating-point-operations. Experimental results for image classification on CIFAR-10 and ImageNet datasets using ResNet, MobileNetv2 and SeNet architectures as well as action recognition on HMDB-51 using I3D-ResNet50 substantiate the effectiveness of our proposed approach. We find that GKPD outperforms state-of-the-art decomposition methods including Tensor-Train and Tensor-Ring as well as other relevant compression methods such as pruning and knowledge distillation. The proposed GKPD method serves as a means of deploying state-of-the-art CNN models without sacrificing significant accuracy degradation. Furthermore, the capability of utilizing GKPD as a drop-in replacement for convolutional layers allows its use for CNN model compression with minimal development time, in contrast to approaches such as efficient architecture design.