Compression and Analysis of Pre-trained Language Model using Neural Slimming
Abstract
Neural networks are powerful solutions to help with decision making and solve complex
problems in recent years. In the domain of natural language processing, BERT and
its variants significantly outperform other network structures. It learns general linguistic
knowledge from a large corpus in the pre-training stage and utilizes that to solve downstream
tasks. However, the size of this kind of model is enormous and causes the issue
of overparameterization. This makes the model deployment on small edge devices less
scalable and flexible.
In this thesis, we study how to compress the BERT model in a structured pruning
manner. We proposed the neural slimming technique to assess the importance of each
neuron and designed the cost function and pruning strategy to remove neurons that make
zero or less contribution to the prediction. After getting fine-tuned on the downstream
tasks, the model can learn a more compact structure, and we name it SlimBERT.
We tested our method on 7 GLUE tasks and used only 10% original parameters to recover 94% of the original performance. It also reduced the run-time memory and increased the
inference speed at the same time. Compared to knowledge distillation methods and other
structured pruning methods, the proposed approach achieved better performance under
different metrics with the same compression ratio. Moreover, our method also improved
the interpretability of BERT. By analyzing neurons with a significant contribution, we can
observe that BERT utilizes different components and subnetworks according to different
tasks.
Collections
Cite this version of the work
Ruizhou Xu
(2022).
Compression and Analysis of Pre-trained Language Model using Neural Slimming. UWSpace.
http://hdl.handle.net/10012/18586
Other formats