What’s Hidden in a One-layer Randomly Weighted Transformer?

Sheng Shen; Zhewei Yao; Douwe Kiela; Kurt Keutzer; Michael Mahoney

doi:10.18653/v1/2021.emnlp-main.231

What’s Hidden in a One-layer Randomly Weighted Transformer?

Sheng Shen, Zhewei Yao, Douwe Kiela, Kurt Keutzer, Michael Mahoney

Abstract

We demonstrate that, hidden within one-layer randomly weighted neural networks, there exist subnetworks that can achieve impressive performance, without ever modifying the weight initializations, on machine translation tasks. To find subnetworks for one-layer randomly weighted neural networks, we apply different binary masks to the same weight matrix to generate different layers. Hidden within a one-layer randomly weighted Transformer, we find that subnetworks that can achieve 29.45/17.29 BLEU on IWSLT14/WMT14. Using a fixed pre-trained embedding layer, the previously found subnetworks are smaller than, but can match 98%/92% (34.14/25.24 BLEU) of the performance of, a trained Transformersmall/base on IWSLT14/WMT14. Furthermore, we demonstrate the effectiveness of larger and deeper transformers in this setting, as well as the impact of different initialization methods.

Anthology ID:: 2021.emnlp-main.231
Volume:: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2021
Address:: Online and Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2914–2921
Language:
URL:: https://aclanthology.org/2021.emnlp-main.231
DOI:: 10.18653/v1/2021.emnlp-main.231
Bibkey:
Cite (ACL):: Sheng Shen, Zhewei Yao, Douwe Kiela, Kurt Keutzer, and Michael Mahoney. 2021. What’s Hidden in a One-layer Randomly Weighted Transformer?. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2914–2921, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: What’s Hidden in a One-layer Randomly Weighted Transformer? (Shen et al., EMNLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.emnlp-main.231.pdf
Video:: https://aclanthology.org/2021.emnlp-main.231.mp4
Code: sincerass/one_layer_lottery_ticket
Data: MultiNLI, WMT 2014

PDF Cite Search Code Video