Oct 30, 2023 · We report three main conclusions: (1) after fine-tuning, deeper models generalize more compositionally than shallower models do, but the benefit of additional ...
Nov 27, 2023 · When controlling for total parameter count, deeper transformer language models generalize better than wider ones, up to a point.
Oct 31, 2023 · after fine-tuning, deeper models generalize better out-of-distribution than shallower models do, but the relative benefit of additional layers ...
Jun 16, 2024 · As with language-modeling performance, deeper models tend to attain higher generalization accuracies than shallower models in the same size ...
We report three main conclusions: (1) within each family, deeper models show better language modeling performance, but the relative benefit of additional layers ...
Apr 10, 2024 · As with language-modeling performance, deeper models tend to attain higher generalization accuracies than shallower models in the same size ...
People also ask
What are transformers in large language models?
What is the Transformer model in deep learning?
Apr 11, 2024 · This paper demonstrates that deeper transformer models exhibit greater compositional generalization abilities than shallower models.
Apr 13, 2024 · "The Impact of Depth on Compositional Generalization in Transformer Language Models", Petty et al 2023. R, T, Emp, Theory · arxiv.org. Open.
Missing: Width | Show results with:Width