×
Oct 30, 2023 · We report three main conclusions: (1) after fine-tuning, deeper models generalize more compositionally than shallower models do, but the benefit of additional ...
Oct 31, 2023 · after fine-tuning, deeper models generalize better out-of-distribution than shallower models do, but the relative benefit of additional layers ...
Jun 16, 2024 · As with language-modeling performance, deeper models tend to attain higher generalization accuracies than shallower models in the same size ...
We report three main conclusions: (1) within each family, deeper models show better language modeling performance, but the relative benefit of additional layers ...
Apr 10, 2024 · As with language-modeling performance, deeper models tend to attain higher generalization accuracies than shallower models in the same size ...
People also ask
Apr 11, 2024 · This paper demonstrates that deeper transformer models exhibit greater compositional generalization abilities than shallower models.
Apr 13, 2024 · "The Impact of Depth on Compositional Generalization in Transformer Language Models", Petty et al 2023. R, T, Emp, Theory · arxiv.org. Open.
Missing: Width | Show results with:Width