×
Jan 29, 2019 · In this paper, we show that a very lightweight convolution can perform competitively to the best reported self-attention results. Next, we ...
Dec 20, 2018 · In this paper, we show that a very lightweight convolution can perform competitively to the best reported self-attention results. Next, we ...
LightConv and DynamicConv are identical to Transformer Big, except that self-attention modules are swapped with either fixed or dynamic convolutions. These ...
• Use same setting as “Attention is all you need”. • Replace the self-attention module for lightweight and dynamic convolutions. • The encoder and decoder's ...
In this paper, we show that a very lightweight convolution can perform competitively to the best reported self-attention results. Next, we introduce dynamic ...
It is shown that a very lightweight convolution can perform competitively to the best reported self-attention results, and dynamic convolutions are ...
Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019). This page contains pointers to pre-trained models as well as instructions on ...
Dec 23, 2018 · The dynamic convolution layers seem interesting. It looks rather like dumbed down version of self attention once you think about it since ...
People also ask
Jul 25, 2019 · Felix Wu, Angela Fan, Alexei Baevski, Yann N. Dauphin, Michael Auli: Pay Less Attention with Lightweight and Dynamic Convolutions. ICLR 2019.