Data Science by ODS.ai 🦜
46.1K subscribers
664 photos
77 videos
7 files
1.75K links
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @malev
加入频道
​​Improving Transformer Models by Reordering their Sublayers

tl;dr – improve transformers by reordering their sublayers like the sandwich transformer

The authors trained random #transformer models with reordered sublayers, and find that some perform better than the baseline interleaved trans former in #language #modeling.
They observed that, on average, better models contain more self-attention #sublayers at the bottom and more feedforward sublayer at the top.

This leads them to design a new transformer stack, the sandwich transformer, which consistently improves performance over the baseline at no cost.

paper: https://ofir.io/sandwich_transformer.pdf