Data Science by ODS.ai 🦜

Improving Transformer Models by Reordering their Sublayers

tl;dr – improve transformers by reordering their sublayers like the sandwich transformer

The authors trained random #transformer models with reordered sublayers, and find that some perform better than the baseline interleaved trans former in #language #modeling.
They observed that, on average, better models contain more self-attention #sublayers at the bottom and more feedforward sublayer at the top.

This leads them to design a new transformer stack, the sandwich transformer, which consistently improves performance over the baseline at no cost.

paper: https://ofir.io/sandwich_transformer.pdf

9.6K viewsedited 07:47

👎 1 👍 14

About

Blog

Apps

Platform