Data Science by ODS.ai 🦜

Scheduled DropHead: A Regularization Method for Transformer Models

In this paper introduced DropHead, a structured dropout method specifically designed for regularizing the multi-head attention mechanism, which is a key component of the transformer, a SOTA model for various NLP tasks.

In contrast to the conventional dropout mechanisms which randomly drop units or connections, the proposed DropHead is a structured dropout method. It drops entire attention heads during training and It prevents the multi-head attention model from being dominated by a small portion of attention heads while also reduces the risk of overfitting the training data, thus making use of the multi-head attention mechanism more efficiently.

paper: https://arxiv.org/abs/2004.13342

#nlp #regularization #transformer

11.5K views06:35

😱 15 🔥 9

About

Blog

Apps

Platform