Scheduled DropHead: A Regularization Method for Transformer Models
In this paper introduced DropHead, a structured dropout method specifically designed for regularizing the multi-head attention mechanism, which is a key component of the transformer, a SOTA model for various NLP tasks.
In contrast to the conventional dropout mechanisms which randomly drop units or connections, the proposed DropHead is a structured dropout method. It drops entire attention heads during training and It prevents the multi-head attention model from being dominated by a small portion of attention heads while also reduces the risk of overfitting the training data, thus making use of the multi-head attention mechanism more efficiently.
paper: https://arxiv.org/abs/2004.13342
#nlp #regularization #transformer
In this paper introduced DropHead, a structured dropout method specifically designed for regularizing the multi-head attention mechanism, which is a key component of the transformer, a SOTA model for various NLP tasks.
In contrast to the conventional dropout mechanisms which randomly drop units or connections, the proposed DropHead is a structured dropout method. It drops entire attention heads during training and It prevents the multi-head attention model from being dominated by a small portion of attention heads while also reduces the risk of overfitting the training data, thus making use of the multi-head attention mechanism more efficiently.
paper: https://arxiv.org/abs/2004.13342
#nlp #regularization #transformer