Emerging Cross-lingual Structure in Pretrained Language Models
tl;dr – dissect mBERT & XLM and show monolingual BERTs are similar
They offer an ablation study on bilingual #MLM considering all relevant factors. Sharing only the top 2 layers of the #transformer finally break cross-lingual transfer.
Factors importance: parameter sharing >> domain similarity, anchor points, language universal softmax, joint BPE
We can align monolingual BERT representation at word-level & sentence level with orthogonal mapping. CKA visualizes the similarity of monitoring. & billing. BERT
Paper: https://arxiv.org/abs/1911.01464
#nlp #multilingual
tl;dr – dissect mBERT & XLM and show monolingual BERTs are similar
They offer an ablation study on bilingual #MLM considering all relevant factors. Sharing only the top 2 layers of the #transformer finally break cross-lingual transfer.
Factors importance: parameter sharing >> domain similarity, anchor points, language universal softmax, joint BPE
We can align monolingual BERT representation at word-level & sentence level with orthogonal mapping. CKA visualizes the similarity of monitoring. & billing. BERT
Paper: https://arxiv.org/abs/1911.01464
#nlp #multilingual