Data Science by ODS.ai 🦜

Emerging Cross-lingual Structure in Pretrained Language Models

tl;dr – dissect mBERT & XLM and show monolingual BERTs are similar

They offer an ablation study on bilingual #MLM considering all relevant factors. Sharing only the top 2 layers of the #transformer finally break cross-lingual transfer.
Factors importance: parameter sharing >> domain similarity, anchor points, language universal softmax, joint BPE

We can align monolingual BERT representation at word-level & sentence level with orthogonal mapping. CKA visualizes the similarity of monitoring. & billing. BERT

Paper: https://arxiv.org/abs/1911.01464

#nlp #multilingual

8.1K viewsedited 08:13

👀 4 👍 18

About

Blog

Apps

Platform