Bitext Mining For Low-resource Languages Via Contrastive Learning | Awesome Learning to Hash Add your paper to Learning2Hash

Bitext Mining For Low-resource Languages Via Contrastive Learning

Weiting Tan, Philipp Koehn . Arxiv 2022 – 4 citations

[Paper]   Search on Google Scholar   Search on Semantic Scholar
Self-Supervised

Mining high-quality bitexts for low-resource languages is challenging. This paper shows that sentence representation of language models fine-tuned with multiple negatives ranking loss, a contrastive objective, helps retrieve clean bitexts. Experiments show that parallel data mined from our approach substantially outperform the previous state-of-the-art method on low resource languages Khmer and Pashto.

Similar Work