Exploring Alignment in Shared Cross-lingual Spaces

Mousi, Basel; Durrani, Nadir; Dalvi, Fahim; Hawasly, Majd; Abdelali, Ahmed

Computer Science > Computation and Language

arXiv:2405.14535 (cs)

[Submitted on 23 May 2024]

Title:Exploring Alignment in Shared Cross-lingual Spaces

Authors:Basel Mousi, Nadir Durrani, Fahim Dalvi, Majd Hawasly, Ahmed Abdelali

View PDF HTML (experimental)

Abstract:Despite their remarkable ability to capture linguistic nuances across diverse languages, questions persist regarding the degree of alignment between languages in multilingual embeddings. Drawing inspiration from research on high-dimensional representations in neural language models, we employ clustering to uncover latent concepts within multilingual models. Our analysis focuses on quantifying the \textit{alignment} and \textit{overlap} of these concepts across various languages within the latent space. To this end, we introduce two metrics \CA{} and \CO{} aimed at quantifying these aspects, enabling a deeper exploration of multilingual embeddings. Our study encompasses three multilingual models (\texttt{mT5}, \texttt{mBERT}, and \texttt{XLM-R}) and three downstream tasks (Machine Translation, Named Entity Recognition, and Sentiment Analysis). Key findings from our analysis include: i) deeper layers in the network demonstrate increased cross-lingual \textit{alignment} due to the presence of language-agnostic concepts, ii) fine-tuning of the models enhances \textit{alignment} within the latent space, and iii) such task-specific calibration helps in explaining the emergence of zero-shot capabilities in the models.\footnote{The code is available at \url{this https URL}}

Comments:	ACL 2024
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2405.14535 [cs.CL]
	(or arXiv:2405.14535v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2405.14535

Submission history

From: Nadir Durrani Dr [view email]
[v1] Thu, 23 May 2024 13:20:24 UTC (24,549 KB)

Computer Science > Computation and Language

Title:Exploring Alignment in Shared Cross-lingual Spaces

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Exploring Alignment in Shared Cross-lingual Spaces

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators