LLM-based speaker diarization correction: A generalizable approach

Efstathiadis, Georgios; Yadav, Vijay; Abbas, Anzar

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2406.04927 (eess)

[Submitted on 7 Jun 2024 (v1), last revised 13 Sep 2024 (this version, v2)]

Title:LLM-based speaker diarization correction: A generalizable approach

Authors:Georgios Efstathiadis, Vijay Yadav, Anzar Abbas

View PDF HTML (experimental)

Abstract:Speaker diarization is necessary for interpreting conversations transcribed using automated speech recognition (ASR) tools. Despite significant developments in diarization methods, diarization accuracy remains an issue. Here, we investigate the use of large language models (LLMs) for diarization correction as a post-processing step. LLMs were fine-tuned using the Fisher corpus, a large dataset of transcribed conversations. The ability of the models to improve diarization accuracy in a holdout dataset from the Fisher corpus as well as an independent dataset was measured. We report that fine-tuned LLMs can markedly improve diarization accuracy. However, model performance is constrained to transcripts produced using the same ASR tool as the transcripts used for fine-tuning, limiting generalizability. To address this constraint, an ensemble model was developed by combining weights from three separate models, each fine-tuned using transcripts from a different ASR tool. The ensemble model demonstrated better overall performance than each of the ASR-specific models, suggesting that a generalizable and ASR-agnostic approach may be achievable. We have made the weights of these models publicly available on HuggingFace at this https URL.

Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
Cite as:	arXiv:2406.04927 [eess.AS]
	(or arXiv:2406.04927v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2406.04927

Submission history

From: Georgios Efstathiadis [view email]
[v1] Fri, 7 Jun 2024 13:33:22 UTC (128 KB)
[v2] Fri, 13 Sep 2024 20:42:20 UTC (1,092 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:LLM-based speaker diarization correction: A generalizable approach

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:LLM-based speaker diarization correction: A generalizable approach

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators