Clustering Document Parts: Detecting and Characterizing Influence Campaigns from Documents

Wang, Zhengxiang; Rambow, Owen

Computer Science > Computation and Language

arXiv:2402.17151 (cs)

[Submitted on 27 Feb 2024 (v1), last revised 26 Apr 2024 (this version, v2)]

Title:Clustering Document Parts: Detecting and Characterizing Influence Campaigns from Documents

Authors:Zhengxiang Wang, Owen Rambow

View PDF HTML (experimental)

Abstract:We propose a novel clustering pipeline to detect and characterize influence campaigns from documents. This approach clusters parts of document, detects clusters that likely reflect an influence campaign, and then identifies documents linked to an influence campaign via their association with the high-influence clusters. Our approach outperforms both the direct document-level classification and the direct document-level clustering approach in predicting if a document is part of an influence campaign. We propose various novel techniques to enhance our pipeline, including using an existing event factuality prediction system to obtain document parts, and aggregating multiple clustering experiments to improve the performance of both cluster and document classification. Classifying documents after clustering not only accurately extracts the parts of the documents that are relevant to influence campaigns, but also captures influence campaigns as a coordinated and holistic phenomenon. Our approach makes possible more fine-grained and interpretable characterizations of influence campaigns from documents.

Comments:	12 pages, 2 figures, 5 tables
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2402.17151 [cs.CL]
	(or arXiv:2402.17151v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2402.17151

Submission history

From: Zhengxiang Wang [view email]
[v1] Tue, 27 Feb 2024 02:36:43 UTC (739 KB)
[v2] Fri, 26 Apr 2024 20:01:28 UTC (739 KB)

Computer Science > Computation and Language

Title:Clustering Document Parts: Detecting and Characterizing Influence Campaigns from Documents

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Clustering Document Parts: Detecting and Characterizing Influence Campaigns from Documents

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators