Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow

Zhang, Wenqi; Shen, Yongliang; Lu, Weiming; Zhuang, Yueting

Computer Science > Computation and Language

arXiv:2306.07209v7 (cs)

[Submitted on 12 Jun 2023 (v1), last revised 5 Oct 2024 (this version, v7)]

Title:Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow

Authors:Wenqi Zhang, Yongliang Shen, Weiming Lu, Yueting Zhuang

View PDF HTML (experimental)

Abstract:Industries such as finance, meteorology, and energy generate vast amounts of data daily. Efficiently managing, processing, and displaying this data requires specialized expertise and is often tedious and repetitive. Leveraging large language models (LLMs) to develop an automated workflow presents a highly promising solution. However, LLMs are not adept at handling complex numerical computations and table manipulations and are also constrained by a limited context budget. Based on this, we propose Data-Copilot, a data analysis agent that autonomously performs querying, processing, and visualization of massive data tailored to diverse human requests. The advancements are twofold: First, it is a code-centric agent that receives human requests and generates code as an intermediary to handle massive data, which is quite flexible for large-scale data processing tasks. Second, Data-Copilot involves a data exploration phase in advance, which explores how to design more universal and error-free interfaces for real-time response. Specifically, it actively explores data sources, discovers numerous common requests, and abstracts them into many universal interfaces for daily invocation. When deployed in real-time requests, Data-Copilot only needs to invoke these pre-designed interfaces, transforming raw data into visualized outputs (e.g., charts, tables) that best match the user's intent. Compared to generating code from scratch, invoking these pre-designed and compiler-validated interfaces can significantly reduce errors during real-time requests. Additionally, interface workflows are more efficient and offer greater interpretability than code. We open-sourced Data-Copilot with massive Chinese financial data, such as stocks, funds, and news, demonstrating promising application prospects.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
Cite as:	arXiv:2306.07209 [cs.CL]
	(or arXiv:2306.07209v7 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2306.07209

Submission history

From: Wenqi Zhang [view email]
[v1] Mon, 12 Jun 2023 16:12:56 UTC (6,220 KB)
[v2] Sun, 21 Apr 2024 12:25:25 UTC (6,829 KB)
[v3] Mon, 6 May 2024 15:36:53 UTC (7,265 KB)
[v4] Tue, 7 May 2024 02:53:28 UTC (7,265 KB)
[v5] Fri, 24 May 2024 16:35:15 UTC (6,250 KB)
[v6] Sun, 4 Aug 2024 17:54:15 UTC (6,250 KB)
[v7] Sat, 5 Oct 2024 22:55:15 UTC (6,529 KB)

Computer Science > Computation and Language

Title:Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators