Traffic Refinery: Cost-Aware Data Representation for Machine Learning on Network Traffic

Bronzino, Francesco; Schmitt, Paul; Ayoubi, Sara; Kim, Hyojoon; Teixeira, Renata; Feamster, Nick

doi:10.1145/3491052

Computer Science > Networking and Internet Architecture

arXiv:2010.14605 (cs)

[Submitted on 27 Oct 2020 (v1), last revised 7 Jun 2021 (this version, v3)]

Title:Traffic Refinery: Cost-Aware Data Representation for Machine Learning on Network Traffic

Authors:Francesco Bronzino, Paul Schmitt, Sara Ayoubi, Hyojoon Kim, Renata Teixeira, Nick Feamster

View PDF

Abstract:Network management often relies on machine learning to make predictions about performance and security from network traffic. Often, the representation of the traffic is as important as the choice of the model. The features that the model relies on, and the representation of those features, ultimately determine model accuracy, as well as where and whether the model can be deployed in practice. Thus, the design and evaluation of these models ultimately requires understanding not only model accuracy but also the systems costs associated with deploying the model in an operational network. Towards this goal, this paper develops a new framework and system that enables a joint evaluation of both the conventional notions of machine learning performance (e.g., model accuracy) and the systems-level costs of different representations of network traffic. We highlight these two dimensions for two practical network management tasks, video streaming quality inference and malware detection, to demonstrate the importance of exploring different representations to find the appropriate operating point. We demonstrate the benefit of exploring a range of representations of network traffic and present Traffic Refinery, a proof-of-concept implementation that both monitors network traffic at 10 Gbps and transforms traffic in real time to produce a variety of feature representations for machine learning. Traffic Refinery both highlights this design space and makes it possible to explore different representations for learning, balancing systems costs related to feature extraction and model training against model accuracy.

Subjects:	Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
Cite as:	arXiv:2010.14605 [cs.NI]
	(or arXiv:2010.14605v3 [cs.NI] for this version)
	https://doi.org/10.48550/arXiv.2010.14605
Related DOI:	https://doi.org/10.1145/3491052

Submission history

From: Francesco Bronzino [view email]
[v1] Tue, 27 Oct 2020 20:56:49 UTC (363 KB)
[v2] Thu, 28 Jan 2021 14:37:09 UTC (277 KB)
[v3] Mon, 7 Jun 2021 10:06:48 UTC (333 KB)

Computer Science > Networking and Internet Architecture

Title:Traffic Refinery: Cost-Aware Data Representation for Machine Learning on Network Traffic

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Networking and Internet Architecture

Title:Traffic Refinery: Cost-Aware Data Representation for Machine Learning on Network Traffic

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators