DROP: Dimensionality Reduction Optimization for Time Series

Suri, Sahaana; Bailis, Peter

doi:10.1145/3329486.3329490

Computer Science > Databases

arXiv:1708.00183 (cs)

[Submitted on 1 Aug 2017 (v1), last revised 23 Aug 2020 (this version, v4)]

Title:DROP: Dimensionality Reduction Optimization for Time Series

Authors:Sahaana Suri, Peter Bailis

View PDF

Abstract:Dimensionality reduction is a critical step in scaling machine learning pipelines. Principal component analysis (PCA) is a standard tool for dimensionality reduction, but performing PCA over a full dataset can be prohibitively expensive. As a result, theoretical work has studied the effectiveness of iterative, stochastic PCA methods that operate over data samples. However, termination conditions for stochastic PCA either execute for a predetermined number of iterations, or until convergence of the solution, frequently sampling too many or too few datapoints for end-to-end runtime improvements. We show how accounting for downstream analytics operations during DR via PCA allows stochastic methods to efficiently terminate after operating over small (e.g., 1%) subsamples of input data, reducing whole workload runtime. Leveraging this, we propose DROP, a DR optimizer that enables speedups of up to 5x over Singular-Value-Decomposition-based PCA techniques, and exceeds conventional approaches like FFT and PAA by up to 16x in end-to-end workloads.

Subjects:	Databases (cs.DB)
Cite as:	arXiv:1708.00183 [cs.DB]
	(or arXiv:1708.00183v4 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.1708.00183
Journal reference:	DEEM'19: Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning (2019)
Related DOI:	https://doi.org/10.1145/3329486.3329490

Submission history

From: Sahaana Suri [view email]
[v1] Tue, 1 Aug 2017 06:58:15 UTC (3,411 KB)
[v2] Thu, 8 Mar 2018 20:08:41 UTC (4,812 KB)
[v3] Mon, 1 Jul 2019 22:46:55 UTC (3,836 KB)
[v4] Sun, 23 Aug 2020 07:28:23 UTC (3,836 KB)

Computer Science > Databases

Title:DROP: Dimensionality Reduction Optimization for Time Series

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Databases

Title:DROP: Dimensionality Reduction Optimization for Time Series

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators