Efficiently Answering Durability Prediction Queries

Gao, Junyang; Xu, Yifan; Agarwal, Pankaj K.; Yang, Jun

Abstract:We consider a class of queries called durability prediction queries that arise commonly in predictive analytics, where we use a given predictive model to answer questions about possible futures to inform our decisions. Examples of durability prediction queries include "what is the probability that this financial product will keep losing money over the next 12 quarters before turning in any profit?" and "what is the chance for our proposed server cluster to fail the required service-level agreement before its term ends?" We devise a general method called Multi-Level Splitting Sampling (MLSS) that can efficiently handle complex queries and complex models -- including those involving black-box functions -- as long as the models allow us to simulate possible futures step by step. Our method addresses the inefficiency of standard Monte Carlo (MC) methods by applying the idea of importance splitting to let one "promising" sample path prefix generate multiple "offspring" paths, thereby directing simulation efforts toward more promising paths. We propose practical techniques for designing splitting strategies, freeing users from manual tuning. Experiments show that our approach is able to achieve unbiased estimates and the same error guarantees as standard MC while offering an order-of-magnitude cost reduction.

Comments:	in SIGMOD 2021
Subjects:	Databases (cs.DB)
Cite as:	arXiv:2103.12887 [cs.DB]
	(or arXiv:2103.12887v2 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.2103.12887

Computer Science > Databases

Title:Efficiently Answering Durability Prediction Queries

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators