Optimizing regression models for data streams with missing values

I Žliobaitė, J Hollmen - Machine learning, 2015 - Springer
Machine learning, 2015Springer
Automated data acquisition systems, such as wireless sensor networks, surveillance
systems, or any system that records data in operating logs, are becoming increasingly
common, and provide opportunities for making decision on data in real or nearly real time. In
these systems, data is generated continuously resulting in a stream of data, and predictive
models need to be built and updated online with the incoming data. In addition, the
predictive models need to be able to output predictions continuously, and without delays …
Abstract
Automated data acquisition systems, such as wireless sensor networks, surveillance systems, or any system that records data in operating logs, are becoming increasingly common, and provide opportunities for making decision on data in real or nearly real time. In these systems, data is generated continuously resulting in a stream of data, and predictive models need to be built and updated online with the incoming data. In addition, the predictive models need to be able to output predictions continuously, and without delays. Automated data acquisition systems are prone to occasional failures. As a result, missing values may often occur. Nevertheless, predictions need to be made continuously. Hence, predictive models need to have mechanisms for dealing with missing data in such a way that the loss in accuracy due to occasionally missing values would be minimal. In this paper, we theoretically analyze effects of missing values to the accuracy of linear predictive models. We derive the optimal least squares solution that minimizes the expected mean squared error given an expected rate of missing values. Based on this theoretically optimal solution we propose a recursive algorithm for producing and updating linear regression online, without accessing historical data. Our experimental evaluation on eight benchmark datasets and a case study in environmental monitoring with streaming data validate the theoretical results and confirm the effectiveness of the proposed strategy.
Springer
Showing the best result for this search. See all results