Authors:
Lucas C. Scabora
1
;
Jaqueline J. Brito
1
;
Ricardo Rodrigues Ciferri
2
and
Cristina Dutra de Aguiar Ciferri
1
Affiliations:
1
University of Sao Paulo at Sao Carlos, Brazil
;
2
Federal University of Sao Carlos, Brazil
Keyword(s):
Data Warehousing, Physical Design, NoSQL, OLAP Query Processing, HBase, Star Schema Benchmark.
Related
Ontology
Subjects/Areas/Topics:
Data Warehouses and OLAP
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Non-Relational Databases
;
Performance Evaluation and Benchmarking
Abstract:
Nowadays, data warehousing and online analytical processing (OLAP) are core technologies in business intelligence and therefore have drawn much interest by researchers in the last decade. However, these technologies have been mainly developed for relational database systems in centralized environments. In other words, these technologies have not been designed to be applied in scalable systems such as NoSQL databases. Adapting a data warehousing environment to NoSQL databases introduces several advantages, such as scalability and flexibility. This paper investigates three physical data warehouse designs to adapt the Star Schema Benchmark for its use in NoSQL databases. In particular, our main investigation refers to the OLAP query processing over column-oriented databases using the MapReduce framework. We analyze the impact of distributing attributes among column-families in HBase on the OLAP query performance. Our experiments showed how processing time of OLAP queries was impacted by
a physical data warehouse design regarding the number of dimensions accessed and the data volume. We conclude that using distinct distributions of attributes among column-families can improve OLAP query performance in HBase and consequently make the benchmark more suitable for OLAP over NoSQL databases.
(More)