Google Scholar

Key based Deep Data Locality on Hadoop

S Lee, JY Jo, Y Kim - … International Conference on Big Data (Big …, 2018 - ieeexplore.ieee.org

2018 IEEE International Conference on Big Data (Big Data), 2018•ieeexplore.ieee.org

Apache Hadoop is a famous framework for big data science. Most of the research for improving the speed of big data analysis is researching based on Hadoop modules such as Hadoop common, Hadoop Distribute File System (HDFS), Hadoop Yet Another Resource Negotiator (YARN) and Hadoop MapReduce. The paper focuses on data locality on HDFS and MapReduce to improve the performance. The input data is divided into several blocks and stored in HDFS. Each block has sever key-value fair in map stages. The paper use the keys in block to make key-based Deep Data Locality (DDL). The MapReduce with key-based DDL reduce some steps on map stage, shuffle stage, and reducer stages to improve the performance of MapReduce. We tested the performance of MapReduce with block-based DDL and key-based DDL to compare with default MapReduce. According to the test, MapReduce with key-based DDL is 28% faster than default MapReduce and 15.4 % faster than MapReduce with block-based DDL. Additionally, key-based DDL can be combined other data locality methods to improve the Hadoop. Combined key-based DDL and block-based DDL improve the Hadoop performance up to 52.5%.The paper also introduced the simulator for testing the performance of MapReduce with applied data locality methods on Hadoop. The simulator display a performance of each stage of MapReduce using graph. Key-based DDL can be combined with other data locality research to get optimized performance in various data types and node' status in the simulator.

ieeexplore.ieee.org

Show moreShow less

Save Cite Cited by 3 Related articles All 2 versions

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Key based Deep Data Locality on Hadoop