Key based Deep Data Locality on Hadoop
2018 IEEE International Conference on Big Data (Big Data), 2018•ieeexplore.ieee.org
Apache Hadoop is a famous framework for big data science. Most of the research for
improving the speed of big data analysis is researching based on Hadoop modules such as
Hadoop common, Hadoop Distribute File System (HDFS), Hadoop Yet Another Resource
Negotiator (YARN) and Hadoop MapReduce. The paper focuses on data locality on HDFS
and MapReduce to improve the performance. The input data is divided into several blocks
and stored in HDFS. Each block has sever key-value fair in map stages. The paper use the …
improving the speed of big data analysis is researching based on Hadoop modules such as
Hadoop common, Hadoop Distribute File System (HDFS), Hadoop Yet Another Resource
Negotiator (YARN) and Hadoop MapReduce. The paper focuses on data locality on HDFS
and MapReduce to improve the performance. The input data is divided into several blocks
and stored in HDFS. Each block has sever key-value fair in map stages. The paper use the …
Apache Hadoop is a famous framework for big data science. Most of the research for improving the speed of big data analysis is researching based on Hadoop modules such as Hadoop common, Hadoop Distribute File System (HDFS), Hadoop Yet Another Resource Negotiator (YARN) and Hadoop MapReduce. The paper focuses on data locality on HDFS and MapReduce to improve the performance. The input data is divided into several blocks and stored in HDFS. Each block has sever key-value fair in map stages. The paper use the keys in block to make key-based Deep Data Locality (DDL). The MapReduce with key-based DDL reduce some steps on map stage, shuffle stage, and reducer stages to improve the performance of MapReduce. We tested the performance of MapReduce with block-based DDL and key-based DDL to compare with default MapReduce. According to the test, MapReduce with key-based DDL is 28% faster than default MapReduce and 15.4 % faster than MapReduce with block-based DDL. Additionally, key-based DDL can be combined other data locality methods to improve the Hadoop. Combined key-based DDL and block-based DDL improve the Hadoop performance up to 52.5%.The paper also introduced the simulator for testing the performance of MapReduce with applied data locality methods on Hadoop. The simulator display a performance of each stage of MapReduce using graph. Key-based DDL can be combined with other data locality research to get optimized performance in various data types and node' status in the simulator.
ieeexplore.ieee.org
Showing the best result for this search. See all results