AN ENHANCEMENT LAZY REPLICATION TECHNIQUE FOR HADOOP DISTRIBUTED FILE SYSTEM
DOI:
https://doi.org/10.59743/jbs.v34i1.12الكلمات المفتاحية:
Hadoop Distributed File System (HDFS)، Pipelined، Replication factor، NameNode، DataNode، Clientالملخص
The Hadoop Distributed File System (HDFS) is designed to store, analysis, transfer larg datasets reliably, and stream it at high bandwidth to the user applications. HDFS is a variant of the Google File System (GFS). It handles fault tolerance by using data replication, where each data block is replicated and stored on multiple DataNodes. Therefore, the HDFS supports availability and reliability. The existed implementation of the HDFS in Hadoop performs replication in a pipelined manner that takes much time for replication. In this paper, an alternative technique for efficient replica placement, called Enhancement Lazy replication technique, has been proposed. The main principle of this technique is that, the client allows to write a block to two DataNodes in parallel, which store the packet. which will send acknowledgement directly to the client without waiting of receiving acknowledgement from other DataNodes. The experiment has been performed to evaluate the performance of the proposed HDFS replication technique with the default pipelined replication technique and the existed replication techniques; using TestDFSIO benchmark. According to the experimental results (i.e., the execution time and throughput), it is found that the HDFS availability has been improved in the proposed replication technique.
المراجع
S. Sagiroglu and D. Sinanc, "Big data: A review," in Collaboration Technologies and Systems (CTS), 2013 International Conference on, 2013, pp. 42-47.
B. Lublinsky, K. T. Smith, and A. Yakubovich, Professional Hadoop Solutions: John Wiley & Sons, 2013.
R. Akerkar, Big data computing: CRC Press, 2013.
A. Gkoulalas-Divanis and A. Labbi, Large-Scale Data Analytics: Springer, 2014.
M. Patel Neha, M. Patel Narendra, M. I. Hasan, D. Shah Parth, and M. Patel Mayur, "Improving HDFS write performance using efficient replica placement," in Confluence The Next Generation Information Technology Summit (Confluence), 2014 5th International Conference-, 2014, pp. 36-39.
T. White, Hadoop: The definitive guide: " O'Reilly Media, Inc.", 2012.
(Access:25/6/2021 15:00 PM ). HDFS Architecture Available: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop- hdfs/HdfsDesign.html
D. Borthakur, "The Hadoop distributed file system: Architecture and design," Hadoop Project Website, vol. 11, p. 21, 2007.
C. L. Abad, Y. Lu, and R. H. Campbell, "DARE: Adaptive data replication for efficient cluster scheduling," in Cluster Computing (CLUSTER), 2011 IEEE International Conference on, 2011, pp. 159-168.
B. Fan, W. Tantisiriroj, L. Xiao, and G. Gibson, "DiskReduce: RAID for data- intensive scalable computing," in Proceedings of the 4th Annual Workshop on Petascale Data Storage, 2009, pp. 6-10.
Z. Cheng, Z. Luan, Y. Meng, Y. Xu, D. Qian, A. Roy, N. Zhang, and G. Guan, "Erms: an elastic replication management system for hdfs," in Cluster Computing Workshops (CLUSTER WORKSHOPS), 2012 IEEE International Conference on, 2012, pp. 32-40.
Q. Feng, J. Han, Y. Gao, and D. Meng, "Magicube: High Reliability and Low Redundancy Storage Architecture for Cloud Computing," in Networking, Architecture and Storage (NAS), 2012 IEEE 7th International Conference on, 2012, pp. 89-93.
N. M. Patel, N. M. Patel, M. I. Hasan, and M. M. Patel, "Improving Data Transfer Rate and Throughput of HDFS using Efficient Replica Placement," International Journal of Computer Applications, vol. 86, 2014.
H. Zhang, L. Wang, and H. Huang, "SMARTH: Enabling Multi-pipeline Data Transfer in HDFS," in Parallel Processing (ICPP), 2014 43rd International Conference on, 2014, pp. 30-39.
Eman.S.Abead, " An Efficient Replication Technique for Improving availability in Hadoop Distributed File System,(Unpublished Master's Thesis).Cairo University, Egypt, 2016, pp 55.
K. Shvachko, H. Kuang, S. Radia, and R. Chansler, "The Hadoop distributed file system," in Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, 2010, pp. 1-10.
Ebada Sarhan, Atif Ghalwash, Mohamed Khafagy," Queue weighting load-balancing technique for database replication in dynamic content web sites ", Proceedings of the 9th WSEAS International Conference on APPLIED COMPUTER SCIENCE, 2009, Pp. 50-55.
Ahmed M Wahdan Hesham A. Hefny, Mohamed Helmy Khafagy," Comparative Study Load Balance Algorithms for Map Reduce Environment ", International Journal of Applied Information Systems, volume 7, issue 11, 2014, Pp. 41-50
M. G. Noll. (APR 9TH, 2011). Benchmarking and Stress Testing an Hadoop Cluster With TeraSort, TestDFSIO & Co. (Access: 25/6/2021 15:00 PM ) Available: http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an- hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/
التنزيلات
منشور
إصدار
القسم
الرخصة
الحقوق الفكرية (c) 2021 مجلة العلوم الأساسية

هذا العمل مرخص بموجب Creative Commons Attribution 4.0 International License.