A HADOOP DISTRIBUTED FILE SYSTEM REPLICATION APPROACH

Eman S. Abead; Mohamed H. Khafagy; Fatma A. Omara

Authors

Eman S.Abead 1Computer Department, Faculty of Science, Alasmarya Islamic University, Zliten, Libya
Mohamed H. Khafagy Faculty of Computers and Information, Fayoum University, Egypt
Fatma A. Omara Faculty of Computers and Information, Cairo University, Egypt

Keywords:

Hadoop Distributed File System (HDFS), Replication factor, NameNode, DataNode, Pipelined, Client

Abstract

Hadoop Distributed File System (HDFS) is a record framework that is intended to store, examine, and dependably move enormous datasets to client applications. Data replication is utilized to deal with adaptation to handle failures, with every data block being copied and stored on various DataNodes. Thereafter, the HDFS promote availability and reliability. The current Hadoop execution of HDFS does replication in a pipelined design, which consumes most of the daytime. The replication approach is proposed in this concentrate as a substitute methodology for effective replica state of affairs. The basic idea of this procedure is that the client allows two DataNodes to compose one block to the other equally, by storing the package.

References

B. Lublinsky, K. T. Smith, and A. Yakubovich, Professional Hadoop Solutions: John Wiley & Sons, 2013.

S. Sagiroglu and D. Sinanc, "Big data: A review," in Collaboration Technologies and Systems (CTS), 2013 International Conference on, 2013, pp. 43-48.

R. Akerkar, Big data computing: CRC Press, 2013, pp. 25-55.

A. Gkoulalas-Divanis and A. Labbi, Large-Scale Data Analytics: Springer, 2014.

M. Patel Neha, M. Patel Narendra, M. I. Hasan, D. Shah Parth, and M. Patel Mayur, "Improving HDFS write performance using efficient replica placement," in Confluence The Next Generation Information Technology Summit (Confluence), 2014 5th International Conference-, 2014, pp. 35-38.

(Access:25/5/2022 15:00 PM ). HDFS Architecture Available: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html

T. White, Hadoop: The definitive guide: " O'Reilly Media, Inc.", 2012.

D. Borthakur, "The Hadoop distributed file system: Architecture and design," Hadoop Project Website, vol. 11, p. 21, 2007.

C. L. Abad, Y. Lu, and R. H. Campbell, "DARE: Adaptive data replication for efficient cluster scheduling," in Cluster Computing (CLUSTER), 2011 IEEE International Conference on, 2011, pp. 158-169.

B. Fan, W. Tantisiriroj, L. Xiao, and G. Gibson, "DiskReduce: RAID for data-intensive scalable computing," in Proceedings of the 4th Annual Workshop on Petascale Data Storage, 2009, pp. 5-10.

Z. Cheng, Z. Luan, Y. Meng, Y. Xu, D. Qian, A. Roy, N. Zhang, and G. Guan, "Erms: an elastic replication management system for hdfs," in Cluster Computing Workshops (CLUSTER WORKSHOPS), 2012 IEEE International Conference on, 2012, pp. 33-40.

Q. Feng, J. Han, Y. Gao, and D. Meng, "Magicube: High Reliability and Low Redundancy Storage Architecture for Cloud Computing," in Networking, Architecture and Storage (NAS), 2012 IEEE 7th International Conference on, 2012, pp. 88-94.

H. Zhang, L. Wang, and H. Huang, "SMARTH: Enabling Multi-pipeline Data Transfer in HDFS," in Parallel Processing (ICPP), 2014 43rd International Conference on, 2014, pp. 30-39.

N. M. Patel, N. M. Patel, M. I. Hasan, and M. M. Patel, "Improving Data Transfer Rate and Throughput of HDFS using Efficient Replica Placement," International Journal of Computer Applications, vol. 86, 2014.

Eman.S.Abead, Mohamed H. Khafagy, and Fatma A. Omara, "A Comparative Study of HDFS Replication Approaches,", the International Journal of IT and Engineering Issues, Vol. 03, Issue-08, August 2015, pp 5-11

K. Shvachko, H. Kuang, S. Radia, and R. Chansler, "The Hadoop distributed file system," in Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, 2010, pp. 1-10.

Ebada Sarhan, Atif Ghalwash, Mohamed Khafagy," Queue weighting load-balancing technique for database replication in dynamic content web sites ", Proceedings of the 9th WSEAS International Conference on APPLIED COMPUTER SCIENCE, 2009, Pp. 50-55

M. G. Noll. (APR 9TH, 2011). Benchmarking and Stress Testing an Hadoop Cluster With TeraSort, TestDFSIO & Co. (Access: 25/6/2021 15:00 PM ) Available: http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/

Eman.S.Abead, Mohamed H. Khafagy, and Fatma A. Omara, " An Efficient Replication Technique for Hadoop Distributed File System, in Proceeding of the International Journal of Scientific and Engineering Research, Volume 7, Issue 1, ISSN: 2229-5518, January 2016, pp 254- 261.

A HADOOP DISTRIBUTED FILE SYSTEM REPLICATION APPROACH

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Most read articles by the same author(s)

Language

Information

Browse

Keywords

Latest publications