COSTECH Integrated Repository

HDFS+: Erasure coding based hadoop distributed file system

Show simple item record

dc.creator Ishengoma, Fredrick Romanus
dc.date 2020-03-25T08:11:56Z
dc.date 2020-03-25T08:11:56Z
dc.date 2013
dc.date.accessioned 2022-10-20T13:47:41Z
dc.date.available 2022-10-20T13:47:41Z
dc.identifier Ishengoma, F. R. (2013). HDFS+ Erasure coding based hadoop distributed file system. International Journal of Scientific & Technology Research, 2(8), 190-197.
dc.identifier 2277-8616
dc.identifier http://hdl.handle.net/20.500.12661/2366
dc.identifier.uri http://hdl.handle.net/20.500.12661/2366
dc.description Full Text Article. Also available at: http://www.ijstr.org/final-print/sep2013/Hdfs+-Erasure-Coding-Based-Hadoop-Distributed-File-System.pdf
dc.description A simple replication-based mechanism has been used to achieve high data reliability of Hadoop Distributed File System (HDFS). However, replication based mechanisms have high degree of disk storage requirement since it makes copies of full block without consideration of storage size. Studies have shown that erasure-coding mechanism can provide more storage space when used as an alternative to replication. Also, it can increase write throughput compared to replication mechanism. To improve both space efficiency and I/O performance of the HDFS while preserving the same data reliability level, we propose HDFS+, an erasure coding based Hadoop Distributed File System. The proposed scheme writes a full block on the primary DataNode and then performs erasure coding with Vandermonde-based Reed-Solomon algorithm that divides data into m data fragments and encode them into ndata fragments (n>m), which are saved in N distinct DataNodes such that the original object can be reconstructed from any m fragments. The experimental results show that our scheme can save up to 33% of storage space while outperforming the original scheme in write performance by 1.4 times. Our scheme provides the same read performance as the original scheme as long as data can be read from the primary DataNode even under single-node or double-node failure. Otherwise, the read performance of the HDFS+ decreases to some extent. However, as the number of fragments increases, we show that the performance degradation becomes negligible. Index Terms: Erasure coding, Hadoop, HDFS, I/O performance, node failure, replication, space efficiency.
dc.language en
dc.publisher Elsevier
dc.subject Erasure coding
dc.subject Hadoop
dc.subject I/O performance
dc.subject HDFS
dc.subject Node failure
dc.subject Replication
dc.subject Space efficiency
dc.subject Hadoop Distributed File System
dc.title HDFS+: Erasure coding based hadoop distributed file system
dc.type Article


Files in this item

Files Size Format View
Ishengoma 2013..pdf 626.6Kb application/pdf View/Open

This item appears in the following Collection(s)

Show simple item record

Search COSTECH


Advanced Search

Browse

My Account