You are not logged in.
Openly accessible

Effective methods and strategies for massive small files processing based on Hadoop

Xia, D., Wang, B., Rong, Z., Li, Y. and Zhang, Zili 2014, Effective methods and strategies for massive small files processing based on Hadoop, ICIC Express Letters, vol. 8, no. 7, pp. 1935-1941.

Attached Files
Name Description MIMEType Size Downloads
xia-effectivemethodsand-2014.pdf Published version application/pdf 2.49MB 63

Title Effective methods and strategies for massive small files processing based on Hadoop
Author(s) Xia, D.
Wang, B.
Rong, Z.
Li, Y.
Zhang, ZiliORCID iD for Zhang, Zili orcid.org/0000-0002-8721-9333
Journal name ICIC Express Letters
Volume number 8
Issue number 7
Start page 1935
End page 1941
Total pages 7
Publisher ICIC International
Place of publication Kumamoto, Japan
Publication date 2014
ISSN 1881-803X
Keyword(s) Big Data
Hadoop distributed file system (HDFS)
Hadoop mapReduce
Small files problem
Summary The Hadoop framework provides a powerful way to handle Big Data. Since Hadoop has inherent defects of high memory overhead and low computing performance in processing massive small files, we implement three methods and propose two strategies for solving small files problem in this paper. First, we implement three methods, i.e., Hadoop Archives (HAR), Sequence Files (SF) and CombineFileInputFormat (CFIF), to compensate the existing defects of Hadoop. Moreover, we propose two strategies for meeting the actual needs of different users. Finally, we evaluate the efficiency of the implemented methods and the validity of the proposed strategies. The experimental results show that our methods and strategies can improve the efficiency of massive small files processing, thereby enhancing the overall performance of Hadoop. © 2014 ISSN 1881-803X.
Language eng
Field of Research 080599 Distributed Computing not elsewhere classified
Socio Economic Objective 970108 Expanding Knowledge in the Information and Computing Sciences
HERDC Research category C1 Refereed article in a scholarly journal
ERA Research output type C Journal article
Copyright notice ©2014, ICIC International
Free to Read? Yes
Persistent URL http://hdl.handle.net/10536/DRO/DU:30072604

Document type: Journal Article
Collections: School of Information Technology
Open Access Collection
Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.

Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in TR Web of Science
Scopus Citation Count Cited 3 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 211 Abstract Views, 65 File Downloads  -  Detailed Statistics
Created: Thu, 23 Apr 2015, 13:33:38 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.