File(s) under permanent embargo
Is high performance computing (HPC) ready to handle big data?
conference contributionposted on 2017-01-01, 00:00 authored by Biplob Ray, Morshed ChowdhuryMorshed Chowdhury, U Atif
In recent years big data has emerged as a universal term and its management has become a crucial research topic. The phrase ‘big data’ refers to data sets so large and complex that the processing of them requires collaborative High Performance Computing (HPC). How to effectively allocate resources is one of the prime challenges in HPC. This leads us to the question: are the existing HPC resource allocation techniques effective enough to support future big data challenges? In this context, we have investigated the effectiveness of HPC resource allocation using the Google cluster dataset and a number of data mining tools to determine the correlational coefficient between resource allocation, resource usages and priority. Our analysis initially focused on correlation between resource allocation and resource uses. The finding shows that a high volume of resources that are allocated by the system for a job are not being used by that same job. To investigate further, we analyzed the correlation between resource allocation, resource usages and priority. Our clustering, classification and prediction techniques identified that the allocation and uses of resources are very loosely correlated with priority of the jobs. This research shows that our current HPC scheduling needs improvement in order to accommodate the big data challenge efficiently.