International Journal of Applied Information Systems |
Foundation of Computer Science (FCS), NY, USA |
Volume 4 - Number 7 |
Year of Publication: 2012 |
Authors: Prabin R. Sahoo |
10.5120/ijais12-450799 |
Prabin R. Sahoo . Performance Overhead on Relational Join in Hadoop using Hive/Pig/Streaming - A Comparative Analysis. International Journal of Applied Information Systems. 4, 7 ( December 2012), 15-20. DOI=10.5120/ijais12-450799
Hadoop Distributed File System (HDFS) is quite popular in the big data world. It not only provides a framework for storing data in a distributed environment, but also has set of tools to retrieve and process these data using map-reduce concept. This paper discusses the result of evaluation of major tools such as Hive, Pigand hadoop streaming for solving problems from a relational prospective and comparing their performances. Though big data cannot be compared to the strength of relational database in solving relational problems, but as big data is about data so the relational nature of data access cannot be eliminated altogether. Fortunately, there are ways to deal with this which has been discussed in this paper from a performance prospective. This may help the big data community in understanding the performance challenges so that further optimization can be done and the application developers' community can learn how strategically the relational operations need to be used.