Hadoop Environment Setup for Big Data

Abstract

Hadoop is a powerful open-source framework designed for the distributed storage and processing of large data sets across clusters of computers, making it a vital tool in the era of big data. Big data refers to vast volumes of data generated at high velocity and from various sources, presenting significant challenges in storage, analysis, and management. This paper outlines the installation steps necessary to set up a Hadoop environment on a Linux operating system, which provides a stable and efficient platform for running distributed applications. By offering a comprehensive overview of the installation and operational aspects of Hadoop, this research serves as a practical guide for beginners and practitioners, facilitating efficient data processing and enhancing the understanding of big data management in a Linux ecosystem.

Country : India

1 Pooja S. Gadhave2 Sanika D. Pangul3 Tejaswini J. Bhande4 Shilpa B. Sarvaiya

  1. MCA-II, Department of MCA, Vidya Bharti Mahavidyalaya, Amravati, Maharashtra, India
  2. MCA-II, Department of MCA, Vidya Bharti Mahavidyalaya, Amravati, Maharashtra, India
  3. MCA-II, Department of MCA, Vidya Bharti Mahavidyalaya, Amravati, Maharashtra, India
  4. Head, Department of MCA, Vidya Bharti Mahavidyalaya, Amravati, Maharashtra, India

IRJIET, Volume 8, Issue 10, October 2024 pp. 182-185

doi.org/10.47001/IRJIET/2024.810025

References

  1. Forbes Welcome, https://www.forbes.com/sites/gilpress/2014/09/03/12-big-data-definitions-whats-yours/#487d104413ae (Access on March 30, 2019).
  2. Hadoop, http://hadoop.apache.org (Access on March 30, 2019).
  3. Dean, J. and Ghemawat, S., MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), pp.107-113 (2008).
  4. Shah A., Padole M. (2019) Performance Analysis of Scheduling Algorithms in Apache Hadoop. In: Shukla R., Agrawal J., Sharma S., Singh Tomer G. (eds) Data, Engineering and Applications. Springer, Singapore.
  5. Shvachko, K., Kuang, H., Radia, S. and Chansler, R., 2010, May. The hadoop distributed file system. In MSST (Vol. 10, pp. 1-10).
  6. Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar,M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S. and Saha, B., (2013). Apache hadoop yarn: Yet another resource negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing (p.5). ACM.
  7. BaoRong Chang, Yo-Ai Wang, Yun-Da Lee, and Chien-Feng Huang, "Development of Multiple Big Data Analysis Platforms for Business Intelligence", Proceedings of the 2017 IEEE International Conference on Applied System Innovation.
  8. Chu-Hsing Lin, Jung-Chun Liu, Tsung-Chi Peng, "Performance Evaluation of Cluster Algorithms for Big Data Analysis on Cloud", Proceedings of the 2017 IEEE International Conference on Applied System Innovation.
  9. https://intellipaat.com/tutorial/hadooptutorial/introduction-hadoop/
  10. Apache Hadoop. http://hadoop.apache.org/
  11. Ms. Preeti Narooka, Dr. Sunita Choudhary, "Optimization of the Search Graph Using Hadoop and Linux Operating System", 2017 International Conference on Nascent Technologies in the Engineering Field (ICNTE-2017) IEEE-ICASI 2017.
  12. Yu-Sheng Su1, Ting-Jou Ding2, Jiann-Hwa Lue3, Chin-Feng Lai4, Chiu-Nan Su5,"Applying Big Data Analysis Technique to Students’ Learning Behavior and Learning", Proceedings of the 2017 IEEE International Conference on Applied System Innovation IEEE-ICASI 2017.