Big Data Analytics Using Machine Learning Techniques for Prediction on Datasets

Big Data Analytics Using Machine Learning Techniques for Prediction on Datasets

Ankit Verma 1, Hansraj 2

Computational Intelligence and Machine Learning . 2023 April ; 4(1): 6-10. Published online April 2023

doi.org/10.36647/CIML/04.01.A002

Abstract :Data analytics is the process of performing scientific and statistical analysis on raw data in order to transform it into information that can be used for gaining knowledge. A recently emerging trend in feature abstraction is the combination of computational techniques and big data analysis. This requires gaining knowledge from trustworthy data sources, being able to digest information quickly, and making accurate predictions about the future. The primary objective of this study is to locate the machine learning strategies that produce the most accurate prediction by utilising the model that has been proposed. The supervised and unsupervised strategies have been implemented in a variety of different ways using the MapReduce methodology; however, the suggested model makes use of the Apache Spark framework in order to compare the many existing methods. In this study, the emphasis is placed on elucidating the characteristics of datasets in order to conduct the most accurate analysis possible using machine learning techniques. For the purpose of conducting an analysis of the data sets, machine learning methods such as linear regression, decision trees, random forests, and gradient boosting tree algorithms are utilised. In light of the findings of this research, it is possible to draw the conclusion that when the Spark framework is applied on top of Machine Learning methods, the efficiency of the model is improved by a factor of seventy percent in comparison to the MapReduce paradigm.

Keyword : Apache Spark Framework, Big Data Analytics, Machine Learning Algorithms, MapReduce Paradigm.