Аннотации:
In today's rapidly evolving landscape of higher education, the effective management and analysis of academic data have
become increasingly challenging, particularly in the context of the 3Vs of Big Data: volume, variety, and velocity. The amount of data
produced by educational institutions has increased dramatically, including student records. This flood of data originates from various
sources and takes several forms, such as learning management systems and student information systems. Hence, in education, data
analytics and predictive modeling have become increasingly significant in acquiring insights into student performance, such as
identifying at-risk students who are most likely to fail their courses. This study proposes a novel approach for predicting student
academic performance, particularly identifying at-risk students, by leveraging a data lake architecture. The proposed methodology
comprises the ingestion, transformation, and quality assessment of a combined data source from Universiti Putra Malaysia's Student
Information System and learning management system within the data lake environment. With its parallel processing capabilities, this
centralized data repository facilitates the training and evaluation of various machine learning models for prediction. In addition to
forecasting the student performance, appropriate machine learning algorithms such as Support Vector Classifier, Naive Bayes, and
Decision Trees are used to build prediction models by using the data lake's scalability and parallel processing capabilities. This study
has laid a solid groundwork for using data architecture to improve students' performance.