Student Performance Prediction Using the UCI Dataset: A Comparison of Interpretable and Ensemble Models

Jiajun Guo; Chenghang Liu; Tianxing Ji; Yuyao Li; Quan Zhang

doi:10.11113/ajee2026.10n1.211

Authors

Jiajun Guo Xi'an Jiaotong-Liverpool University
Chenghang Liu Xi’an Jiaotong-Liverpool University
Tianxing Ji Xi’an Jiaotong-Liverpool University
Yuyao Li Xi’an Jiaotong-Liverpool University
Quan Zhang Xi’an Jiaotong-Liverpool University

DOI:

https://doi.org/10.11113/ajee2026.10n1.211

Keywords:

machine learning, educational data mining, performance prediction, linear regression, feature importance

Abstract

AI technologies have not only transformed teaching methods but have also provided novel solutions for education management, assessment, and personalized learning. This study examines whether complex machine learning models consistently outperform simple interpretable approaches in predicting student outcomes. Using the UCI Student Performance dataset, five key predictors: First period grade (G1), Second period grade (G2), Number of past class failures (Failures), Mother's education level (Medu) and Higher education aspiration (Higher), were extracted from 32 original attributes. Three models, including Linear Regression (LR), Random Forest (RF), and an Ensemble Model (EM), were evaluated across Mathematics and Portuguese subjects using MSE, RMSE, MAE, and , with five-fold cross-validation to assess robustness. Experimental results demonstrate that Linear Regression achieved the best overall performance in both subjects, with = 0.779 for Mathematics and = 0.862 for Portuguese , whereas RF and EM did not yield consistent gains. Portuguese is generally more predictable than Mathematics under the same pipeline. Feature influence analysis indicates that early-term grades (G1 and G2) dominate predictive power, suggesting that the approach supports mid-semester/operational prediction rather than start-of-term early-warning. Overall, the findings highlight the practical value of interpretable models for educational analytics when transparency and deployability are important.

Student Performance Prediction Using the UCI Dataset: A Comparison of Interpretable and Ensemble Models

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

How to Cite

AJEE

Article Template

Database Indexing