Student Performance Prediction Using the UCI Dataset: A Comparison of Interpretable and Ensemble Models
DOI:
https://doi.org/10.11113/ajee2026.10n1.211Keywords:
machine learning, educational data mining, performance prediction, linear regression, feature importanceAbstract
AI technologies have not only transformed teaching methods but have also provided novel solutions for education management, assessment, and personalized learning. This study examines whether complex machine learning models consistently outperform simple interpretable approaches in predicting student outcomes. Using the UCI Student Performance dataset, five key predictors: First period grade (G1), Second period grade (G2), Number of past class failures (Failures), Mother's education level (Medu) and Higher education aspiration (Higher), were extracted from 32 original attributes. Three models, including Linear Regression (LR), Random Forest (RF), and an Ensemble Model (EM), were evaluated across Mathematics and Portuguese subjects using MSE, RMSE, MAE, and , with five-fold cross-validation to assess robustness. Experimental results demonstrate that Linear Regression achieved the best overall performance in both subjects, with = 0.779 for Mathematics and = 0.862 for Portuguese , whereas RF and EM did not yield consistent gains. Portuguese is generally more predictable than Mathematics under the same pipeline. Feature influence analysis indicates that early-term grades (G1 and G2) dominate predictive power, suggesting that the approach supports mid-semester/operational prediction rather than start-of-term early-warning. Overall, the findings highlight the practical value of interpretable models for educational analytics when transparency and deployability are important.
















