Using Machine Learning to Advance High School Dropout Prediction and Prevention
The importance of high school completion for jobs and postsecondary opportunities is well- documented. Combined with federal laws where high school graduation rate is a core performance indicator, school, districts, and states face pressure to actively monitor and assess high school completion. This study employs machine learning techniques to identify students at-risk of exiting high school in either 9th or 10th grade. I find increased precision when applying resampling techniques to balance the training data, and that logistic regression performs similarly to more complex algorithms. When assessing the algorithmic fairness of models, I find most models tend to discriminate students with group membership in English proficiency, disability, and economic disadvantage attributes. Post-hoc analyses of the XGboost model reveal that a student’s age in 8th grade followed by middle grade absences, especially chronic absenteeism, is predictive of early exit. This study advances the current state of knowledge in the field by (1) generating synthetic data to improve model accuracy, (2) ensuring that model predictions prevent the deepening of structural inequities, and (3) exploring novel approaches to enhance the explainability associated with “black box” models, ultimately generating actionable insights for practitioners and stakeholders.