A Multifaceted Data Mining Approach To Understanding What Factors Lead College Students To Persist And Graduate


Aparna Gopalakrishnan

Oral Defence Date: 

Monday, March 14, 2016 - 15:00


TH 434


Assoc. Prof. Hui Yang, Assistant Prof. Anagha Kulkarni & Celia Graterol, M.P.H.


This study describes a host of generalizable and data mining-based approaches to identify factors that contribute towards student persistence and graduation, using data from an academic program named Metro College Success Program at San Francisco State University, California. These approaches include (1) a visual analysis to identify bivariate relationships and to understand the flow of students in an educational institute, (2) an ensemble feature selection method to recognize factors that have a significant impact on a student's persistence and graduation, (3) classification and prediction algorithms to predict whether a student will persist in a given semester and ultimately graduate, and (4) a variety of association patterns to help education practitioners gain further insights into factors that affect persistence and graduation. Our analysis reveals the following main insights: (1) most students who dropout do so in the fourth and seventh terms, (2) the educational level of a student's mother, the ELM (Entry Level Mathematics) score and race are identified as the most influential factors in predicting a student's third-term persistence, (3) Naive Bayesian is the most suitable model for predicting graduation while AdaBoost and SVM models are most suited for predicting persistence (4) a student's low ELM score and Pell eligibility (an indicator of socioeconomic status) together predict a lower rate of graduation. By collaborating with practitioners and focusing on generating human-interpretable results, the study helped identify bottlenecks to a student's path towards graduation.

Aparna Gopalakrishnan

Association patterns, college student persistence and graduation, classification, feature selection, data mining