Applications of Data Mining to Student Performance Prediction and Curriculum Design


Paul Previde

Oral Defence Date: 

Tuesday, May 7, 2019 - 14:10


TH 935


Prof. Hui Yang and Prof. Anagha Kulkarni


This thesis project analyzed the curriculum-level factors that affected the persistence and graduation outcomes of over 2,000 students in the Metro College Success Program at San Francisco State University, and their control group counterparts. This work addressed four questions: (1) how did the timing of students' Mathematics courses affect their performance, persistence, and graduation outcomes; (2) whether students who progressed farther through the prescribed foundation course sequences of the program exhibited higher persistence and graduation rates; (3) what were the most frequently taken sequences of courses, and whether students who progressed farther through those sequences exhibited higher persistence and graduation rates; and (4) whether greater progress was more important than other demographic and academic factors for predicting persistence and graduation. The study found that students who took their Math course in the second year showed higher fifth-term and seventh-term persistence than students who took it in the first year. Also, students who progressed farther through course sequences consistently exhibited higher persistence and graduation rates. Furthermore, a student's persistence was a more reliable predictor of graduation than other features. Overall, these findings can potentially inform an institution's strategies for maximizing persistence and graduation by emphasizing a student’s progress through the curriculum. This project also demonstrated the utility of sequential pattern mining as a technique for the analysis of course sequences.


Learning Analytics, Educational Data Mining, Machine Learning


Paul Previde