From Explaining How Random Forest Classifier Predicts Learning of Software Engineering Teamwork to Guidance for Educators


Sabiha Barlaskar

Oral Defence Date: 

Monday, May 6, 2019 - 13:45


TH 434


Prof. Dragutin Petkovic


Machine Learning (ML) has gained increased popularity for its promise and ability to solve many problems in wide areas of applications like health, medicine, banking, information dissemination (e.g. news filtering), general business etc. However, the transparency or explainability of ML decision-making techniques is rarely provided. This makes it difficult for non-expert or even expert adopters to place their complete trust on it as well as to maintain and audit those ML systems. This in turn caused many concerns at technical, legal and regulatory level and got main (often negative) coverage in popular press. To address these issues, an approach to enhance explainability of Random Forest ML technique (RFEX) has already been created by Dr. Petkovic’s lab. It consists of a number of steps which provide explanation summary and has been previously applied to biomedical data. Goals of our project were: a) to improve RFEX by adding more robust steps to its pipeline; b) to apply RFEX to data from Software Engineering Teamwork Assessment and Prediction (SETAP) project where RF classifier was previously used to predict the learning of Software Engineering student teams working during courses taught at three universities (San Francisco State University, Fulda University and Florida Atlantic University). In this project, working with Prof. Petkovic and other team members, we significantly improved RFEX to create RFEX 2.0. We then applied it to SETAP data and using the factors analyzed, we obtained a number of practical recommendations helpful to educators of Software engineering to predict early failure of weaker teams. We published our work in a major educational conference and we are also finalizing a Jupyter notebook toolkit for RFEX 2.0 and plan to submit it to open source community.

Sabiha Barlaskar

Random Forest, Explainability, educational assessment, SETAP