SETAP: SW Engineering Teamwork Assessment and Prediction using Machine Learning


Swati Arora

Oral Defence Date: 



TH 434


Profs. Dragutin Petkovic, Kaz Okada, & Lecturer Marc Sosnick


Effective teaching of teamwork skills in local and globally distributed Software Engineering (SE) teams is recognized as an important part of the education of current and future software engineers. Effective methods for assessment and early prediction of learning effectiveness in SE teamwork are not only a critical part of teaching but also of value in industrial training and project management. SETAP project offers a novel analytical approach to the assessment and, most importantly, the prediction of learning outcomes in SE teamwork based on data from the joint software engineering class concurrently taught at San Francisco State University (SFSU), Florida Atlantic University (FAU) and Fulda University, Germany (Fulda). This approach differs from existing work in the following aspects: a) it develops and uses only objective and quantitative measures of team activity from multiple sources, such as statistics of student time use, software engineering tool use, and instructor observations; b) it leverages powerful machine learning (ML) techniques applied to team activity measurements to identify and rank quantitative and objective factors which can assess and predict student learning of software engineering teamwork skills. In this project we focus on applying Random Forest (RF) ML approach to estimate accuracy in predicting teams that are likely to fail, based on the data consisting of over 40 objective and quantitative measures extracted from students working on class projects. We also evaluate various measures used in RF decision making and rank them by their predictive power using RF built in functions for variable importance. The data is obtained from our joint software engineering classes in Fall 2012, and Spring 2013, from 17 student teams. Our results are preliminary due to small database but show that RF can predict teams that fail with good recall and high precision (90%) and that several variables like times spent in meetings and length of comments to source code repository provide high predictive power. This research is funded by NSF TUES Grant # 1140172


Assessment; Software Engineering Teamwork, Machine Learning, Education


Swati Arora