Predicting Student Performance Using Data from an Auto-Grading System

Chen, Huanyi

Predicting Student Performance Using Data from an Auto-Grading System

Files

Chen_Huanyi.pdf (1.86 MB)

Date

2018-06-25

Authors

Chen, Huanyi

Advisor

Ward, Paul

Publisher

University of Waterloo

Abstract

As online auto-grading systems appear, information obtained from those systems can potentially enable the researchers to create predictive models to predict student behaviour and performances. In University of Waterloo, the ECE150 (Introductory Programming) Instructional Team wants insights into how to best allocate their limited teaching resources, especially individual tutoring, to achieve improved educational outcomes. However, currently, the Instructional Team allocates tutoring time in a reactive basis. They help students ''as-requested''. This approach serves those students with the wherewithal to request help, but many of the students who are struggling do not reach out for assistance. In ECE150 of year 2016, the Instructional Team had a hypothesis that the assignment grades may not be an accurate predictor of students' performance. Instead, they had another hypothesis that a behaviour analysis of student performance might be able to identify students for proactive intervention. Therefore, we, as the Research Team, want to explore what can be inferred from the students' behaviour, such as how frequently they submit, how early they submit for the first time, from the auto-grading data that can potentially allow us to identify students who need help. However, given the changing nature of the setup of auto-grading systems (for example, assignment content might be different from year to year), it is more important for us to explore the data and get insights, rather than trying to create a precise predictive model. 1. If we put students into categories according to their final exam and midterm performances, can we create a model over the auto-grading data to understand the students' behaviour and predict those categories? More importantly, to predict the students who need help and identify them as early as possible. 2. Can we predict students’ raw numerical midterm grades and raw final exam grades from the students' behaviour? 3. Can we find any interesting relations between the features generated (reflecting students' behaviour) from auto-grading system information, grades and student categories? In our experiments, we generated different type of features based on the raw data we collected from the Marmoset of 428 first-year students in ECE150 of year 2016, such as the passing rate for each programming task, the testcase outcomes, the number of submissions, the lab attendance and the time interval of submissions. The experiments for those features are our first step for exploring the auto-grading data. However, we mentioned more features which are reasonable for conducting experiments in the thesis and future experiments will be conducted for them. We applied a decision-tree algorithm to all above features and a linear regression algorithm to the time intervals feature to predict the students' grades on their midterm and final exam. In all experiments, we split the data into training set and testing set. The training set was balanced by applying Synthetic Minority Oversampling Technique (SMOTE). For regression, we used the time interval between the student's first reasonable submission and the deadline as the feature, and applied linear regression algorithm to predict the exam grades. The results showed that for the midterm, the mean of difference between predicted midterm grades and actual midterm grades (maximum is 110 points) is -5.76 points and the standard deviation is 16.44 points. For the final exam, the mean of difference between predicted final exam grades and actual final exam grades (maximum is 120 points) is 0.92 points and the standard deviation is 17.12 points. In order to stabilize the residual variance, power transformation was applied. For classification, students were divided into three categories according to their midterm and final exam grades: good-performance students, satisfactory-performance students, and poor-performance students, and we used C4.5 decision tree algorithm to classify students. In order to take the regression model into comparison, we used the predicted midterm and final exam grades to create predicted categories for regression method. The results showed that for both midterm and final exam, the regression model using the time interval between the student's first reasonable submission and the deadline gave us the best Precision and F-measure for predicting which students would perform poorly on the exams. During the experiments, we found for predicting raw midterm grades or raw final exam grades, the time interval information from the assignment assigned right before the midterm exam or the final exam was most correlated with the midterm grades or final exam grades; however, if we considered midterm grades for the final exam, we found the correlation of the midterm grades was greater than the correlation of all assignments. The experiment results show that the linear regression model using submission time interval performs better than other models and further researching on this might be the best next step. However, since this is only a preliminary auto-grading data exploratory study, we can only get limited insight from the data and features. Future work will include performing additional experiments on combining different features to explore the data and as we collect more data, we can reach more definitive conclusions.