Affects Version/s: 3.5
Fix Version/s: None
The learning analytics system makes predictions about educational outcomes such as student success or course effectiveness. Ideally, the analytics model will be highly accurate in its predictions. However, two kinds of errors should be reported and investigated: Type I or "False positive," when the model predicts that an outcome will occur but it fails to occur (e.g. a student is predicted to be at risk of dropping out, but the student continues in the course until completion) and Type II or "False negative," when the model fails to predict an outcome that then occurs (e.g. a student is not predicted to be at risk, but then stops participating partway through the course). In either case, the model can only improve if some investigation is conducted on why the model was not accurate.
To investigate these inaccuracies, the following steps should be taken:
1: In the transition from "prediction" to "training" data set, the element (student, course, cohort, etc.) should be checked to see if its prediction was accurate. If not, the prediction should be marked as either False Positive or False Negative. This is especially important for cases with a large "Cook's Distance."
2: False predictions should be reviewed by a relevant user who might be able to identify a reason the prediction was not accurate. In the case of a prediction about the outcome of a student enrollment in a course, this could be the teacher, the student, or both. See MDL-62302
3: To make analysis simple, probable causes for model inaccuracy should be presented as a checklist that the user can select from, in addition to providing a free-text response. Examples for a False Positive prediction of a student at risk of dropping out might include:
- Intervention by the teacher helped the student
- The prediction notice and advice provided to the student helped to change the outcome
- One or more of the predictors are considered inappropriate by the reviewer (variables should be presented in order of largest deviance residual to smallest)
- Circumstances unrelated to the course changed the outcome (e.g. the student suddenly had a lot more or less time to work on the course than before)
Ethically, it is critical at this point to reveal to any users about whom predictions have been made what those predictions were, on what basis the predictions were made, and what information related to the predictions the system will continue to store and possibly use in future predictions.
4: Summary reports for each model categorizing failures should be available to the system/analytics administrator.
Note that it is also possible to conduct forensics on training data that do not fit the model. Care should be taken to avoid over-fitting the model, but examination of false positives and negatives may help to identify novel indicators that should be incorporated into the model.