I've been reviewing how this would analytics would work in clustered environments and imagining possible problems they could have (I'm added a couple of lines to https://docs.moodle.org/dev/Analytics_API#Clustered_environments as well)
Cron locks running tasks for us and that prevents multiple executions of the training process; this is important because would not be nice to have 2 different processes training a machine algorithm at the same time (the 2nd one will overwrite the 1st one saved data). To have a process both training and predicting at the same time is not a problem because the prediction process only involves reading the trained machine learning weights. There are some other stuff we could improve though although I'm not very worried about them:
- We are not protected at API level. Someone could write a plugin that calls core_analytics\model::train() multiple times. He would need to manually lock.
- Evaluation of prediction models is executed via command line. Multiple admin users with superpowers could potentially evaluate the same version of the same model in parallel. Core machine learning backends shouldn't have any problem with this because we are not storing trained algorithms weights during evaluation, but 3rd party ones could.
If I have not missed any important point these are 2 minor points we could improve but I think we are fine. I will leave this issue open for the record and for other possible points we are missing.