1. There is no record of wether the last cron job was completed in it's entirety or not.
MDL-11170 cause the cron job to fail silently (unless someone is frequently reading the output of it, which is unlikely). In the case of MDL-11170, the cron job crashed before any backups had the chance to run, which makes this kind of problem dangerous.
Timestamping and recording the running status of the cron job (the same way as with scheduled backups) will allow the detection of an incomplete cron job the next time one is started. This will make it possible to notify the administrators that something is wrong.
2. Every individual cron activity has a race condition, and so any cron activity migt be running simultaneously with itself.
This is a real problem if cron jobs are run very frequently (it ran every minute in my case), and if the server is under heavy load (while running all the nightly cron jobs, in my case). Under these load conditions database requests took extremely long, so long that several cron job invocation inadvertently were synchronized and managed to even start backing up the same individual courses simultaneously.
There are at least three race conditions/critical sections in the code that is supposed to prevent simultaneous backups (the other cron activities have similar ones):
- While checking for a correctly terminated run of backups, in schedule_backup_cron between lines 26 and 47
- While restarting an incorrectly terminated run of backups, betwheen schedule_backup_cron line 36 and the first invocation of schedule_backup_launch_backup line 217
- While backuping each course, in schedule_backup_cron between lines 89 and 137
These sections should be protected from being run by several processes at once with a proper mutual exclusion lock.