I think I've found the culprit of this original bug:
the notify_login_failures() is failing (without error), causing the cron process to stop when clean-up tasks are executed! Not analised if it's due to one "excess" of records or what, but needs to be traced and fixed for sure!
Note that some cleanup-tasks like sync_metacourses(), setnew_password_and_mail(), tag_cron(), cleanup_contexts()... are never executed in some sites due to this problem.
I guess we can chnge current behaviour to one more robust (scaling well) alternative. Will research it this weekend.
Call this the NOTIFY_LOGIN_FAILURES_PROBLEM.
BTW, I've stopped those notifications in one big server to test the new implementation when ready.