Moodle

cron execution, review, fix and improve

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Fixed
  • Affects Version/s: 1.9
  • Fix Version/s: 1.9, 2.0
  • Component/s: Administration
  • Labels:
    None
  • Database:
    Any
  • Affected Branches:
    MOODLE_19_STABLE
  • Fixed Branches:
    MOODLE_19_STABLE, MOODLE_20_STABLE

Description

After the "Deleted old cache_text records" step of cron execution, something strange happens... no more output.....

Have to test it.

Issue Links

Activity

Hide
Eloy Lafuente (stronk7) added a comment -

Another thing: I was so lucky because get the daily send of digest mails in action. And this line:

if ($digestposts = get_records('forum_queue')) {

just crash if there are too many records in the table (right now one server has more than 800.000!).

I would propose to:

  • Change code to use one recordset.
  • Add one more query to delete old records say, one week, at the begining of the forum cron. just to avoid current and future accumulation!
  • Add some trace message about the digest process starting (was impossible to detect if before!).

And this is the called DIGEST_CRON_PROBLEM from now.

I continue advancing...

Show
Eloy Lafuente (stronk7) added a comment - Another thing: I was so lucky because get the daily send of digest mails in action. And this line: if ($digestposts = get_records('forum_queue')) { just crash if there are too many records in the table (right now one server has more than 800.000!). I would propose to:
  • Change code to use one recordset.
  • Add one more query to delete old records say, one week, at the begining of the forum cron. just to avoid current and future accumulation!
  • Add some trace message about the digest process starting (was impossible to detect if before!).
And this is the called DIGEST_CRON_PROBLEM from now. I continue advancing...
Hide
Eloy Lafuente (stronk7) added a comment -

One more problem found causing some of the rnadom breakages here and there:

There is one memory limit for the script: @raise_memory_limit('128M');

And, when the number of messages in forum are high, if grows, and grows, and grows... and break somewhere (in the forum cron itself) or later, when the script requires memory over the limit.

But the place were memory grows continuously is forum cron (just guessing if it's due to some cap/role/context cache) that, not having limits, causes this to happen. IMO, all those static caches, than can be executed from cron.... should have a limit.

Call this the RAISE_MEMORY_LIMIT_PROBLEM from now.

Continue...

Show
Eloy Lafuente (stronk7) added a comment - One more problem found causing some of the rnadom breakages here and there: There is one memory limit for the script: @raise_memory_limit('128M'); And, when the number of messages in forum are high, if grows, and grows, and grows... and break somewhere (in the forum cron itself) or later, when the script requires memory over the limit. But the place were memory grows continuously is forum cron (just guessing if it's due to some cap/role/context cache) that, not having limits, causes this to happen. IMO, all those static caches, than can be executed from cron.... should have a limit. Call this the RAISE_MEMORY_LIMIT_PROBLEM from now. Continue...
Hide
Eloy Lafuente (stronk7) added a comment -

After some chat in HQ with Petr it seems that the role_unassign() function itself includes the necessary code to delete, for COURSE contexts, the associated user_lastaccess record if the user hasn't anymore the course:view capabilty.

But some sites can show orphan records that never are deleted from that funcion, so I would introduce one new iterator in cron.php in order to detect and delete all those orphan records from user_lastaccess.

Call this the ORPHAN_USER_LASTACCESS_PROBLEM from now. Patch attached. Requires approval, plz.

Continue...

Show
Eloy Lafuente (stronk7) added a comment - After some chat in HQ with Petr it seems that the role_unassign() function itself includes the necessary code to delete, for COURSE contexts, the associated user_lastaccess record if the user hasn't anymore the course:view capabilty. But some sites can show orphan records that never are deleted from that funcion, so I would introduce one new iterator in cron.php in order to detect and delete all those orphan records from user_lastaccess. Call this the ORPHAN_USER_LASTACCESS_PROBLEM from now. Patch attached. Requires approval, plz. Continue...
Hide
Eloy Lafuente (stronk7) added a comment - - edited

ORPHAN_USER_LASTACCESS_PROBLEM has been fixed in cron.php

Petr has detected that on course deletion, records aren't deleted from the user_lastaccess table. Fixing includes:

  • delete such records in the delete_course() function
  • delete such records if necessary in reset_course()
  • delete when one user auto unenrol from one course?
  • as part of the upgrade, delete all records belonging to nonexistent courses.

Call this the ORPHAN_BY_COURSE_DELETE_PROBLEM.

Show
Eloy Lafuente (stronk7) added a comment - - edited ORPHAN_USER_LASTACCESS_PROBLEM has been fixed in cron.php Petr has detected that on course deletion, records aren't deleted from the user_lastaccess table. Fixing includes:
  • delete such records in the delete_course() function
  • delete such records if necessary in reset_course()
  • delete when one user auto unenrol from one course?
  • as part of the upgrade, delete all records belonging to nonexistent courses.
Call this the ORPHAN_BY_COURSE_DELETE_PROBLEM.
Hide
Eloy Lafuente (stronk7) added a comment -

I think I've found the culprit of this original bug:

the notify_login_failures() is failing (without error), causing the cron process to stop when clean-up tasks are executed! Not analised if it's due to one "excess" of records or what, but needs to be traced and fixed for sure!

Note that some cleanup-tasks like sync_metacourses(), setnew_password_and_mail(), tag_cron(), cleanup_contexts()... are never executed in some sites due to this problem.

I guess we can chnge current behaviour to one more robust (scaling well) alternative. Will research it this weekend.

Call this the NOTIFY_LOGIN_FAILURES_PROBLEM.

BTW, I've stopped those notifications in one big server to test the new implementation when ready.

Ciao

Show
Eloy Lafuente (stronk7) added a comment - I think I've found the culprit of this original bug: the notify_login_failures() is failing (without error), causing the cron process to stop when clean-up tasks are executed! Not analised if it's due to one "excess" of records or what, but needs to be traced and fixed for sure! Note that some cleanup-tasks like sync_metacourses(), setnew_password_and_mail(), tag_cron(), cleanup_contexts()... are never executed in some sites due to this problem. I guess we can chnge current behaviour to one more robust (scaling well) alternative. Will research it this weekend. Call this the NOTIFY_LOGIN_FAILURES_PROBLEM. BTW, I've stopped those notifications in one big server to test the new implementation when ready. Ciao
Hide
Eloy Lafuente (stronk7) added a comment -

and, after some repeated mistakes, this bug is reorganized in subtasks. grrr. :-P

Feel free to add new if needed (and to fix current, if possible).

Ciao

Show
Eloy Lafuente (stronk7) added a comment - and, after some repeated mistakes, this bug is reorganized in subtasks. grrr. :-P Feel free to add new if needed (and to fix current, if possible). Ciao
Hide
Eloy Lafuente (stronk7) added a comment -

Raising this to blocker because contains "blocker" sub-tasks (although we are near fixing all them).

Show
Eloy Lafuente (stronk7) added a comment - Raising this to blocker because contains "blocker" sub-tasks (although we are near fixing all them).
Hide
Martin Dougiamas added a comment -

I'm dropping the priority on this because the last issue doesn't appear serious compared to all the other bugs

Show
Martin Dougiamas added a comment - I'm dropping the priority on this because the last issue doesn't appear serious compared to all the other bugs
Hide
Eloy Lafuente (stronk7) added a comment -

With all the subtasks currently closed... closing this too. B-)

Ciao

Show
Eloy Lafuente (stronk7) added a comment - With all the subtasks currently closed... closing this too. B-) Ciao

People

Vote (2)
Watch (1)

Dates

  • Created:
    Updated:
    Resolved: