Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-73293

Adhoc task metadata around running tasks is wrong when processes are killed

XMLWordPrintable

    • MOODLE_311_STABLE, MOODLE_400_STABLE
    • MOODLE_401_STABLE
    • MDL-73293-master
    • Hide

      Prerequisites

      1. Install tool_testtasks:

        git clone git@github.com:catalyst/moodle-tool_testtasks.git admin/tool/testtasks

      2. Modify this patch so tasks get cleaned up quicker - the example below makes it MINSECS (one minute) but you could make it higher (e.g., 5*MINSECS) if you have difficulty quickly switching between terminals and the web browser (which is required in later testing instructions):
        1. Open lib/classes/task/manager.php
        2. In the cleanup_metadata method, look for:

          if ($task->timestarted > time() - HOURSECS) {
              continue;
          }
          

        3. Modify it to:

          if ($task->timestarted > time() - MINSECS) {
              continue;
          }
          

      Testing (normal case)

      1. Browse to "Site administration" > "Server" > "Tasks running now" (you'll need to get back to this page quickly after the next steps)
      2. Queue up 2 1000 second tasks:

        php admin/tool/testtasks/cli/queue_multiple_adhoc_tasks.php -n=2 -c="tool_testtasks\task\one_thousand_second_task"

      3. In two terminals, start processing the two tasks (one in each terminal):

        php admin/cli/adhoc_task.php -e

      4. Kill one of the adhoc task runners in one of the terminals by pressing ctrl+z followed by the command: kill -9 %1
      5. Start a slow scheduled task in another terminal:

        php admin/cli/scheduled_task.php --execute=\tool_testtasks\\task\\slow_task  
        

      6. Refresh the "Tasks running now page" and verify you see two 1000 second tasks processing, and a slow scheduled task
      7. Run the cleanup task:

        php admin/cli/scheduled_task.php --execute=\\core\\task\\task_lock_cleanup_task 

      8. Verify both adhoc tasks and a slow task (for a total of 3 rows) are still shown on the "Tasks running now" page (you'll have to be quick and check this before the slow task completes)
      9. Once the slow task has completed, refresh the "Tasks running now" page and Verify the two 1000 second tasks are still there
      10. Wait for 5 minutes (keep checking the "Tasks running now page" until all tasks have been running for more than 5 minutes - also verify that the slow task goes away once it completes)
      11. Run the cleanup task again:

        php admin/cli/scheduled_task.php --execute=\\core\\task\\task_lock_cleanup_task 

      12. Verify only one 1000 second task is left on the "Tasks running now" page
      13. Check the mdl_task_adhoc table and verify there are two records there, and verify that one of them has a faildelay set
      14. Start another slow task:

        php admin/cli/scheduled_task.php --execute=\tool_testtasks\\task\\slow_task  
        

      15. Kill it by pressing ctrl+z followed by the command: kill -9 %1
      16. Check the "Tasks running now" page and verify you see the 1000 second task and the slow task
      17. Run the cleanup task:

        php admin/cli/scheduled_task.php --execute=\\core\\task\\task_lock_cleanup_task 

      18. Check the "Tasks running now" page and verify both tasks are still there
      19. Keep refreshing the "Tasks running now" page until it shows the slow task has been running for more than 5 minutes
      20. Run the cleanup task:

        php admin/cli/scheduled_task.php --execute=\\core\\task\\task_lock_cleanup_task 

      21. Check the "Tasks running now" page and verify only the 1000 second task remains

      Testing (race condition)

      There is a chance of a race condition. To test this a testing commit is provided that exposes the conditions required to race more easily. While the cleanup task is processing running tasks, it is possible for the data to become stale (if a task finishes while the cleanup task is in the process of cleaning another task). The testing commit adds a sleep in the main loop of the cleanup task, making it easy for a task to finish while processing is happening.

      1. Revert the changes applied in the prerequisites step and apply attached patch (it also adds some extra logging and reduces the time required for a task to be considered for cleanup)

        git apply /PATH/TOO/PATCH/MDL-73293.patch
        

      1. Stop any tasks that are still running (use ctrl+c) and cleanup the queue:

        php admin/tool/testtasks/cli/clear_adhoc_task_queue.php

      2. Queue up two 100 second tasks:

        php admin/tool/testtasks/cli/queue_multiple_adhoc_tasks.php -n=2 -c="tool_testtasks\task\one_hundred_second_task"

      3. In two terminals, start processing the two tasks (one in each terminal):

        php admin/cli/adhoc_task.php -e

      4. Wait for 5 seconds and then start the cleanup task:

        php admin/cli/scheduled_task.php --execute=\\core\\task\\task_lock_cleanup_task 

      5. Watch the logging messages carefully and ensure both adhoc tasks finish before the cleanup task
      6. Verify the output of the cleanup tasks show that an adhoc task completed and there's no need to do anything
      7. Queue up a single 100 second task and start processing it:

        php admin/tool/testtasks/cli/queue_multiple_adhoc_tasks.php -n=1 -c="tool_testtasks\task\one_hundred_second_task"
        php admin/cli/adhoc_task.php -e
        

      8. Start a slow scheduled task:

        php admin/cli/scheduled_task.php --execute=\tool_testtasks\\task\\slow_task

      9. Wait 5 seconds and run the cleanup task:

        php admin/cli/scheduled_task.php --execute=\\core\\task\\task_lock_cleanup_task 

      10. Verify the output of the cleanup tasks show that a scheduled task completed and there's no need to do anything
      11. Start a slow scheduled task:

        php admin/cli/scheduled_task.php --execute=\tool_testtasks\\task\\slow_task

      12. Queue up a single 100 second task and start processing it:

        php admin/tool/testtasks/cli/queue_multiple_adhoc_tasks.php -n=1 -c="tool_testtasks\task\one_hundred_second_task"
        php admin/cli/adhoc_task.php -e
        

      13. Wait for 5 seconds and then start the cleanup task:

        php admin/cli/scheduled_task.php --execute=\\core\\task\\task_lock_cleanup_task 

      14. Verify the output of the cleanup tasks show that an adhoc task completed and there's no need to do anything
      Show
      Prerequisites Install tool_testtasks: git clone git@github.com:catalyst/moodle-tool_testtasks.git admin/tool/testtasks Modify this patch so tasks get cleaned up quicker - the example below makes it MINSECS (one minute) but you could make it higher (e.g., 5*MINSECS) if you have difficulty quickly switching between terminals and the web browser (which is required in later testing instructions): Open lib/classes/task/manager.php In the cleanup_metadata method, look for: if ($task->timestarted > time() - HOURSECS) { continue; } Modify it to: if ($task->timestarted > time() - MINSECS) { continue; } Testing (normal case) Browse to "Site administration" > "Server" > "Tasks running now" (you'll need to get back to this page quickly after the next steps) Queue up 2 1000 second tasks: php admin/tool/testtasks/cli/queue_multiple_adhoc_tasks.php -n=2 -c="tool_testtasks\task\one_thousand_second_task" In two terminals, start processing the two tasks (one in each terminal): php admin/cli/adhoc_task.php -e Kill one of the adhoc task runners in one of the terminals by pressing ctrl+z followed by the command: kill -9 %1 Start a slow scheduled task in another terminal: php admin/cli/scheduled_task.php --execute=\tool_testtasks\\task\\slow_task Refresh the "Tasks running now page" and verify you see two 1000 second tasks processing, and a slow scheduled task Run the cleanup task: php admin/cli/scheduled_task.php --execute=\\core\\task\\task_lock_cleanup_task Verify both adhoc tasks and a slow task (for a total of 3 rows) are still shown on the "Tasks running now" page (you'll have to be quick and check this before the slow task completes) Once the slow task has completed, refresh the "Tasks running now" page and Verify the two 1000 second tasks are still there Wait for 5 minutes (keep checking the "Tasks running now page" until all tasks have been running for more than 5 minutes - also verify that the slow task goes away once it completes) Run the cleanup task again: php admin/cli/scheduled_task.php --execute=\\core\\task\\task_lock_cleanup_task Verify only one 1000 second task is left on the "Tasks running now" page Check the mdl_task_adhoc table and verify there are two records there, and verify that one of them has a faildelay set Start another slow task: php admin/cli/scheduled_task.php --execute=\tool_testtasks\\task\\slow_task Kill it by pressing ctrl+z followed by the command: kill -9 %1 Check the "Tasks running now" page and verify you see the 1000 second task and the slow task Run the cleanup task: php admin/cli/scheduled_task.php --execute=\\core\\task\\task_lock_cleanup_task Check the "Tasks running now" page and verify both tasks are still there Keep refreshing the "Tasks running now" page until it shows the slow task has been running for more than 5 minutes Run the cleanup task: php admin/cli/scheduled_task.php --execute=\\core\\task\\task_lock_cleanup_task Check the "Tasks running now" page and verify only the 1000 second task remains Testing (race condition) There is a chance of a race condition. To test this a testing commit is provided that exposes the conditions required to race more easily. While the cleanup task is processing running tasks, it is possible for the data to become stale (if a task finishes while the cleanup task is in the process of cleaning another task). The testing commit adds a sleep in the main loop of the cleanup task, making it easy for a task to finish while processing is happening. Revert the changes applied in the prerequisites step and apply attached patch (it also adds some extra logging and reduces the time required for a task to be considered for cleanup) git apply /PATH/TOO/PATCH/MDL- 73293 .patch Stop any tasks that are still running (use ctrl+c ) and cleanup the queue: php admin/tool/testtasks/cli/clear_adhoc_task_queue.php Queue up two 100 second tasks: php admin/tool/testtasks/cli/queue_multiple_adhoc_tasks.php -n=2 -c="tool_testtasks\task\one_hundred_second_task" In two terminals, start processing the two tasks (one in each terminal): php admin/cli/adhoc_task.php -e Wait for 5 seconds and then start the cleanup task: php admin/cli/scheduled_task.php --execute=\\core\\task\\task_lock_cleanup_task Watch the logging messages carefully and ensure both adhoc tasks finish before the cleanup task Verify the output of the cleanup tasks show that an adhoc task completed and there's no need to do anything Queue up a single 100 second task and start processing it: php admin/tool/testtasks/cli/queue_multiple_adhoc_tasks.php -n=1 -c="tool_testtasks\task\one_hundred_second_task" php admin/cli/adhoc_task.php -e Start a slow scheduled task: php admin/cli/scheduled_task.php --execute=\tool_testtasks\\task\\slow_task Wait 5 seconds and run the cleanup task: php admin/cli/scheduled_task.php --execute=\\core\\task\\task_lock_cleanup_task Verify the output of the cleanup tasks show that a scheduled task completed and there's no need to do anything Start a slow scheduled task: php admin/cli/scheduled_task.php --execute=\tool_testtasks\\task\\slow_task Queue up a single 100 second task and start processing it: php admin/tool/testtasks/cli/queue_multiple_adhoc_tasks.php -n=1 -c="tool_testtasks\task\one_hundred_second_task" php admin/cli/adhoc_task.php -e Wait for 5 seconds and then start the cleanup task: php admin/cli/scheduled_task.php --execute=\\core\\task\\task_lock_cleanup_task Verify the output of the cleanup tasks show that an adhoc task completed and there's no need to do anything

      Install test tasks:

      https://github.com/catalyst/moodle-tool_testtasks

      Queue up a long running task:

      php admin/tool/testtasks/cli/queue_adhoc_tasks.php -d=100

      Now run it:

      php admin/cli/adhoc_task.php -e

      Then kill the process to simulate a kill -9 or a container reap. The adhoc tasks table will still have a non null timestarted which means it thinks it is still running when you visit:

      /admin/tool/task/runningtasks.php

      There isn't much we can do about preventing it, but we can clean up the metadata after the fact.

      Proposing:

      1) we introduce a new scheduled task which can be run relatively infrequently

      2) all it does it look for all adhoc or scheduled tasks which clai to have been running for a long time, such as more than an hour

      3) Attempt to gain a lock for the task with a 0 timeout. If the lock can be gained then we know the task isn't running and we can clean up the metadata.

       

        1. MDL-73293_normal case.webm
          12.57 MB
        2. MDL-73293_race condition_1.webm
          2.27 MB
        3. MDL-73293_race condition_2.webm
          2.41 MB
        4. MDL-73293_race condition_3.webm
          1.26 MB
        5. MDL-73293.patch
          3 kB

            cameron1729 cameron1729
            brendanheywood Brendan Heywood
            Brendan Heywood Brendan Heywood
            Ilya Tregubov Ilya Tregubov
            Angelia Dela Cruz Angelia Dela Cruz
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved:

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 1 hour, 40 minutes
                1h 40m

                  Error rendering 'clockify-timesheets-time-tracking-reports:timer-sidebar'. Please contact your Jira administrators.