Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-67483

Improve adhoc tasks quality of service at very high scale

    XMLWordPrintable

    Details

    • Testing Instructions:
      Hide

      0) Install these in admin tools to make the testing easier:

      https://github.com/catalyst/moodle-tool_testtasks

      https://github.com/catalyst/moodle-tool_lockstats

       

      Testing scenario 1. Test the case with a single task runner

      1) Queue up 1000 one seconds tasks:

      php admin/tool/testtasks/cli/queue_adhoc_tasks.php -d=1 -n=1000
      

      2) Behind this queue up a single 'another' type of adhoc task:

      php admin/tool/testtasks/cli/queue_adhoc_tasks.php -d=1 -n=1 --class='tool_testtasks\task\another_timed_adhoc_task'

      3) Peek into the task queue by class:

      $ select count(*),classname from mdl_task_adhoc group by classname;
       count | classname 
      -------+-----------------------------------------------
       1000 | \tool_testtasks\task\timed_adhoc_task
       1 | \tool_testtasks\task\another_timed_adhoc_task
      (2 rows)

      4) Now start processing the queue:

      php admin/tool/task/cli/adhoc_task.php --execute 
      

      5) You should see that the first 2 tasks it picked off cycled through each type and very quickly you are left with a number lower than 1000 (i.e. 997):

      $ select count(*),classname from mdl_task_adhoc group by classname;
       count | classname 
      -------+---------------------------------------
       997 | \tool_testtasks\task\timed_adhoc_task
      (1 row)

       

      Testing scenario 2. A bunch of runners in parallel

      1) First, allow this to work in config.php, also setup the lock stats tool which we'll use later:

      $CFG->task_adhoc_concurrency_limit = 1000;
      $CFG->lock_factory = '\tool_lockstats\proxy_lock_factory';
      $CFG->proxied_lock_factory = "auto";
      

      2) Queue up 22000 of type A and then another 22000 of type B behind it:

      $ php admin/tool/testtasks/cli/clear_adhoc_task_queue.php
      $ php admin/tool/testtasks/cli/queue_adhoc_tasks.php -d=1 -n=22000
      $ php admin/tool/testtasks/cli/queue_adhoc_tasks.php -d=1 -n=22000 --class='tool_testtasks\task\another_timed_adhoc_task' 
      

      3) Confirm the queues: 

      $ select count(*),classname from mdl_task_adhoc group by classname;
       count | classname 
      -------+-----------------------------------------------
       22000 | \tool_testtasks\task\timed_adhoc_task
       22000 | \tool_testtasks\task\another_timed_adhoc_task
      (2 rows)
      

      6) Now lets fire up several task runners, do this say 4 times (you may need to open several terminals in order to execute them):

      php admin/tool/task/cli/adhoc_task.php --execute &
      

      7) Recheck the queues and confirm they are being processed evenly (so the count column number is lower than in step #3):

      $ select count(*),classname from mdl_task_adhoc group by classname;
       count | classname 
      -------+-----------------------------------------------
       21944 | \tool_testtasks\task\timed_adhoc_task
       21944 | \tool_testtasks\task\another_timed_adhoc_task
      (2 rows) 
      

       

      Testing scenario 3.  NOT optional

      8) Lastly, and most importantly, keep throwing more and more runners at it and confirm that the overall system is still scaling up linearly the more processes you throw at it.

      ie fire up a number of processes and count how many processes you have running:

      php admin/tool/task/cli/adhoc_task.php --execute &

      Then using the lock stats tool you can see what is running right now in the gui:

      /admin/tool/lockstats/ 

      or from a sql shell:

      $ select count(*) from mdl_tool_lockstats_locks;
      count
      -------
      6
      (1 row) 
      

       Ideally you want to run this until it breaks to find the total maximum practical level of concurrency the system can handle. On my local box I saw something like:

      Cron processes Tasks being processed
      5 5
      10 10
      20 19
      30 28
      35 32
      40 18

      When it hits the max threshold each process may get a lock timeout and exit so you'll get a sharp drop off from linear back to something much smaller. This max concurrency is an issue with or without this patch but we need to make sure it doesn't go backwards under similar conditions.

       

       

       

      Show
      0) Install these in admin tools to make the testing easier: https://github.com/catalyst/moodle-tool_testtasks https://github.com/catalyst/moodle-tool_lockstats   Testing scenario 1. Test the case with a single task runner 1) Queue up 1000 one seconds tasks: php admin /tool/testtasks/cli/queue_adhoc_tasks .php -d=1 -n=1000 2) Behind this queue up a single 'another' type of adhoc task: php admin /tool/testtasks/cli/queue_adhoc_tasks .php -d=1 -n=1 --class= 'tool_testtasks\task\another_timed_adhoc_task' 3) Peek into the task queue by class: $ select count(*),classname from mdl_task_adhoc group by classname; count | classname -------+----------------------------------------------- 1000 | \tool_testtasks\task\timed_adhoc_task 1 | \tool_testtasks\task\another_timed_adhoc_task (2 rows) 4) Now start processing the queue: php admin /tool/task/cli/adhoc_task .php --execute  5) You should see that the first 2 tasks it picked off cycled through each type and very quickly you are left with a number lower than 1000 (i.e. 997): $ select count(*),classname from mdl_task_adhoc group by classname; count | classname -------+--------------------------------------- 997 | \tool_testtasks\task\timed_adhoc_task (1 row)   Testing scenario 2. A bunch of runners in parallel 1) First, allow this to work in config.php, also setup the lock stats tool which we'll use later: $CFG ->task_adhoc_concurrency_limit = 1000; $CFG ->lock_factory = '\tool_lockstats\proxy_lock_factory' ; $CFG ->proxied_lock_factory = "auto" ; 2) Queue up 22000 of type A and then another 22000 of type B behind it: $ php admin /tool/testtasks/cli/clear_adhoc_task_queue .php $ php admin /tool/testtasks/cli/queue_adhoc_tasks .php -d=1 -n=22000 $ php admin /tool/testtasks/cli/queue_adhoc_tasks .php -d=1 -n=22000 --class= 'tool_testtasks\task\another_timed_adhoc_task'   3) Confirm the queues:  $ select count(*),classname from mdl_task_adhoc group by classname; count | classname -------+----------------------------------------------- 22000 | \tool_testtasks\task\timed_adhoc_task 22000 | \tool_testtasks\task\another_timed_adhoc_task ( 2 rows) 6) Now lets fire up several task runners, do this say 4 times (you may need to open several terminals in order to execute them): php admin /tool/task/cli/adhoc_task .php --execute & 7) Recheck the queues and confirm they are being processed evenly (so the count column number is lower than in step #3): $ select count(*),classname from mdl_task_adhoc group by classname; count | classname -------+----------------------------------------------- 21944 | \tool_testtasks\task\timed_adhoc_task 21944 | \tool_testtasks\task\another_timed_adhoc_task ( 2 rows)    Testing scenario 3.  NOT optional 8) Lastly, and most importantly, keep throwing more and more runners at it and confirm that the overall system is still scaling up linearly the more processes you throw at it. ie fire up a number of processes and count how many processes you have running: php admin/tool/task/cli/adhoc_task.php --execute & Then using the lock stats tool you can see what is running right now in the gui: /admin/tool/lockstats/  or from a sql shell: $ select count(*) from mdl_tool_lockstats_locks; count ------- 6 (1 row)  Ideally you want to run this until it breaks to find the total maximum practical level of concurrency the system can handle. On my local box I saw something like: Cron processes Tasks being processed 5 5 10 10 20 19 30 28 35 32 40 18 When it hits the max threshold each process may get a lock timeout and exit so you'll get a sharp drop off from linear back to something much smaller. This max concurrency is an issue with or without this patch but we need to make sure it doesn't go backwards under similar conditions.      
    • Affected Branches:
      MOODLE_39_STABLE
    • Fixed Branches:
      MOODLE_39_STABLE
    • Pull Master Branch:
      MDL-67483-qos-perf

      Description

      When I tested MDL-67363 I didn't scale it up high enough, the algorithm works but when you get really large it is scaling at O(n^2) and it starts to choke.

      This is a tweak to get the performance back to as close to linear as we can get.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              brendanheywood Brendan Heywood
              Reporter:
              brendanheywood Brendan Heywood
              Peer reviewer:
              Dmitrii Metelkin
              Integrator:
              Jake Dallimore
              Tester:
              Jake Dallimore
              Participants:
              Component watchers:
              Amaia Anabitarte, Carlos Escobedo, Ferran Recio, Sara Arjona (@sarjona), Víctor Déniz Falcón
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Fix Release Date:
                15/Jun/20

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 4 hours
                  4h