Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-67648

Cron task manager quality of service (version 3)

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: Future Dev
    • Fix Version/s: None
    • Component/s: Tasks
    • Labels:

      Description

      This is more a placeholder to collect more ideas on a more holistic and performant approach following on from MDL-67486MDL-67211MDL-67483 and MDL-67363.

      Things are now much better but at high scale, but with very unequal sized adhoc tasks you can still end up with some tasks hogging cron abd blocking processing of other things. MDL-64610 will help a lot here, but in an ideal world the task manager would dynamically adjust the priorities of tasks based on as much info as it has and not need manual tuning by either the developer or the admin.

      Some example scenarios:

      1) A queue of very slow tasks, eg async backups that take 10 mins each, is followed by some small tasks like sending emails which generally we want to be done fairly fast. Even with QoS the slow tasks end up pegging all of the available cron runners, because QoS is only considering what should start next based on what is in the queue, not what is already running.

      2) You have say 2 or 3 types of heavy task and nothing else. We end up splitting the load 50/50 and cron is pegged on heavy tasks. There are no 'spare' runners to start on a random new type of task which comes along.

       

      The concept I'm thinking about is roughly:

      1) after MDL-67211 lands we have metadata on what is running and for how long total, grouped by type

      2) when we look at what should be picked up next we weight the priorities by the totals above, already running tasks get progressively lower and lower priorities

      3) We tune this so that if there is say 10 runners, then no one task can ever hog more than say 2/3 of the runners so we always have something spare to start on new tasks, but we don't have an explicit limit on any one type of task

      4) If new types of task appears in the queue then we want to balance the runners across all of them. So if there are 5 types of tasks and 10 runners then each should get roughly 2 processes each regardless of how long each specific task takes.

      5) The current QoS layer slows down at scale, try to rebuild it in sql as much as possible

       

       

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned
              Reporter:
              brendanheywood Brendan Heywood
              Participants:
              Component watchers:
              Amaia Anabitarte, Carlos Escobedo, Ferran Recio, Sara Arjona (@sarjona), Víctor Déniz Falcón
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated: