Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-68577

Tune the redis session handler to reduce blocking wait time

    XMLWordPrintable

    Details

    • Testing Instructions:
      Hide

      This is something that should be stress testing in a variety of ways and compared before and after. Real world results will depend on a ton of variables.

       

      1) Install this testing plugin:

      https://github.com/brendanheywood/moodle-tool_sessionbreaker

      2) Open this page with the dev tools open looking at the network tab

      http://moodle.local/admin/tool/sessionbreaker/index.php?delay=1&countdelay=20

      You can play with different params, delay is the starting delay between each new ajax request and the large countdelay is the longer each requests hangs onto the lock each time.

      3) confirm you get better overall page loads than before the patch

      4) Tune lock retry to various levels, 550 will be slower with less load on redis around roughly similar to prior to the patch. 100 is the new default. 10 will be very close to theoretical ideal with higher load on redis:

      $CFG->session_redis_acquire_lock_retry = 550;

      $CFG->session_redis_acquire_lock_retry = 100;

      $CFG->session_redis_acquire_lock_retry = 10;

        

      Some harder things to test, but nice for the real world or load test:

      a) total php fpm ttfb / total run time before and after, should be lower

      b) total concurrent php fpm process count, should be lower

      c) total hits to redis checking for locks, will be higher ideally but not substantially

       

       

       

      Show
      This is something that should be stress testing in a variety of ways and compared before and after. Real world results will depend on a ton of variables.   1) Install this testing plugin: https://github.com/brendanheywood/moodle-tool_sessionbreaker 2) Open this page with the dev tools open looking at the network tab http://moodle.local/admin/tool/sessionbreaker/index.php?delay=1&countdelay=20 You can play with different params, delay is the starting delay between each new ajax request and the large countdelay is the longer each requests hangs onto the lock each time. 3) confirm you get better overall page loads than before the patch 4) Tune lock retry to various levels, 550 will be slower with less load on redis around roughly similar to prior to the patch. 100 is the new default. 10 will be very close to theoretical ideal with higher load on redis: $CFG->session_redis_acquire_lock_retry = 550; $CFG->session_redis_acquire_lock_retry = 100; $CFG->session_redis_acquire_lock_retry = 10;    Some harder things to test, but nice for the real world or load test: a) total php fpm ttfb / total run time before and after, should be lower b) total concurrent php fpm process count, should be lower c) total hits to redis checking for locks, will be higher ideally but not substantially      
    • Affected Branches:
      MOODLE_39_STABLE
    • Fixed Branches:
      MOODLE_39_STABLE
    • Pull Master Branch:
      MDL-68577-redis-lock-tuning

      Description

      Discovered in MDL-58018

      The redis session handle polls for a lock, and it waits a random time from 100-1000ms so on average each blocked http request waits 550ms. We aim for a performance budget of say 200ms for a small ajax request, and lets assume there are 5 queued up to run so in a perfect world they run in serial right after each other and take 1000ms. 

      Lets assume call A grabs the lock first, the others are all blocks and wait for random times: B: 100, C: 400, D: 900, E: 1000

      B will still be blocked by A after 100 and poll again, lets say it gets 550 the second time.

      C will run after 400ms, but there is already a 200ms gap from when A had finished that was wasted.

      B runs again at 750 but could have started at 600, so another 150ms wasted.

      D will run at 900ms but it blocked and so goes back again (say another +550)

      E runs at 1000ms

      D runs eventually on average at 1450.

      Visually, scenario above:

      A: ==== 0 + 200 = 200
      B: ==|===========|==== 100 + 550 + 200 =  850
      C: ========|==== 400 + 200 = 600
      D: ==================|===========|==== 900 + 550 + 200 = 1650
      E: ====================|==== 1000 + 200 = 1200

      Everything is finished at 1650 vs 1000ms in the ideal case so we're 65% worse off. But also we have 200 + 850 + 600 + 1650 + 1200 = 4500 total process time.

       

      With improved lock polling

      A: ==== 0 + 200 = 200
      B: ==|==|==== 100 x 2 + 200 = 400 (100% better)
      C: ==|==|==|==|==== 100 x 4 + 200 = 600
      D: ==|==|==|==|==|==|==== 100 x 6 + 200  = 800 (100% better)
      E: ==|==|==|==|==|==|==|==|==== 100 x 7 + 200 = 1000 (10% better)

      Everything is finished at 1000ms in the ideal case so we're 65% better off total elapsed time. Total process time is 200 + 400 + 600 + 800 + 1000 = 3000, or 33% better with 1500ms saved on server processes / php fpm workers.

       

      With pure readonly and no locks

      A: ==== 0 + 200 = 200
      B: ==== 0 + 200 = 200
      C: ==== 0 + 200 = 200
      D: ==== 0 + 200 = 200
      E: ==== 0 + 200 = 200

      Total time is 200ms, total process time 1000ms so another 2000ms better that this tracker alone.

       

      Some fraction of ajax requests will never be readonly, you may genuinely need the session lock, or it's all third party code and hard to improve.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              brendanheywood Brendan Heywood
              Reporter:
              brendanheywood Brendan Heywood
              Peer reviewer:
              Nadav Kavalerchik
              Integrator:
              Andrew Nicols
              Tester:
              Andrew Nicols
              Participants:
              Component watchers:
              Adrian Greeve, Jake Dallimore, Mathew May, Mihail Geshoski, Peter Dias, Matteo Scaramuccia, Andrew Nicols, Jun Pataleta, Michael Hawkins, Shamim Rezaie, Simey Lameze
              Votes:
              4 Vote for this issue
              Watchers:
              12 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Fix Release Date:
                15/Jun/20

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 2 hours, 46 minutes
                  2h 46m