Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-78092

Cache: Modinfo cache locking is broken, especially with Redis store

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Minor Minor
    • 4.1.4, 4.2.1
    • 4.1.3
    • Caching
    • MOODLE_401_STABLE
    • MOODLE_401_STABLE, MOODLE_402_STABLE
    • MDL-78092-m401
    • MDL-78092-master
    • Hide
      • Prerequisite: you will need to have access to a Redis server. It can be running on your machine or elsewhere. Windows users can use Memurai if desired.
      Setup using Docker
      1. Install the PHP Redis extension (replace "X" as your server's PHP version, e.g. 8.0)

        sudo apt update && sudo apt install -y phpX-redis

        • If it does not work, try to install manually:

          cd ~/Downloads
          git clone https://www.github.com/phpredis/phpredis.git
          cd phpredis
          phpize && ./configure && make && sudo make install
          

      2. Enable the redis PHP extension.
        • If on Ubuntu, create a redis.ini file in "/etc/php/X/mods-available" with the contents "extension=redis.so". Then run

          sudo phpenmod -v X redis
          

        • Otherwise, add the line "extension=redis.so" to wherever your php.ini file is.
      3. Restart your web server.
      4. Confirm the redis PHP extension is enabled: "php -i | grep redis" - You should see a bunch of lines related to redis.
      5. Install redis. e.g. using Docker

        docker run --name redis -p 6379:6379 -d redis

      6. Setup redis as the session handler, e.g. copy the following to your config.php:

        $CFG->session_handler_class = '\core\session\redis';
        $CFG->session_redis_host = '127.0.0.1';
        $CFG->session_redis_port = 6379;
        

      Test
      1. Download attached modinfo_lock_test.php and place it in the root folder of your Moodle installation.
      2. Select a suitable course id from your test install (ideally one that has some content so it takes a second to build modinfo, but it doesn't have to be too much).
      3. Configure your MUC cache (via Site administration / Plugins / Caching / Configuration / Cache administration, /cache/admin.php) so that it has at least one Redis cache and at least one file cache:
        • By default, there is already a file cache set as 'Default application cache', so you don't need to add one of those.
        • You can add a Redis instance by clicking the Add instance link in the Redis line of the table under Installed cache stores.
        • Set the store name to whatever you like, the server name to wherever you have a Redis server installed (E.g. localhost), and remember to also set the 'key prefix' if you are using the same Redis for multiple Moodle instances.
        • Save settings
      4. Set the core/coursemodinfo cache to use the file cache first and then the Redis cache (i.e. simulating a system that is using local file cache with a shared Redis cache):
        • Under Known cache definitions. find the line with coursemodinfo (usually first) and click Edit mappings.
        • Change the Primary store to your file instance (e.g. default application) and Final store to your Redis instance (whatever you called it).
        • Save settings.
      5. Run the script like this:
        php modinfo_lock_test.php --wipelocal --wiperedis --courseid=12345 --repeats=10
        • If you get weird errors it's probably because the MUC setup isn't right - this test script is likely to fall apart horribly if you don't have the two-layer cache setup described above.
        • Based on how long it takes with the given number of repeats, you can increase the 'repeats' number - I actually used 100. However, try not to ctrl-c out of the script - the lock can get stuck (if you exit between the point where it acquires and release the lock), meaning you will need to manually purge Redis before continuing.
      6. You should see the script start to run through, repeatedly building modinfo for the course. As it is a repeated loop, it should nearly always hold the lock on rebuilding modinfo.
      7. In a second terminal window, run another copy of the same command.
        • EXPECTED: The second terminal window will get stuck waiting for the lock for a bit. Either the two processes will 'take turns' doing a batch each, or else one of them will wait until the other one completes; if this takes longer than a minute it will time out with an exception 'Unable to acquire a lock for caching'. If watching both windows you should be able to see that only one of them will be progressing at a time, indicating that locking is working.

      Note: It might seem that the result of this test (especially if it times out) is not ideal, but in practice it is not realistic for one process to repeatedly update the same modinfo. The test just demonstrates that we aren't allowing them both to do it at once any more.

      Before this fix, the second terminal window does not actually wait for the lock, and they both run at the same time, but one (usually second) throws a coding exception (usually after a few cycles): 'Coding error detected, it must be fixed by a programmer: Attempted to set cache key "12345" without a lock. Locking before writes is required for core/coursemodinfo'.

      Show
      Prerequisite: you will need to have access to a Redis server. It can be running on your machine or elsewhere. Windows users can use Memurai if desired. Setup using Docker Install the PHP Redis extension (replace " X " as your server's PHP version, e.g. 8.0) sudo apt update && sudo apt install -y phpX-redis If it does not work, try to install manually: cd ~/Downloads git clone https://www.github.com/phpredis/phpredis.git cd phpredis phpize && ./configure && make && sudo make install Enable the redis PHP extension. If on Ubuntu, create a redis.ini file in "/etc/php/X/mods-available" with the contents "extension=redis.so". Then run sudo phpenmod -v X redis Otherwise, add the line "extension=redis.so" to wherever your php.ini file is. Restart your web server. Confirm the redis PHP extension is enabled: " php -i | grep redis " - You should see a bunch of lines related to redis. Install redis. e.g. using Docker docker run --name redis -p 6379:6379 -d redis Setup redis as the session handler, e.g. copy the following to your config.php : $CFG ->session_handler_class = '\core\session\redis' ; $CFG ->session_redis_host = '127.0.0.1' ; $CFG ->session_redis_port = 6379; Test Download attached modinfo_lock_test.php and place it in the root folder of your Moodle installation. Select a suitable course id from your test install (ideally one that has some content so it takes a second to build modinfo, but it doesn't have to be too much). Configure your MUC cache (via Site administration / Plugins / Caching / Configuration / Cache administration , /cache/admin.php) so that it has at least one Redis cache and at least one file cache: By default, there is already a file cache set as 'Default application cache', so you don't need to add one of those. You can add a Redis instance by clicking the Add instance link in the Redis line of the table under Installed cache stores . Set the store name to whatever you like, the server name to wherever you have a Redis server installed (E.g. localhost), and remember to also set the 'key prefix' if you are using the same Redis for multiple Moodle instances. Save settings Set the core/coursemodinfo cache to use the file cache first and then the Redis cache (i.e. simulating a system that is using local file cache with a shared Redis cache): Under Known cache definitions . find the line with coursemodinfo (usually first) and click Edit mappings . Change the Primary store to your file instance (e.g. default application) and Final store to your Redis instance (whatever you called it). Save settings. Run the script like this: php modinfo_lock_test.php --wipelocal --wiperedis --courseid=12345 --repeats=10 If you get weird errors it's probably because the MUC setup isn't right - this test script is likely to fall apart horribly if you don't have the two-layer cache setup described above. Based on how long it takes with the given number of repeats, you can increase the 'repeats' number - I actually used 100. However, try not to ctrl-c out of the script - the lock can get stuck (if you exit between the point where it acquires and release the lock), meaning you will need to manually purge Redis before continuing. You should see the script start to run through, repeatedly building modinfo for the course. As it is a repeated loop, it should nearly always hold the lock on rebuilding modinfo. In a second terminal window, run another copy of the same command. EXPECTED: The second terminal window will get stuck waiting for the lock for a bit. Either the two processes will 'take turns' doing a batch each, or else one of them will wait until the other one completes; if this takes longer than a minute it will time out with an exception 'Unable to acquire a lock for caching'. If watching both windows you should be able to see that only one of them will be progressing at a time, indicating that locking is working. Note: It might seem that the result of this test (especially if it times out) is not ideal, but in practice it is not realistic for one process to repeatedly update the same modinfo. The test just demonstrates that we aren't allowing them both to do it at once any more. Before this fix, the second terminal window does not actually wait for the lock, and they both run at the same time, but one (usually second) throws a coding exception (usually after a few cycles): 'Coding error detected, it must be fixed by a programmer: Attempted to set cache key "12345" without a lock. Locking before writes is required for core/coursemodinfo'.

      There are two problems with cache locking (which has been around a while but not used much, and was implemented for modinfo and modified in 4.1), both related to acquire_lock function:

      1. The cachestore_redis and cachestore_file implementations of this function behave differently. cachestore_redis returns false immediately if the lock can't be obtained right now, whereas cachestore_file has a lock wait setting (60 seconds default) and will retry within that time limit and only return false if it still can't get it. (There is one other mechanism, the default one used if the store doesn't support locking, which also has a similar retry approach.)
      2. The acquire_lock function returns false if it doesn't acquire the lock, but the build_course_cache function ignores the return value and goes ahead to build the cache anyway. (Actually, other places that call this function do the same, it's not really obvious that it can return false...)

      As you can see the second problem may be more serious but both probably need fixing

      The actual consequence of this is that in a scenario with Redis cache, it is possible that requests which rebuild modinfo cache can result in errors about how the cache lock isn't acquired. (This happens when it was acquired, but because locking doesn't work, the system then lets somebody else acquire it before the first request gets around to actually using it to set the data.)

      On our test server (thankfully, we are not live with 4.1 yet) we see this in the system error logs as occasional, sporadic occurrences of the 'Locking before writes is required for core/coursemodinfo' error message.

      This patch currently makes the following changes:

      • The cachestore_redis lock implementation is changed so that it retries the lock for a given 'wait' period (default 1 minute). Once acquired, locks also time out (default 10 minutes), in case a process crashes without releasing a lock.
      • The code in modinfolib is changed to actually check that it acquired the lock, and if not, throw an exception.

        1. modinfo_lock_test.php
          3 kB
        2. redis-401.webm
          5.69 MB
        3. redis-402.webm
          8.96 MB
        4. redis-master.webm
          8.64 MB

            quen Sam Marshall
            quen Sam Marshall
            Mark Johnson Mark Johnson
            Jun Pataleta Jun Pataleta
            Ilya Tregubov Ilya Tregubov
            Votes:
            1 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved:

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 3 hours, 45 minutes
                3h 45m

                  Error rendering 'clockify-timesheets-time-tracking-reports:timer-sidebar'. Please contact your Jira administrators.