Moodle
  1. Moodle
  2. MDL-43530

Clusters with different realpath dirroot per node cause random crash

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.5.3
    • Fix Version/s: 2.5.4
    • Component/s: Caching
    • Labels:
    • Testing Instructions:
      Hide

      Run all PHPUnit tests.

      To confirm no regression;
      1. Login
      2. Turn on performance statistics at the bottom of the screen.
      3. Visit/create a course with at least 3 different types of module.
      4. Confirm that the cache (plugintypes) is being used with static acceleration and not experiencing large numbers of misses.
      5. Rename your moodle folder
      6. Visit the same course as above
      7. Again confirm the cache (plugintypes) is built (sets) and then used.

      Show
      Run all PHPUnit tests. To confirm no regression; 1. Login 2. Turn on performance statistics at the bottom of the screen. 3. Visit/create a course with at least 3 different types of module. 4. Confirm that the cache (plugintypes) is being used with static acceleration and not experiencing large numbers of misses. 5. Rename your moodle folder 6. Visit the same course as above 7. Again confirm the cache (plugintypes) is built (sets) and then used.
    • Affected Branches:
      MOODLE_25_STABLE
    • Fixed Branches:
      MOODLE_25_STABLE

      Description

      Amazon Web Services has the capability to use a technology OpsWorks.

      When you deploy PHP applications via the default OpsWorks chef script, it will deploy in a fashion similar;

      Node 1 -> /srv/www/moodle/20131231001123/
      Node 2 -> /srv/www/moodle/20131231001127/

      These folder names are based on the deployment time to that server. Which will be a few seconds per server as they are not deployed at the same instant.

      These real folders are then symlinked to current.

      All Nodes -> /srv/www/moodle/current/

      Moodles plugin cache uses the real folder name and does not expect it to be different between frontend nodes. Technically there is no server requirement for different nodes to be exactly the same as the above configuration shows.

      The crashes that happen are;

      Dec 24 11:06:11 www3 logger: [Tue Dec 24 11:06:11 2013] [error] [client 182.255.102.140] Default exception handler: Coding error detected, it must be fixed by a programmer: 
      Request for an unknown renderer class block_settings_renderer 
      Debug: \n
      Error code: codingerror\n
      * line 290 of /lib/outputfactories.php: coding_exception thrown\n
      * line 1402 of /lib/outputlib.php: call to theme_overridden_renderer_factory->get_renderer()\n
      * line 771 of /lib/pagelib.php: call to theme_config->get_renderer()\n
      * line 141 of /blocks/settings/block_settings.php: call to moodle_page->get_renderer()\n
      * line 292 of /blocks/moodleblock.class.php: call to block_settings->get_content()\n
      * line 238 of /blocks/moodleblock.class.php: call to block_base->formatted_contents()\n
      * line 951 of /lib/blocklib.php: call to block_base->get_content_for_output()\n
      * line 1003 of /lib/blocklib.php: call to block_manager->create_block_contents()\n
      * line 353 of /lib/blocklib.php: call to block_manager->ensure_content_created()\n
      * line 31 of /theme/vet/layout/header.php: call to b
      

      And only happen intermittently. It is either when you change nodes or when you hit a race condition with multiple accesses from different nodes. Even though I cannot prove which it is, the race condition appears to be the issue as the frequency we are seeing the error does not match up with the amount it should happen if every request was failing.

        Gliffy Diagrams

          Activity

          Hide
          Russell Smith added a comment - - edited

          I'm away from my setup to publish this easily. Here is my proposed patch in a text blob;

          diff --git a/lib/moodlelib.php b/lib/moodlelib.php
          index c71d5c2..66d0d66 100644
          --- a/lib/moodlelib.php
          +++ b/lib/moodlelib.php
          @@ -8261,14 +8261,9 @@ function get_plugin_types($fullpaths=true) {
               $cache = cache::make('core', 'plugintypes');
           
               if ($fullpaths) {
          -        // First confirm that dirroot and the stored dirroot match.
          -        if ($CFG->dirroot === $cache->get('dirroot')) {
          -            // They match we can use it.
          -            $cached = $cache->get(1);
          -        } else {
          -            // Oops they didn't match. The moodle directory has been moved on us.
          -            $cached = false;
          -        }
          +        // Cache each dirroot separately in case cluster nodes happen to be deployed to
          +        // different locations.
          +        $cached = $cache->get(sha1($CFG->dirroot));
               } else {
                   $cached = $cache->get(0);
               }
          @@ -8327,10 +8322,7 @@ function get_plugin_types($fullpaths=true) {
                   }
           
                   $cache->set(0, $info);
          -        $cache->set(1, $fullinfo);
          -        // We cache the dirroot as well so that we can compare it when we
          -        // retrieve full info from the cache.
          -        $cache->set('dirroot', $CFG->dirroot);
          +        $cache->set(sha1($CFG->dirroot), $fullinfo);
           
                   return ($fullpaths ? $fullinfo : $info);
               }
          

          I will publish the patch in a branch if it's likely this change will not be rejected.

          Show
          Russell Smith added a comment - - edited I'm away from my setup to publish this easily. Here is my proposed patch in a text blob; diff --git a/lib/moodlelib.php b/lib/moodlelib.php index c71d5c2..66d0d66 100644 --- a/lib/moodlelib.php +++ b/lib/moodlelib.php @@ -8261,14 +8261,9 @@ function get_plugin_types($fullpaths=true) { $cache = cache::make('core', 'plugintypes'); if ($fullpaths) { - // First confirm that dirroot and the stored dirroot match. - if ($CFG->dirroot === $cache->get('dirroot')) { - // They match we can use it. - $cached = $cache->get(1); - } else { - // Oops they didn't match. The moodle directory has been moved on us. - $cached = false; - } + // Cache each dirroot separately in case cluster nodes happen to be deployed to + // different locations. + $cached = $cache->get(sha1($CFG->dirroot)); } else { $cached = $cache->get(0); } @@ -8327,10 +8322,7 @@ function get_plugin_types($fullpaths=true) { } $cache->set(0, $info); - $cache->set(1, $fullinfo); - // We cache the dirroot as well so that we can compare it when we - // retrieve full info from the cache. - $cache->set('dirroot', $CFG->dirroot); + $cache->set(sha1($CFG->dirroot), $fullinfo); return ($fullpaths ? $fullinfo : $info); } I will publish the patch in a branch if it's likely this change will not be rejected.
          Hide
          Petr Skoda added a comment - - edited

          edit: I thought this is from 2.6 and I forgot how it worked in 2.5, looks like this should be fine for stable when using sha1()

          Show
          Petr Skoda added a comment - - edited edit: I thought this is from 2.6 and I forgot how it worked in 2.5, looks like this should be fine for stable when using sha1()
          Hide
          Russell Smith added a comment -

          2.6 rewrote this code, so this is a 2.5 only fix.

          Show
          Russell Smith added a comment - 2.6 rewrote this code, so this is a 2.5 only fix.
          Hide
          Petr Skoda added a comment -

          +1, thanks a lot for the report, explanation and patch

          Show
          Petr Skoda added a comment - +1, thanks a lot for the report, explanation and patch
          Hide
          Sam Hemelryk added a comment -

          Thanks Russell, this has been integrated now.

          Show
          Sam Hemelryk added a comment - Thanks Russell, this has been integrated now.
          Hide
          Andrew Davis added a comment -

          I believe this is working as it should. Passing.

          Show
          Andrew Davis added a comment - I believe this is working as it should. Passing.
          Hide
          Damyon Wiese added a comment -

          David built a framework for behat
          At first just to test this and that
          10000+ steps written
          Sounds like we're all smitten
          And David should be smiling at that

          Thanks for reporting, patching, and testing this issue. It has been released upstream along with 64 others today.

          Show
          Damyon Wiese added a comment - David built a framework for behat At first just to test this and that 10000+ steps written Sounds like we're all smitten And David should be smiling at that Thanks for reporting, patching, and testing this issue. It has been released upstream along with 64 others today.

            People

            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: