Moodle
  1. Moodle
  2. MDL-43530

Clusters with different realpath dirroot per node cause random crash

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.5.3
    • Fix Version/s: 2.5.4
    • Component/s: Caching
    • Labels:
    • Testing Instructions:
      Hide

      Run all PHPUnit tests.

      To confirm no regression;
      1. Login
      2. Turn on performance statistics at the bottom of the screen.
      3. Visit/create a course with at least 3 different types of module.
      4. Confirm that the cache (plugintypes) is being used with static acceleration and not experiencing large numbers of misses.
      5. Rename your moodle folder
      6. Visit the same course as above
      7. Again confirm the cache (plugintypes) is built (sets) and then used.

      Show
      Run all PHPUnit tests. To confirm no regression; 1. Login 2. Turn on performance statistics at the bottom of the screen. 3. Visit/create a course with at least 3 different types of module. 4. Confirm that the cache (plugintypes) is being used with static acceleration and not experiencing large numbers of misses. 5. Rename your moodle folder 6. Visit the same course as above 7. Again confirm the cache (plugintypes) is built (sets) and then used.
    • Affected Branches:
      MOODLE_25_STABLE
    • Fixed Branches:
      MOODLE_25_STABLE
    • Pull 2.5 Branch:
    • Rank:
      55608

      Description

      Amazon Web Services has the capability to use a technology OpsWorks.

      When you deploy PHP applications via the default OpsWorks chef script, it will deploy in a fashion similar;

      Node 1 -> /srv/www/moodle/20131231001123/
      Node 2 -> /srv/www/moodle/20131231001127/

      These folder names are based on the deployment time to that server. Which will be a few seconds per server as they are not deployed at the same instant.

      These real folders are then symlinked to current.

      All Nodes -> /srv/www/moodle/current/

      Moodles plugin cache uses the real folder name and does not expect it to be different between frontend nodes. Technically there is no server requirement for different nodes to be exactly the same as the above configuration shows.

      The crashes that happen are;

      Dec 24 11:06:11 www3 logger: [Tue Dec 24 11:06:11 2013] [error] [client 182.255.102.140] Default exception handler: Coding error detected, it must be fixed by a programmer: 
      Request for an unknown renderer class block_settings_renderer 
      Debug: \n
      Error code: codingerror\n
      * line 290 of /lib/outputfactories.php: coding_exception thrown\n
      * line 1402 of /lib/outputlib.php: call to theme_overridden_renderer_factory->get_renderer()\n
      * line 771 of /lib/pagelib.php: call to theme_config->get_renderer()\n
      * line 141 of /blocks/settings/block_settings.php: call to moodle_page->get_renderer()\n
      * line 292 of /blocks/moodleblock.class.php: call to block_settings->get_content()\n
      * line 238 of /blocks/moodleblock.class.php: call to block_base->formatted_contents()\n
      * line 951 of /lib/blocklib.php: call to block_base->get_content_for_output()\n
      * line 1003 of /lib/blocklib.php: call to block_manager->create_block_contents()\n
      * line 353 of /lib/blocklib.php: call to block_manager->ensure_content_created()\n
      * line 31 of /theme/vet/layout/header.php: call to b
      

      And only happen intermittently. It is either when you change nodes or when you hit a race condition with multiple accesses from different nodes. Even though I cannot prove which it is, the race condition appears to be the issue as the frequency we are seeing the error does not match up with the amount it should happen if every request was failing.

        Activity

        Hide
        Russell Smith added a comment - - edited

        I'm away from my setup to publish this easily. Here is my proposed patch in a text blob;

        diff --git a/lib/moodlelib.php b/lib/moodlelib.php
        index c71d5c2..66d0d66 100644
        --- a/lib/moodlelib.php
        +++ b/lib/moodlelib.php
        @@ -8261,14 +8261,9 @@ function get_plugin_types($fullpaths=true) {
             $cache = cache::make('core', 'plugintypes');
         
             if ($fullpaths) {
        -        // First confirm that dirroot and the stored dirroot match.
        -        if ($CFG->dirroot === $cache->get('dirroot')) {
        -            // They match we can use it.
        -            $cached = $cache->get(1);
        -        } else {
        -            // Oops they didn't match. The moodle directory has been moved on us.
        -            $cached = false;
        -        }
        +        // Cache each dirroot separately in case cluster nodes happen to be deployed to
        +        // different locations.
        +        $cached = $cache->get(sha1($CFG->dirroot));
             } else {
                 $cached = $cache->get(0);
             }
        @@ -8327,10 +8322,7 @@ function get_plugin_types($fullpaths=true) {
                 }
         
                 $cache->set(0, $info);
        -        $cache->set(1, $fullinfo);
        -        // We cache the dirroot as well so that we can compare it when we
        -        // retrieve full info from the cache.
        -        $cache->set('dirroot', $CFG->dirroot);
        +        $cache->set(sha1($CFG->dirroot), $fullinfo);
         
                 return ($fullpaths ? $fullinfo : $info);
             }
        

        I will publish the patch in a branch if it's likely this change will not be rejected.

        Show
        Russell Smith added a comment - - edited I'm away from my setup to publish this easily. Here is my proposed patch in a text blob; diff --git a/lib/moodlelib.php b/lib/moodlelib.php index c71d5c2..66d0d66 100644 --- a/lib/moodlelib.php +++ b/lib/moodlelib.php @@ -8261,14 +8261,9 @@ function get_plugin_types($fullpaths= true ) { $cache = cache::make('core', 'plugintypes'); if ($fullpaths) { - // First confirm that dirroot and the stored dirroot match. - if ($CFG->dirroot === $cache->get('dirroot')) { - // They match we can use it. - $cached = $cache->get(1); - } else { - // Oops they didn't match. The moodle directory has been moved on us. - $cached = false ; - } + // Cache each dirroot separately in case cluster nodes happen to be deployed to + // different locations. + $cached = $cache->get(sha1($CFG->dirroot)); } else { $cached = $cache->get(0); } @@ -8327,10 +8322,7 @@ function get_plugin_types($fullpaths= true ) { } $cache->set(0, $info); - $cache->set(1, $fullinfo); - // We cache the dirroot as well so that we can compare it when we - // retrieve full info from the cache. - $cache->set('dirroot', $CFG->dirroot); + $cache->set(sha1($CFG->dirroot), $fullinfo); return ($fullpaths ? $fullinfo : $info); } I will publish the patch in a branch if it's likely this change will not be rejected.
        Hide
        Petr Škoda added a comment - - edited

        edit: I thought this is from 2.6 and I forgot how it worked in 2.5, looks like this should be fine for stable when using sha1()

        Show
        Petr Škoda added a comment - - edited edit: I thought this is from 2.6 and I forgot how it worked in 2.5, looks like this should be fine for stable when using sha1()
        Hide
        Russell Smith added a comment -

        2.6 rewrote this code, so this is a 2.5 only fix.

        Show
        Russell Smith added a comment - 2.6 rewrote this code, so this is a 2.5 only fix.
        Hide
        Petr Škoda added a comment -

        +1, thanks a lot for the report, explanation and patch

        Show
        Petr Škoda added a comment - +1, thanks a lot for the report, explanation and patch
        Hide
        Sam Hemelryk added a comment -

        Thanks Russell, this has been integrated now.

        Show
        Sam Hemelryk added a comment - Thanks Russell, this has been integrated now.
        Hide
        Andrew Davis added a comment -

        I believe this is working as it should. Passing.

        Show
        Andrew Davis added a comment - I believe this is working as it should. Passing.
        Hide
        Damyon Wiese added a comment -

        David built a framework for behat
        At first just to test this and that
        10000+ steps written
        Sounds like we're all smitten
        And David should be smiling at that

        Thanks for reporting, patching, and testing this issue. It has been released upstream along with 64 others today.

        Show
        Damyon Wiese added a comment - David built a framework for behat At first just to test this and that 10000+ steps written Sounds like we're all smitten And David should be smiling at that Thanks for reporting, patching, and testing this issue. It has been released upstream along with 64 others today.

          People

          • Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: