Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-40947

Segmentation fault caused by get_fast_modinfo()

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.5.1
    • Fix Version/s: 2.4.6, 2.5.2
    • Component/s: Course
    • Labels:
    • Testing Instructions:
      Hide

      1. Go to /my page, having logged in as the user who enrolled in more than 80 courses.
      2. Make sure there are no errors on the page.
      3. Reverse patch for MDL-34785 and do the same (those two issues address different problems but both affect /my/ page)

      Show
      1. Go to /my page, having logged in as the user who enrolled in more than 80 courses. 2. Make sure there are no errors on the page. 3. Reverse patch for MDL-34785 and do the same (those two issues address different problems but both affect /my/ page)
    • Affected Branches:
      MOODLE_25_STABLE
    • Fixed Branches:
      MOODLE_24_STABLE, MOODLE_25_STABLE
    • Pull Master Branch:
      MDL-40947-master

      Description

      Some users reported 502 error on attempt to open /my page in Moodle. Apache error log only indicated:

      [Tue Jul 30 10:33:43 2013] [notice] child pid 8183 exit signal Segmentation fault (11)
      

      which was not useful enough. Seems like apache process just dies without reporting anything else in the log. The closer look indicated that problem appear with users who is enrolled to too many courses (above 80 in our case).

      The next logical step was to make Apache produce a core dump for analysis of the problem. As it was impossible to replicated issue on our test system, for this purpose we removed of the frontends from the live pool used it for testing. After spending some time playing with gdb and learning how it works, I moved further in analysis and got backtrace of events from the memory dump:

      [0x7f215e3e84b8] get_fast_modinfo() /usr/share/moodle-lu/lib/modinfolib.php:1399 
      [0x7f215e3e7c90] get_fast_modinfo(object[0x7f2160df0680], NULL) /usr/share/moodle-lu/lib/datalib.php:1424 
      [0x7f215e3e7478] get_all_instances_in_courses("chat", array(132)[0x7f216147c040]) /usr/share/moodle-lu/mod/chat/lib.php:1093 
      [0x7f215e3e7290] chat_print_overview(array(132)[0x7f216147c040], array(77)[0x7f2161478a10]) /usr/share/moodle-lu/blocks/course_overview/locallib.php:35 
      [0x7f215e3e67d0] block_course_overview_get_overviews(array(132)[0x7f216147c040]) /usr/share/moodle-lu/blocks/course_overview/block_course_overview.php:68 
      [0x7f215e3e6608] block_course_overview->get_content() /usr/share/moodle-lu/blocks/moodleblock.class.php:292 
      [0x7f215e3e5c48] block_base->formatted_contents(object[0x7f2160d86be8]) /usr/share/moodle-lu/blocks/moodleblock.class.php:238 
      [0x7f215e3e5258] block_base->get_content_for_output(object[0x7f2160d86be8]) /usr/share/moodle-lu/lib/blocklib.php:951 
      [0x7f215e3e4de0] block_manager->create_block_contents(array(1)[0x7f2160ad92d8], object[0x7f2160d86be8], "content") /usr/share/moodle-lu/lib/blocklib.php:1003 
      [0x7f215e3e4c58] block_manager->ensure_content_created("content", object[0x7f2160d86be8]) /usr/share/moodle-lu/lib/blocklib.php:315 
      [0x7f215e3e4560] block_manager->get_content_for_region("content", object[0x7f2160d86be8]) /usr/share/moodle-lu/lib/outputrenderers.php:1232 
      [0x7f215e3e30a8] core_renderer->blocks_for_region("content") /usr/share/moodle-lu/my/index.php:155
      

      The top line gave a clue what causes the problem:

       if (count($cache) > MAX_MODINFO_CACHE_SIZE) {
              reset($cache);
              $key = key($cache);
              unset($cache[$key]->instances);  // line 1399
              unset($cache[$key]->cms);
              unset($cache[$key]);
          }
      

      This has been introduced in MDL-25191 and had a good reason for being there, as we know that unsetting static variables in PHP is not trivial. They sort of remain set in the memory, just a reference being removed.

      The first thought, what Andrew suggested was to introduce gc_collect_cycles(); before doing unsets. This solved the problem resulting in script execution for particular user lasts 53 seconds:

      [Tue Jul 30 12:26:50 2013] [error] [client 148.88.22.10] PERF: /my/ time: 53.323946s memory_total: 28194552B (26.9MB) memory_growth: 27846176B (26.6MB) memory_peak: 31064032B (29.6MB) includecount: 448 contextswithfilters: 0 filterscreated: 0 textsfiltered: 0 stringsfiltered: 0 langcountgetstring: 908 includedyuimodules: 0 includedjsmodules: 3 db reads/writes: 20103/0 ticks: 5333 user: 2997 sys: 78 cuser: 0 csys: 0 serverload: 0.37 Session: 62.6KB Caches used (hits/misses/sets): core/databasemeta {** static persist **(18907/15/0) cachestore_memcached(6/9/9) } core/string {** static persist **(1028/30/0) cachestore_memcached(4/26/26) } core/plugintypes {** static persist **(463/3/0) cachestore_memcached(3/0/0) } core/pluginlist {** static persist **(2/4/0) cachestore_memcached(4/0/0) } core/yuimodules {cachestore_memcached(1/0/0) } core/htmlpurifier {cachestore_memcached(3/30/30) } core/groupdata {** static persist **(6/1/0) cachestore_memcached(0/1/1) }
      

      The second approach I tried was to remove what has been added in MDL-25191, leaving just single unset (unset($cache[$key]). Surprisingly, this worked as well taking less time:

      [Tue Jul 30 12:23:24 2013] [error] [client 148.88.22.10] PERF: /my/ time: 36.408919s memory_total: 29366872B (28MB) memory_growth: 29018496B (27.7MB) memory_peak: 37594232B (35.9MB) includecount: 415 contextswithfilters: 0 filterscreated: 0 textsfiltered: 0 stringsfiltered: 0 langcountgetstring: 908 includedyuimodules: 0 includedjsmodules: 3 db reads/writes: 20093/0 ticks: 3641 user: 1380 sys: 86 cuser: 0 csys: 0 serverload: 0.30 Session: 62.6KB Caches used (hits/misses/sets): core/databasemeta {** static persist **(18903/15/0) cachestore_memcached(12/3/3) } core/string {** static persist **(1028/30/0) cachestore_memcached(28/2/2) } core/plugintypes {** static persist **(435/3/0) cachestore_memcached(3/0/0) } core/pluginlist {** static persist **(2/4/0) cachestore_memcached(4/0/0) } core/yuimodules {cachestore_memcached(1/0/0) } core/htmlpurifier {cachestore_memcached(3/30/30) } core/groupdata {** static persist **(6/1/0) cachestore_memcached(0/1/1) }
      

      This seems like a problem related to particular system and Moodle configuration, which is difficult to replicate. We realised that if we reduce MAX_MODINFO_CACHE_SIZE (we had it set to 62) the problem also disappears. Having looked further into unsetting static variable, I found a solution that does both - killing the variable in memory and not causing problems in our installation:

           if (count($cache) > MAX_MODINFO_CACHE_SIZE) {
               reset($cache);
               $key = key($cache);
      -        unset($cache[$key]->instances);
      -        unset($cache[$key]->cms);
      +        $cache[$key] = '';
               unset($cache[$key]);
           }
      

      The performance remain the same as in the previous case where MDL-25191 changes are reverted. I suggest to implement it, though, given that problem is weird and not applicable to everyone, I am not 100% sure we need to change existing code.

      Comments are welcome.

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

                People

                • Votes:
                  2 Vote for this issue
                  Watchers:
                  11 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:
                    Fix Release Date:
                    9/Sep/13