Moodle
  1. Moodle
  2. MDL-18768

Moodle has memory issues with large sites e.g. in role_assign

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.9.4
    • Fix Version/s: 1.9.5
    • Component/s: Libraries
    • Labels:
      None
    • Affected Branches:
      MOODLE_19_STABLE
    • Fixed Branches:
      MOODLE_19_STABLE
    • Rank:
      31787

      Description

      If you role_assign somebody to the teacher role at site level on a large site (~5k courses, ~110k contexts), this runs out of memory.

      There are three causes:

      1) In forum_role_assign, it does get_records('course'). This should restrict to required fields (only id!) and should be a recordset anyway. Easily changed.

      but more seriously

      2) The get_fast_modinfo data is cached. This cache gradually expands to contain the complete modinfo for all courses.

      and

      3) get_fast_modinfo also causes contexts to be cached. This cache gradually expands to contain all contexts for courses, modules, and blocks (ie virtually the whole context table).

      Petr suggested that the best approach would be to limit the size of these caches, discarding old records. I have implemented this. The changes appear to work (the memory issues go away) on our test system.

      Could somebody review this patch please as it's for 1.9 and is not trivial?

      1. memoryissues.patch
        6 kB
        Sam Marshall
      2. memoryissues2.patch
        6 kB
        Sam Marshall

        Issue Links

          Activity

          Sam Marshall created issue -
          Hide
          Sam Marshall added a comment -

          Note the reason for fixing it at this level is that this should avoid any similar memory issues in other areas of code that do similar operations on large numbers of courses.

          Show
          Sam Marshall added a comment - Note the reason for fixing it at this level is that this should avoid any similar memory issues in other areas of code that do similar operations on large numbers of courses.
          Hide
          Martin Dougiamas added a comment -

          Adding Petr and Eloy for review.

          Show
          Martin Dougiamas added a comment - Adding Petr and Eloy for review.
          Hide
          Petr Škoda added a comment -

          I think it might be better to do the checking right before inserting into $context_cache and $context_cache_id: if >max and not present, then shift() the oldest from those arrays

          Show
          Petr Škoda added a comment - I think it might be better to do the checking right before inserting into $context_cache and $context_cache_id: if >max and not present, then shift() the oldest from those arrays
          Hide
          Sam Marshall added a comment -

          Thanks for looking at it.

          About doing checking before inserting, instead of afterward:

          But then I'd have to duplicate the code 4 times rather than having it be in the function, which seems like a really bad idea.

          Or are you suggesting I make a function for adding into context cache (which includes the size check), instead of this current function for clearing it up? That doesn't seem like a bad idea, and reduces duplication, I can do that if you want.

          (It would be slightly slower due to doing the check and function call on every insert instead of at the end after a loop of inserts, but that's only CPU time.)

          Is that what you'd prefer?

          Show
          Sam Marshall added a comment - Thanks for looking at it. About doing checking before inserting, instead of afterward: But then I'd have to duplicate the code 4 times rather than having it be in the function, which seems like a really bad idea. Or are you suggesting I make a function for adding into context cache (which includes the size check), instead of this current function for clearing it up? That doesn't seem like a bad idea, and reduces duplication, I can do that if you want. (It would be slightly slower due to doing the check and function call on every insert instead of at the end after a loop of inserts, but that's only CPU time.) Is that what you'd prefer?
          Hide
          Sam Marshall added a comment -

          Here's a revised patch that uses array_shift and makes the suggested change about checking on each cache, and moving cache into a function.

          Show
          Sam Marshall added a comment - Here's a revised patch that uses array_shift and makes the suggested change about checking on each cache, and moving cache into a function.
          Sam Marshall made changes -
          Field Original Value New Value
          Attachment memoryissues2.patch [ 16757 ]
          Hide
          Sam Marshall added a comment -

          If I get OK on the revised patch, I'll work one up for Moodle 2 as well and check 'em both in. thanks

          Show
          Sam Marshall added a comment - If I get OK on the revised patch, I'll work one up for Moodle 2 as well and check 'em both in. thanks
          Hide
          Petr Škoda added a comment -

          nice, +3!

          Show
          Petr Škoda added a comment - nice, +3!
          Hide
          Sam Marshall added a comment -

          OK, fixed. There were fewer changes in 2.0 because somebody had already made a cache_context function (duh I should've looked at that first, but we used the same name anyway so it's the same).

          Show
          Sam Marshall added a comment - OK, fixed. There were fewer changes in 2.0 because somebody had already made a cache_context function (duh I should've looked at that first, but we used the same name anyway so it's the same).
          Sam Marshall made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Colin Campbell added a comment -

          I am running into issues with this fix. The array_shift function renumbers numeric indexes. Consequently, as soon as we reach MAX_MODINFO_CACHE_SIZE in get_fast_modinfo, array_shift converts the cache keys from the course ids originally used to index the cache to sequential integers starting with zero. This causes all sorts of bad things to happen.

          Show
          Colin Campbell added a comment - I am running into issues with this fix. The array_shift function renumbers numeric indexes. Consequently, as soon as we reach MAX_MODINFO_CACHE_SIZE in get_fast_modinfo, array_shift converts the cache keys from the course ids originally used to index the cache to sequential integers starting with zero. This causes all sorts of bad things to happen.
          Petr Škoda made changes -
          Link This issue has been marked as being related by MDL-19288 [ MDL-19288 ]
          Hide
          Petr Škoda added a comment - - edited

          I have filed a new issues, going to fix it asap

          Show
          Petr Škoda added a comment - - edited I have filed a new issues, going to fix it asap
          Hide
          Petr Škoda added a comment -

          fixed in cvs, please test
          thanks a lot for the report!!

          Show
          Petr Škoda added a comment - fixed in cvs, please test thanks a lot for the report!!
          Hide
          Sam Marshall added a comment -

          Thanks Petr. We have been testing the previous code for some weeks and haven't noticed anything - oops!

          I applied these patches which look correct to me. We will be going live with a new version in a week's time so hopefully then we really will notice if it's still broken...

          Show
          Sam Marshall added a comment - Thanks Petr. We have been testing the previous code for some weeks and haven't noticed anything - oops! I applied these patches which look correct to me. We will be going live with a new version in a week's time so hopefully then we really will notice if it's still broken...
          Hide
          cameron added a comment -

          Just curious to know how it was decided MAX_CONTEXT_CACHE_SIZE should be 5000? I haven't looked deeply into it, but when I ran kcachegrind when assigning a role it said the script spent approx. 75% of the time inside the cache_context function.

          Show
          cameron added a comment - Just curious to know how it was decided MAX_CONTEXT_CACHE_SIZE should be 5000? I haven't looked deeply into it, but when I ran kcachegrind when assigning a role it said the script spent approx. 75% of the time inside the cache_context function.
          Hide
          Sam Marshall added a comment - - edited

          I don't remember who picked it, but basically 5,000 = big number but won't take up all your memory.

          It has to be bigger than the expected number of activities + blocks within any single course, because these are cached usually when you look at course pages. We have one course - a slightly silly special-case course - with about 1,500 contexts, I figured it's not likely anyone would go very much bigger.

          basically we want to cache all the contexts retrieved during normal use - but not cache all contexts in case cron, or whatever admin task, does something that ends up iterating through many of the courses in the system, retrieving all their contexts.

          i'm not really familiar profiling php but insofar as I know anything about it at all, I generally assume php profilers are wrong.... [and there is also a question as to whether you're running a php accelerator, which is necessary for good performance].

          (so if measured performance is poor, maybe somebody else should take a look at profiler results.)

          Show
          Sam Marshall added a comment - - edited I don't remember who picked it, but basically 5,000 = big number but won't take up all your memory. It has to be bigger than the expected number of activities + blocks within any single course, because these are cached usually when you look at course pages. We have one course - a slightly silly special-case course - with about 1,500 contexts, I figured it's not likely anyone would go very much bigger. basically we want to cache all the contexts retrieved during normal use - but not cache all contexts in case cron, or whatever admin task, does something that ends up iterating through many of the courses in the system, retrieving all their contexts. i'm not really familiar profiling php but insofar as I know anything about it at all, I generally assume php profilers are wrong.... [and there is also a question as to whether you're running a php accelerator, which is necessary for good performance] . (so if measured performance is poor, maybe somebody else should take a look at profiler results.)
          Jens Eremie made changes -
          Link This issue has a non-specific relationship to MDL-19702 [ MDL-19702 ]
          Hide
          Jens Eremie added a comment -

          The observation of cameron is correct:
          "[....] looked deeply into it, but when I ran kcachegrind when assigning a role it said the script spent approx. 75% of the time inside the cache_context function[....]"

          When the limit of MAX_CONTEXT_CACHE_SIZE exceeds 5000 one single tuple is removed from the cache array and this may be done 500,1000....3000 times.... which is far to slow....

          & please have a look @ http://tracker.moodle.org/browse/MDL-19702

          Jens

          Show
          Jens Eremie added a comment - The observation of cameron is correct: " [....] looked deeply into it, but when I ran kcachegrind when assigning a role it said the script spent approx. 75% of the time inside the cache_context function [....] " When the limit of MAX_CONTEXT_CACHE_SIZE exceeds 5000 one single tuple is removed from the cache array and this may be done 500,1000....3000 times.... which is far to slow.... & please have a look @ http://tracker.moodle.org/browse/MDL-19702 Jens
          Hide
          Sam Marshall added a comment -

          Why do you think removing a single tuple from the cache array is time-consuming? Here is a test case about removing a single thing from an array 10,000 times:

          require_once('config.php');
          
          $a = array();
          for($i = 0; $i < 10000; $i++) {
              $a[] = (object)array('hello'=>$i, 'test', 'whatever');
          }
          
          $before = microtime(true);
          for($i = 0; $i < 10000; $i++) {
              $count = count($a);
              if($count > 0) {
                  unset($a[$i]);
              }
          }
          $after = microtime(true);
          print_object($after-$before);
          exit;
          

          Between the 'before' and 'after' it counts the items in the array, then unsets one thing from it, 10,000 times. This takes, on our development server, 0.014 seconds. While that's a non-neglible time, it hardly makes much difference to the average Moodle request, let alone something that's churning through fifteen thousand contexts.

          If necessary you might like to try this testcase on your system to see if there is some weird setup that might make it slow? Otherwise it is more complicated than 'counting array and removing things is slow', because it ain't - at most, maybe counting and removing things from that particular array is slow...

          Show
          Sam Marshall added a comment - Why do you think removing a single tuple from the cache array is time-consuming? Here is a test case about removing a single thing from an array 10,000 times: require_once('config.php'); $a = array(); for ($i = 0; $i < 10000; $i++) { $a[] = (object)array('hello'=>$i, 'test', 'whatever'); } $before = microtime( true ); for ($i = 0; $i < 10000; $i++) { $count = count($a); if ($count > 0) { unset($a[$i]); } } $after = microtime( true ); print_object($after-$before); exit; Between the 'before' and 'after' it counts the items in the array, then unsets one thing from it, 10,000 times. This takes, on our development server, 0.014 seconds. While that's a non-neglible time, it hardly makes much difference to the average Moodle request, let alone something that's churning through fifteen thousand contexts. If necessary you might like to try this testcase on your system to see if there is some weird setup that might make it slow? Otherwise it is more complicated than 'counting array and removing things is slow', because it ain't - at most, maybe counting and removing things from that particular array is slow...
          Hide
          Eloy Lafuente (stronk7) added a comment -

          I'm with Sam that counting shouldn't be slow at all. In any case... I agree that we are using that cache really badly, sometimes, using cache_context() others adding directly to the cache global object... it should be reworked to be a proper cache, always accessed through its API: add/clean... whatever.

          In fact, due to the "repeated" nature of contexts... one easy improvement to do is to check in cache_context() if the element already exists or no. Seems obvious. Before any further processing.

          Also... note that xdebug, for example... uses to slow things A LOT. Not sure if this is the case, but I've seen big slowdowns with xdebug running. And really, the count() can be performed only once (out from the loop) and then decrease results as we are deleting elements.

          Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - I'm with Sam that counting shouldn't be slow at all. In any case... I agree that we are using that cache really badly, sometimes, using cache_context() others adding directly to the cache global object... it should be reworked to be a proper cache, always accessed through its API: add/clean... whatever. In fact, due to the "repeated" nature of contexts... one easy improvement to do is to check in cache_context() if the element already exists or no. Seems obvious. Before any further processing. Also... note that xdebug, for example... uses to slow things A LOT. Not sure if this is the case, but I've seen big slowdowns with xdebug running. And really, the count() can be performed only once (out from the loop) and then decrease results as we are deleting elements. Ciao
          Hide
          Jens Eremie added a comment -

          Both of you are right.... counting usualy costs about.... noting.
          But in fact, the case in cache_context() is:
          cache_context(){
          .....
          while (count($context_cache_id) >= MAX_CONTEXT_CACHE_SIZE)

          { $first = reset($context_cache_id); unset($context_cache_id[$first->id]); unset($context_cache[$first->contextlevel][$first->instanceid]); }

          .....
          }

          1. call cache_context()
          2. call count($context_cache_id) -> count the array
          if | $context_cache_id | >= MAX_CONTEXT_CACHE_SIZE
          3. remove one single tuple
          4. -> 1.

          How often is count: Tuple 1-5000
          1. Tuple = count(1) = 1
          2. Tuple = count(2) = 2 // 1 + 2
          3. Tuple = count(3) = 3 // 1 + 2 + 3 .....
          ......
          5000. .....
          = the sum starting at one to 5000 = (n^2+n)/2 = (5000^2+5000)/2 = 12.502.500

          Now lets assume we have 7000 tuples:
          We already have 5000 from above, 2000 remaining and the |array| = 5000 // constant because one singe tuple is deleted if the array is "full" (5000)
          = (2000*5000) counts = 10.000.000

          Total: 12.502.500 + 10.000.000 = 22.202.500 tuple counts are done on the array.

          Now simply count how many set/unsets are done on the array is defenitely less work, am i wrong ?

          That's why the whole thing is such expensive... in this particular case.
          Please have also a closer look @ MDL-19702 How and why i did that.......

          Jens

          Show
          Jens Eremie added a comment - Both of you are right.... counting usualy costs about.... noting. But in fact, the case in cache_context() is: cache_context(){ ..... while (count($context_cache_id) >= MAX_CONTEXT_CACHE_SIZE) { $first = reset($context_cache_id); unset($context_cache_id[$first->id]); unset($context_cache[$first->contextlevel][$first->instanceid]); } ..... } 1. call cache_context() 2. call count($context_cache_id) -> count the array if | $context_cache_id | >= MAX_CONTEXT_CACHE_SIZE 3. remove one single tuple 4. -> 1. How often is count: Tuple 1-5000 1. Tuple = count(1) = 1 2. Tuple = count(2) = 2 // 1 + 2 3. Tuple = count(3) = 3 // 1 + 2 + 3 ..... ...... 5000. ..... = the sum starting at one to 5000 = (n^2+n)/2 = (5000^2+5000)/2 = 12.502.500 Now lets assume we have 7000 tuples: We already have 5000 from above, 2000 remaining and the |array| = 5000 // constant because one singe tuple is deleted if the array is "full" (5000) = (2000*5000) counts = 10.000.000 Total: 12.502.500 + 10.000.000 = 22.202.500 tuple counts are done on the array. Now simply count how many set/unsets are done on the array is defenitely less work, am i wrong ? That's why the whole thing is such expensive... in this particular case. Please have also a closer look @ MDL-19702 How and why i did that....... Jens
          Hide
          Sam Marshall added a comment -

          Sorry, I must be missing something. I don't follow your logic.

          It seems to me that for tuples 1-5000, 'count' will be called precisely once. Say it's tuple 37 for instance:

          while (count($context_cache_id) >= MAX_CONTEXT_CACHE_SIZE) {

          it will work out count (once!), it is 37, that is less than max context cache, so it does not enter the loop. End, no more counts. Count ran 1 time.

          OK now say it's tuple 6,400:

          while (count($context_cache_id) >= MAX_CONTEXT_CACHE_SIZE) {

          count($context_cache_id) will be 5,000, so the condition is true and it goes around the loop, removing one item. Next time count($context_cache_id) will be 4,999 so the condition is false and it stops. In other words count then runs 2 times.

          So the total counts =

          1 * (tuples 1-5000) + 2 * (tuples 5000+)

          e.g. for 7,000 tuples there will be 5,000 + 2,000 * 2 = 9,000 calls to count.

          Or let's say you bypassed the system and somehow made the cache larger than MAX_CONTEXT_CACHE_SIZE, so that (before calling this with entry 6,400) it begins with size 6,399 in there. In that case it will call count 1,400 times. However, the NEXT time you call cache_context, it only has to do it twice again - in fact this results in fewer calls to count than if you did it one at a time.

          At least that is how the code is supposed to work. Is it for some reason not working that way due to a bug I haven't spotted? For example, if you make a static $countcount and do $countcount++ each time around that loop, does it end up with these really big numbers like 22 million loops for the 7,000 added tuples?

          Show
          Sam Marshall added a comment - Sorry, I must be missing something. I don't follow your logic. It seems to me that for tuples 1-5000, 'count' will be called precisely once. Say it's tuple 37 for instance: while (count($context_cache_id) >= MAX_CONTEXT_CACHE_SIZE) { it will work out count (once!), it is 37, that is less than max context cache, so it does not enter the loop. End, no more counts. Count ran 1 time. OK now say it's tuple 6,400: while (count($context_cache_id) >= MAX_CONTEXT_CACHE_SIZE) { count($context_cache_id) will be 5,000, so the condition is true and it goes around the loop, removing one item. Next time count($context_cache_id) will be 4,999 so the condition is false and it stops. In other words count then runs 2 times. So the total counts = 1 * (tuples 1-5000) + 2 * (tuples 5000+) e.g. for 7,000 tuples there will be 5,000 + 2,000 * 2 = 9,000 calls to count. Or let's say you bypassed the system and somehow made the cache larger than MAX_CONTEXT_CACHE_SIZE, so that (before calling this with entry 6,400) it begins with size 6,399 in there. In that case it will call count 1,400 times. However, the NEXT time you call cache_context, it only has to do it twice again - in fact this results in fewer calls to count than if you did it one at a time. At least that is how the code is supposed to work. Is it for some reason not working that way due to a bug I haven't spotted? For example, if you make a static $countcount and do $countcount++ each time around that loop, does it end up with these really big numbers like 22 million loops for the 7,000 added tuples?
          Hide
          Jens Eremie added a comment -

          Ok, i took Sam's code and changed it to moodle's way, as it is also done in cache_context() and get_fast_mod_info()

          Basically i just moved the "caching" inside of a function call.
          If you want to have an estimate of the counted tuples just uncomment the appropriate lines..... i hope this helps to clear up the above calculation ...

          <?php
          require_once('config.php');
          
          $cache = array();
          $tuples_counted = 0;
          
          function cache($to_cache){
          	global $cache, $tuples_counted;
          	
          	// $count = count($cache);
          	// $tuples_counted += $count;	
          
          	while (count($cache) > 5000) {
          		$delete_me=reset($cache);
          		unset($cache[$delete_me->my_id_is]);
          	}
          
          	$cache[$to_cache->my_id_is]=$to_cache;
          }
          
          $before = microtime(true);
          for($i = 0; $i < 7000; $i++) {
              	$to_cache = (object)array('my_id_is'=>$i, 'test', 'whatever');
          	cache($to_cache);
          }
          $after = microtime(true);
          print_object($after-$before);
          // echo 'tuples counted: '.$tuples_counted;
          
          exit;
          ?>
          

          microtime = approx. 3.40

          Now i just got rid of: count($cache)

          <?php
          require_once('config.php');
          
          $cache = array();
          
          function cache($to_cache){
          	global $cache;
          	static $cache_counter = 0;
          
          	while ($cache_counter > 5000) {
          		$delete_me=reset($cache);
          		unset($cache[$delete_me->my_id_is]);
          		$cache_counter--;
          	}
          
          	$cache[$to_cache->my_id_is]=$to_cache;
          	$cache_counter++;	
          }
          
          $before = microtime(true);
          for($i = 0; $i < 7000; $i++) {
              	$to_cache = (object)array('my_id_is'=>$i, 'test', 'whatever');
          	cache($to_cache);
          }
          $after = microtime(true);
          print_object($after-$before);
          
          exit;
          ?>
          

          microtime = approx. 0.021

          This leaded me to the point that there must be something wrong using count() inside of nested functions calls.
          May be, doing it Sam's way, optimization of php come to effect - loop unrolling, or some what.... which makes it pretty fast....
          But inside the nesting.... no effect, even worse... In this case it's cheaper to count "by hand" and take care that there is no write access to the cache array from ouside cache_context().

          and for 10.000+ tuples there is a performance plus by simply removing bigger chunks....

          <?php
          require_once('config.php');
          
          $cache = array();
          
          function cache($to_cache){
          	global $cache;
          	static $cache_counter = 0;
          
          	if($cache_counter > 5000) {
          		while ($cache_counter > 4000){ // 80% cache fill-factor
          			$delete_me=reset($cache);
          			unset($cache[$delete_me->my_id_is]);
          			$cache_counter--;
          		}
          	}
          
          	$cache[$to_cache->my_id_is]=$to_cache;
          	$cache_counter++;	
          }
          
          $before = microtime(true);
          for($i = 0; $i < 7000; $i++) {
              	$to_cache = (object)array('my_id_is'=>$i, 'test', 'whatever');
          	cache($to_cache);
          }
          $after = microtime(true);
          print_object($after-$before);
          
          exit;
          ?>
          
          Show
          Jens Eremie added a comment - Ok, i took Sam's code and changed it to moodle's way, as it is also done in cache_context() and get_fast_mod_info() Basically i just moved the "caching" inside of a function call. If you want to have an estimate of the counted tuples just uncomment the appropriate lines..... i hope this helps to clear up the above calculation ... <?php require_once('config.php'); $cache = array(); $tuples_counted = 0; function cache($to_cache){ global $cache, $tuples_counted; // $count = count($cache); // $tuples_counted += $count; while (count($cache) > 5000) { $delete_me=reset($cache); unset($cache[$delete_me->my_id_is]); } $cache[$to_cache->my_id_is]=$to_cache; } $before = microtime(true); for($i = 0; $i < 7000; $i++) { $to_cache = (object)array('my_id_is'=>$i, 'test', 'whatever'); cache($to_cache); } $after = microtime(true); print_object($after-$before); // echo 'tuples counted: '.$tuples_counted; exit; ?> microtime = approx. 3.40 Now i just got rid of: count($cache) <?php require_once('config.php'); $cache = array(); function cache($to_cache){ global $cache; static $cache_counter = 0; while ($cache_counter > 5000) { $delete_me=reset($cache); unset($cache[$delete_me->my_id_is]); $cache_counter--; } $cache[$to_cache->my_id_is]=$to_cache; $cache_counter++; } $before = microtime(true); for($i = 0; $i < 7000; $i++) { $to_cache = (object)array('my_id_is'=>$i, 'test', 'whatever'); cache($to_cache); } $after = microtime(true); print_object($after-$before); exit; ?> microtime = approx. 0.021 This leaded me to the point that there must be something wrong using count() inside of nested functions calls. May be, doing it Sam's way, optimization of php come to effect - loop unrolling, or some what.... which makes it pretty fast.... But inside the nesting.... no effect, even worse... In this case it's cheaper to count "by hand" and take care that there is no write access to the cache array from ouside cache_context(). and for 10.000+ tuples there is a performance plus by simply removing bigger chunks.... <?php require_once('config.php'); $cache = array(); function cache($to_cache){ global $cache; static $cache_counter = 0; if($cache_counter > 5000) { while ($cache_counter > 4000){ // 80% cache fill-factor $delete_me=reset($cache); unset($cache[$delete_me->my_id_is]); $cache_counter--; } } $cache[$to_cache->my_id_is]=$to_cache; $cache_counter++; } $before = microtime(true); for($i = 0; $i < 7000; $i++) { $to_cache = (object)array('my_id_is'=>$i, 'test', 'whatever'); cache($to_cache); } $after = microtime(true); print_object($after-$before); exit; ?>
          Hide
          Sam Marshall added a comment -

          Ah, I guess your calculations were assuming that the time it takes to count an array is proportional to array size? OK, that makes sense, if it's true. (If it were me, I'd make sure count was O(1), but...)

          I saw the following times (with a few reloads to ensure consistency):

          First code 0.023

          Second code 0.016

          Third code 0.018

          So it is faster, for sure, but not all that much of a difference. Also, the third (chunk) version is fractionally slower again.

          Are you perhaps not running a php accelerator? I think we are running one. A quick look at our phpinfo showed Zend Optimizer, which might be one I guess, who knows.

          With the third, I really don't understand why removing bigger chunks would help in this case. Are you seeing significant differences there? You didn't quote the time. As noted, it was fractionally slower on my setup. Of course there are some silly optimisations possible there... And it might make things faster in other areas if the array hash lookups are faster with fewer items, I guess, with average size being 4,500 instead of 5,000...

          Basically I think maybe we could make it count this separately instead of calling count() each time, although it does seem a bit like micro-optimising which I'm not normally in favour of - but surely there must be something wrong with your system as well, if count is causing such a problem?

          Show
          Sam Marshall added a comment - Ah, I guess your calculations were assuming that the time it takes to count an array is proportional to array size? OK, that makes sense, if it's true. (If it were me, I'd make sure count was O(1), but...) I saw the following times (with a few reloads to ensure consistency): First code 0.023 Second code 0.016 Third code 0.018 So it is faster, for sure, but not all that much of a difference. Also, the third (chunk) version is fractionally slower again. Are you perhaps not running a php accelerator? I think we are running one. A quick look at our phpinfo showed Zend Optimizer, which might be one I guess, who knows. With the third, I really don't understand why removing bigger chunks would help in this case. Are you seeing significant differences there? You didn't quote the time. As noted, it was fractionally slower on my setup. Of course there are some silly optimisations possible there... And it might make things faster in other areas if the array hash lookups are faster with fewer items, I guess, with average size being 4,500 instead of 5,000... Basically I think maybe we could make it count this separately instead of calling count() each time, although it does seem a bit like micro-optimising which I'm not normally in favour of - but surely there must be something wrong with your system as well, if count is causing such a problem?
          Hide
          Jens Eremie added a comment -

          Sam, yes due to the, on my machines, significant longer execution time i thought it must be the case that to count an array is proportional to array size.... it's true, it should be O(1). Does anybody know how php handels such things internally ?

          I did some further measurement of the first sample on different hardware and os platforms with and without eaccelerator:
          DELL Server, (Dual Xenon, Ubuntu std. Apache + PHP 32 Bit)
          microtime = approx. 9.16 (eacc)
          microtime = approx. 10.32 (eacc off)
          Apple XServe (Dual G5, MacOSX 10.5 Server, Apache + PHP 32 Bit)
          microtime = approx. 20.71 (eacc on/off)
          Apple MacBook Pro (Core2Duo, MacOSX 10.5, Apache + PHP 64 Bit)
          microtime = approx. 3.71 (eacc on/off)
          ZendServer 4.0.4 w/o debugger
          microtime = approx. 4.32

          The results speak for themselfs.... and the useage of eaccelerator doesn't even matter at all. May be you could test the first sample without the zend optimizer? Wich is a commerical product, isn't it ?

          For the third sample:
          remove bigger chunks (e.g. 500 tuples at once) of the array is faster since we even don't know which tuples will be read first/next..... MAX_CONTEXT_CACHE_SIZE could also be reduced to about (+/.- 500) 2500. Because above 2500 there is no performance plus and the cache is flushed serveral time anyway.... so we just save some "check and delete"-time since we have space for another 500 tuples ( please have a look at MDL-19702 ).

          If the whole thing would be a kind of "micro-optimising", i agree, it wouldn't be worth thinking about it further more.... but the problem seems to be real...

          If you're thinking about to give this approach a try, the work is already done -> MDL-19702

          Show
          Jens Eremie added a comment - Sam, yes due to the, on my machines, significant longer execution time i thought it must be the case that to count an array is proportional to array size.... it's true, it should be O(1). Does anybody know how php handels such things internally ? I did some further measurement of the first sample on different hardware and os platforms with and without eaccelerator: DELL Server, (Dual Xenon, Ubuntu std. Apache + PHP 32 Bit) microtime = approx. 9.16 (eacc) microtime = approx. 10.32 (eacc off) Apple XServe (Dual G5, MacOSX 10.5 Server, Apache + PHP 32 Bit) microtime = approx. 20.71 (eacc on/off) Apple MacBook Pro (Core2Duo, MacOSX 10.5, Apache + PHP 64 Bit) microtime = approx. 3.71 (eacc on/off) ZendServer 4.0.4 w/o debugger microtime = approx. 4.32 The results speak for themselfs.... and the useage of eaccelerator doesn't even matter at all. May be you could test the first sample without the zend optimizer? Wich is a commerical product, isn't it ? For the third sample: remove bigger chunks (e.g. 500 tuples at once) of the array is faster since we even don't know which tuples will be read first/next..... MAX_CONTEXT_CACHE_SIZE could also be reduced to about (+/.- 500) 2500. Because above 2500 there is no performance plus and the cache is flushed serveral time anyway.... so we just save some "check and delete"-time since we have space for another 500 tuples ( please have a look at MDL-19702 ). If the whole thing would be a kind of "micro-optimising", i agree, it wouldn't be worth thinking about it further more.... but the problem seems to be real... If you're thinking about to give this approach a try, the work is already done -> MDL-19702
          Martin Dougiamas made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Martin Dougiamas made changes -
          Workflow jira [ 31358 ] MDL Workflow [ 62242 ]
          Martin Dougiamas made changes -
          Workflow MDL Workflow [ 62242 ] MDL Full Workflow [ 91446 ]

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: