Moodle
  1. Moodle
  2. MDL-30761

Performance regression: slow login because get_user_access_sitewide takes way too long since moodle 2.2

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.2, 2.3
    • Fix Version/s: 2.2.1
    • Component/s: Libraries, Performance
    • Labels:
    • Environment:

      Used server is a recent 5.1 mysql server on 64 bit Linux.
    • Database:
      MySQL
    • Testing Instructions:
      Hide

      Note to tester: See last comment below, about unit-tests already passed and other thoughts (before testing).

      1/ run /lib/simpletest/fulltestaccesslib.php for all 4 supported databases to make sure there are no regressions
      2/ find some large test site and verify it is faster on pg and mysql

      Show
      Note to tester: See last comment below, about unit-tests already passed and other thoughts (before testing). 1/ run /lib/simpletest/fulltestaccesslib.php for all 4 supported databases to make sure there are no regressions 2/ find some large test site and verify it is faster on pg and mysql
    • Workaround:
      Hide

      use PostgreSQL

      Show
      use PostgreSQL
    • Affected Branches:
      MOODLE_22_STABLE, MOODLE_23_STABLE
    • Fixed Branches:
      MOODLE_22_STABLE
    • Pull from Repository:
    • Pull Master Branch:
      w51_MDL-30761_m23_slowmysql
    • Rank:
      33689

      Description

      The query for getting "overrides of interesting roles in all interesting child contexts" takes very very long since the update to moodle 2.2

      With moodle 2.1.3 the created query (not captured) took about 0.7 seconds, with moodle 2.2 it takes about 40 seconds. This depends on the user, not all users are affected - seems to depend on the number of entries in the IN - clause, because with every entry the query execution time increases by about 1 second.

      query created by moodle 2.2 (which took measurably long)

      /lib/accesslib.php around line 770
      SELECT ctx.path, rc.roleid, rc.capability, rc.permission
                           FROM mdl_role_capabilities rc
                           JOIN mdl_context ctx
                                ON (ctx.id = rc.contextid)
                      LEFT JOIN mdl_context cctx
                                ON (cctx.contextlevel = 50
                                    AND ctx.path LIKE CONCAT(cctx.path, '/%'))
                           JOIN mdl_context pctx
                                ON (pctx.path IN (XXXX)
                                    AND (ctx.id = pctx.id
                                         OR ctx.path LIKE CONCAT(pctx.path, '/%')
                                         OR pctx.path LIKE CONCAT(ctx.path, '/%')))
                          WHERE rc.roleid = 'XXXX'
                                AND cctx.id IS NULL
      

      About 1200 entries in the mdl_role_capabilites, about 25000 in the mdl_context table, slow query log states "Query_time: 36.833683 Lock_time: 0.000498 Rows_sent: 449 Rows_examined: 29503116". Most time spent with this query.

        Issue Links

          Activity

          Hide
          Michael de Raadt added a comment -

          Hi, Himmelbauer.

          I suspect this function is called in a number of places. Could you let us know where in Moodle this is causing a problem?

          Show
          Michael de Raadt added a comment - Hi, Himmelbauer. I suspect this function is called in a number of places. Could you let us know where in Moodle this is causing a problem?
          Hide
          Himmelbauer added a comment - - edited

          As mentioned in the title of this bug this is causing the login process to take a long while (user enters username + password, clicks on login and waits for many seconds).

          Show
          Himmelbauer added a comment - - edited As mentioned in the title of this bug this is causing the login process to take a long while (user enters username + password, clicks on login and waits for many seconds).
          Hide
          Petr Škoda added a comment -

          Thanks a lot for the report, I have rewritten the query a bit to make it faster:

          • mysql before 120s, after 2s
          • pg before 4s, after 1s

          My test site info:

          • 5002 users
          • 1000 courses
          • 31 course cats
          • 9994 activities
          • 5004 blocks
          • 135675 enrolments
          • 135876 ras
          • 21032 contexts
          Show
          Petr Škoda added a comment - Thanks a lot for the report, I have rewritten the query a bit to make it faster: mysql before 120s, after 2s pg before 4s, after 1s My test site info: 5002 users 1000 courses 31 course cats 9994 activities 5004 blocks 135675 enrolments 135876 ras 21032 contexts
          Hide
          Petr Škoda added a comment -

          I have done some more profiling and fixed another query, now I get 0.3s for pg and 0.7s for mysql.

          Show
          Petr Škoda added a comment - I have done some more profiling and fixed another query, now I get 0.3s for pg and 0.7s for mysql.
          Hide
          Himmelbauer added a comment -

          Strange that postgres is so much faster in your tests - I copied the two queried tables to another system, where the "original" (=slow) query takes 10 seconds using mysql and 6 seconds using postgres.

          Using the function from moodle 2.1.3+ caused the query to be finished in about the same time (0.7 seconds), but unfortunately this could not be reliably used because it breaks the upload of files in assignments and on other places.

          Thanks for the fast response and the fix.

          Show
          Himmelbauer added a comment - Strange that postgres is so much faster in your tests - I copied the two queried tables to another system, where the "original" (=slow) query takes 10 seconds using mysql and 6 seconds using postgres. Using the function from moodle 2.1.3+ caused the query to be finished in about the same time (0.7 seconds), but unfortunately this could not be reliably used because it breaks the upload of files in assignments and on other places. Thanks for the fast response and the fix.
          Hide
          Petr Škoda added a comment -

          I've tested also MySQL 5.5 and MariaDB 5.3.2 using different query modifications and consistently PostgreSQL was faster. This is not the first time, we have similar problems in statistics processing, enrol plugins, amos (translations), etc. PostgreSQL seems to be better for unoptimised complex queries with many joins or subqueries.

          Show
          Petr Škoda added a comment - I've tested also MySQL 5.5 and MariaDB 5.3.2 using different query modifications and consistently PostgreSQL was faster. This is not the first time, we have similar problems in statistics processing, enrol plugins, amos (translations), etc. PostgreSQL seems to be better for unoptimised complex queries with many joins or subqueries.
          Hide
          Eloy Lafuente (stronk7) added a comment -

          It looks really, ok... I'm just running the fulltestaccesslib.php tests (over all DBs) before sending this upstream.

          In the mean time, it would be great if we can get some feedback from you, Himmelbauer, and how the fix is performing to you.

          Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - It looks really, ok... I'm just running the fulltestaccesslib.php tests (over all DBs) before sending this upstream. In the mean time, it would be great if we can get some feedback from you, Himmelbauer, and how the fix is performing to you. Ciao
          Hide
          Eloy Lafuente (stronk7) added a comment -

          fulltestaccesslib.php working under all DBs but got 1 failure under Oracle:

          Fail: lib/simpletest/fulltestaccesslib.php / ▶ accesslib_test / ▶ test_everything_in_accesslib
          Equal expectation fails at character 0 with [3] and [2] at [lib/simpletest/fulltestaccesslib.php line 590]
          
          Show
          Eloy Lafuente (stronk7) added a comment - fulltestaccesslib.php working under all DBs but got 1 failure under Oracle: Fail: lib/simpletest/fulltestaccesslib.php / ▶ accesslib_test / ▶ test_everything_in_accesslib Equal expectation fails at character 0 with [3] and [2] at [lib/simpletest/fulltestaccesslib.php line 590]
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Crap, changing this:

          diff --git a/lib/accesslib.php b/lib/accesslib.php
          index 8c4ecd1..56ce622 100644
          --- a/lib/accesslib.php
          +++ b/lib/accesslib.php
          @@ -1675,7 +1675,11 @@ function role_unassign($roleid, $userid, $contextid, $component = '', $itemid =
                   }
               }
           
          -    role_unassign_all(array('roleid'=>$roleid, 'userid'=>$userid, 'contextid'=>$contextid, 'component'=>$component, 'itemid'=>$itemid), false, false);
          +    $params = array('roleid'=>$roleid, 'userid'=>$userid, 'contextid'=>$contextid, 'itemid'=>$itemid);
          +    if (!empty($component)) {
          +        $params['component'] = $component;
          +    }
          +    role_unassign_all($params, false, false);
           }
          

          causes oracle test to pass... so there is something wrong in that 'component' column not sure if @ insert or query or whitespace hack or definition, but leading to missed role_unassign() for sure.

          Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - Crap, changing this: diff --git a/lib/accesslib.php b/lib/accesslib.php index 8c4ecd1..56ce622 100644 --- a/lib/accesslib.php +++ b/lib/accesslib.php @@ -1675,7 +1675,11 @@ function role_unassign($roleid, $userid, $contextid, $component = '', $itemid = } } - role_unassign_all(array('roleid'=>$roleid, 'userid'=>$userid, 'contextid'=>$contextid, 'component'=>$component, 'itemid'=>$itemid), false , false ); + $params = array('roleid'=>$roleid, 'userid'=>$userid, 'contextid'=>$contextid, 'itemid'=>$itemid); + if (!empty($component)) { + $params['component'] = $component; + } + role_unassign_all($params, false , false ); } causes oracle test to pass... so there is something wrong in that 'component' column not sure if @ insert or query or whitespace hack or definition, but leading to missed role_unassign() for sure. Ciao
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Oki, for Oracle:

          We only perform the automatic empty => 1whitespace (the dirty hack) automatically on insert/update/set_field operations. But never one get_recordxxxx ones.

          Why? Because due to the how those methods are chained, at the final one - get_recordset_sql() - we don't have easy access to the tablename, hence the hack does not transform anything.

          So there are 2 solutions for this:

          1) Or $component = $DB->sql_empty() is used by the caller. Like other parts of Moodle are doing now. Really it's not usual.
          2) Or we try to support the dirty hack in more operations, there are 2 possibilities:

          • a) support it for 1-table get_recordxxxx() operations (this will leave out get_recordxxxx_sql() ones, but seems easier and safer).
          • b) support it everywhere, so each time we get one === '' in param, it's converted always to 1whitespace, without introspecting into metadata at all, I think this is more risky, but if covered properly by tests, could work ok too IMO.

          I think any of 1), 2a) and 2b) are ok to get this fixed, just need to agree about the best one.

          Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - Oki, for Oracle: We only perform the automatic empty => 1whitespace (the dirty hack) automatically on insert/update/set_field operations. But never one get_recordxxxx ones. Why? Because due to the how those methods are chained, at the final one - get_recordset_sql() - we don't have easy access to the tablename, hence the hack does not transform anything. So there are 2 solutions for this: 1) Or $component = $DB->sql_empty() is used by the caller. Like other parts of Moodle are doing now. Really it's not usual. 2) Or we try to support the dirty hack in more operations, there are 2 possibilities: a) support it for 1-table get_recordxxxx() operations (this will leave out get_recordxxxx_sql() ones, but seems easier and safer). b) support it everywhere, so each time we get one === '' in param, it's converted always to 1whitespace, without introspecting into metadata at all, I think this is more risky, but if covered properly by tests, could work ok too IMO. I think any of 1), 2a) and 2b) are ok to get this fixed, just need to agree about the best one. Ciao
          Hide
          Petr Škoda added a comment -

          wow! I thought the get_record() does the '' hackery, +10 for 2a)

          Show
          Petr Škoda added a comment - wow! I thought the get_record() does the '' hackery, +10 for 2a)
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Solution 1), working here under the 4 DBs (fulltestaccesslib.php passes 100%) is:

          diff --git a/lib/accesslib.php b/lib/accesslib.php
          index 8c4ecd1..bf0616f 100644
          --- a/lib/accesslib.php
          +++ b/lib/accesslib.php
          @@ -1605,6 +1605,7 @@ function role_assign($roleid, $userid, $contextid, $component = '', $itemid = 0,
               }
           
           /// Check for existing entry
          +    $component = $component === '' ? $DB->sql_empty() : $component;
               $ras = $DB->get_records('role_assignments', array('roleid'=>$roleid, 'contextid'=>$context->id, 'userid'=>$userid, 'component'=>$component, 'itemid'=>$itemid), 'id');
           
               if ($ras) {
          @@ -1717,6 +1718,9 @@ function role_unassign_all(array $params, $subcontexts = false, $includemanual =
                   }
               }
           
          +    if (isset($params['component'])) {
          +        $params['component'] = $params['component'] === '' ? $DB->sql_empty() : $params['component'];
          +    }
               $ras = $DB->get_records('role_assignments', $params);
               foreach($ras as $ra) {
                   $DB->delete_records('role_assignments', array('id'=>$ra->id));
          

          Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - Solution 1), working here under the 4 DBs (fulltestaccesslib.php passes 100%) is: diff --git a/lib/accesslib.php b/lib/accesslib.php index 8c4ecd1..bf0616f 100644 --- a/lib/accesslib.php +++ b/lib/accesslib.php @@ -1605,6 +1605,7 @@ function role_assign($roleid, $userid, $contextid, $component = '', $itemid = 0, } /// Check for existing entry + $component = $component === '' ? $DB->sql_empty() : $component; $ras = $DB->get_records('role_assignments', array('roleid'=>$roleid, 'contextid'=>$context->id, 'userid'=>$userid, 'component'=>$component, 'itemid'=>$itemid), 'id'); if ($ras) { @@ -1717,6 +1718,9 @@ function role_unassign_all(array $params, $subcontexts = false , $includemanual = } } + if (isset($params['component'])) { + $params['component'] = $params['component'] === '' ? $DB->sql_empty() : $params['component']; + } $ras = $DB->get_records('role_assignments', $params); foreach($ras as $ra) { $DB->delete_records('role_assignments', array('id'=>$ra->id)); Ciao
          Hide
          Himmelbauer added a comment -

          The new query boosted moodle's performance. At least nobody complained about too slow logins anymore.

          Show
          Himmelbauer added a comment - The new query boosted moodle's performance. At least nobody complained about too slow logins anymore.
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Thanks Himmelbauer!

          Petr, so is it ok for you to add the small changes above, so we get it upstream now.

          And later, with calm, we implement/test the bindings/hack problem in Oracle (MDL-29765). ?

          Show
          Eloy Lafuente (stronk7) added a comment - Thanks Himmelbauer! Petr, so is it ok for you to add the small changes above, so we get it upstream now. And later, with calm, we implement/test the bindings/hack problem in Oracle ( MDL-29765 ). ?
          Hide
          Petr Škoda added a comment -

          I am sorry, I have to work on something else right now, feel free to add any oracle hack you want, ciao.

          Show
          Petr Škoda added a comment - I am sorry, I have to work on something else right now, feel free to add any oracle hack you want, ciao.
          Hide
          Eloy Lafuente (stronk7) added a comment -

          This has been integrated, thanks!

          Note: I've added extra commit with the 2 sql_empty() uses needed to make it work under Oracle, with TODO pointing to MDL-29765.

          To testers: I'm running the fullxxx unit tests against the 4 DBs right now. Will inform here about results.

          Show
          Eloy Lafuente (stronk7) added a comment - This has been integrated, thanks! Note: I've added extra commit with the 2 sql_empty() uses needed to make it work under Oracle, with TODO pointing to MDL-29765 . To testers: I'm running the fullxxx unit tests against the 4 DBs right now. Will inform here about results.
          Hide
          Petr Škoda added a comment -

          Eloy: thanks a lot!

          Show
          Petr Škoda added a comment - Eloy: thanks a lot!
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Just finished running the fullxxx unit tests. They have passed under all DBs, so no need to run them again.

          Just guessing if Jordan may time-compare results before and after into some testing machine having a well-known big site Although the site is not really big, from a number of contexts POV.

          Else, I think that having unit test passing and feedback from Himmelbauer may be enough, for your consideration, dear tester.

          Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - Just finished running the fullxxx unit tests. They have passed under all DBs, so no need to run them again. Just guessing if Jordan may time-compare results before and after into some testing machine having a well-known big site Although the site is not really big, from a number of contexts POV. Else, I think that having unit test passing and feedback from Himmelbauer may be enough, for your consideration, dear tester. Ciao
          Hide
          Aparup Banerjee added a comment - - edited

          Jordan and i are getting no diff when running the entire unit test under lib/simpletest/fulltestaccesslib.php in his local copy of moodle.org (roughly 2mins50secs+)

          the 'bad' query itself from the description is too fast! 5 secs on the moodleorg copy that i accessed so i don't know how to better see performance improvements than others have already comment in here about.

          anyway passing this as unit tests are running fine for us too.

          Show
          Aparup Banerjee added a comment - - edited Jordan and i are getting no diff when running the entire unit test under lib/simpletest/fulltestaccesslib.php in his local copy of moodle.org (roughly 2mins50secs+) the 'bad' query itself from the description is too fast! 5 secs on the moodleorg copy that i accessed so i don't know how to better see performance improvements than others have already comment in here about. anyway passing this as unit tests are running fine for us too.
          Hide
          Marcelo Cohen added a comment -

          People, just to add another confirmed case to the discussion: at our university we're migrating from 1.9 to 2.2 and have also experienced a big slowdown on CERTAIN logins (not everyone - around 22 seconds). This is a Moodle site with a database of 60K users and 5K courses. Note: we noted the SAME slowdown both in MySQL and PostgreSQL.

          Another issue which probably is related: the slowdown also happened when viewing the profiles of the affected users. EVERY time.

          We have applied Petr's fix and it's back to normal: ~1s login time for everyone.

          Show
          Marcelo Cohen added a comment - People, just to add another confirmed case to the discussion: at our university we're migrating from 1.9 to 2.2 and have also experienced a big slowdown on CERTAIN logins (not everyone - around 22 seconds). This is a Moodle site with a database of 60K users and 5K courses. Note: we noted the SAME slowdown both in MySQL and PostgreSQL. Another issue which probably is related: the slowdown also happened when viewing the profiles of the affected users. EVERY time. We have applied Petr's fix and it's back to normal: ~1s login time for everyone.
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Whoever decided one week was worth 14 days had really one bad idea. Anyway, the nightmare is over, so thanks for your, once again, amazing contributions. Many, many thanks!

          Now... disconnect, relax and enjoy the next days, yay!

          Closing...ciao

          Show
          Eloy Lafuente (stronk7) added a comment - Whoever decided one week was worth 14 days had really one bad idea. Anyway, the nightmare is over, so thanks for your, once again, amazing contributions. Many, many thanks! Now... disconnect, relax and enjoy the next days, yay! Closing...ciao

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: