Moodle
  1. Moodle
  2. MDL-30761

Performance regression: slow login because get_user_access_sitewide takes way too long since moodle 2.2

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.2, 2.3
    • Fix Version/s: 2.2.1
    • Component/s: Libraries, Performance
    • Labels:
    • Environment:

      Used server is a recent 5.1 mysql server on 64 bit Linux.
    • Database:
      MySQL
    • Testing Instructions:
      Hide

      Note to tester: See last comment below, about unit-tests already passed and other thoughts (before testing).

      1/ run /lib/simpletest/fulltestaccesslib.php for all 4 supported databases to make sure there are no regressions
      2/ find some large test site and verify it is faster on pg and mysql

      Show
      Note to tester: See last comment below, about unit-tests already passed and other thoughts (before testing). 1/ run /lib/simpletest/fulltestaccesslib.php for all 4 supported databases to make sure there are no regressions 2/ find some large test site and verify it is faster on pg and mysql
    • Workaround:
      Hide

      use PostgreSQL

      Show
      use PostgreSQL
    • Affected Branches:
      MOODLE_22_STABLE, MOODLE_23_STABLE
    • Fixed Branches:
      MOODLE_22_STABLE
    • Pull from Repository:
    • Pull Master Branch:
      w51_MDL-30761_m23_slowmysql

      Description

      The query for getting "overrides of interesting roles in all interesting child contexts" takes very very long since the update to moodle 2.2

      With moodle 2.1.3 the created query (not captured) took about 0.7 seconds, with moodle 2.2 it takes about 40 seconds. This depends on the user, not all users are affected - seems to depend on the number of entries in the IN - clause, because with every entry the query execution time increases by about 1 second.

      query created by moodle 2.2 (which took measurably long)

      /lib/accesslib.php around line 770

      SELECT ctx.path, rc.roleid, rc.capability, rc.permission
                           FROM mdl_role_capabilities rc
                           JOIN mdl_context ctx
                                ON (ctx.id = rc.contextid)
                      LEFT JOIN mdl_context cctx
                                ON (cctx.contextlevel = 50
                                    AND ctx.path LIKE CONCAT(cctx.path, '/%'))
                           JOIN mdl_context pctx
                                ON (pctx.path IN (XXXX)
                                    AND (ctx.id = pctx.id
                                         OR ctx.path LIKE CONCAT(pctx.path, '/%')
                                         OR pctx.path LIKE CONCAT(ctx.path, '/%')))
                          WHERE rc.roleid = 'XXXX'
                                AND cctx.id IS NULL

      About 1200 entries in the mdl_role_capabilites, about 25000 in the mdl_context table, slow query log states "Query_time: 36.833683 Lock_time: 0.000498 Rows_sent: 449 Rows_examined: 29503116". Most time spent with this query.

        Gliffy Diagrams

          Issue Links

            Activity

            Hide
            Michael de Raadt added a comment -

            Hi, Himmelbauer.

            I suspect this function is called in a number of places. Could you let us know where in Moodle this is causing a problem?

            Show
            Michael de Raadt added a comment - Hi, Himmelbauer. I suspect this function is called in a number of places. Could you let us know where in Moodle this is causing a problem?
            Hide
            Himmelbauer added a comment - - edited

            As mentioned in the title of this bug this is causing the login process to take a long while (user enters username + password, clicks on login and waits for many seconds).

            Show
            Himmelbauer added a comment - - edited As mentioned in the title of this bug this is causing the login process to take a long while (user enters username + password, clicks on login and waits for many seconds).
            Hide
            Petr Skoda added a comment -

            Thanks a lot for the report, I have rewritten the query a bit to make it faster:

            • mysql before 120s, after 2s
            • pg before 4s, after 1s

            My test site info:

            • 5002 users
            • 1000 courses
            • 31 course cats
            • 9994 activities
            • 5004 blocks
            • 135675 enrolments
            • 135876 ras
            • 21032 contexts
            Show
            Petr Skoda added a comment - Thanks a lot for the report, I have rewritten the query a bit to make it faster: mysql before 120s, after 2s pg before 4s, after 1s My test site info: 5002 users 1000 courses 31 course cats 9994 activities 5004 blocks 135675 enrolments 135876 ras 21032 contexts
            Hide
            Petr Skoda added a comment -

            I have done some more profiling and fixed another query, now I get 0.3s for pg and 0.7s for mysql.

            Show
            Petr Skoda added a comment - I have done some more profiling and fixed another query, now I get 0.3s for pg and 0.7s for mysql.
            Hide
            Himmelbauer added a comment -

            Strange that postgres is so much faster in your tests - I copied the two queried tables to another system, where the "original" (=slow) query takes 10 seconds using mysql and 6 seconds using postgres.

            Using the function from moodle 2.1.3+ caused the query to be finished in about the same time (0.7 seconds), but unfortunately this could not be reliably used because it breaks the upload of files in assignments and on other places.

            Thanks for the fast response and the fix.

            Show
            Himmelbauer added a comment - Strange that postgres is so much faster in your tests - I copied the two queried tables to another system, where the "original" (=slow) query takes 10 seconds using mysql and 6 seconds using postgres. Using the function from moodle 2.1.3+ caused the query to be finished in about the same time (0.7 seconds), but unfortunately this could not be reliably used because it breaks the upload of files in assignments and on other places. Thanks for the fast response and the fix.
            Hide
            Petr Skoda added a comment -

            I've tested also MySQL 5.5 and MariaDB 5.3.2 using different query modifications and consistently PostgreSQL was faster. This is not the first time, we have similar problems in statistics processing, enrol plugins, amos (translations), etc. PostgreSQL seems to be better for unoptimised complex queries with many joins or subqueries.

            Show
            Petr Skoda added a comment - I've tested also MySQL 5.5 and MariaDB 5.3.2 using different query modifications and consistently PostgreSQL was faster. This is not the first time, we have similar problems in statistics processing, enrol plugins, amos (translations), etc. PostgreSQL seems to be better for unoptimised complex queries with many joins or subqueries.
            Hide
            Eloy Lafuente (stronk7) added a comment -

            It looks really, ok... I'm just running the fulltestaccesslib.php tests (over all DBs) before sending this upstream.

            In the mean time, it would be great if we can get some feedback from you, Himmelbauer, and how the fix is performing to you.

            Ciao

            Show
            Eloy Lafuente (stronk7) added a comment - It looks really, ok... I'm just running the fulltestaccesslib.php tests (over all DBs) before sending this upstream. In the mean time, it would be great if we can get some feedback from you, Himmelbauer, and how the fix is performing to you. Ciao
            Hide
            Eloy Lafuente (stronk7) added a comment -

            fulltestaccesslib.php working under all DBs but got 1 failure under Oracle:

            Fail: lib/simpletest/fulltestaccesslib.php / ▶ accesslib_test / ▶ test_everything_in_accesslib
            Equal expectation fails at character 0 with [3] and [2] at [lib/simpletest/fulltestaccesslib.php line 590]
            

            Show
            Eloy Lafuente (stronk7) added a comment - fulltestaccesslib.php working under all DBs but got 1 failure under Oracle: Fail: lib/simpletest/fulltestaccesslib.php / ▶ accesslib_test / ▶ test_everything_in_accesslib Equal expectation fails at character 0 with [3] and [2] at [lib/simpletest/fulltestaccesslib.php line 590]
            Hide
            Eloy Lafuente (stronk7) added a comment -

            Crap, changing this:

            diff --git a/lib/accesslib.php b/lib/accesslib.php
            index 8c4ecd1..56ce622 100644
            --- a/lib/accesslib.php
            +++ b/lib/accesslib.php
            @@ -1675,7 +1675,11 @@ function role_unassign($roleid, $userid, $contextid, $component = '', $itemid =
                     }
                 }
             
            -    role_unassign_all(array('roleid'=>$roleid, 'userid'=>$userid, 'contextid'=>$contextid, 'component'=>$component, 'itemid'=>$itemid), false, false);
            +    $params = array('roleid'=>$roleid, 'userid'=>$userid, 'contextid'=>$contextid, 'itemid'=>$itemid);
            +    if (!empty($component)) {
            +        $params['component'] = $component;
            +    }
            +    role_unassign_all($params, false, false);
             }
            

            causes oracle test to pass... so there is something wrong in that 'component' column not sure if @ insert or query or whitespace hack or definition, but leading to missed role_unassign() for sure.

            Ciao

            Show
            Eloy Lafuente (stronk7) added a comment - Crap, changing this: diff --git a/lib/accesslib.php b/lib/accesslib.php index 8c4ecd1..56ce622 100644 --- a/lib/accesslib.php +++ b/lib/accesslib.php @@ -1675,7 +1675,11 @@ function role_unassign($roleid, $userid, $contextid, $component = '', $itemid = } } - role_unassign_all(array('roleid'=>$roleid, 'userid'=>$userid, 'contextid'=>$contextid, 'component'=>$component, 'itemid'=>$itemid), false, false); + $params = array('roleid'=>$roleid, 'userid'=>$userid, 'contextid'=>$contextid, 'itemid'=>$itemid); + if (!empty($component)) { + $params['component'] = $component; + } + role_unassign_all($params, false, false); } causes oracle test to pass... so there is something wrong in that 'component' column not sure if @ insert or query or whitespace hack or definition, but leading to missed role_unassign() for sure. Ciao
            Hide
            Eloy Lafuente (stronk7) added a comment -

            Oki, for Oracle:

            We only perform the automatic empty => 1whitespace (the dirty hack) automatically on insert/update/set_field operations. But never one get_recordxxxx ones.

            Why? Because due to the how those methods are chained, at the final one - get_recordset_sql() - we don't have easy access to the tablename, hence the hack does not transform anything.

            So there are 2 solutions for this:

            1) Or $component = $DB->sql_empty() is used by the caller. Like other parts of Moodle are doing now. Really it's not usual.
            2) Or we try to support the dirty hack in more operations, there are 2 possibilities:

            • a) support it for 1-table get_recordxxxx() operations (this will leave out get_recordxxxx_sql() ones, but seems easier and safer).
            • b) support it everywhere, so each time we get one === '' in param, it's converted always to 1whitespace, without introspecting into metadata at all, I think this is more risky, but if covered properly by tests, could work ok too IMO.

            I think any of 1), 2a) and 2b) are ok to get this fixed, just need to agree about the best one.

            Ciao

            Show
            Eloy Lafuente (stronk7) added a comment - Oki, for Oracle: We only perform the automatic empty => 1whitespace (the dirty hack) automatically on insert/update/set_field operations. But never one get_recordxxxx ones. Why? Because due to the how those methods are chained, at the final one - get_recordset_sql() - we don't have easy access to the tablename, hence the hack does not transform anything. So there are 2 solutions for this: 1) Or $component = $DB->sql_empty() is used by the caller. Like other parts of Moodle are doing now. Really it's not usual. 2) Or we try to support the dirty hack in more operations, there are 2 possibilities: a) support it for 1-table get_recordxxxx() operations (this will leave out get_recordxxxx_sql() ones, but seems easier and safer). b) support it everywhere, so each time we get one === '' in param, it's converted always to 1whitespace, without introspecting into metadata at all, I think this is more risky, but if covered properly by tests, could work ok too IMO. I think any of 1), 2a) and 2b) are ok to get this fixed, just need to agree about the best one. Ciao
            Hide
            Petr Skoda added a comment -

            wow! I thought the get_record() does the '' hackery, +10 for 2a)

            Show
            Petr Skoda added a comment - wow! I thought the get_record() does the '' hackery, +10 for 2a)
            Hide
            Eloy Lafuente (stronk7) added a comment -

            Solution 1), working here under the 4 DBs (fulltestaccesslib.php passes 100%) is:

            diff --git a/lib/accesslib.php b/lib/accesslib.php
            index 8c4ecd1..bf0616f 100644
            --- a/lib/accesslib.php
            +++ b/lib/accesslib.php
            @@ -1605,6 +1605,7 @@ function role_assign($roleid, $userid, $contextid, $component = '', $itemid = 0,
                 }
             
             /// Check for existing entry
            +    $component = $component === '' ? $DB->sql_empty() : $component;
                 $ras = $DB->get_records('role_assignments', array('roleid'=>$roleid, 'contextid'=>$context->id, 'userid'=>$userid, 'component'=>$component, 'itemid'=>$itemid), 'id');
             
                 if ($ras) {
            @@ -1717,6 +1718,9 @@ function role_unassign_all(array $params, $subcontexts = false, $includemanual =
                     }
                 }
             
            +    if (isset($params['component'])) {
            +        $params['component'] = $params['component'] === '' ? $DB->sql_empty() : $params['component'];
            +    }
                 $ras = $DB->get_records('role_assignments', $params);
                 foreach($ras as $ra) {
                     $DB->delete_records('role_assignments', array('id'=>$ra->id));
            

            Ciao

            Show
            Eloy Lafuente (stronk7) added a comment - Solution 1), working here under the 4 DBs (fulltestaccesslib.php passes 100%) is: diff --git a/lib/accesslib.php b/lib/accesslib.php index 8c4ecd1..bf0616f 100644 --- a/lib/accesslib.php +++ b/lib/accesslib.php @@ -1605,6 +1605,7 @@ function role_assign($roleid, $userid, $contextid, $component = '', $itemid = 0, } /// Check for existing entry + $component = $component === '' ? $DB->sql_empty() : $component; $ras = $DB->get_records('role_assignments', array('roleid'=>$roleid, 'contextid'=>$context->id, 'userid'=>$userid, 'component'=>$component, 'itemid'=>$itemid), 'id'); if ($ras) { @@ -1717,6 +1718,9 @@ function role_unassign_all(array $params, $subcontexts = false, $includemanual = } } + if (isset($params['component'])) { + $params['component'] = $params['component'] === '' ? $DB->sql_empty() : $params['component']; + } $ras = $DB->get_records('role_assignments', $params); foreach($ras as $ra) { $DB->delete_records('role_assignments', array('id'=>$ra->id)); Ciao
            Hide
            Himmelbauer added a comment -

            The new query boosted moodle's performance. At least nobody complained about too slow logins anymore.

            Show
            Himmelbauer added a comment - The new query boosted moodle's performance. At least nobody complained about too slow logins anymore.
            Hide
            Eloy Lafuente (stronk7) added a comment -

            Thanks Himmelbauer!

            Petr, so is it ok for you to add the small changes above, so we get it upstream now.

            And later, with calm, we implement/test the bindings/hack problem in Oracle (MDL-29765). ?

            Show
            Eloy Lafuente (stronk7) added a comment - Thanks Himmelbauer! Petr, so is it ok for you to add the small changes above, so we get it upstream now. And later, with calm, we implement/test the bindings/hack problem in Oracle ( MDL-29765 ). ?
            Hide
            Petr Skoda added a comment -

            I am sorry, I have to work on something else right now, feel free to add any oracle hack you want, ciao.

            Show
            Petr Skoda added a comment - I am sorry, I have to work on something else right now, feel free to add any oracle hack you want, ciao.
            Hide
            Eloy Lafuente (stronk7) added a comment -

            This has been integrated, thanks!

            Note: I've added extra commit with the 2 sql_empty() uses needed to make it work under Oracle, with TODO pointing to MDL-29765.

            To testers: I'm running the fullxxx unit tests against the 4 DBs right now. Will inform here about results.

            Show
            Eloy Lafuente (stronk7) added a comment - This has been integrated, thanks! Note: I've added extra commit with the 2 sql_empty() uses needed to make it work under Oracle, with TODO pointing to MDL-29765 . To testers: I'm running the fullxxx unit tests against the 4 DBs right now. Will inform here about results.
            Hide
            Petr Skoda added a comment -

            Eloy: thanks a lot!

            Show
            Petr Skoda added a comment - Eloy: thanks a lot!
            Hide
            Eloy Lafuente (stronk7) added a comment -

            Just finished running the fullxxx unit tests. They have passed under all DBs, so no need to run them again.

            Just guessing if Jordan may time-compare results before and after into some testing machine having a well-known big site Although the site is not really big, from a number of contexts POV.

            Else, I think that having unit test passing and feedback from Himmelbauer may be enough, for your consideration, dear tester.

            Ciao

            Show
            Eloy Lafuente (stronk7) added a comment - Just finished running the fullxxx unit tests. They have passed under all DBs, so no need to run them again. Just guessing if Jordan may time-compare results before and after into some testing machine having a well-known big site Although the site is not really big, from a number of contexts POV. Else, I think that having unit test passing and feedback from Himmelbauer may be enough, for your consideration, dear tester. Ciao
            Hide
            Aparup Banerjee added a comment - - edited

            Jordan and i are getting no diff when running the entire unit test under lib/simpletest/fulltestaccesslib.php in his local copy of moodle.org (roughly 2mins50secs+)

            the 'bad' query itself from the description is too fast! 5 secs on the moodleorg copy that i accessed so i don't know how to better see performance improvements than others have already comment in here about.

            anyway passing this as unit tests are running fine for us too.

            Show
            Aparup Banerjee added a comment - - edited Jordan and i are getting no diff when running the entire unit test under lib/simpletest/fulltestaccesslib.php in his local copy of moodle.org (roughly 2mins50secs+) the 'bad' query itself from the description is too fast! 5 secs on the moodleorg copy that i accessed so i don't know how to better see performance improvements than others have already comment in here about. anyway passing this as unit tests are running fine for us too.
            Hide
            Marcelo Cohen added a comment -

            People, just to add another confirmed case to the discussion: at our university we're migrating from 1.9 to 2.2 and have also experienced a big slowdown on CERTAIN logins (not everyone - around 22 seconds). This is a Moodle site with a database of 60K users and 5K courses. Note: we noted the SAME slowdown both in MySQL and PostgreSQL.

            Another issue which probably is related: the slowdown also happened when viewing the profiles of the affected users. EVERY time.

            We have applied Petr's fix and it's back to normal: ~1s login time for everyone.

            Show
            Marcelo Cohen added a comment - People, just to add another confirmed case to the discussion: at our university we're migrating from 1.9 to 2.2 and have also experienced a big slowdown on CERTAIN logins (not everyone - around 22 seconds). This is a Moodle site with a database of 60K users and 5K courses. Note: we noted the SAME slowdown both in MySQL and PostgreSQL. Another issue which probably is related: the slowdown also happened when viewing the profiles of the affected users. EVERY time. We have applied Petr's fix and it's back to normal: ~1s login time for everyone.
            Hide
            Eloy Lafuente (stronk7) added a comment -

            Whoever decided one week was worth 14 days had really one bad idea. Anyway, the nightmare is over, so thanks for your, once again, amazing contributions. Many, many thanks!

            Now... disconnect, relax and enjoy the next days, yay!

            Closing...ciao

            Show
            Eloy Lafuente (stronk7) added a comment - Whoever decided one week was worth 14 days had really one bad idea. Anyway, the nightmare is over, so thanks for your, once again, amazing contributions. Many, many thanks! Now... disconnect, relax and enjoy the next days, yay! Closing...ciao

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: