Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-34776

Some enrolment and role related queries are slow

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.3.1, 2.4
    • Fix Version/s: 2.3.2
    • Component/s: Enrolments
    • Labels:
    • Testing Instructions:
      Hide

      1/ try assigning roles for enrolled users bellow the course level ("locally assigned roles" in module settings block)
      2/ try "normal manual" user enrolling
      3/ try "normal" assigning of roles to not-enrolled users ("Other users" in course settings block)
      4/ disable JS and repeat user ernolment via the old two pane widget
      5/ disable JS and repeat other user role assignment via the old two pane widget
      6/ try to add/remove enrolled users from a group

      Show
      1/ try assigning roles for enrolled users bellow the course level ("locally assigned roles" in module settings block) 2/ try "normal manual" user enrolling 3/ try "normal" assigning of roles to not-enrolled users ("Other users" in course settings block) 4/ disable JS and repeat user ernolment via the old two pane widget 5/ disable JS and repeat other user role assignment via the old two pane widget 6/ try to add/remove enrolled users from a group
    • Affected Branches:
      MOODLE_23_STABLE, MOODLE_24_STABLE
    • Fixed Branches:
      MOODLE_23_STABLE
    • Pull from Repository:
    • Pull Master Branch:
      w34_MDL-34776_m24_notinsql

      Description

      I have a moodle with 600,000 users enrolled and this query is never completing when simply trying to enrol one user to a course using postgres.

      SELECT COUNT(1) FROM mdl_user u
      WHERE id <> $1 AND u.deleted = 0 AND u.confirmed = 1
      AND u.id NOT IN (SELECT ue.userid
      FROM mdl_user_enrolments ue
      JOIN mdl_enrol e ON (e.id = ue.enrolid AND e.id = $2))

      I'm wondering if we can avoid the subselect somehow, something like:

      SELECT COUNT(1) FROM mdl_user u 
      LEFT OUTER join mdl_user_enrolments ue on u.id = ue.userid and ue.enrolid = $2 
      WHERE ue IS NULL AND u.id <> $1 AND u.deleted = 0 AND u.confirmed = 1;

        Gliffy Diagrams

          Attachments

            Activity

            Hide
            poltawski Dan Poltawski added a comment - - edited

            Old query - cost *11058679686.88*!!

            integration=# explain SELECT COUNT(1) FROM mdl_user u
            WHERE id <> 1 AND u.deleted = 0 AND u.confirmed = 1
            AND u.id NOT IN (SELECT ue.userid
            FROM mdl_user_enrolments ue
            JOIN mdl_enrol e ON (e.id = ue.enrolid AND e.id = 1))
            ;
                                                             QUERY PLAN                                                 
            ------------------------------------------------------------------------------------------------------------
             Aggregate  (cost=11058679686.88..11058679686.89 rows=1 width=0)
               ->  Seq Scan on mdl_user u  (cost=0.00..11058678821.84 rows=346016 width=0)
                     Filter: ((id <> 1) AND (deleted = 0) AND (confirmed = 1) AND (NOT (SubPlan 1)))
                     SubPlan 1
                       ->  Materialize  (cost=0.00..30232.83 rows=690784 width=8)
                             ->  Nested Loop  (cost=0.00..24079.91 rows=690784 width=8)
                                   ->  Index Scan using mdl_enro_id_pk on mdl_enrol e  (cost=0.00..8.27 rows=1 width=8)
                                         Index Cond: (id = 1)
                                   ->  Seq Scan on mdl_user_enrolments ue  (cost=0.00..17163.80 rows=690784 width=16)
                                         Filter: (enrolid = 1)
            (10 rows)

            New query cost *129018.94*:

            integration=# explain ANALYZE SELECT COUNT(1) FROM mdl_user u 
            LEFT OUTER join mdl_user_enrolments ue on u.id = ue.userid and ue.enrolid = 1 
            WHERE ue IS NULL AND u.id <> 1 AND u.deleted = 0 AND u.confirmed = 1;
                                                                                 QUERY PLAN                                                                     
            ----------------------------------------------------------------------------------------------------------------------------------------------------
             Aggregate  (cost=129018.94..129018.95 rows=1 width=0) (actual time=3010.499..3010.499 rows=1 loops=1)
               ->  Hash Left Join  (cost=36592.60..129010.29 rows=3460 width=0) (actual time=2163.198..3010.492 rows=2 loops=1)
                     Hash Cond: (u.id = ue.userid)
                     Filter: (ue.* IS NULL)
                     ->  Seq Scan on mdl_user u  (cost=0.00..63252.58 rows=692032 width=8) (actual time=0.008..590.134 rows=690786 loops=1)
                           Filter: ((id <> 1) AND (deleted = 0) AND (confirmed = 1))
                     ->  Hash  (cost=17163.80..17163.80 rows=690784 width=104) (actual time=974.618..974.618 rows=690784 loops=1)
                           Buckets: 1024  Batches: 128  Memory Usage: 729kB
                           ->  Seq Scan on mdl_user_enrolments ue  (cost=0.00..17163.80 rows=690784 width=104) (actual time=0.016..351.870 rows=690784 loops=1)
                                 Filter: (enrolid = 1)
             Total runtime: 3010.555 ms

            Show
            poltawski Dan Poltawski added a comment - - edited Old query - cost * 11058679686.88 *!! integration=# explain SELECT COUNT(1) FROM mdl_user u WHERE id <> 1 AND u.deleted = 0 AND u.confirmed = 1 AND u.id NOT IN (SELECT ue.userid FROM mdl_user_enrolments ue JOIN mdl_enrol e ON (e.id = ue.enrolid AND e.id = 1)) ; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Aggregate (cost=11058679686.88..11058679686.89 rows=1 width=0) -> Seq Scan on mdl_user u (cost=0.00..11058678821.84 rows=346016 width=0) Filter: ((id <> 1) AND (deleted = 0) AND (confirmed = 1) AND (NOT (SubPlan 1))) SubPlan 1 -> Materialize (cost=0.00..30232.83 rows=690784 width=8) -> Nested Loop (cost=0.00..24079.91 rows=690784 width=8) -> Index Scan using mdl_enro_id_pk on mdl_enrol e (cost=0.00..8.27 rows=1 width=8) Index Cond: (id = 1) -> Seq Scan on mdl_user_enrolments ue (cost=0.00..17163.80 rows=690784 width=16) Filter: (enrolid = 1) (10 rows) New query cost * 129018.94 *: integration=# explain ANALYZE SELECT COUNT(1) FROM mdl_user u LEFT OUTER join mdl_user_enrolments ue on u.id = ue.userid and ue.enrolid = 1 WHERE ue IS NULL AND u.id <> 1 AND u.deleted = 0 AND u.confirmed = 1; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=129018.94..129018.95 rows=1 width=0) (actual time=3010.499..3010.499 rows=1 loops=1) -> Hash Left Join (cost=36592.60..129010.29 rows=3460 width=0) (actual time=2163.198..3010.492 rows=2 loops=1) Hash Cond: (u.id = ue.userid) Filter: (ue.* IS NULL) -> Seq Scan on mdl_user u (cost=0.00..63252.58 rows=692032 width=8) (actual time=0.008..590.134 rows=690786 loops=1) Filter: ((id <> 1) AND (deleted = 0) AND (confirmed = 1)) -> Hash (cost=17163.80..17163.80 rows=690784 width=104) (actual time=974.618..974.618 rows=690784 loops=1) Buckets: 1024 Batches: 128 Memory Usage: 729kB -> Seq Scan on mdl_user_enrolments ue (cost=0.00..17163.80 rows=690784 width=104) (actual time=0.016..351.870 rows=690784 loops=1) Filter: (enrolid = 1) Total runtime: 3010.555 ms
            Hide
            poltawski Dan Poltawski added a comment -

            The current query I gave up on timing after 20mins. As you can see the new query takes 3000ms.

            NOTE: I haven't tested the correctness of the query - just proving the concept.

            Show
            poltawski Dan Poltawski added a comment - The current query I gave up on timing after 20mins. As you can see the new query takes 3000ms. NOTE: I haven't tested the correctness of the query - just proving the concept.
            Hide
            skodak Petr Skoda added a comment -

            I have found a few places that use "NOT IN" where "LEFT JOIN" should be faster, thanks for the report.

            Show
            skodak Petr Skoda added a comment - I have found a few places that use "NOT IN" where "LEFT JOIN" should be faster, thanks for the report.
            Hide
            nebgor Aparup Banerjee added a comment -

            i was wondering about indexes needed here (if any).

            Show
            nebgor Aparup Banerjee added a comment - i was wondering about indexes needed here (if any).
            Hide
            poltawski Dan Poltawski added a comment -

            Well, we hav one on enrolid, userid which seems sensible. I'm a bit rusty on explain output and the query I was looking at is not one you do every page load so I don't think we necessarily should overoptimisme it.

            If it completeles in a few seconds rather than > 20mins then thats a win.

            Show
            poltawski Dan Poltawski added a comment - Well, we hav one on enrolid, userid which seems sensible. I'm a bit rusty on explain output and the query I was looking at is not one you do every page load so I don't think we necessarily should overoptimisme it. If it completeles in a few seconds rather than > 20mins then thats a win.
            Hide
            stronk7 Eloy Lafuente (stronk7) added a comment -

            The integration of this issue has been delayed to next week because the integration period is over (Monday, Tuesday) and testing must happen on Wednesday.

            This change to a more rigid timeframe on each integration/testing cycle aims to produce a better and clear separation and organization of tasks for everybody.

            This is a bulk-automated message, so if you want to blame somebody/thing/where, don't do it here (use git instead) :-D :-P

            Apologizes for the inconvenient, this will be integrated next week. Thanks for your collaboration & ciao

            Show
            stronk7 Eloy Lafuente (stronk7) added a comment - The integration of this issue has been delayed to next week because the integration period is over (Monday, Tuesday) and testing must happen on Wednesday. This change to a more rigid timeframe on each integration/testing cycle aims to produce a better and clear separation and organization of tasks for everybody. This is a bulk-automated message, so if you want to blame somebody/thing/where, don't do it here (use git instead) :-D :-P Apologizes for the inconvenient, this will be integrated next week. Thanks for your collaboration & ciao
            Hide
            stronk7 Eloy Lafuente (stronk7) added a comment -

            Integrated, thanks! (23 & master)

            Show
            stronk7 Eloy Lafuente (stronk7) added a comment - Integrated, thanks! (23 & master)
            Hide
            andyjdavis Andrew Davis added a comment -

            It seems to be working nicely. Passing.

            Show
            andyjdavis Andrew Davis added a comment - It seems to be working nicely. Passing.
            Hide
            stronk7 Eloy Lafuente (stronk7) added a comment -

            YEAR!*

            CAF*, TOT!*

            • Your effort amazingly resulted. (unbelievable :-P)
            • Closing as fixed.
            • Tons of thanks.
            Show
            stronk7 Eloy Lafuente (stronk7) added a comment - YEAR!* CAF*, TOT!* Your effort amazingly resulted. (unbelievable :-P) Closing as fixed. Tons of thanks.

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Fix Release Date:
                  10/Sep/12