Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-55776

course_enrolment: get_users serious performance implications

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.0.5, 3.1, 3.2
    • Fix Version/s: 3.0.7, 3.1.3
    • Component/s: Enrolments
    • Labels:
    • Testing Instructions:
      Hide

      Run corresponding unit tests included in this patch.

      To do the manual testing, create the course containing at least one group and have at least one user that is in the group and at least one user that is not in any groups. Then navigate to course enrolment interface and:

      • Make sure that all enrolled users are listed.
      • Apply "no groups" filter and make sure that only users not in any groups are listed.
      • Apply some group filter and make sure that only users in the corresponding group are listed.

      The good idea might be to use a test course generator for above manual testing to replicate scenario when there are many enrolled users in each group. Also testing before applying the patch and after to notice any performance changes could be useful as well.

      Show
      Run corresponding unit tests included in this patch. To do the manual testing, create the course containing at least one group and have at least one user that is in the group and at least one user that is not in any groups. Then navigate to course enrolment interface and: Make sure that all enrolled users are listed. Apply "no groups" filter and make sure that only users not in any groups are listed. Apply some group filter and make sure that only users in the corresponding group are listed. The good idea might be to use a test course generator for above manual testing to replicate scenario when there are many enrolled users in each group. Also testing before applying the patch and after to notice any performance changes could be useful as well.
    • Affected Branches:
      MOODLE_30_STABLE, MOODLE_31_STABLE, MOODLE_32_STABLE
    • Fixed Branches:
      MOODLE_30_STABLE, MOODLE_31_STABLE
    • Pull Master Branch:
      MDL-55776-master

      Description

      tl;dr: Here at Lancaster University we have some big courses with over 100 groups, which allowed us to identify slow course enrolments page loading issue and narrow it down to the performance of particular functions, namely get_users and get_total_users, both of which are sharing the same implementation of "expensive" sub-query designed for user groups membership filtering. I believe this is a regression introduced in MDL-51341 and MDL-51923, so versions of 3.0 onward are affected. We have not noticed it earlier because we skip one version in our deployment policy (upgrade from 2.9 to 3.1 this summer).

      Testing method

      I have put together a simplified version of the query used in the function to demonstrate the issue and possible solutions. The query I will be using in the examples is:

      SELECT u.id, u.username, count(gm.groupid) FROM mdl_user u
      JOIN mdl_user_enrolments ue ON (ue.userid = u.id)
      JOIN mdl_enrol e ON (e.id = ue.enrolid)
      LEFT JOIN mdl_groups_members gm ON u.id = gm.userid AND gm.groupid IN (
          SELECT g.id FROM mdl_groups g WHERE g.courseid = e.courseid
          )
      WHERE e.courseid = 7268 GROUP BY u.id, u.username ORDER BY u.id;
      

      Effectively, this query features the same method of user group membership filtering as the original query in the function. I replaced DISTINCT with GROUP BY here only to output the number of groups the user is enrolled into, and compare this column values in different query modifications to make sure user groups filtering remain consistent. You may remove GROUP BY if you wish (along with count in select statement) and add DISTINCT when replicating my tests, but you will see the same result, both in terms of output and relative performance of given query model.

      The course is in our production database running PostgreSQL 9.4 on Debian Jessie, it has 67 enrolled users, 57 groups, most users enrolled to more than 25 groups at the same time, so total number of mdl_groups_members for all groups in this course is 1133. This is not our largest course, but it is large enough and has a reach group structure to show the difference.

      Analysis

      Now, the most interesting part. We will run EXPLAIN ANALYSE for this query:

      EXPLAIN ANALYSE SELECT u.id, u.username, count(gm.groupid) FROM mdl_user u
      JOIN mdl_user_enrolments ue ON (ue.userid = u.id)
      JOIN mdl_enrol e ON (e.id = ue.enrolid)
      LEFT JOIN mdl_groups_members gm ON u.id = gm.userid AND gm.groupid IN (
      SELECT g.id FROM mdl_groups g WHERE g.courseid = e.courseid)
      WHERE e.courseid = 7268 GROUP BY u.id, u.username  ORDER BY u.id;
       
      QUERY PLAN                                                                                          
      ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
       GroupAggregate  (cost=539115.20..539164.34 rows=2457 width=39) (actual time=5056.833..5057.724 rows=67 loops=1)
         Group Key: u.id, u.username
         ->  Sort  (cost=539115.20..539121.34 rows=2457 width=39) (actual time=5056.825..5057.206 rows=1159 loops=1)
               Sort Key: u.id, u.username
               Sort Method: quicksort  Memory: 143kB
               ->  Hash Right Join  (cost=91.92..538976.83 rows=2457 width=39) (actual time=0.691..5055.025 rows=1159 loops=1)
                     Hash Cond: (gm.userid = u.id)
                     Join Filter: (SubPlan 1)
                     Rows Removed by Join Filter: 24359
                     ->  Seq Scan on mdl_groups_members gm  (cost=0.00..101576.67 rows=4811067 width=16) (actual time=0.004..1678.541 rows=4742639 loops=1)
                     ->  Hash  (cost=90.93..90.93 rows=79 width=39) (actual time=0.594..0.594 rows=68 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 5kB
                           ->  Nested Loop  (cost=1.13..90.93 rows=79 width=39) (actual time=0.051..0.554 rows=68 loops=1)
                                 ->  Nested Loop  (cost=0.71..53.08 rows=79 width=16) (actual time=0.043..0.159 rows=68 loops=1)
                                       ->  Index Scan using mdl_enro_cou_ix on mdl_enrol e  (cost=0.29..19.14 rows=4 width=16) (actual time=0.019..0.026 rows=4 loops=1)
                                             Index Cond: (courseid = 7268)
                                       ->  Index Only Scan using mdl_userenro_enruse_uix on mdl_user_enrolments ue  (cost=0.42..7.46 rows=102 width=16) (actual time=0.005..0.020 rows=17 loops=4)
                                             Index Cond: (enrolid = e.id)
                                             Heap Fetches: 23
                                 ->  Index Scan using mdl_user_id_pk on mdl_user u  (cost=0.42..0.47 rows=1 width=31) (actual time=0.004..0.004 rows=1 loops=68)
                                       Index Cond: (id = ue.userid)
                     SubPlan 1
                       ->  Bitmap Heap Scan on mdl_groups g  (cost=4.75..165.80 rows=42 width=8) (actual time=0.007..0.038 rows=56 loops=25501)
                             Recheck Cond: (courseid = e.courseid)
                             Heap Blocks: exact=695219
                             ->  Bitmap Index Scan on mdl_grou_cou_ix  (cost=0.00..4.74 rows=42 width=0) (actual time=0.005..0.005 rows=57 loops=25501)
                                   Index Cond: (courseid = e.courseid)
       Planning time: 0.776 ms
       Execution time: 5057.870 ms
      

      5 seconds, not a big deal, right? Notice that SubPlan 1 and number of loops it is running comparing with other parts.

      Let's see how this query worked before MDL-51341. Yes, it provides inaccurate results in terms of group membership filter which was fixed in MDL-51341, but it gives a good indication of speed of the same query in Moodle 2.9:

      EXPLAIN ANALYSE SELECT u.id, u.username, count(gm.groupid) FROM mdl_user u
      JOIN mdl_user_enrolments ue ON (ue.userid = u.id)
      JOIN mdl_enrol e ON (e.id = ue.enrolid)
      LEFT JOIN mdl_groups_members gm ON u.id = gm.userid
      WHERE e.courseid = 7268 GROUP BY u.id, u.username  ORDER BY u.id;
       
      QUERY PLAN                                                                                       
      ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
       GroupAggregate  (cost=1418.87..1517.13 rows=4913 width=39) (actual time=130.018..148.550 rows=67 loops=1)
         Group Key: u.id, u.username
         ->  Sort  (cost=1418.87..1431.16 rows=4913 width=39) (actual time=129.532..137.815 rows=25503 loops=1)
               Sort Key: u.id, u.username
               Sort Method: quicksort  Memory: 2816kB
               ->  Nested Loop Left Join  (cost=1.56..1117.65 rows=4913 width=39) (actual time=0.041..44.157 rows=25503 loops=1)
                     ->  Nested Loop  (cost=1.13..90.93 rows=79 width=31) (actual time=0.032..0.665 rows=68 loops=1)
                           ->  Nested Loop  (cost=0.71..53.08 rows=79 width=8) (actual time=0.025..0.174 rows=68 loops=1)
                                 ->  Index Scan using mdl_enro_cou_ix on mdl_enrol e  (cost=0.29..19.14 rows=4 width=8) (actual time=0.007..0.014 rows=4 loops=1)
                                       Index Cond: (courseid = 7268)
                                 ->  Index Only Scan using mdl_userenro_enruse_uix on mdl_user_enrolments ue  (cost=0.42..7.46 rows=102 width=16) (actual time=0.005..0.024 rows=17 loops=4)
                                       Index Cond: (enrolid = e.id)
                                       Heap Fetches: 23
                           ->  Index Scan using mdl_user_id_pk on mdl_user u  (cost=0.42..0.47 rows=1 width=31) (actual time=0.004..0.005 rows=1 loops=68)
                                 Index Cond: (id = ue.userid)
                     ->  Index Scan using mdl_groumemb_use_ix on mdl_groups_members gm  (cost=0.43..9.75 rows=325 width=16) (actual time=0.007..0.353 rows=375 loops=68)
                           Index Cond: (u.id = userid)
       Planning time: 0.517 ms
       Execution time: 148.826 ms
      

      0.15 seconds is clear difference. You can see the implications of sub-query.

      Discussion

      What can be optimised to make the existing query faster? The most immediate thing I spotted is sub-query, where "WHERE" condition is referenced to parent query (g.courseid = e.courseid), this is not optimal. As far as I can see, we are running get_users function for a single course as its class suggests. Given this advantage, it is possible explicitly say which course id we need (using $this->course->id for populating query parameter) and save SQL some computing power waste. Let's see if this will make a difference:

      EXPLAIN ANALYSE SELECT u.id, u.username, count(gm.groupid) FROM mdl_user u
      JOIN mdl_user_enrolments ue ON (ue.userid = u.id)
      JOIN mdl_enrol e ON (e.id = ue.enrolid)
      LEFT JOIN mdl_groups_members gm ON u.id = gm.userid AND gm.groupid IN (
      SELECT g.id FROM mdl_groups g WHERE g.courseid = 7268)
      WHERE e.courseid = 7268 GROUP BY u.id, u.username  ORDER BY u.id;
       
      QUERY PLAN                                                                                       
      ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
       GroupAggregate  (cost=1030.14..1031.72 rows=79 width=39) (actual time=37.699..38.583 rows=67 loops=1)
         Group Key: u.id, u.username
         ->  Sort  (cost=1030.14..1030.34 rows=79 width=39) (actual time=37.690..38.056 rows=1159 loops=1)
               Sort Key: u.id, u.username
               Sort Method: quicksort  Memory: 143kB
               ->  Nested Loop Left Join  (cost=133.86..1027.65 rows=79 width=39) (actual time=0.945..33.987 rows=1159 loops=1)
                     ->  Nested Loop  (cost=1.13..90.93 rows=79 width=31) (actual time=0.032..0.608 rows=68 loops=1)
                           ->  Nested Loop  (cost=0.71..53.08 rows=79 width=8) (actual time=0.026..0.152 rows=68 loops=1)
                                 ->  Index Scan using mdl_enro_cou_ix on mdl_enrol e  (cost=0.29..19.14 rows=4 width=8) (actual time=0.007..0.013 rows=4 loops=1)
                                       Index Cond: (courseid = 7268)
                                 ->  Index Only Scan using mdl_userenro_enruse_uix on mdl_user_enrolments ue  (cost=0.42..7.46 rows=102 width=16) (actual time=0.006..0.023 rows=17 loops=4)
                                       Index Cond: (enrolid = e.id)
                                       Heap Fetches: 23
                           ->  Index Scan using mdl_user_id_pk on mdl_user u  (cost=0.42..0.47 rows=1 width=31) (actual time=0.004..0.004 rows=1 loops=68)
                                 Index Cond: (id = ue.userid)
                     ->  Hash Semi Join  (cost=132.73..142.90 rows=1 width=16) (actual time=0.169..0.478 rows=17 loops=68)
                           Hash Cond: (gm.groupid = g.id)
                           ->  Index Scan using mdl_groumemb_use_ix on mdl_groups_members gm  (cost=0.43..9.75 rows=325 width=16) (actual time=0.007..0.322 rows=375 loops=68)
                                 Index Cond: (u.id = userid)
                           ->  Hash  (cost=131.89..131.89 rows=33 width=8) (actual time=0.112..0.112 rows=57 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 3kB
                                 ->  Bitmap Heap Scan on mdl_groups g  (cost=4.68..131.89 rows=33 width=8) (actual time=0.019..0.082 rows=57 loops=1)
                                       Recheck Cond: (courseid = 7268)
                                       Heap Blocks: exact=28
                                       ->  Bitmap Index Scan on mdl_grou_cou_ix  (cost=0.00..4.67 rows=33 width=0) (actual time=0.014..0.014 rows=57 loops=1)
                                             Index Cond: (courseid = 7268)
       Planning time: 0.702 ms
       Execution time: 38.673 ms
      

      38 milliseconds - even faster than this query run in 2.9. But notice the explain output - sub-plan has disappeared. Knowing course id condition beforehand does not require to execute sub-query for every row of enrolment table, with expectation of e.courseid to change. We know it will stay the same, so we use it.

      More optimisation? Let's see if it is possible to avoid sub-query and use joins instead. Joining tables in a sequence will not give us the accurate result-set unfortunately, this is where developers like me start thinking about sub-queries. But wait - can we make some joins before the others in non-sequential order, this functionality makes code less clear to read though:

      EXPLAIN ANALYSE SELECT u.id, u.username, count(gm.groupid) FROM mdl_user u
      JOIN mdl_user_enrolments ue ON (ue.userid = u.id)
      JOIN mdl_enrol e ON (e.id = ue.enrolid)
      LEFT JOIN (mdl_groups_members gm JOIN mdl_groups g ON (g.id = gm.groupid AND g.courseid = 7268)) ON u.id = gm.userid
      WHERE e.courseid = 7268 GROUP BY u.id, u.username ORDER BY u.id;
       
       QUERY PLAN                                                                                       
      ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
       GroupAggregate  (cost=1059.78..1061.36 rows=79 width=39) (actual time=38.876..39.780 rows=67 loops=1)
         Group Key: u.id, u.username
         ->  Sort  (cost=1059.78..1059.97 rows=79 width=39) (actual time=38.867..39.256 rows=1159 loops=1)
               Sort Key: u.id, u.username
               Sort Method: quicksort  Memory: 143kB
               ->  Nested Loop Left Join  (cost=133.86..1057.29 rows=79 width=39) (actual time=0.949..35.387 rows=1159 loops=1)
                     ->  Nested Loop  (cost=1.13..90.93 rows=79 width=31) (actual time=0.035..0.631 rows=68 loops=1)
                           ->  Nested Loop  (cost=0.71..53.08 rows=79 width=8) (actual time=0.030..0.161 rows=68 loops=1)
                                 ->  Index Scan using mdl_enro_cou_ix on mdl_enrol e  (cost=0.29..19.14 rows=4 width=8) (actual time=0.008..0.013 rows=4 loops=1)
                                       Index Cond: (courseid = 7268)
                                 ->  Index Only Scan using mdl_userenro_enruse_uix on mdl_user_enrolments ue  (cost=0.42..7.46 rows=102 width=16) (actual time=0.006..0.023 rows=17 loops=4)
                                       Index Cond: (enrolid = e.id)
                                       Heap Fetches: 23
                           ->  Index Scan using mdl_user_id_pk on mdl_user u  (cost=0.42..0.47 rows=1 width=31) (actual time=0.004..0.005 rows=1 loops=68)
                                 Index Cond: (id = ue.userid)
                     ->  Hash Join  (cost=132.73..143.27 rows=1 width=16) (actual time=0.175..0.499 rows=17 loops=68)
                           Hash Cond: (gm.groupid = g.id)
                           ->  Index Scan using mdl_groumemb_use_ix on mdl_groups_members gm  (cost=0.43..9.75 rows=325 width=16) (actual time=0.008..0.337 rows=375 loops=68)
                                 Index Cond: (u.id = userid)
                           ->  Hash  (cost=131.89..131.89 rows=33 width=8) (actual time=0.125..0.125 rows=57 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 3kB
                                 ->  Bitmap Heap Scan on mdl_groups g  (cost=4.68..131.89 rows=33 width=8) (actual time=0.022..0.091 rows=57 loops=1)
                                       Recheck Cond: (courseid = 7268)
                                       Heap Blocks: exact=28
                                       ->  Bitmap Index Scan on mdl_grou_cou_ix  (cost=0.00..4.67 rows=33 width=0) (actual time=0.018..0.018 rows=57 loops=1)
                                             Index Cond: (courseid = 7268)
       Planning time: 0.750 ms
       Execution time: 39.877 ms
      

      It seems Postgres is interpreting this query as the same (as previous one) from the query plan internals, there is no performance gain to use this form of query.

      Something can be done with the logic though. Notice that group filtering is only work when $this->groupfilter is in place to create corresponding conditions. If no filtering is needed, there is no need to include any reference to groups in the query, and the performance of the same query will be astonishing:

      EXPLAIN ANALYSE SELECT u.id, u.username FROM mdl_user u
      JOIN mdl_user_enrolments ue ON (ue.userid = u.id)
      JOIN mdl_enrol e ON (e.id = ue.enrolid)
      WHERE e.courseid = 7268 GROUP BY u.id, u.username  ORDER BY u.id;
       
      QUERY PLAN                                                                                    
      ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
       Group  (cost=93.42..94.01 rows=79 width=31) (actual time=0.598..0.672 rows=67 loops=1)
         Group Key: u.id, u.username
         ->  Sort  (cost=93.42..93.62 rows=79 width=31) (actual time=0.596..0.618 rows=68 loops=1)
               Sort Key: u.id, u.username
               Sort Method: quicksort  Memory: 29kB
               ->  Nested Loop  (cost=1.13..90.93 rows=79 width=31) (actual time=0.035..0.474 rows=68 loops=1)
                     ->  Nested Loop  (cost=0.71..53.08 rows=79 width=8) (actual time=0.029..0.131 rows=68 loops=1)
                           ->  Index Scan using mdl_enro_cou_ix on mdl_enrol e  (cost=0.29..19.14 rows=4 width=8) (actual time=0.008..0.012 rows=4 loops=1)
                                 Index Cond: (courseid = 7268)
                           ->  Index Only Scan using mdl_userenro_enruse_uix on mdl_user_enrolments ue  (cost=0.42..7.46 rows=102 width=16) (actual time=0.005..0.018 rows=17 loops=4)
                                 Index Cond: (enrolid = e.id)
                                 Heap Fetches: 23
                     ->  Index Scan using mdl_user_id_pk on mdl_user u  (cost=0.42..0.47 rows=1 width=31) (actual time=0.003..0.004 rows=1 loops=68)
                           Index Cond: (id = ue.userid)
       Planning time: 0.478 ms
       Execution time: 0.744 ms
      

      Less than 1 millisecond!

      I am not proposing the simple patch just now. Would be good to have some feedback, perhaps we can optimise it even more?

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                1 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Fix Release Date:
                  14/Nov/16