Moodle
  1. Moodle
  2. MDL-32340

Extremely slow query in calendar for large sites

    Details

    • Testing Instructions:
      Hide

      Set the calendar_adminseesall config setting to true
      Visit a calendar page
      Ensure that all entries are correctly shown

      Run the old query and the new query to ensure that the same results are always returned. For convenience, I've pasted them below:

      Old code:

      SELECT DISTINCT
          c.* , ctx.id AS ctxid, ctx.path AS ctxpath, ctx.depth AS ctxdepth, ctx.contextlevel AS ctxlevel, ctx.instanceid AS ctxinstance
      FROM mdl_course c
      JOIN mdl_event e ON e.courseid = c.id
      LEFT JOIN mdl_context ctx ON (ctx.instanceid = c.id AND ctx.contextlevel = 50)
      ;
      

      New code:

      SELECT c.* , ctx.id AS ctxid, ctx.path AS ctxpath, ctx.depth AS ctxdepth, ctx.contextlevel AS ctxlevel, ctx.instanceid AS ctxinstance
      FROM mdl_course c
      LEFT JOIN mdl_context ctx ON (ctx.instanceid = c.id AND ctx.contextlevel = 50)
      WHERE EXISTS (SELECT 1 FROM mdl_event e WHERE e.courseid = c.id)
      ;
      
      Show
      Set the calendar_adminseesall config setting to true Visit a calendar page Ensure that all entries are correctly shown Run the old query and the new query to ensure that the same results are always returned. For convenience, I've pasted them below: Old code: SELECT DISTINCT c.* , ctx.id AS ctxid, ctx.path AS ctxpath, ctx.depth AS ctxdepth, ctx.contextlevel AS ctxlevel, ctx.instanceid AS ctxinstance FROM mdl_course c JOIN mdl_event e ON e.courseid = c.id LEFT JOIN mdl_context ctx ON (ctx.instanceid = c.id AND ctx.contextlevel = 50) ; New code: SELECT c.* , ctx.id AS ctxid, ctx.path AS ctxpath, ctx.depth AS ctxdepth, ctx.contextlevel AS ctxlevel, ctx.instanceid AS ctxinstance FROM mdl_course c LEFT JOIN mdl_context ctx ON (ctx.instanceid = c.id AND ctx.contextlevel = 50) WHERE EXISTS (SELECT 1 FROM mdl_event e WHERE e.courseid = c.id) ;
    • Difficulty:
      Moderate
    • Affected Branches:
      MOODLE_21_STABLE, MOODLE_22_STABLE
    • Fixed Branches:
      MOODLE_21_STABLE, MOODLE_22_STABLE
    • Pull from Repository:
    • Pull Master Branch:
      MDL-32340-master-2
    • Rank:
      39163

      Description

      It seems MDL-31086 introduces a very slow query in calendar, for large sites.

      $sql = "SELECT DISTINCT c.* $select
      FROM {course} c
      JOIN {event} e ON e.courseid = c.id
      $join";
      

      In my case, after upgrading to 2.1.5 via CLI, I could not login to the site as an admin due to hitting 5 minute connection timeout. Checking postgres showed this query running for several minutes - it appears to try and sort by all columns in the query, then return the top 20 for the get_records_sql limit. (So, not a very good plan, but...)

      Since we know course id is unique enough, it is much nicer on the database to advise it only to make the results unique by c.id i.e:

      -        $sql = "SELECT DISTINCT c.* $select
      +        $sql = "SELECT DISTINCT ON (c.id) c.* $select
      

      This makes the query return in only 10ms for this site. I'm not sure if this is compatible enough syntax for core, however.

      Github incoming.

        Issue Links

          Activity

          Hide
          Tony Levi added a comment - - edited

          Raising to critical as this effectively blocks admin access to bigger sites.

          https://github.com/tlevi/moodle/tree/mdl32340_21

          Show
          Tony Levi added a comment - - edited Raising to critical as this effectively blocks admin access to bigger sites. https://github.com/tlevi/moodle/tree/mdl32340_21
          Hide
          Michael de Raadt added a comment -

          Thanks for spotting that and providing a solution.

          This would need to be tested against each DB.

          Show
          Michael de Raadt added a comment - Thanks for spotting that and providing a solution. This would need to be tested against each DB.
          Hide
          Dan Poltawski added a comment -

          Unfortunately this query doesn't work on all dbs:

          Debug info: Incorrect syntax near the keyword 'ON'.
          SELECT DISTINCT TOP 20 ON (c.id) c.* , ctx.id AS ctxid, ctx.path AS ctxpath, ctx.depth AS ctxdepth, ctx.contextlevel AS ctxlevel, ctx.instanceid AS ctxinstance
          FROM f__course c
          JOIN f__event e ON e.courseid = c.id
          LEFT JOIN f__context ctx ON (ctx.instanceid = c.id AND ctx.contextlevel = 50)
          [array (
          )]
          Stack trace:
          line 413 of /lib/dml/moodle_database.php: dml_read_exception thrown
          line 260 of /lib/dml/mssql_native_moodle_database.php: call to moodle_database->query_end()
          line 717 of /lib/dml/mssql_native_moodle_database.php: call to mssql_native_moodle_database->query_end()
          line 747 of /lib/dml/mssql_native_moodle_database.php: call to mssql_native_moodle_database->get_recordset_sql()
          line 1517 of /calendar/lib.php: call to mssql_native_moodle_database->get_records_sql()
          line 13 of /test.php: call to calendar_get_default_courses()

          Show
          Dan Poltawski added a comment - Unfortunately this query doesn't work on all dbs: Debug info: Incorrect syntax near the keyword 'ON'. SELECT DISTINCT TOP 20 ON (c.id) c.* , ctx.id AS ctxid, ctx.path AS ctxpath, ctx.depth AS ctxdepth, ctx.contextlevel AS ctxlevel, ctx.instanceid AS ctxinstance FROM f__course c JOIN f__event e ON e.courseid = c.id LEFT JOIN f__context ctx ON (ctx.instanceid = c.id AND ctx.contextlevel = 50) [array ( )] Stack trace: line 413 of /lib/dml/moodle_database.php: dml_read_exception thrown line 260 of /lib/dml/mssql_native_moodle_database.php: call to moodle_database->query_end() line 717 of /lib/dml/mssql_native_moodle_database.php: call to mssql_native_moodle_database->query_end() line 747 of /lib/dml/mssql_native_moodle_database.php: call to mssql_native_moodle_database->get_recordset_sql() line 1517 of /calendar/lib.php: call to mssql_native_moodle_database->get_records_sql() line 13 of /test.php: call to calendar_get_default_courses()
          Hide
          Andrew Nicols added a comment -

          DISTINCT ON doesn't work with several

          Show
          Andrew Nicols added a comment - DISTINCT ON doesn't work with several
          Hide
          Andrew Nicols added a comment -

          Taking this off Dan as I have real data to test with.
          Have a potential solution I've started working on...

          Show
          Andrew Nicols added a comment - Taking this off Dan as I have real data to test with. Have a potential solution I've started working on...
          Hide
          Andrew Nicols added a comment -

          Can I get a second set of eyes on this.

          This should return the same set of results but without using the distinct. Since the $join always joins contexts, which always come from courses, and there's only one course per event, we only need to limit on the number of events once.

          I've taken the liberty of updating the context calls to use context helper and friends. I haven't changed the context_instance_preload_sql as I haven't been able to find the best replacement for it yet.

          This will need separate branches for < master if it's applicable for it to be backported.

          Here are my EXPLAIN ANALYZE results from before and after on a production dataset. This returns 90 results from 296 course events:

          Before:

          moodle-lu=# \i /tmp/one.sql
                                                                               QUERY PLAN                                                                      
          -----------------------------------------------------------------------------------------------------------------------------------------------------
           HashAggregate  (cost=342.44..346.12 rows=368 width=508) (actual time=31.967..32.012 rows=90 loops=1)
             ->  Merge Right Join  (cost=126.91..307.48 rows=368 width=508) (actual time=1.557..2.646 rows=321 loops=1)
                   Merge Cond: (ctx.instanceid = c.id)
                   ->  Index Scan using mdl_cont_ins_ix on mdl_context ctx  (cost=0.00..1678.29 rows=866 width=42) (actual time=0.012..0.768 rows=620 loops=1)
                         Filter: (contextlevel = 50)
                   ->  Sort  (cost=126.91..127.83 rows=368 width=466) (actual time=1.538..1.576 rows=321 loops=1)
                         Sort Key: c.id
                         Sort Method:  quicksort  Memory: 289kB
                         ->  Hash Join  (cost=78.48..111.22 rows=368 width=466) (actual time=0.999..1.254 rows=321 loops=1)
                               Hash Cond: (e.courseid = c.id)
                               ->  Seq Scan on mdl_event e  (cost=0.00..27.68 rows=368 width=8) (actual time=0.003..0.065 rows=368 loops=1)
                               ->  Hash  (cost=67.66..67.66 rows=866 width=466) (actual time=0.991..0.991 rows=866 loops=1)
                                     Buckets: 1024  Batches: 1  Memory Usage: 438kB
                                     ->  Seq Scan on mdl_course c  (cost=0.00..67.66 rows=866 width=466) (actual time=0.003..0.469 rows=866 loops=1)
           Total runtime: 32.148 ms
          (15 rows)
          

          After:

          moodle-lu=# \i /tmp/two.sql
                                                                         QUERY PLAN                                                                
          -----------------------------------------------------------------------------------------------------------------------------------------
           Nested Loop Left Join  (cost=30.65..265.85 rows=91 width=508) (actual time=0.170..1.004 rows=90 loops=1)
             ->  Hash Join  (cost=30.65..105.24 rows=91 width=466) (actual time=0.161..0.726 rows=90 loops=1)
                   Hash Cond: (c.id = mdl_event.courseid)
                   ->  Seq Scan on mdl_course c  (cost=0.00..67.66 rows=866 width=466) (actual time=0.004..0.475 rows=866 loops=1)
                   ->  Hash  (cost=29.51..29.51 rows=91 width=8) (actual time=0.146..0.146 rows=91 loops=1)
                         Buckets: 1024  Batches: 1  Memory Usage: 4kB
                         ->  HashAggregate  (cost=28.60..29.51 rows=91 width=8) (actual time=0.125..0.137 rows=91 loops=1)
                               ->  Seq Scan on mdl_event  (cost=0.00..27.68 rows=368 width=8) (actual time=0.001..0.040 rows=368 loops=1)
             ->  Index Scan using mdl_cont_ins_ix on mdl_context ctx  (cost=0.00..1.75 rows=1 width=42) (actual time=0.002..0.003 rows=1 loops=90)
                   Index Cond: (ctx.instanceid = c.id)
                   Filter: (ctx.contextlevel = 50)
           Total runtime: 1.063 ms
          (12 rows)
          
          Show
          Andrew Nicols added a comment - Can I get a second set of eyes on this. This should return the same set of results but without using the distinct. Since the $join always joins contexts, which always come from courses, and there's only one course per event, we only need to limit on the number of events once. I've taken the liberty of updating the context calls to use context helper and friends. I haven't changed the context_instance_preload_sql as I haven't been able to find the best replacement for it yet. This will need separate branches for < master if it's applicable for it to be backported. Here are my EXPLAIN ANALYZE results from before and after on a production dataset. This returns 90 results from 296 course events: Before: moodle-lu=# \i /tmp/one.sql QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=342.44..346.12 rows=368 width=508) (actual time=31.967..32.012 rows=90 loops=1) -> Merge Right Join (cost=126.91..307.48 rows=368 width=508) (actual time=1.557..2.646 rows=321 loops=1) Merge Cond: (ctx.instanceid = c.id) -> Index Scan using mdl_cont_ins_ix on mdl_context ctx (cost=0.00..1678.29 rows=866 width=42) (actual time=0.012..0.768 rows=620 loops=1) Filter: (contextlevel = 50) -> Sort (cost=126.91..127.83 rows=368 width=466) (actual time=1.538..1.576 rows=321 loops=1) Sort Key: c.id Sort Method: quicksort Memory: 289kB -> Hash Join (cost=78.48..111.22 rows=368 width=466) (actual time=0.999..1.254 rows=321 loops=1) Hash Cond: (e.courseid = c.id) -> Seq Scan on mdl_event e (cost=0.00..27.68 rows=368 width=8) (actual time=0.003..0.065 rows=368 loops=1) -> Hash (cost=67.66..67.66 rows=866 width=466) (actual time=0.991..0.991 rows=866 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 438kB -> Seq Scan on mdl_course c (cost=0.00..67.66 rows=866 width=466) (actual time=0.003..0.469 rows=866 loops=1) Total runtime: 32.148 ms (15 rows) After: moodle-lu=# \i /tmp/two.sql QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=30.65..265.85 rows=91 width=508) (actual time=0.170..1.004 rows=90 loops=1) -> Hash Join (cost=30.65..105.24 rows=91 width=466) (actual time=0.161..0.726 rows=90 loops=1) Hash Cond: (c.id = mdl_event.courseid) -> Seq Scan on mdl_course c (cost=0.00..67.66 rows=866 width=466) (actual time=0.004..0.475 rows=866 loops=1) -> Hash (cost=29.51..29.51 rows=91 width=8) (actual time=0.146..0.146 rows=91 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 4kB -> HashAggregate (cost=28.60..29.51 rows=91 width=8) (actual time=0.125..0.137 rows=91 loops=1) -> Seq Scan on mdl_event (cost=0.00..27.68 rows=368 width=8) (actual time=0.001..0.040 rows=368 loops=1) -> Index Scan using mdl_cont_ins_ix on mdl_context ctx (cost=0.00..1.75 rows=1 width=42) (actual time=0.002..0.003 rows=1 loops=90) Index Cond: (ctx.instanceid = c.id) Filter: (ctx.contextlevel = 50) Total runtime: 1.063 ms (12 rows)
          Hide
          Tony Levi added a comment -

          Your dataset looks smaller, but this query does work for our case too: http://explain.depesz.com/s/0Jm

          84ms vs 164 seconds

          Show
          Tony Levi added a comment - Your dataset looks smaller, but this query does work for our case too: http://explain.depesz.com/s/0Jm 84ms vs 164 seconds
          Hide
          Andrew Nicols added a comment -

          A minor improvement then

          Show
          Andrew Nicols added a comment - A minor improvement then
          Hide
          Tony Levi added a comment -

          Yep, very nice, this has my +1 at least.

          Show
          Tony Levi added a comment - Yep, very nice, this has my +1 at least.
          Hide
          Eloy Lafuente (stronk7) added a comment -
          • Why courses and contexts are outer joined? Cannot imagine any reason, really, especially when later, it's used to preload contexts.
          • "IN" is evil. Use "EXISTS" instead. Experience says it cannot perform worse than "IN", in normal DBs but can help some instead.

          Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - Why courses and contexts are outer joined? Cannot imagine any reason, really, especially when later, it's used to preload contexts. "IN" is evil. Use "EXISTS" instead. Experience says it cannot perform worse than "IN", in normal DBs but can help some instead. Ciao
          Hide
          Andrew Nicols added a comment -

          Cheers Eloy,

          I've changed it to use an EXISTS rather than IN. The explain seems fairly similar (see below).
          The LEFT JOIN stems from context_instance_preload_sql() - I'm happy to change it but I'm not sure what it needs to change to!

          moodle-master=# \i /tmp/three.sql
                                                                   QUERY PLAN                                                         
          ----------------------------------------------------------------------------------------------------------------------------
           Hash Semi Join  (cost=13.90..15.11 rows=6 width=428) (actual time=0.027..0.027 rows=0 loops=1)
             Hash Cond: (c.id = e.courseid)
             ->  Hash Left Join  (cost=1.88..3.00 rows=6 width=428) (actual time=0.023..0.023 rows=1 loops=1)
                   Hash Cond: (c.id = ctx.instanceid)
                   ->  Seq Scan on mdl_course c  (cost=0.00..1.06 rows=6 width=394) (actual time=0.003..0.003 rows=1 loops=1)
                   ->  Hash  (cost=1.76..1.76 rows=9 width=34) (actual time=0.012..0.012 rows=6 loops=1)
                         Buckets: 1024  Batches: 1  Memory Usage: 1kB
                         ->  Seq Scan on mdl_context ctx  (cost=0.00..1.76 rows=9 width=34) (actual time=0.005..0.009 rows=6 loops=1)
                               Filter: (contextlevel = 50)
             ->  Hash  (cost=10.90..10.90 rows=90 width=8) (actual time=0.000..0.000 rows=0 loops=1)
                   Buckets: 1024  Batches: 1  Memory Usage: 0kB
                   ->  Seq Scan on mdl_event e  (cost=0.00..10.90 rows=90 width=8) (actual time=0.000..0.000 rows=0 loops=1)
           Total runtime: 0.088 ms
          (13 rows)
          
          Show
          Andrew Nicols added a comment - Cheers Eloy, I've changed it to use an EXISTS rather than IN. The explain seems fairly similar (see below). The LEFT JOIN stems from context_instance_preload_sql() - I'm happy to change it but I'm not sure what it needs to change to! moodle-master=# \i /tmp/three.sql QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Hash Semi Join (cost=13.90..15.11 rows=6 width=428) (actual time=0.027..0.027 rows=0 loops=1) Hash Cond: (c.id = e.courseid) -> Hash Left Join (cost=1.88..3.00 rows=6 width=428) (actual time=0.023..0.023 rows=1 loops=1) Hash Cond: (c.id = ctx.instanceid) -> Seq Scan on mdl_course c (cost=0.00..1.06 rows=6 width=394) (actual time=0.003..0.003 rows=1 loops=1) -> Hash (cost=1.76..1.76 rows=9 width=34) (actual time=0.012..0.012 rows=6 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB -> Seq Scan on mdl_context ctx (cost=0.00..1.76 rows=9 width=34) (actual time=0.005..0.009 rows=6 loops=1) Filter: (contextlevel = 50) -> Hash (cost=10.90..10.90 rows=90 width=8) (actual time=0.000..0.000 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 0kB -> Seq Scan on mdl_event e (cost=0.00..10.90 rows=90 width=8) (actual time=0.000..0.000 rows=0 loops=1) Total runtime: 0.088 ms (13 rows)
          Hide
          Andrew Nicols added a comment -

          Some comments here for you Eloy. Thanks for the feedback

          Show
          Andrew Nicols added a comment - Some comments here for you Eloy. Thanks for the feedback
          Hide
          Tony Levi added a comment -
          Show
          Tony Levi added a comment - Good for us still. http://explain.depesz.com/s/nrr
          Hide
          Dan Poltawski added a comment -

          GRR! Another bad diff url.

          Show
          Dan Poltawski added a comment - GRR! Another bad diff url.
          Hide
          Dan Poltawski added a comment -

          This looks good to me and appears to be working in all DBs.

          Tony - how many contexts/events/courses does your site have?

          Maybe we could write a script to generate some big data so could try this on all the dbs

          Show
          Dan Poltawski added a comment - This looks good to me and appears to be working in all DBs. Tony - how many contexts/events/courses does your site have? Maybe we could write a script to generate some big data so could try this on all the dbs
          Hide
          Tony Levi added a comment -

          Not our biggest by far but the system I first saw the problem with, and I've done the tests with, has these numbers:
          275034, 3336, 34259

          We've been flirting with the idea of trying to anonymise one of these big DBs so HQ could potentially test with it - there are some wild things about production data vs anything synthetic. Not sure if this would be useful if we could manage it one day??

          Show
          Tony Levi added a comment - Not our biggest by far but the system I first saw the problem with, and I've done the tests with, has these numbers: 275034, 3336, 34259 We've been flirting with the idea of trying to anonymise one of these big DBs so HQ could potentially test with it - there are some wild things about production data vs anything synthetic. Not sure if this would be useful if we could manage it one day??
          Hide
          Dan Poltawski added a comment -

          I definitely think it would. Like you say, trying to synthesise the data produces it in one specific structure rather than reality. Though I think both are useful..

          Show
          Dan Poltawski added a comment - I definitely think it would. Like you say, trying to synthesise the data produces it in one specific structure rather than reality. Though I think both are useful..
          Hide
          Eloy Lafuente (stronk7) added a comment -

          The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week.

          TIA and ciao

          Show
          Eloy Lafuente (stronk7) added a comment - The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week. TIA and ciao
          Hide
          Sam Hemelryk added a comment -

          Thanks guys this has been integrated now

          Show
          Sam Hemelryk added a comment - Thanks guys this has been integrated now
          Hide
          Michael de Raadt added a comment - - edited

          The "old" query didn't seem to be running on SQL Server.

          It looks like the introduction of the DISTINCT keyword in MDL-31086 is the issue. SQL Server doesn't like comparing "ntext" fields in when producing distinct results. This query calls for all course fields c.*, in which the summary is a text field.

          The "new" query seems to produce results.

          Show
          Michael de Raadt added a comment - - edited The "old" query didn't seem to be running on SQL Server. It looks like the introduction of the DISTINCT keyword in MDL-31086 is the issue. SQL Server doesn't like comparing "ntext" fields in when producing distinct results. This query calls for all course fields c.*, in which the summary is a text field. The "new" query seems to produce results.
          Hide
          Michael de Raadt added a comment -

          Oracle seems to have the same quibble with the old query, but the new one works.

          Show
          Michael de Raadt added a comment - Oracle seems to have the same quibble with the old query, but the new one works.
          Hide
          Michael de Raadt added a comment -

          Test result: Success

          Tested on 2.1, 2.2 and master under Oracle, SQL Server, PostgreSQL and MySQL. (phew)

          Show
          Michael de Raadt added a comment - Test result: Success Tested on 2.1, 2.2 and master under Oracle, SQL Server, PostgreSQL and MySQL. (phew)
          Hide
          Eloy Lafuente (stronk7) added a comment -

          U P S T R E A M I Z E D !

          Many thanks for the hard work, closing this as fixed.

          Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - U P S T R E A M I Z E D ! Many thanks for the hard work, closing this as fixed. Ciao

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: