Moodle
  1. Moodle
  2. MDL-29439

Quiz restoration is very slow if the backup contains lots of questions

    Details

    • Database:
      Any
    • Affected Branches:
      MOODLE_21_STABLE, MOODLE_23_STABLE, MOODLE_24_STABLE
    • Rank:
      18946

      Description

      When restoring a backup (Moodle 2 mbz) containing 120 quizzes it takes a long time to complete the restore. To restore the previous course it took 10 minutes on a 4 processor, 16 GB RAM MySQL Database Server.

      When the subject was backed up and restored in a Moodle 1.9 environment the restore takes 2 minutes.

      We can provide 10 backup samples but not on this public tracker. This is due the presence of copyrighted documents in the backup.

        Issue Links

          Activity

          Hide
          Eloy Lafuente (stronk7) added a comment -

          Hi Juan, could you send me one backup to be able to perform some profiling here to look (if / where) we can improve the process a bit? I've some preliminar ideas but surely one big-enough backup will help to find the bottlenecks / compare results.

          Of course, 100% privately, and will be erased from my dev environment once this gets sorted. stronk7.at.moodle.dot.org is me. TIA!

          Show
          Eloy Lafuente (stronk7) added a comment - Hi Juan, could you send me one backup to be able to perform some profiling here to look (if / where) we can improve the process a bit? I've some preliminar ideas but surely one big-enough backup will help to find the bottlenecks / compare results. Of course, 100% privately, and will be erased from my dev environment once this gets sorted. stronk7.at.moodle.dot.org is me. TIA!
          Hide
          Juan Leyva added a comment -

          Hi Eloy,

          I have sent you 4 courses backup with lots of quizzes

          Regards

          Show
          Juan Leyva added a comment - Hi Eloy, I have sent you 4 courses backup with lots of quizzes Regards
          Hide
          Eloy Lafuente (stronk7) added a comment -

          ACK, received! Thanks!

          Show
          Eloy Lafuente (stronk7) added a comment - ACK, received! Thanks!
          Hide
          Michael de Raadt added a comment -

          This seems to be affecting a few people. I'm bumping the issue.

          Eloy, if you are not able to work on the issue, please reassign.

          Show
          Michael de Raadt added a comment - This seems to be affecting a few people. I'm bumping the issue. Eloy, if you are not able to work on the issue, please reassign.
          Hide
          Russell Smith added a comment -

          This issue exists on Moodle 2.3.2 at least. The affecting code that produces part of the horrible performance is still present in 2.3.4. Even though we have not specifically tested on 2.3.4.

          What we are seeing in our environment is;

          The temp table query;

          SELECT f.id AS btfid, f.contextid, ... FROM mdl_backup_files_temp f JOIN mdl_backup_ids_temp i ON i.backupid = f.backupid AND i.itemid = f.itemid WHERE ...

          This code is in; backup/util/dbops/restore_dbops.class.php:866 ff

          Once you get a large number of rows in those tables it causes problems for us. The first appears on PostgreSQL as it doesn't analyze that data as it's a temp table. This results in a guess the tables are small and does a nested loop join. This results in 2seconds per query. Hundreds of these queries run to restore a course with large numbers of quizzes.

          The code for that can be found in;

          To reduce this time, we've tested hacking an analyze table into;

          backup/moodle2/restore_stepslib.php:3102

          $DB->execute("ANALYZE

          {backup_ids_temp}

          ");
          $DB->execute("ANALYZE

          {backup_files_temp}

          ");

          This dropped the query from 2 seconds to 35ms in examples where the number of rows in those tables were large.

          When you scale that up we have the following example running on our environment right now. A 1.5Mb backup of a course, it's currently been restoring for 5.5 hours and still not finished. On a different less powerful environment I put in the PostgreSQL only hack above and the restoration time was 30minutes. So that's at least a 7x increase without taking into consideration the following.

          The comments at the top of the restore_create_question_files function in restore_stepslib.php says there is a TODO to improve this code. It indicates there will be a huge amount of queries saved if we do this per context rather than per question. This clearly appears to be a significant location for improvement. When we are restoring subjects like this we see low CPU usage on both the database server process and the web server process. That indicates lots of queries and waiting on millisecond level network latency.

          We latency and high numbers of queries issues when upgraded from 2.0 to 2.2. We reduced the upgrade time from 12 hours to 5 hours by adjusting the quiz upgrade code to insert data in large chunks where possible, instead of thousands of small queries. I theorize this is also what's happening here.

          We are discussing options with our Moodle Partner at the moment to look at resolutions to our issue. Is anybody already looking at this as I'm sure there are lots who would like to see a benefit from this. I would also allow us and our Partner to get on the right track if somebody has already proposed a way to work towards a solution.

          Has anybody else got any theories or understand of other things that could be causing the slowness?

          Show
          Russell Smith added a comment - This issue exists on Moodle 2.3.2 at least. The affecting code that produces part of the horrible performance is still present in 2.3.4. Even though we have not specifically tested on 2.3.4. What we are seeing in our environment is; The temp table query; SELECT f.id AS btfid, f.contextid, ... FROM mdl_backup_files_temp f JOIN mdl_backup_ids_temp i ON i.backupid = f.backupid AND i.itemid = f.itemid WHERE ... This code is in; backup/util/dbops/restore_dbops.class.php:866 ff Once you get a large number of rows in those tables it causes problems for us. The first appears on PostgreSQL as it doesn't analyze that data as it's a temp table. This results in a guess the tables are small and does a nested loop join. This results in 2seconds per query. Hundreds of these queries run to restore a course with large numbers of quizzes. The code for that can be found in; To reduce this time, we've tested hacking an analyze table into; backup/moodle2/restore_stepslib.php:3102 $DB->execute("ANALYZE {backup_ids_temp} "); $DB->execute("ANALYZE {backup_files_temp} "); This dropped the query from 2 seconds to 35ms in examples where the number of rows in those tables were large. When you scale that up we have the following example running on our environment right now. A 1.5Mb backup of a course, it's currently been restoring for 5.5 hours and still not finished. On a different less powerful environment I put in the PostgreSQL only hack above and the restoration time was 30minutes. So that's at least a 7x increase without taking into consideration the following. The comments at the top of the restore_create_question_files function in restore_stepslib.php says there is a TODO to improve this code. It indicates there will be a huge amount of queries saved if we do this per context rather than per question. This clearly appears to be a significant location for improvement. When we are restoring subjects like this we see low CPU usage on both the database server process and the web server process. That indicates lots of queries and waiting on millisecond level network latency. We latency and high numbers of queries issues when upgraded from 2.0 to 2.2. We reduced the upgrade time from 12 hours to 5 hours by adjusting the quiz upgrade code to insert data in large chunks where possible, instead of thousands of small queries. I theorize this is also what's happening here. We are discussing options with our Moodle Partner at the moment to look at resolutions to our issue. Is anybody already looking at this as I'm sure there are lots who would like to see a benefit from this. I would also allow us and our Partner to get on the right track if somebody has already proposed a way to work towards a solution. Has anybody else got any theories or understand of other things that could be causing the slowness?
          Hide
          Russell Smith added a comment -

          For confirmation, the long running backup that was in progress finished in 6hours. So we are seeing 1/12 the time requirement with the hack.

          Show
          Russell Smith added a comment - For confirmation, the long running backup that was in progress finished in 6hours. So we are seeing 1/12 the time requirement with the hack.
          Hide
          Russell Smith added a comment -

          As Michael noted in September last year, is anybody looking at this? As I mentioned in my first comment it would be good to know any specific work done or direction this is heading in before developing a patch for it. It's usually annoying to develop a patch and find out it's not what you should have been doing or how you should have been doing it.

          Give the time analysis shown in my comment, are there any specific rules to follow with the refactor of the question restoration TODO. I know everyone is busy, but a short investigation will allow me to get a permanent solution developed and contributed.

          Show
          Russell Smith added a comment - As Michael noted in September last year, is anybody looking at this? As I mentioned in my first comment it would be good to know any specific work done or direction this is heading in before developing a patch for it. It's usually annoying to develop a patch and find out it's not what you should have been doing or how you should have been doing it. Give the time analysis shown in my comment, are there any specific rules to follow with the refactor of the question restoration TODO. I know everyone is busy, but a short investigation will allow me to get a permanent solution developed and contributed.
          Hide
          Russell Smith added a comment -

          Can we get 2.3 and 2.4 added as affected branches.

          Components should include; Performance and Quiz.

          As only a bug paticipant I'm unable to alter any of those even though I've inspected the code to confirm the requested changes.

          Show
          Russell Smith added a comment - Can we get 2.3 and 2.4 added as affected branches. Components should include; Performance and Quiz. As only a bug paticipant I'm unable to alter any of those even though I've inspected the code to confirm the requested changes.
          Hide
          Chris Follin added a comment -

          We're seeing this in 2.3.3. I also added 2.4 as an affected version based on Russell's comment.

          Show
          Chris Follin added a comment - We're seeing this in 2.3.3. I also added 2.4 as an affected version based on Russell's comment.
          Hide
          Chris Follin added a comment -

          We're seeing this again with another site. A course from that site has over 200 quizzes. For testing, we deleted all of the quizzes but still had problems with backup and restore. We then started deleting questions from the large question bank. This improved the backup and restore. So the problem may not necessarily be the quizzes but actually the question bank. Or maybe there's a problem with both in different situations.

          Show
          Chris Follin added a comment - We're seeing this again with another site. A course from that site has over 200 quizzes. For testing, we deleted all of the quizzes but still had problems with backup and restore. We then started deleting questions from the large question bank. This improved the backup and restore. So the problem may not necessarily be the quizzes but actually the question bank. Or maybe there's a problem with both in different situations.
          Hide
          Russell Smith added a comment -

          I've added Tim Hunt to the watcher list to see if he has any wisdom to shed on the issue. As this is very much related to quiz and question bank.

          Show
          Russell Smith added a comment - I've added Tim Hunt to the watcher list to see if he has any wisdom to shed on the issue. As this is very much related to quiz and question bank.
          Hide
          Kris Stokking added a comment - - edited

          This issue should be related to MDL-26442. The question restore process is doing a lookup on questions based on the category, stamp and version, but the table has no index on these fields. The resolution to 26442 includes a unique constraint using those same fields (to avoid duplicates), which would make question lookups near instantaneous. If you're looking for a quick fix, you may want to consider adding a non-unique index on those fields, BUT it must be removed once you receive the fix for 26442.

          Show
          Kris Stokking added a comment - - edited This issue should be related to MDL-26442 . The question restore process is doing a lookup on questions based on the category, stamp and version, but the table has no index on these fields. The resolution to 26442 includes a unique constraint using those same fields (to avoid duplicates), which would make question lookups near instantaneous. If you're looking for a quick fix, you may want to consider adding a non-unique index on those fields, BUT it must be removed once you receive the fix for 26442.
          Hide
          Russell Smith added a comment -

          It is related by the backup and restore process. However the table with the fault is completely different. The issue seen with this is the temporary table that is created. My reading of MDL-26442 is that is adds indexes to the questions tables.

          Also my earlier comments indicate that on PostgreSQL at least, the first main issue is that temporary tables don't have statistics and won't use indexes as it won't think it's valuable on a temp table with 1000 rows. Other database may behave differently with temporary tables.

          The second part of the issue is that as inserts are done 1 by one, you end up sending thousands of queries to the database. Not matter how fast they are, a couple of milliseconds latency adds up quickly. The network latency has proven to be a big issue when we did out 2.0 -> 2.2 upgrade as seen in my earlier comment.

          So all that put together still leaves me feeling we need a way to collect statistics for temporary tables on all databases. I can't confirm if MySQL, SQLServer or Oracle suffer from the statistics issue PostgreSQL does.

          Is my analysis of MDL-26442 correct, or am I missing the point of that issue?

          Show
          Russell Smith added a comment - It is related by the backup and restore process. However the table with the fault is completely different. The issue seen with this is the temporary table that is created. My reading of MDL-26442 is that is adds indexes to the questions tables. Also my earlier comments indicate that on PostgreSQL at least, the first main issue is that temporary tables don't have statistics and won't use indexes as it won't think it's valuable on a temp table with 1000 rows. Other database may behave differently with temporary tables. The second part of the issue is that as inserts are done 1 by one, you end up sending thousands of queries to the database. Not matter how fast they are, a couple of milliseconds latency adds up quickly. The network latency has proven to be a big issue when we did out 2.0 -> 2.2 upgrade as seen in my earlier comment. So all that put together still leaves me feeling we need a way to collect statistics for temporary tables on all databases. I can't confirm if MySQL, SQLServer or Oracle suffer from the statistics issue PostgreSQL does. Is my analysis of MDL-26442 correct, or am I missing the point of that issue?
          Hide
          Kris Stokking added a comment -

          Hi Russell - you are correct, MDL-26442 refers to a different performance problem within the quiz restore process regarding lookups to mdl_question.

          I'd agree with both of the optimizations you've identified in the ticket. I can't speak to how dramatic they would improve performance for non-PGSQL sites, the indexing should update just fine in MySQL on temporary tables, but at the very least you're in the right direction. Ideally, the restore process would:

          1. Create the temp tables, with keys turned off
          2. Bulk insert data into the temp tables
          3. Enable keys

          I will defer to Eloy Lafuente (stronk7) or Petr Škoda on how feasible that would be.

          Show
          Kris Stokking added a comment - Hi Russell - you are correct, MDL-26442 refers to a different performance problem within the quiz restore process regarding lookups to mdl_question. I'd agree with both of the optimizations you've identified in the ticket. I can't speak to how dramatic they would improve performance for non-PGSQL sites, the indexing should update just fine in MySQL on temporary tables, but at the very least you're in the right direction. Ideally, the restore process would: 1. Create the temp tables, with keys turned off 2. Bulk insert data into the temp tables 3. Enable keys I will defer to Eloy Lafuente (stronk7) or Petr Škoda on how feasible that would be.
          Hide
          Russell Smith added a comment -

          As discussed with Kris, MDL-26442 has been marked as related. However the relationship is not specific.

          Show
          Russell Smith added a comment - As discussed with Kris, MDL-26442 has been marked as related. However the relationship is not specific.
          Hide
          Russell Smith added a comment -

          So I did a bit of a database survey and came up with some information. PostgreSQL is the only db affected by the statistics collection issue and a search of the tracker didn't find any bugs submitted about it. I'm not sure if it's worth reporting as a separate issue or not.

          Here are the links and summary of affected DB's for Analyzing temporary tables;

          MySQL: No statistics Engine until 5.6, so I'm assuming not affecting people.
          PostgreSQL: Doesn't automatically update statistics on Temporary Tables
          Oracle: Does update statistics
          SQL Server: Does update statistics unless you disable it.

          MySQL Link: http://dev.mysql.com/doc/refman/5.6/en/glossary.html#glos_persistent_statistics
          PostgreSQL Link: http://www.postgresql.org/message-id/AANLkTin5Z3ie1XBCNs=sjDL=nsbXXERVF1xVnxcE_108@mail.gmail.com
          Oracle Link: http://docs.oracle.com/cd/B19306_01/server.102/b14231/tables.htm#i1006473
          SQL Server Link: http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/cfa8f983-8107-4af0-91a7-b2ee915e07ea

          However the current way that temp tables are managed doesn't lend itself to an easy fix for PostgreSQL or to being able to load data into temp tables in the way that Kris suggests.

          So I'm debating opening a related issue that is about the analysis of temp tables in general which specifically showed up here first. This issue could then be focused on the issue of quizzes not inserting data in bulk which is another significant contributor to the slow speed.

          Thoughts?

          Show
          Russell Smith added a comment - So I did a bit of a database survey and came up with some information. PostgreSQL is the only db affected by the statistics collection issue and a search of the tracker didn't find any bugs submitted about it. I'm not sure if it's worth reporting as a separate issue or not. Here are the links and summary of affected DB's for Analyzing temporary tables; MySQL: No statistics Engine until 5.6, so I'm assuming not affecting people. PostgreSQL: Doesn't automatically update statistics on Temporary Tables Oracle: Does update statistics SQL Server: Does update statistics unless you disable it. MySQL Link: http://dev.mysql.com/doc/refman/5.6/en/glossary.html#glos_persistent_statistics PostgreSQL Link: http://www.postgresql.org/message-id/AANLkTin5Z3ie1XBCNs=sjDL=nsbXXERVF1xVnxcE_108@mail.gmail.com Oracle Link: http://docs.oracle.com/cd/B19306_01/server.102/b14231/tables.htm#i1006473 SQL Server Link: http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/cfa8f983-8107-4af0-91a7-b2ee915e07ea However the current way that temp tables are managed doesn't lend itself to an easy fix for PostgreSQL or to being able to load data into temp tables in the way that Kris suggests. So I'm debating opening a related issue that is about the analysis of temp tables in general which specifically showed up here first. This issue could then be focused on the issue of quizzes not inserting data in bulk which is another significant contributor to the slow speed. Thoughts?
          Hide
          Tim Hunt added a comment -

          Russel, above, you say "Hack this into backup/moodle2/restore_stepslib.php:3102". The problem is, restore_stepslib.php has now changed, and you gave no contextual informaiton to help me work out the right place to add your suggested code now.

          Would it be possible for you to propose code changes as git patches? If not, make it clear what needs to be done, and I can get this code into git.

          Show
          Tim Hunt added a comment - Russel, above, you say "Hack this into backup/moodle2/restore_stepslib.php:3102". The problem is, restore_stepslib.php has now changed, and you gave no contextual informaiton to help me work out the right place to add your suggested code now. Would it be possible for you to propose code changes as git patches? If not, make it clear what needs to be done, and I can get this code into git.
          Hide
          Tim Hunt added a comment -

          I took a guess that this is what Russel meant: https://github.com/timhunt/moodle/compare/master...MDL-29439. Please confirm or deny.

          Show
          Tim Hunt added a comment - I took a guess that this is what Russel meant: https://github.com/timhunt/moodle/compare/master...MDL-29439 . Please confirm or deny.
          Hide
          Russell Smith added a comment -

          Thanks for the kick Tim. I've gotten less lazy and pushed up a git branch against 2.3, I'm not sure how the code has changed in master yet and the production system is still running 2.3.

          https://github.com/mr-russ/moodle/compare/MOODLE_23_STABLE...MDL-29439_slow_quiz_hack

          I put it my version in the define_execution as I knew the code just below that needed it. I don't clearly understand the flow of restores to know whether the location you put it in has the same effect on master.

          Further to the second part of the slow restore.

          The lines are part of the same restore function in 2.3 in the same file as in the github compare; The below code also exists in current master.

          /**
           * Execution step that will create all the question/answers/qtype-specific files for the restored
           * questions. It must be executed after {@link restore_move_module_questions_categories}
           * because only then each question is in its final category and only then the
           * context can be determined
           *
           * TODO: Improve this. Instead of looping over each question, it can be reduced to
           *       be done by contexts (this will save a huge ammount of queries)
           */
          class restore_create_question_files extends restore_execution_step {
          
              protected function define_execution() {
          

          That TODO is what I believe is causing the performance problem as we get huge numbers of queries as a result of each of the steps in functions called from the define_execution loop of questions. That just adds lots of backwards and forwards latency and query parsing time to the restore process. As mentioned in earlier comments, the ANALYZE fix gets you from 6.5hours to 0.5hours but reduced queries would certainly lower that a lot further. I expect the magnitude of queries is what the original reporter is experiencing due to their much lower restore times than mine. I have no idea how I would start the process of rewriting that bit to do as the TODO suggests to provide confirmation this can improve the performance.

          I hope that is clearer.

          Show
          Russell Smith added a comment - Thanks for the kick Tim. I've gotten less lazy and pushed up a git branch against 2.3, I'm not sure how the code has changed in master yet and the production system is still running 2.3. https://github.com/mr-russ/moodle/compare/MOODLE_23_STABLE...MDL-29439_slow_quiz_hack I put it my version in the define_execution as I knew the code just below that needed it. I don't clearly understand the flow of restores to know whether the location you put it in has the same effect on master. Further to the second part of the slow restore. The lines are part of the same restore function in 2.3 in the same file as in the github compare; The below code also exists in current master. /** * Execution step that will create all the question/answers/qtype-specific files for the restored * questions. It must be executed after {@link restore_move_module_questions_categories} * because only then each question is in its final category and only then the * context can be determined * * TODO: Improve this . Instead of looping over each question, it can be reduced to * be done by contexts ( this will save a huge ammount of queries) */ class restore_create_question_files extends restore_execution_step { protected function define_execution() { That TODO is what I believe is causing the performance problem as we get huge numbers of queries as a result of each of the steps in functions called from the define_execution loop of questions. That just adds lots of backwards and forwards latency and query parsing time to the restore process. As mentioned in earlier comments, the ANALYZE fix gets you from 6.5hours to 0.5hours but reduced queries would certainly lower that a lot further. I expect the magnitude of queries is what the original reporter is experiencing due to their much lower restore times than mine. I have no idea how I would start the process of rewriting that bit to do as the TODO suggests to provide confirmation this can improve the performance. I hope that is clearer.
          Hide
          Tim Hunt added a comment -

          I suggest we open a new issue for one part of the work. Let us use this issue for getting your patch https://github.com/mr-russ/moodle/compare/MOODLE_23_STABLE...MDL-29439_slow_quiz_hack integrated. Can you open a new issue for workgin on that TODO.

          I am trying to get one of the Moodle HQ developers to look help out here, but no luck yet.

          Show
          Tim Hunt added a comment - I suggest we open a new issue for one part of the work. Let us use this issue for getting your patch https://github.com/mr-russ/moodle/compare/MOODLE_23_STABLE...MDL-29439_slow_quiz_hack integrated. Can you open a new issue for workgin on that TODO. I am trying to get one of the Moodle HQ developers to look help out here, but no luck yet.
          Hide
          Tim Hunt added a comment -

          To answer your request for hints about handling the TODO.

          You really need to understand what restore_dbops::send_files_to_pool does. (I don't!). The $olditemid argument seems to be unnecessary, and if you don't pass it, it does all the different itemids for a particular context at once somehow.

          If you understand that, then presumably it is possible in restore_create_question_files::define_execution to call restore_dbops::send_files_to_pool one per context, instead of once per question.

          You will still need mupltiple calls to handle all the different file areas, including the backup_qtype_plugin::get_components_and_fileareas($question->qtype) loop for all qtypes that are in the context. (It may be OK to do that in a loop over all qtypes, whether or not there are any questions of that type in that context. Typically each qtype only has a few extra file areas.

          That is the summary anyway.

          My other hint, before you start, is to work out how you are going to test your changes to establish

          1. the code works correctly
          2. the code does acutally perform many fewer DB queries than before.

          Acutally, for 1. you probably just need to create a backup file containing at least on question of each type, with images embedded in every possible HTML input, and with a quiz using question from every possible context.

          For 2., for testing restore code, it is well worth making a script like http://docs.moodle.org/dev/Restore_2.0_for_developers#Automatically_triggering_restore_in_code. You could make that script output number of DB queries at the end. You could also make it rollback the transaction at the end, rather than committing, so you don't end up with a DB full of junk.

          Note that, for small courses, your imporved code may acutally do more DB queries, but we don't care. We care about the limiting behaviour for many questions.

          Finally, can I just say, if you take this on, you are a braver man than me . (But don't let me put you off. I am sure this is doable given a bit of time, and it really needs to be done.) Thank you.

          Show
          Tim Hunt added a comment - To answer your request for hints about handling the TODO. You really need to understand what restore_dbops::send_files_to_pool does. (I don't!). The $olditemid argument seems to be unnecessary, and if you don't pass it, it does all the different itemids for a particular context at once somehow. If you understand that, then presumably it is possible in restore_create_question_files::define_execution to call restore_dbops::send_files_to_pool one per context, instead of once per question. You will still need mupltiple calls to handle all the different file areas, including the backup_qtype_plugin::get_components_and_fileareas($question->qtype) loop for all qtypes that are in the context. (It may be OK to do that in a loop over all qtypes, whether or not there are any questions of that type in that context. Typically each qtype only has a few extra file areas. That is the summary anyway. My other hint, before you start, is to work out how you are going to test your changes to establish the code works correctly the code does acutally perform many fewer DB queries than before. Acutally, for 1. you probably just need to create a backup file containing at least on question of each type, with images embedded in every possible HTML input, and with a quiz using question from every possible context. For 2., for testing restore code, it is well worth making a script like http://docs.moodle.org/dev/Restore_2.0_for_developers#Automatically_triggering_restore_in_code . You could make that script output number of DB queries at the end. You could also make it rollback the transaction at the end, rather than committing, so you don't end up with a DB full of junk. Note that, for small courses, your imporved code may acutally do more DB queries, but we don't care. We care about the limiting behaviour for many questions. Finally, can I just say, if you take this on, you are a braver man than me . (But don't let me put you off. I am sure this is doable given a bit of time, and it really needs to be done.) Thank you.
          Hide
          Eloy Lafuente (stronk7) added a comment -

          This issue was assigned to me automatically, however I will not be able to work on this issue in the immediate future. In order to create a truer sense of the state of this issue and to allow other developers to have chance to become involved, I am removing myself as the assignee of this issue.

          For more information, see http://docs.moodle.org/dev/Changes_to_issue_assignment

          Show
          Eloy Lafuente (stronk7) added a comment - This issue was assigned to me automatically, however I will not be able to work on this issue in the immediate future. In order to create a truer sense of the state of this issue and to allow other developers to have chance to become involved, I am removing myself as the assignee of this issue. For more information, see http://docs.moodle.org/dev/Changes_to_issue_assignment
          Hide
          Eloy Lafuente (stronk7) added a comment - - edited

          While I don't like it much, I cannot imagine how we could put that under the DDL API umbrella, mainly because of the different behavior of each RDBMS, specially with temp tables.

          Some alternatives I can imagine are, surely to be researched into different issue:

          1) Implement the "auto-analyze" after XXX insertions/modifications, I think the count can be done easily (aprox) and would benefit other places automatically.
          2) Follow the existing TODO and try to reduce the queries.

          So, unless you think it is worth to implement 1) instead of current patch... my +1 for current, because benefits seem to be important enough. And back-porting to supported branches, as far as the operation seems to be 100% safe.

          Thanks and ciao

          Show
          Eloy Lafuente (stronk7) added a comment - - edited While I don't like it much, I cannot imagine how we could put that under the DDL API umbrella, mainly because of the different behavior of each RDBMS, specially with temp tables. Some alternatives I can imagine are, surely to be researched into different issue: 1) Implement the "auto-analyze" after XXX insertions/modifications, I think the count can be done easily (aprox) and would benefit other places automatically. 2) Follow the existing TODO and try to reduce the queries. So, unless you think it is worth to implement 1) instead of current patch... my +1 for current, because benefits seem to be important enough. And back-porting to supported branches, as far as the operation seems to be 100% safe. Thanks and ciao
          Hide
          Tim Hunt added a comment -

          Acutally, Eloy, another alternative would be to add an API like $DB->notify_many_rows_changed_in_table($table). That could be a no-op for all DBs except Postgres, and for postgres do the ANALYZE command?

          Show
          Tim Hunt added a comment - Acutally, Eloy, another alternative would be to add an API like $DB->notify_many_rows_changed_in_table($table). That could be a no-op for all DBs except Postgres, and for postgres do the ANALYZE command?
          Hide
          Russell Smith added a comment -

          I've created a new issue for PostgreSQL analyze on temporary tables. I consider it a general improvement and would think it to replace any fixes we completed on this bug.

          Show
          Russell Smith added a comment - I've created a new issue for PostgreSQL analyze on temporary tables. I consider it a general improvement and would think it to replace any fixes we completed on this bug.
          Hide
          Russell Smith added a comment -

          I have also now created an improvement to specifically deal with the TODO. I considered it an improvement as it's going to be too complex a fix to easily back-port in my opinion.

          If you are interested in either MDL-39725 or MDL-39726, you'll need to watch them separately as nobody seems to have been added as automatic watchers to those issues.

          Show
          Russell Smith added a comment - I have also now created an improvement to specifically deal with the TODO. I considered it an improvement as it's going to be too complex a fix to easily back-port in my opinion. If you are interested in either MDL-39725 or MDL-39726 , you'll need to watch them separately as nobody seems to have been added as automatic watchers to those issues.
          Hide
          Tim Hunt added a comment -

          Thanks Russell.

          Show
          Tim Hunt added a comment - Thanks Russell.
          Hide
          Russell Smith added a comment -

          Eloy Lafuente (stronk7) and Tim Hunt: Are we in agreement we will back-patch using the proposed postgresql 'hack' and will fix it properly in MDL-39725?

          Tim Hunt: My research of backup/restore indicates that after_execution runs after define_execution when restore is happening. I'm not convinced your location will provide the required performance improvement. Are you happy to place the analyze in the location I did?

          If you provide feedback on those, I'll make up the patch and post it for peer review. If I don't get comments in a reasonable time, I'll assume you agree and also post the patch.

          Show
          Russell Smith added a comment - Eloy Lafuente (stronk7) and Tim Hunt : Are we in agreement we will back-patch using the proposed postgresql 'hack' and will fix it properly in MDL-39725 ? Tim Hunt : My research of backup/restore indicates that after_execution runs after define_execution when restore is happening. I'm not convinced your location will provide the required performance improvement. Are you happy to place the analyze in the location I did? If you provide feedback on those, I'll make up the patch and post it for peer review. If I don't get comments in a reasonable time, I'll assume you agree and also post the patch.
          Hide
          Russell Smith added a comment -

          Juan Leyva: When you originally reported this bug, what database were you using? The answer to that would help my understanding of the scope of this report and whether you should really be following MDL-39726 now.

          Show
          Russell Smith added a comment - Juan Leyva : When you originally reported this bug, what database were you using? The answer to that would help my understanding of the scope of this report and whether you should really be following MDL-39726 now.
          Hide
          Juan Leyva added a comment -

          Russel: We were using MySQL

          Show
          Juan Leyva added a comment - Russel: We were using MySQL
          Hide
          Tim Hunt added a comment -

          I have to say I am failing to see what is the latest patch that I am supposed to be looking at.

          As far as I can see, MDL-39725 is the $DB->notify_many_rows_changed_in_table($table) proposal. I think we should do that now, rathar than putting Postgres-specific code inside backup. However, you may disagree.

          "My research of backup/restore indicates that after_execution runs after define_execution ..." I am sure you now understand this better than me.

          Sorry I can't be more helpful, and sorry it took me so long to get back to this.

          Show
          Tim Hunt added a comment - I have to say I am failing to see what is the latest patch that I am supposed to be looking at. As far as I can see, MDL-39725 is the $DB->notify_many_rows_changed_in_table($table) proposal. I think we should do that now, rathar than putting Postgres-specific code inside backup. However, you may disagree. "My research of backup/restore indicates that after_execution runs after define_execution ..." I am sure you now understand this better than me. Sorry I can't be more helpful, and sorry it took me so long to get back to this.
          Hide
          Russell Smith added a comment -

          I'm happy to proceed with the API change. I'm still learning the back-porting rules for things like that. So I don't know if the API change would be backported. Which was why I originally separated out MDL-39726.

          That said, would could make the API change here and put the function in just this location. MDL-39725 would be more focused on putting the function where other large temporary tables are made.

          That seems reasonable to me.

          However the proposed function name seems very long and doesn't really describe all of why you might use it. $DB->analyse() or $DB->update_stats_table($table) ? As significant data change should result in the same call. eg, adding backup/restore id's to a table with 2million rows should be re-analysed to give the best results.

          Thanks for your involvement Tim, it helps keep this moving and provides direction.

          Thoughts on that?

          Show
          Russell Smith added a comment - I'm happy to proceed with the API change. I'm still learning the back-porting rules for things like that. So I don't know if the API change would be backported. Which was why I originally separated out MDL-39726 . That said, would could make the API change here and put the function in just this location. MDL-39725 would be more focused on putting the function where other large temporary tables are made. That seems reasonable to me. However the proposed function name seems very long and doesn't really describe all of why you might use it. $DB->analyse() or $DB->update_stats_table($table) ? As significant data change should result in the same call. eg, adding backup/restore id's to a table with 2million rows should be re-analysed to give the best results. Thanks for your involvement Tim, it helps keep this moving and provides direction. Thoughts on that?
          Hide
          Tim Hunt added a comment -

          Well, adding a new method to a class is completely backwards compatible, so no worries about back-porting it, if adding it is the best way to fix a bug.

          I am sure you can come up with a better name than my suggestion.

          Show
          Tim Hunt added a comment - Well, adding a new method to a class is completely backwards compatible, so no worries about back-porting it, if adding it is the best way to fix a bug. I am sure you can come up with a better name than my suggestion.
          Hide
          Russell Smith added a comment -

          After having re-read the original description and understanding what was really being said there, I've come to the conclusion/belief MDL-39726 will resolve this issue completely. I chose to reword the summary and description to make sure I was clear that it was the restore with the failure, not backup. So now this issue and MDL-39726 are the same, ideally I'd use this as the back port request and the MDL-39726 as the improvement to be put in master. I'm still unclear on how you submit and manage back port requests.

          For those who know that process a little better, is my proposal for using this as a back port issue acceptable?

          Show
          Russell Smith added a comment - After having re-read the original description and understanding what was really being said there, I've come to the conclusion/belief MDL-39726 will resolve this issue completely. I chose to reword the summary and description to make sure I was clear that it was the restore with the failure, not backup. So now this issue and MDL-39726 are the same, ideally I'd use this as the back port request and the MDL-39726 as the improvement to be put in master. I'm still unclear on how you submit and manage back port requests. For those who know that process a little better, is my proposal for using this as a back port issue acceptable?
          Hide
          Russell Smith added a comment -

          While investigating further performance issues with Quiz restore, I came across two issues I'm investigating further. One is related to MDL-27120 and the very small cache sizes used there;

          backup/util/dbops/restore_dbops.class.php
          
          private static $backupidscachesize = 2048;
          private static $backupidsexistsize = 10240;
          

          Are inadequate for large volumes of questions. I'm not yet sure why those values were chosen. I know the theory is to reduce memory usage, but 2048 keys isn't much of a cache when you are talking about 20k backupids in a question related backup.

          In basic testing, I've raised those to 20480 and 50000 respectively and restore performance has gone up substantially.

          Another item that appeared as a result of the above change is that;

          backup/util/dbops/restore_dbops.class.php
          @@ -614,6 +614,11 @@ abstract class restore_dbops {
          foreach ($questions as $question) {
              $matchq = $DB->get_record('question', array(
          

          Now that does a db select for each question. Very slow when you are talking about a couple of thousand questions. So I'm looking at ways to change that loop so only a single SQL query is done in that section.

          I'm still putting it all together. Any feedback on cache size choices would be welcome.

          Also I'm not sure what $backupidsexistsize is doing. It doesn't seem to add any value. Can anybody explain that?

          Show
          Russell Smith added a comment - While investigating further performance issues with Quiz restore, I came across two issues I'm investigating further. One is related to MDL-27120 and the very small cache sizes used there; backup/util/dbops/restore_dbops.class.php private static $backupidscachesize = 2048; private static $backupidsexistsize = 10240; Are inadequate for large volumes of questions. I'm not yet sure why those values were chosen. I know the theory is to reduce memory usage, but 2048 keys isn't much of a cache when you are talking about 20k backupids in a question related backup. In basic testing, I've raised those to 20480 and 50000 respectively and restore performance has gone up substantially. Another item that appeared as a result of the above change is that; backup/util/dbops/restore_dbops.class.php @@ -614,6 +614,11 @@ abstract class restore_dbops { foreach ($questions as $question) { $matchq = $DB->get_record('question', array( Now that does a db select for each question. Very slow when you are talking about a couple of thousand questions. So I'm looking at ways to change that loop so only a single SQL query is done in that section. I'm still putting it all together. Any feedback on cache size choices would be welcome. Also I'm not sure what $backupidsexistsize is doing. It doesn't seem to add any value. Can anybody explain that?
          Hide
          Russell Smith added a comment -

          Juan Leyva: I've committed some fixes to head and 2.4, 2.5. There are further fixes under way however some of those will only land on master. Are you able to advise if there is any performance improvement compared with when you reported the issue?

          Show
          Russell Smith added a comment - Juan Leyva: I've committed some fixes to head and 2.4, 2.5. There are further fixes under way however some of those will only land on master. Are you able to advise if there is any performance improvement compared with when you reported the issue?
          Hide
          Juan Leyva added a comment -

          Hi Russell,

          well, I have to check if I still have the backups with the performance problems because a NDA we signed

          I'll keep you posted

          Regards

          Show
          Juan Leyva added a comment - Hi Russell, well, I have to check if I still have the backups with the performance problems because a NDA we signed I'll keep you posted Regards
          Hide
          Russell Smith added a comment -

          MDL-40581 backupids improvements will provide a significant improvement to the restore process of quizzes with lots of questions.

          Show
          Russell Smith added a comment - MDL-40581 backupids improvements will provide a significant improvement to the restore process of quizzes with lots of questions.
          Hide
          Bill Burgos added a comment -

          Hi,
          I am getting this on 2.6 as well. PM me for course file.

          Show
          Bill Burgos added a comment - Hi, I am getting this on 2.6 as well. PM me for course file.

            People

            • Votes:
              23 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

              • Created:
                Updated: