Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.4.5, 2.5.1
    • Fix Version/s: 2.4.7, 2.5.3
    • Component/s: Backup
    • Labels:
    • Rank:
      52223

      Description

      When restoring the L test course, the restore times out during the precheck stage, before it gets to the point of displaying the restore progress bar.

      Fatal error: Maximum execution time of 120 seconds exceeded in /fs1/www_root/sm449/coremoodle/lib/dml/moodle_database.php on line 783
      
      Call Stack:
          0.0005     638904   1. {main}() /fs1/www_root/sm449/coremoodle/backup/restore.php:0
         36.5749  131225240   2. restore_ui->requires_substage() /fs1/www_root/sm449/coremoodle/backup/restore.php:74
         36.5749  131226680   3. restore_ui_stage_process->process() /fs1/www_root/sm449/coremoodle/backup/util/ui/restore_ui.class.php:321
         37.9044  116754160   4. restore_controller->execute_precheck() /fs1/www_root/sm449/coremoodle/backup/util/ui/restore_ui_stage.class.php:710
         37.9044  116755616   5. restore_prechecks_helper::execute_prechecks() /fs1/www_root/sm449/coremoodle/backup/controller/restore_controller.class.php:366
         82.5110  152411816   6. restore_dbops::load_users_to_tempids() /fs1/www_root/sm449/coremoodle/backup/util/helper/restore_prechecks_helper.class.php:118
         82.5113  152430544   7. progressive_parser->process() /fs1/www_root/sm449/coremoodle/backup/util/dbops/restore_dbops.class.php:414
        205.9347  152564920   8. progressive_parser->parse() /fs1/www_root/sm449/coremoodle/backup/util/xml/parser/progressive_parser.class.php:137
        205.9347  152565000   9. xml_parse() /fs1/www_root/sm449/coremoodle/backup/util/xml/parser/progressive_parser.class.php:158
        205.9400  152553224  10. progressive_parser->end_tag() /fs1/www_root/sm449/coremoodle/backup/util/xml/parser/progressive_parser.class.php:158
        205.9400  152553232  11. progressive_parser->inform_end() /fs1/www_root/sm449/coremoodle/backup/util/xml/parser/progressive_parser.class.php:263
        205.9400  152553296  12. grouped_parser_processor->after_path() /fs1/www_root/sm449/coremoodle/backup/util/xml/parser/progressive_parser.class.php:183
        205.9400  152553264  13. restore_users_parser_processor->dispatch_chunk() /fs1/www_root/sm449/coremoodle/backup/util/xml/parser/processors/grouped_parser_processor.class.php:123
        205.9463  152555184  14. restore_dbops::set_backup_ids_record() /fs1/www_root/sm449/coremoodle/backup/util/helper/restore_users_parser_processor.class.php:65
        205.9465  152556544  15. restore_dbops::set_backup_ids_cached() /fs1/www_root/sm449/coremoodle/backup/util/dbops/restore_dbops.class.php:1518
        205.9465  152557352  16. moodle_database->get_record() /fs1/www_root/sm449/coremoodle/backup/util/dbops/restore_dbops.class.php:257
        205.9471  152558136  17. moodle_database->get_record_select() /fs1/www_root/sm449/coremoodle/lib/dml/moodle_database.php:1361
        205.9471  152558464  18. moodle_database->get_record_sql() /fs1/www_root/sm449/coremoodle/lib/dml/moodle_database.php:1382
        205.9471  152558680  19. pgsql_native_moodle_database->get_records_sql() /fs1/www_root/sm449/coremoodle/lib/dml/moodle_database.php:1410
        205.9471  152558832  20. moodle_database->fix_sql_params() /fs1/www_root/sm449/coremoodle/lib/dml/pgsql_native_moodle_database.php:741
      

        Issue Links

          Activity

          Hide
          Michael Aherne added a comment -

          We're seeing a similar problem (with a different backup file) on our 2.4.5+ server. It eventually times out, and always in the restore_dbops::load_users_to_tempids() method. The interesting thing is that, with xdebug profiling switched on this is producing an abnormally huge cachegrind file (2Gb+), which is making me think there might be some kind of recursion going on. Unfortunately, the profile file is too big to load, and I can't make head nor tail of the backup / restore code in Moodle any more, so I'm not sure where to start looking!

          Where would I find the "L test course"? I'd like to see if it gives the same result.

          Show
          Michael Aherne added a comment - We're seeing a similar problem (with a different backup file) on our 2.4.5+ server. It eventually times out, and always in the restore_dbops::load_users_to_tempids() method. The interesting thing is that, with xdebug profiling switched on this is producing an abnormally huge cachegrind file (2Gb+), which is making me think there might be some kind of recursion going on. Unfortunately, the profile file is too big to load, and I can't make head nor tail of the backup / restore code in Moodle any more, so I'm not sure where to start looking! Where would I find the "L test course"? I'd like to see if it gives the same result.
          Hide
          Michael Aherne added a comment -

          Ah, OK, I've found it thanks to MDL-38197.

          Show
          Michael Aherne added a comment - Ah, OK, I've found it thanks to MDL-38197 .
          Hide
          Koen Roggemans added a comment - - edited

          Same here - I can provide an example backup file if necessary to test. It's around 20MB in size.

          Error message:
          Fatal error: Maximum execution time of 30 seconds exceeded in /var/www/elo/moodle/lib/pear/HTML/QuickForm/element.php on line 0 Call Stack: 0.0000 649984 1.

          {main}

          () /var/www/elo/moodle/backup/restore.php:0 1.4904 58064120 2. restore_ui->display() /var/www/elo/moodle/backup/restore.php:67 1.4904 58064120 3. base_ui_stage->display() /var/www/elo/moodle/backup/util/ui/restore_ui.class.php:296 1.4904 58064120 4. restore_ui_stage_schema->initialise_stage_form() /var/www/elo/moodle/backup/util/ui/base_ui_stage.class.php:123 30.0354 69042464 5. base_moodleform->add_setting() /var/www/elo/moodle/backup/util/ui/restore_ui_stage.class.php:489 30.0355 69043448 6. base_moodleform->add_settings() /var/www/elo/moodle/backup/util/ui/base_moodleform.class.php:149 30.0399 69056232 7. HTML_QuickForm->setDefaults() /var/www/elo/moodle/backup/util/ui/base_moodleform.class.php:181 30.0626 69748888 8. HTML_QuickForm_static->onQuickFormEvent() /var/www/elo/moodle/lib/pear/HTML/QuickForm.php:437

          Show
          Koen Roggemans added a comment - - edited Same here - I can provide an example backup file if necessary to test. It's around 20MB in size. Error message: Fatal error: Maximum execution time of 30 seconds exceeded in /var/www/elo/moodle/lib/pear/HTML/QuickForm/element.php on line 0 Call Stack: 0.0000 649984 1. {main} () /var/www/elo/moodle/backup/restore.php:0 1.4904 58064120 2. restore_ui->display() /var/www/elo/moodle/backup/restore.php:67 1.4904 58064120 3. base_ui_stage->display() /var/www/elo/moodle/backup/util/ui/restore_ui.class.php:296 1.4904 58064120 4. restore_ui_stage_schema->initialise_stage_form() /var/www/elo/moodle/backup/util/ui/base_ui_stage.class.php:123 30.0354 69042464 5. base_moodleform->add_setting() /var/www/elo/moodle/backup/util/ui/restore_ui_stage.class.php:489 30.0355 69043448 6. base_moodleform->add_settings() /var/www/elo/moodle/backup/util/ui/base_moodleform.class.php:149 30.0399 69056232 7. HTML_QuickForm->setDefaults() /var/www/elo/moodle/backup/util/ui/base_moodleform.class.php:181 30.0626 69748888 8. HTML_QuickForm_static->onQuickFormEvent() /var/www/elo/moodle/lib/pear/HTML/QuickForm.php:437
          Hide
          Michael Aherne added a comment - - edited

          I've attached a backup file that reproduces this behaviour. It contains 4000 users and no content.

          This is consistently timing out on both my development machine (latest master) and test server (a production-quality environment running 2.4.5+), where max_execution_time is set to 30 seconds. Both servers are running MySQL because I don't have a Postgres testing environment.

          I've managed to get this to finish the restore process by setting max_execution_time to 300, so I think my initial comment about recursion is wrong, but when I switched on xdebug profiling it created a 5Gb file (which is by far the biggest cachegrind file I've ever come across!) so there's clearly a massive amount of processing going on. grep tells me it's 44,617,698 function calls, although I'm not sure how meaningful this is.

          Show
          Michael Aherne added a comment - - edited I've attached a backup file that reproduces this behaviour. It contains 4000 users and no content. This is consistently timing out on both my development machine (latest master) and test server (a production-quality environment running 2.4.5+), where max_execution_time is set to 30 seconds. Both servers are running MySQL because I don't have a Postgres testing environment. I've managed to get this to finish the restore process by setting max_execution_time to 300, so I think my initial comment about recursion is wrong, but when I switched on xdebug profiling it created a 5Gb file (which is by far the biggest cachegrind file I've ever come across!) so there's clearly a massive amount of processing going on. grep tells me it's 44,617,698 function calls, although I'm not sure how meaningful this is.
          Hide
          Michael Aherne added a comment -

          Sorry, I've just realised I'm making the assumption that there's something inefficient in the restore code that needs fixed, but I don't have any real evidence that that's the case.

          Could it be that all that's required here is for restore_controller::execute_precheck() to have the same preventative code as restore_controller::execute_plan() does, i.e.:

          set_time_limit(1 * 60 * 60); // 1 hour for 1 course initially granted
          raise_memory_limit(MEMORY_EXTRA);
          

          Can anyone who understands the backup/restore code comment on whether this is a sensible thing to do, or if it would just be masking an underlying problem?

          Show
          Michael Aherne added a comment - Sorry, I've just realised I'm making the assumption that there's something inefficient in the restore code that needs fixed, but I don't have any real evidence that that's the case. Could it be that all that's required here is for restore_controller::execute_precheck() to have the same preventative code as restore_controller::execute_plan() does, i.e.: set_time_limit(1 * 60 * 60); // 1 hour for 1 course initially granted raise_memory_limit(MEMORY_EXTRA); Can anyone who understands the backup/restore code comment on whether this is a sensible thing to do, or if it would just be masking an underlying problem?
          Hide
          Koen Roggemans added a comment -

          Can the priority be raised of this issue? It's not a minor bug: It's really bad to not be able to restore backups.
          In our use case: We make copies of master courses for different teachers/classes. The school year starts tomorrow

          Show
          Koen Roggemans added a comment - Can the priority be raised of this issue? It's not a minor bug: It's really bad to not be able to restore backups. In our use case: We make copies of master courses for different teachers/classes. The school year starts tomorrow
          Hide
          Michael Aherne added a comment -

          +1 for raising the priority. It doesn't seem like a minor bug to me either - it has been causing us significant problems in preparing for the coming academic session which starts in mid-September.

          Show
          Michael Aherne added a comment - +1 for raising the priority. It doesn't seem like a minor bug to me either - it has been causing us significant problems in preparing for the coming academic session which starts in mid-September.
          Hide
          Michael Aherne added a comment -

          I've added a pull request to simply raise the time and memory in execute_precheck() in the same way as in execute_plan().

          Show
          Michael Aherne added a comment - I've added a pull request to simply raise the time and memory in execute_precheck() in the same way as in execute_plan().
          Hide
          Petr Škoda added a comment -

          Thanks for the patch. The comment does not have the crazy full stop at the end of line, but I think it is not a problem in this area. Submitting for integration.

          Show
          Petr Škoda added a comment - Thanks for the patch. The comment does not have the crazy full stop at the end of line, but I think it is not a problem in this area. Submitting for integration.
          Hide
          Michael Aherne added a comment -

          Thanks for reviewing Petr. I just copied the code from execute_plan(), so I didn't think of checking the style guidelines!

          Show
          Michael Aherne added a comment - Thanks for reviewing Petr. I just copied the code from execute_plan(), so I didn't think of checking the style guidelines!
          Hide
          Sam Marshall added a comment -

          This change is not a complete solution but it might be worth applying anyhow.

          Simply setting the time limit very long is not a complete solution for two reasons:
          1. Multi-server Moodle installs will typically fail if anything takes longer than a few minutes because of how front-end servers work.
          2. Even if it doesn't fail, leaving a user's browser with absolutely no feedback other than a spinning thingy for ten minutes is not a very good interface.

          The real solution is to add a progress bar or some other indication that something is happening, which will then prevent timeout. I've been working on this in various parts of restore and will be submitting a solution that achieves it for this part too, soon, I hope.

          The reason it might still be worth integrating this fix is that the real solution will probably only be for master (2.6) whereas this fixes the bug for some people on existing supported versions 2.4 and 2.5.

          Show
          Sam Marshall added a comment - This change is not a complete solution but it might be worth applying anyhow. Simply setting the time limit very long is not a complete solution for two reasons: 1. Multi-server Moodle installs will typically fail if anything takes longer than a few minutes because of how front-end servers work. 2. Even if it doesn't fail, leaving a user's browser with absolutely no feedback other than a spinning thingy for ten minutes is not a very good interface. The real solution is to add a progress bar or some other indication that something is happening, which will then prevent timeout. I've been working on this in various parts of restore and will be submitting a solution that achieves it for this part too, soon, I hope. The reason it might still be worth integrating this fix is that the real solution will probably only be for master (2.6) whereas this fixes the bug for some people on existing supported versions 2.4 and 2.5.
          Hide
          Koen Roggemans added a comment -

          You are right Sam, but it would be nice if we could start up some of the waiting courses .
          It's administrators stuff, so it's not the end of the world if it doesn't have a progress bar. After the first time, you know you have to be patient.
          I don't know about multi server setups, but it seems worth while to make it work for them too as soon as possible. Some institutions rely massively on this functionality.
          Thanks a lot all of you for the quick response to this one - looking forward to upgrading.

          Show
          Koen Roggemans added a comment - You are right Sam, but it would be nice if we could start up some of the waiting courses . It's administrators stuff, so it's not the end of the world if it doesn't have a progress bar. After the first time, you know you have to be patient. I don't know about multi server setups, but it seems worth while to make it work for them too as soon as possible. Some institutions rely massively on this functionality. Thanks a lot all of you for the quick response to this one - looking forward to upgrading.
          Hide
          Michael Aherne added a comment -

          Thanks for the comments Sam! The patch wasn't really intended to be a complete solution, more just to get what is already there working. After putting this patch on our live server we haven't seen our load balancer timing out on any restores, and it isn't set to a particularly high timeout period. I'm not sure if this means much though, because the courses where the precheck was timing out weren't really big courses as such (i.e. they had a small or moderate amount of actual content but lots of users).

          Really looking forward to seeing a full solution for this in 2.6 as backup and restore has been one of our biggest headaches with Moodle

          Show
          Michael Aherne added a comment - Thanks for the comments Sam! The patch wasn't really intended to be a complete solution, more just to get what is already there working. After putting this patch on our live server we haven't seen our load balancer timing out on any restores, and it isn't set to a particularly high timeout period. I'm not sure if this means much though, because the courses where the precheck was timing out weren't really big courses as such (i.e. they had a small or moderate amount of actual content but lots of users). Really looking forward to seeing a full solution for this in 2.6 as backup and restore has been one of our biggest headaches with Moodle
          Hide
          Dan Poltawski added a comment -

          Hi Guys,

          Can we clarify the situation with this issue:

          1. Testing instructions
          2. Where this sits with the other changes going into master (Sam? Should we still apply this to master too?) Could you expand a bit on it compared to the other tasks?
          3. Should this be assigned to Michael?

          thanks

          Show
          Dan Poltawski added a comment - Hi Guys, Can we clarify the situation with this issue: Testing instructions Where this sits with the other changes going into master (Sam? Should we still apply this to master too?) Could you expand a bit on it compared to the other tasks? Should this be assigned to Michael? thanks
          Hide
          Sam Marshall added a comment - - edited

          3) Yes it should be assigned to Michael. I'm going to create a new issue for the changes I'm making. Reassigning now. I'll add a link to the new issue once I create it.

          2) Yes it should probably still be applied to master - the time/memory limit may be necessary in situations where restore is started programmatically and therefore won't have the new progress bar (or the other place where it sets memory limit) that will be included in the standard restore UI.

          1) Leaving test script to Michael or whoever.

          Show
          Sam Marshall added a comment - - edited 3) Yes it should be assigned to Michael. I'm going to create a new issue for the changes I'm making. Reassigning now. I'll add a link to the new issue once I create it. 2) Yes it should probably still be applied to master - the time/memory limit may be necessary in situations where restore is started programmatically and therefore won't have the new progress bar (or the other place where it sets memory limit) that will be included in the standard restore UI. 1) Leaving test script to Michael or whoever.
          Hide
          Michael Aherne added a comment -

          Thanks, I've added some testing instructions.

          Show
          Michael Aherne added a comment - Thanks, I've added some testing instructions.
          Hide
          Dan Poltawski added a comment -

          Thanks Michael, integrated to master, 25 and 24

          Show
          Dan Poltawski added a comment - Thanks Michael, integrated to master, 25 and 24
          Hide
          Jérôme Mouneyrac added a comment -

          All tested on the same server.

          Before patch:
          2.4: time exexution limit is reached (30s) (Passed)
          2.5: time execution limit is reached (30s) (Passed)
          master: restored (not expected by test instructions)

          After patch:
          2.4: restored (Passed)
          2.5: restored (Passed)
          master: time execution reached (120s) (Failed - sorry it looks like a gag when it works on stable, but it really happened - I attached a screenshot)

          PS: personally I don't find it great to be stuck on a page that load forever (almost 10 minutes). I think we should on 2.4/2.5 disable the restore button so the user can not click multiple time on it + having a dialogue explaining the restore is in progress. Same on master, instead to be stuck on a page with only a header (the rest of the page is blank), we should have some information expalining the process is in progress.

          Show
          Jérôme Mouneyrac added a comment - All tested on the same server. Before patch: 2.4: time exexution limit is reached (30s) (Passed) 2.5: time execution limit is reached (30s) (Passed) master: restored (not expected by test instructions) After patch: 2.4: restored (Passed) 2.5: restored (Passed) master: time execution reached (120s) (Failed - sorry it looks like a gag when it works on stable, but it really happened - I attached a screenshot) PS: personally I don't find it great to be stuck on a page that load forever (almost 10 minutes). I think we should on 2.4/2.5 disable the restore button so the user can not click multiple time on it + having a dialogue explaining the restore is in progress. Same on master, instead to be stuck on a page with only a header (the rest of the page is blank), we should have some information expalining the process is in progress.
          Hide
          Sam Marshall added a comment -

          Jerome - about your PS, I am doing a number of changes which add progress bars to potentially long-running operations in backup/restore, as you've seen some of them are already included, but I'll submit another one soon (for next week) that adds a progress bar in this particular situation. [Master only.]

          About the issue, I think there is new code in restore ui that sets time limit to 120s but I thought that happened after this.

          Show
          Sam Marshall added a comment - Jerome - about your PS, I am doing a number of changes which add progress bars to potentially long-running operations in backup/restore, as you've seen some of them are already included, but I'll submit another one soon (for next week) that adds a progress bar in this particular situation. [Master only.] About the issue, I think there is new code in restore ui that sets time limit to 120s but I thought that happened after this.
          Hide
          Jérôme Mouneyrac added a comment -

          thanks Sam, it will look great with the new progress bar.

          Show
          Jérôme Mouneyrac added a comment - thanks Sam, it will look great with the new progress bar.
          Hide
          Michael Aherne added a comment - - edited

          Is it possible to have this patch integrated into 2.4 and 2.5 only, given that it may not be required on master anyway (particularly if it's breaking it )?

          Show
          Michael Aherne added a comment - - edited Is it possible to have this patch integrated into 2.4 and 2.5 only, given that it may not be required on master anyway (particularly if it's breaking it )?
          Hide
          Sam Marshall added a comment -

          If my restore UI changes have have messed up the time limit for this (after they're all done, you won't be able to see this issue anyhow because it'll have progress bar and frequent updates), I suggest this code should still go into master because it will be useful when doing programmatic restore (not using the normal UI).

          Not sure how you would test it there though - maybe just a 'restore isn't broken' type test.

          Show
          Sam Marshall added a comment - If my restore UI changes have have messed up the time limit for this (after they're all done, you won't be able to see this issue anyhow because it'll have progress bar and frequent updates), I suggest this code should still go into master because it will be useful when doing programmatic restore (not using the normal UI). Not sure how you would test it there though - maybe just a 'restore isn't broken' type test.
          Hide
          Dan Poltawski added a comment -

          I'm trying this out now

          Show
          Dan Poltawski added a comment - I'm trying this out now
          Hide
          Dan Poltawski added a comment -

          Yep, I get the same as Jerome!

          Show
          Dan Poltawski added a comment - Yep, I get the same as Jerome!
          Hide
          Dan Poltawski added a comment -

          I get the failure in my production master too.

          Since this works on the stable branches i'm going to pass it.

          I don't think its worth reverting this change on master as, as Sam says it helps anyway, but we've moved the problem again..

          So, passing this.

          Show
          Dan Poltawski added a comment - I get the failure in my production master too. Since this works on the stable branches i'm going to pass it. I don't think its worth reverting this change on master as, as Sam says it helps anyway, but we've moved the problem again.. So, passing this.
          Hide
          Damyon Wiese added a comment -

          This issue along with 77 of it's friends has been sent upstream and released to the world.

          Thankyou for your contribution.

          Show
          Damyon Wiese added a comment - This issue along with 77 of it's friends has been sent upstream and released to the world. Thankyou for your contribution.

            People

            • Votes:
              5 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: