Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.4.5, 2.5.1
    • Fix Version/s: 2.4.7, 2.5.3
    • Component/s: Backup
    • Labels:
    • Testing Instructions:
      Hide
      • On a test server without the patch:
        • Attempt to restore attached backup MDL-41254-1.mbz on a server with PHP max_execution_time set to 30 seconds
        • Ensure that you see the maximum script execution time exceeded error. If backup restores successfully, lower max_execution_time and repeat until it fails
      • Apply the patch
        • Restore the backup file and confirm that it restores successfully without timing out
      Show
      On a test server without the patch: Attempt to restore attached backup MDL-41254 -1.mbz on a server with PHP max_execution_time set to 30 seconds Ensure that you see the maximum script execution time exceeded error. If backup restores successfully, lower max_execution_time and repeat until it fails Apply the patch Restore the backup file and confirm that it restores successfully without timing out
    • Affected Branches:
      MOODLE_24_STABLE, MOODLE_25_STABLE
    • Fixed Branches:
      MOODLE_24_STABLE, MOODLE_25_STABLE
    • Pull Master Branch:

      Description

      When restoring the L test course, the restore times out during the precheck stage, before it gets to the point of displaying the restore progress bar.

      Fatal error: Maximum execution time of 120 seconds exceeded in /fs1/www_root/sm449/coremoodle/lib/dml/moodle_database.php on line 783
       
      Call Stack:
          0.0005     638904   1. {main}() /fs1/www_root/sm449/coremoodle/backup/restore.php:0
         36.5749  131225240   2. restore_ui->requires_substage() /fs1/www_root/sm449/coremoodle/backup/restore.php:74
         36.5749  131226680   3. restore_ui_stage_process->process() /fs1/www_root/sm449/coremoodle/backup/util/ui/restore_ui.class.php:321
         37.9044  116754160   4. restore_controller->execute_precheck() /fs1/www_root/sm449/coremoodle/backup/util/ui/restore_ui_stage.class.php:710
         37.9044  116755616   5. restore_prechecks_helper::execute_prechecks() /fs1/www_root/sm449/coremoodle/backup/controller/restore_controller.class.php:366
         82.5110  152411816   6. restore_dbops::load_users_to_tempids() /fs1/www_root/sm449/coremoodle/backup/util/helper/restore_prechecks_helper.class.php:118
         82.5113  152430544   7. progressive_parser->process() /fs1/www_root/sm449/coremoodle/backup/util/dbops/restore_dbops.class.php:414
        205.9347  152564920   8. progressive_parser->parse() /fs1/www_root/sm449/coremoodle/backup/util/xml/parser/progressive_parser.class.php:137
        205.9347  152565000   9. xml_parse() /fs1/www_root/sm449/coremoodle/backup/util/xml/parser/progressive_parser.class.php:158
        205.9400  152553224  10. progressive_parser->end_tag() /fs1/www_root/sm449/coremoodle/backup/util/xml/parser/progressive_parser.class.php:158
        205.9400  152553232  11. progressive_parser->inform_end() /fs1/www_root/sm449/coremoodle/backup/util/xml/parser/progressive_parser.class.php:263
        205.9400  152553296  12. grouped_parser_processor->after_path() /fs1/www_root/sm449/coremoodle/backup/util/xml/parser/progressive_parser.class.php:183
        205.9400  152553264  13. restore_users_parser_processor->dispatch_chunk() /fs1/www_root/sm449/coremoodle/backup/util/xml/parser/processors/grouped_parser_processor.class.php:123
        205.9463  152555184  14. restore_dbops::set_backup_ids_record() /fs1/www_root/sm449/coremoodle/backup/util/helper/restore_users_parser_processor.class.php:65
        205.9465  152556544  15. restore_dbops::set_backup_ids_cached() /fs1/www_root/sm449/coremoodle/backup/util/dbops/restore_dbops.class.php:1518
        205.9465  152557352  16. moodle_database->get_record() /fs1/www_root/sm449/coremoodle/backup/util/dbops/restore_dbops.class.php:257
        205.9471  152558136  17. moodle_database->get_record_select() /fs1/www_root/sm449/coremoodle/lib/dml/moodle_database.php:1361
        205.9471  152558464  18. moodle_database->get_record_sql() /fs1/www_root/sm449/coremoodle/lib/dml/moodle_database.php:1382
        205.9471  152558680  19. pgsql_native_moodle_database->get_records_sql() /fs1/www_root/sm449/coremoodle/lib/dml/moodle_database.php:1410
        205.9471  152558832  20. moodle_database->fix_sql_params() /fs1/www_root/sm449/coremoodle/lib/dml/pgsql_native_moodle_database.php:741
      

        Gliffy Diagrams

          Issue Links

            Activity

            Hide
            maherne Michael Aherne added a comment -

            We're seeing a similar problem (with a different backup file) on our 2.4.5+ server. It eventually times out, and always in the restore_dbops::load_users_to_tempids() method. The interesting thing is that, with xdebug profiling switched on this is producing an abnormally huge cachegrind file (2Gb+), which is making me think there might be some kind of recursion going on. Unfortunately, the profile file is too big to load, and I can't make head nor tail of the backup / restore code in Moodle any more, so I'm not sure where to start looking!

            Where would I find the "L test course"? I'd like to see if it gives the same result.

            Show
            maherne Michael Aherne added a comment - We're seeing a similar problem (with a different backup file) on our 2.4.5+ server. It eventually times out, and always in the restore_dbops::load_users_to_tempids() method. The interesting thing is that, with xdebug profiling switched on this is producing an abnormally huge cachegrind file (2Gb+), which is making me think there might be some kind of recursion going on. Unfortunately, the profile file is too big to load, and I can't make head nor tail of the backup / restore code in Moodle any more, so I'm not sure where to start looking! Where would I find the "L test course"? I'd like to see if it gives the same result.
            Hide
            maherne Michael Aherne added a comment -

            Ah, OK, I've found it thanks to MDL-38197.

            Show
            maherne Michael Aherne added a comment - Ah, OK, I've found it thanks to MDL-38197 .
            Hide
            koen Koen Roggemans added a comment - - edited

            Same here - I can provide an example backup file if necessary to test. It's around 20MB in size.

            Error message:
            Fatal error: Maximum execution time of 30 seconds exceeded in /var/www/elo/moodle/lib/pear/HTML/QuickForm/element.php on line 0 Call Stack: 0.0000 649984 1.

            {main}

            () /var/www/elo/moodle/backup/restore.php:0 1.4904 58064120 2. restore_ui->display() /var/www/elo/moodle/backup/restore.php:67 1.4904 58064120 3. base_ui_stage->display() /var/www/elo/moodle/backup/util/ui/restore_ui.class.php:296 1.4904 58064120 4. restore_ui_stage_schema->initialise_stage_form() /var/www/elo/moodle/backup/util/ui/base_ui_stage.class.php:123 30.0354 69042464 5. base_moodleform->add_setting() /var/www/elo/moodle/backup/util/ui/restore_ui_stage.class.php:489 30.0355 69043448 6. base_moodleform->add_settings() /var/www/elo/moodle/backup/util/ui/base_moodleform.class.php:149 30.0399 69056232 7. HTML_QuickForm->setDefaults() /var/www/elo/moodle/backup/util/ui/base_moodleform.class.php:181 30.0626 69748888 8. HTML_QuickForm_static->onQuickFormEvent() /var/www/elo/moodle/lib/pear/HTML/QuickForm.php:437

            Show
            koen Koen Roggemans added a comment - - edited Same here - I can provide an example backup file if necessary to test. It's around 20MB in size. Error message: Fatal error: Maximum execution time of 30 seconds exceeded in /var/www/elo/moodle/lib/pear/HTML/QuickForm/element.php on line 0 Call Stack: 0.0000 649984 1. {main} () /var/www/elo/moodle/backup/restore.php:0 1.4904 58064120 2. restore_ui->display() /var/www/elo/moodle/backup/restore.php:67 1.4904 58064120 3. base_ui_stage->display() /var/www/elo/moodle/backup/util/ui/restore_ui.class.php:296 1.4904 58064120 4. restore_ui_stage_schema->initialise_stage_form() /var/www/elo/moodle/backup/util/ui/base_ui_stage.class.php:123 30.0354 69042464 5. base_moodleform->add_setting() /var/www/elo/moodle/backup/util/ui/restore_ui_stage.class.php:489 30.0355 69043448 6. base_moodleform->add_settings() /var/www/elo/moodle/backup/util/ui/base_moodleform.class.php:149 30.0399 69056232 7. HTML_QuickForm->setDefaults() /var/www/elo/moodle/backup/util/ui/base_moodleform.class.php:181 30.0626 69748888 8. HTML_QuickForm_static->onQuickFormEvent() /var/www/elo/moodle/lib/pear/HTML/QuickForm.php:437
            Hide
            maherne Michael Aherne added a comment - - edited

            I've attached a backup file that reproduces this behaviour. It contains 4000 users and no content.

            This is consistently timing out on both my development machine (latest master) and test server (a production-quality environment running 2.4.5+), where max_execution_time is set to 30 seconds. Both servers are running MySQL because I don't have a Postgres testing environment.

            I've managed to get this to finish the restore process by setting max_execution_time to 300, so I think my initial comment about recursion is wrong, but when I switched on xdebug profiling it created a 5Gb file (which is by far the biggest cachegrind file I've ever come across!) so there's clearly a massive amount of processing going on. grep tells me it's 44,617,698 function calls, although I'm not sure how meaningful this is.

            Show
            maherne Michael Aherne added a comment - - edited I've attached a backup file that reproduces this behaviour. It contains 4000 users and no content. This is consistently timing out on both my development machine (latest master) and test server (a production-quality environment running 2.4.5+), where max_execution_time is set to 30 seconds. Both servers are running MySQL because I don't have a Postgres testing environment. I've managed to get this to finish the restore process by setting max_execution_time to 300, so I think my initial comment about recursion is wrong, but when I switched on xdebug profiling it created a 5Gb file (which is by far the biggest cachegrind file I've ever come across!) so there's clearly a massive amount of processing going on. grep tells me it's 44,617,698 function calls, although I'm not sure how meaningful this is.
            Hide
            maherne Michael Aherne added a comment -

            Sorry, I've just realised I'm making the assumption that there's something inefficient in the restore code that needs fixed, but I don't have any real evidence that that's the case.

            Could it be that all that's required here is for restore_controller::execute_precheck() to have the same preventative code as restore_controller::execute_plan() does, i.e.:

            set_time_limit(1 * 60 * 60); // 1 hour for 1 course initially granted
            raise_memory_limit(MEMORY_EXTRA);

            Can anyone who understands the backup/restore code comment on whether this is a sensible thing to do, or if it would just be masking an underlying problem?

            Show
            maherne Michael Aherne added a comment - Sorry, I've just realised I'm making the assumption that there's something inefficient in the restore code that needs fixed, but I don't have any real evidence that that's the case. Could it be that all that's required here is for restore_controller::execute_precheck() to have the same preventative code as restore_controller::execute_plan() does, i.e.: set_time_limit(1 * 60 * 60); // 1 hour for 1 course initially granted raise_memory_limit(MEMORY_EXTRA); Can anyone who understands the backup/restore code comment on whether this is a sensible thing to do, or if it would just be masking an underlying problem?
            Hide
            koen Koen Roggemans added a comment -

            Can the priority be raised of this issue? It's not a minor bug: It's really bad to not be able to restore backups.
            In our use case: We make copies of master courses for different teachers/classes. The school year starts tomorrow

            Show
            koen Koen Roggemans added a comment - Can the priority be raised of this issue? It's not a minor bug: It's really bad to not be able to restore backups. In our use case: We make copies of master courses for different teachers/classes. The school year starts tomorrow
            Hide
            maherne Michael Aherne added a comment -

            +1 for raising the priority. It doesn't seem like a minor bug to me either - it has been causing us significant problems in preparing for the coming academic session which starts in mid-September.

            Show
            maherne Michael Aherne added a comment - +1 for raising the priority. It doesn't seem like a minor bug to me either - it has been causing us significant problems in preparing for the coming academic session which starts in mid-September.
            Hide
            maherne Michael Aherne added a comment -

            I've added a pull request to simply raise the time and memory in execute_precheck() in the same way as in execute_plan().

            Show
            maherne Michael Aherne added a comment - I've added a pull request to simply raise the time and memory in execute_precheck() in the same way as in execute_plan().
            Hide
            skodak Petr Skoda added a comment -

            Thanks for the patch. The comment does not have the crazy full stop at the end of line, but I think it is not a problem in this area. Submitting for integration.

            Show
            skodak Petr Skoda added a comment - Thanks for the patch. The comment does not have the crazy full stop at the end of line, but I think it is not a problem in this area. Submitting for integration.
            Hide
            maherne Michael Aherne added a comment -

            Thanks for reviewing Petr. I just copied the code from execute_plan(), so I didn't think of checking the style guidelines!

            Show
            maherne Michael Aherne added a comment - Thanks for reviewing Petr. I just copied the code from execute_plan(), so I didn't think of checking the style guidelines!
            Hide
            quen Sam Marshall added a comment -

            This change is not a complete solution but it might be worth applying anyhow.

            Simply setting the time limit very long is not a complete solution for two reasons:
            1. Multi-server Moodle installs will typically fail if anything takes longer than a few minutes because of how front-end servers work.
            2. Even if it doesn't fail, leaving a user's browser with absolutely no feedback other than a spinning thingy for ten minutes is not a very good interface.

            The real solution is to add a progress bar or some other indication that something is happening, which will then prevent timeout. I've been working on this in various parts of restore and will be submitting a solution that achieves it for this part too, soon, I hope.

            The reason it might still be worth integrating this fix is that the real solution will probably only be for master (2.6) whereas this fixes the bug for some people on existing supported versions 2.4 and 2.5.

            Show
            quen Sam Marshall added a comment - This change is not a complete solution but it might be worth applying anyhow. Simply setting the time limit very long is not a complete solution for two reasons: 1. Multi-server Moodle installs will typically fail if anything takes longer than a few minutes because of how front-end servers work. 2. Even if it doesn't fail, leaving a user's browser with absolutely no feedback other than a spinning thingy for ten minutes is not a very good interface. The real solution is to add a progress bar or some other indication that something is happening, which will then prevent timeout. I've been working on this in various parts of restore and will be submitting a solution that achieves it for this part too, soon, I hope. The reason it might still be worth integrating this fix is that the real solution will probably only be for master (2.6) whereas this fixes the bug for some people on existing supported versions 2.4 and 2.5.
            Hide
            koen Koen Roggemans added a comment -

            You are right Sam, but it would be nice if we could start up some of the waiting courses .
            It's administrators stuff, so it's not the end of the world if it doesn't have a progress bar. After the first time, you know you have to be patient.
            I don't know about multi server setups, but it seems worth while to make it work for them too as soon as possible. Some institutions rely massively on this functionality.
            Thanks a lot all of you for the quick response to this one - looking forward to upgrading.

            Show
            koen Koen Roggemans added a comment - You are right Sam, but it would be nice if we could start up some of the waiting courses . It's administrators stuff, so it's not the end of the world if it doesn't have a progress bar. After the first time, you know you have to be patient. I don't know about multi server setups, but it seems worth while to make it work for them too as soon as possible. Some institutions rely massively on this functionality. Thanks a lot all of you for the quick response to this one - looking forward to upgrading.
            Hide
            maherne Michael Aherne added a comment -

            Thanks for the comments Sam! The patch wasn't really intended to be a complete solution, more just to get what is already there working. After putting this patch on our live server we haven't seen our load balancer timing out on any restores, and it isn't set to a particularly high timeout period. I'm not sure if this means much though, because the courses where the precheck was timing out weren't really big courses as such (i.e. they had a small or moderate amount of actual content but lots of users).

            Really looking forward to seeing a full solution for this in 2.6 as backup and restore has been one of our biggest headaches with Moodle

            Show
            maherne Michael Aherne added a comment - Thanks for the comments Sam! The patch wasn't really intended to be a complete solution, more just to get what is already there working. After putting this patch on our live server we haven't seen our load balancer timing out on any restores, and it isn't set to a particularly high timeout period. I'm not sure if this means much though, because the courses where the precheck was timing out weren't really big courses as such (i.e. they had a small or moderate amount of actual content but lots of users). Really looking forward to seeing a full solution for this in 2.6 as backup and restore has been one of our biggest headaches with Moodle
            Hide
            poltawski Dan Poltawski added a comment -

            Hi Guys,

            Can we clarify the situation with this issue:

            1. Testing instructions
            2. Where this sits with the other changes going into master (Sam? Should we still apply this to master too?) Could you expand a bit on it compared to the other tasks?
            3. Should this be assigned to Michael?

            thanks

            Show
            poltawski Dan Poltawski added a comment - Hi Guys, Can we clarify the situation with this issue: Testing instructions Where this sits with the other changes going into master (Sam? Should we still apply this to master too?) Could you expand a bit on it compared to the other tasks? Should this be assigned to Michael? thanks
            Hide
            quen Sam Marshall added a comment - - edited

            3) Yes it should be assigned to Michael. I'm going to create a new issue for the changes I'm making. Reassigning now. I'll add a link to the new issue once I create it.

            2) Yes it should probably still be applied to master - the time/memory limit may be necessary in situations where restore is started programmatically and therefore won't have the new progress bar (or the other place where it sets memory limit) that will be included in the standard restore UI.

            1) Leaving test script to Michael or whoever.

            Show
            quen Sam Marshall added a comment - - edited 3) Yes it should be assigned to Michael. I'm going to create a new issue for the changes I'm making. Reassigning now. I'll add a link to the new issue once I create it. 2) Yes it should probably still be applied to master - the time/memory limit may be necessary in situations where restore is started programmatically and therefore won't have the new progress bar (or the other place where it sets memory limit) that will be included in the standard restore UI. 1) Leaving test script to Michael or whoever.
            Hide
            maherne Michael Aherne added a comment -

            Thanks, I've added some testing instructions.

            Show
            maherne Michael Aherne added a comment - Thanks, I've added some testing instructions.
            Hide
            poltawski Dan Poltawski added a comment -

            Thanks Michael, integrated to master, 25 and 24

            Show
            poltawski Dan Poltawski added a comment - Thanks Michael, integrated to master, 25 and 24
            Hide
            jerome Jérôme Mouneyrac added a comment -

            All tested on the same server.

            Before patch:
            2.4: time exexution limit is reached (30s) (Passed)
            2.5: time execution limit is reached (30s) (Passed)
            master: restored (not expected by test instructions)

            After patch:
            2.4: restored (Passed)
            2.5: restored (Passed)
            master: time execution reached (120s) (Failed - sorry it looks like a gag when it works on stable, but it really happened - I attached a screenshot)

            PS: personally I don't find it great to be stuck on a page that load forever (almost 10 minutes). I think we should on 2.4/2.5 disable the restore button so the user can not click multiple time on it + having a dialogue explaining the restore is in progress. Same on master, instead to be stuck on a page with only a header (the rest of the page is blank), we should have some information expalining the process is in progress.

            Show
            jerome Jérôme Mouneyrac added a comment - All tested on the same server. Before patch: 2.4: time exexution limit is reached (30s) (Passed) 2.5: time execution limit is reached (30s) (Passed) master: restored (not expected by test instructions) After patch: 2.4: restored (Passed) 2.5: restored (Passed) master: time execution reached (120s) (Failed - sorry it looks like a gag when it works on stable, but it really happened - I attached a screenshot) PS: personally I don't find it great to be stuck on a page that load forever (almost 10 minutes). I think we should on 2.4/2.5 disable the restore button so the user can not click multiple time on it + having a dialogue explaining the restore is in progress. Same on master, instead to be stuck on a page with only a header (the rest of the page is blank), we should have some information expalining the process is in progress.
            Hide
            quen Sam Marshall added a comment -

            Jerome - about your PS, I am doing a number of changes which add progress bars to potentially long-running operations in backup/restore, as you've seen some of them are already included, but I'll submit another one soon (for next week) that adds a progress bar in this particular situation. [Master only.]

            About the issue, I think there is new code in restore ui that sets time limit to 120s but I thought that happened after this.

            Show
            quen Sam Marshall added a comment - Jerome - about your PS, I am doing a number of changes which add progress bars to potentially long-running operations in backup/restore, as you've seen some of them are already included, but I'll submit another one soon (for next week) that adds a progress bar in this particular situation. [Master only.] About the issue, I think there is new code in restore ui that sets time limit to 120s but I thought that happened after this.
            Hide
            jerome Jérôme Mouneyrac added a comment -

            thanks Sam, it will look great with the new progress bar.

            Show
            jerome Jérôme Mouneyrac added a comment - thanks Sam, it will look great with the new progress bar.
            Hide
            maherne Michael Aherne added a comment - - edited

            Is it possible to have this patch integrated into 2.4 and 2.5 only, given that it may not be required on master anyway (particularly if it's breaking it )?

            Show
            maherne Michael Aherne added a comment - - edited Is it possible to have this patch integrated into 2.4 and 2.5 only, given that it may not be required on master anyway (particularly if it's breaking it )?
            Hide
            quen Sam Marshall added a comment -

            If my restore UI changes have have messed up the time limit for this (after they're all done, you won't be able to see this issue anyhow because it'll have progress bar and frequent updates), I suggest this code should still go into master because it will be useful when doing programmatic restore (not using the normal UI).

            Not sure how you would test it there though - maybe just a 'restore isn't broken' type test.

            Show
            quen Sam Marshall added a comment - If my restore UI changes have have messed up the time limit for this (after they're all done, you won't be able to see this issue anyhow because it'll have progress bar and frequent updates), I suggest this code should still go into master because it will be useful when doing programmatic restore (not using the normal UI). Not sure how you would test it there though - maybe just a 'restore isn't broken' type test.
            Hide
            poltawski Dan Poltawski added a comment -

            I'm trying this out now

            Show
            poltawski Dan Poltawski added a comment - I'm trying this out now
            Hide
            poltawski Dan Poltawski added a comment -

            Yep, I get the same as Jerome!

            Show
            poltawski Dan Poltawski added a comment - Yep, I get the same as Jerome!
            Hide
            poltawski Dan Poltawski added a comment -

            I get the failure in my production master too.

            Since this works on the stable branches i'm going to pass it.

            I don't think its worth reverting this change on master as, as Sam says it helps anyway, but we've moved the problem again..

            So, passing this.

            Show
            poltawski Dan Poltawski added a comment - I get the failure in my production master too. Since this works on the stable branches i'm going to pass it. I don't think its worth reverting this change on master as, as Sam says it helps anyway, but we've moved the problem again.. So, passing this.
            Hide
            damyon Damyon Wiese added a comment -

            This issue along with 77 of it's friends has been sent upstream and released to the world.

            Thankyou for your contribution.

            Show
            damyon Damyon Wiese added a comment - This issue along with 77 of it's friends has been sent upstream and released to the world. Thankyou for your contribution.

              People

              • Votes:
                5 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Fix Release Date:
                  11/Nov/13