Moodle
  1. Moodle
  2. MDL-37877

Over-large backups fail transparently, then do not get deleted next time, so space fills up

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 2.4, 2.6
    • Fix Version/s: 2.4.7, 2.5.3
    • Component/s: Backup
    • Labels:
    • Testing Instructions:
      Hide
      1. Turn on full debug.
      2. Set time limit in your php ini something very large like a day.
      3. set time limit for a backup step by changing core_backup_progress::TIME_LIMIT_WITHOUT_PROGRESS to a few hours. (located in backup>util>progress>core_backup_progress.class.php)
      4. Generate a course with lots of activities/sections etc.
      5. Add atleast 8-10 GB of media content to the course spread over multiple activities.
      6. Set the following configs:-
        1. backup_auto_active - manual
        2. backup_auto_storage - specified directory
        3. backup_auto_destination - path to some writeable dir
      7. Run the script admin/cli/automated_backup.php
      8. Make sure backup was created for all courses except the large one. The large one would fail.
      9. Goto My home ► Site administration ► Reports ► Backups
      10. Make sure the status of all courses except the large course is "OK"
      11. Re run the backup script.
      12. Again, make sure backup was created for all courses except the large one.
      13. Goto My home ► Site administration ► Reports ► Backups
      14. Make sure the status of all courses except the large course is "OK"
      15. Status for the large course is "ERROR"
      16. Try backing up the courses using web interface. The large course backup should fail, generating an error. Any other course should go as expected.
      Show
      Turn on full debug. Set time limit in your php ini something very large like a day. set time limit for a backup step by changing core_backup_progress::TIME_LIMIT_WITHOUT_PROGRESS to a few hours. (located in backup>util>progress>core_backup_progress.class.php) Generate a course with lots of activities/sections etc. Add atleast 8-10 GB of media content to the course spread over multiple activities. Set the following configs:- backup_auto_active - manual backup_auto_storage - specified directory backup_auto_destination - path to some writeable dir Run the script admin/cli/automated_backup.php Make sure backup was created for all courses except the large one. The large one would fail. Goto My home ► Site administration ► Reports ► Backups Make sure the status of all courses except the large course is "OK" Re run the backup script. Again, make sure backup was created for all courses except the large one. Goto My home ► Site administration ► Reports ► Backups Make sure the status of all courses except the large course is "OK" Status for the large course is "ERROR" Try backing up the courses using web interface. The large course backup should fail, generating an error. Any other course should go as expected.
    • Affected Branches:
      MOODLE_24_STABLE, MOODLE_26_STABLE
    • Fixed Branches:
      MOODLE_24_STABLE, MOODLE_25_STABLE
    • Pull Master Branch:
      MDL-37877-master
    • Story Points (Obsolete):
      40
    • Sprint:
      BACKEND Sprint 6

      Description

      This refers to scheduled backups, where the number of versions kept is set to one. This customer site has about half a dozen very large courses, where the backup system generates vary large files (two examples from the test I ran this morning are 7.8GB and 9.2GB). These are obviously too large to be successfully zipped, and when the backup processes them, it eventually fails with the error like : "Error - backup-moodle2-course-1681-20130205-1034.mbz does not appear to be a valid backup (missing_moodle_backup_xml_file)"

      Then (in the log) it says "complete - next execution (the next scheduled backup time)"

      So it fails, presumably because it is too big - but then carries on. This in itself is good, as previously, failures like this were stopping the backup schedule completely.

      However, the next time that the scheduled course backup is run, all of the old versions of the backups are replaced (remember, we are only keeping one of each) EXCEPT the failed backups. Moodle does not seem to recognise them as valid backups, despite their filenames, so they are NOT deleted. Over the course of several days, the multiple copies of large failed backups that result from this eventually fill up the server disk, and cause teh site to fail.

      I know that this relates to the larger subject of handling very large course backups, but can anything be done with the checking process, so that these large failed backup files do not get left to fill up the server?

        Gliffy Diagrams

          Issue Links

            Activity

            Hide
            Sean Keogh added a comment - - edited

            Could someone take a look at this please - it is now affecting more than one of our customers - and if we don't remember to go into their server every few days to delete the failed backups, their disk fills up and the site fails, and several thousand students and teaching staff want to jump up and down on my head!.

            Show
            Sean Keogh added a comment - - edited Could someone take a look at this please - it is now affecting more than one of our customers - and if we don't remember to go into their server every few days to delete the failed backups, their disk fills up and the site fails, and several thousand students and teaching staff want to jump up and down on my head!.
            Hide
            Michael de Raadt added a comment -

            Thanks for reporting that.

            Feel free to contribute to fixing this issue.

            Show
            Michael de Raadt added a comment - Thanks for reporting that. Feel free to contribute to fixing this issue.
            Hide
            Ankit Agarwal added a comment -

            Hi Sean,
            Can you please provide further information on this please.

            1. What do you get the status of backups in the emails sent?
            2. What are the status of the course in context when you go to My home ► Site administration ► Reports ► Backups? Is it "OK" or "unfinished" or "Error"?

            Thanks

            Show
            Ankit Agarwal added a comment - Hi Sean, Can you please provide further information on this please. What do you get the status of backups in the emails sent? What are the status of the course in context when you go to My home ► Site administration ► Reports ► Backups? Is it "OK" or "unfinished" or "Error"? Thanks
            Hide
            Sean Keogh added a comment -

            I don't get the emails, but I'll have a look at the reports and get back to you.

            Show
            Sean Keogh added a comment - I don't get the emails, but I'll have a look at the reports and get back to you.
            Hide
            Sean Keogh added a comment -

            Hiya,

            It reports in the backup log as completing OK. The .mbz file is almost 17GB in size, which I suspect is not a valid backup...

            Show
            Sean Keogh added a comment - Hiya, It reports in the backup log as completing OK. The .mbz file is almost 17GB in size, which I suspect is not a valid backup...
            Hide
            Ankit Agarwal added a comment - - edited

            I was finally able to replicate this at will. The problem was zip archiver didn't know an error occurred and it generated a corrupt file with a status of OK.
            So the first part of this patch deals to identify such cases and throw an exception. We monitor the generated backup file and see if it is valid zip or not. If found to be corrupt we throw an exception.
            Second part of the solution makes sure we delete any generated backup in case of any error. So we don't store half baked files.

            The issue in context should be resolved by this patch. However Zip_archive::open() throws a debug (developer level) notice when an corrupt backup is identified. I am not sure what is the best way to suppress this notice or even if we should suppress this notice. I will appreciate feedback on that from the reviewer.

            Show
            Ankit Agarwal added a comment - - edited I was finally able to replicate this at will. The problem was zip archiver didn't know an error occurred and it generated a corrupt file with a status of OK. So the first part of this patch deals to identify such cases and throw an exception. We monitor the generated backup file and see if it is valid zip or not. If found to be corrupt we throw an exception. Second part of the solution makes sure we delete any generated backup in case of any error. So we don't store half baked files. The issue in context should be resolved by this patch. However Zip_archive::open() throws a debug (developer level) notice when an corrupt backup is identified. I am not sure what is the best way to suppress this notice or even if we should suppress this notice. I will appreciate feedback on that from the reviewer.
            Hide
            Ankit Agarwal added a comment -

            Also Sean,
            If you are able to test the patch, that would be great.

            Thanks

            Show
            Ankit Agarwal added a comment - Also Sean, If you are able to test the patch, that would be great. Thanks
            Hide
            Sean Keogh added a comment -

            I can certainly give it a try. I will report back.

            Show
            Sean Keogh added a comment - I can certainly give it a try. I will report back.
            Hide
            Petr Skoda added a comment -

            looks ok, +1, submitting for integration

            Show
            Petr Skoda added a comment - looks ok, +1, submitting for integration
            Hide
            Ankit Agarwal added a comment -

            Thanks for the review Petr.

            I have backported this and added docs label.

            Also this needs to be mentioned on release notes once integrated.

            Show
            Ankit Agarwal added a comment - Thanks for the review Petr. I have backported this and added docs label. Also this needs to be mentioned on release notes once integrated.
            Hide
            Eloy Lafuente (stronk7) added a comment -

            The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week.

            TIA and ciao

            Show
            Eloy Lafuente (stronk7) added a comment - The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week. TIA and ciao
            Hide
            Damyon Wiese added a comment -

            Thanks Ankit - the patch looks OK - but I would like a small change.

            Assuming that the reason the backup failed is because of the size of the course is IMO an unhelpful guess - on master they could be using tgz backups which allow much larger course backups.

            I think you are catching a useful exception here and throwing a less useful one. Can you just rethrow the useful one?

            Show
            Damyon Wiese added a comment - Thanks Ankit - the patch looks OK - but I would like a small change. Assuming that the reason the backup failed is because of the size of the course is IMO an unhelpful guess - on master they could be using tgz backups which allow much larger course backups. I think you are catching a useful exception here and throwing a less useful one. Can you just rethrow the useful one?
            Hide
            Ankit Agarwal added a comment -

            Shouldn't the backup step throw only backup_step_exception? or is it aright to throw other exceptions here?

            Show
            Ankit Agarwal added a comment - Shouldn't the backup step throw only backup_step_exception? or is it aright to throw other exceptions here?
            Hide
            Damyon Wiese added a comment -

            That's for you to check and decide on (please). I think the catch is just based on "Exception" but I haven't looked at it closely. If you have to wrap the original exception that's fine.

            Show
            Damyon Wiese added a comment - That's for you to check and decide on (please). I think the catch is just based on "Exception" but I haven't looked at it closely. If you have to wrap the original exception that's fine.
            Hide
            Ankit Agarwal added a comment -

            code wise throwing any exception should be fine, however looking at rest of the backup steps we are throwing "backup_step"exception" in all those.

            Adding Eloy and Mark, to see if they have any strong opinion on which exception to throw here.

            Show
            Ankit Agarwal added a comment - code wise throwing any exception should be fine, however looking at rest of the backup steps we are throwing "backup_step"exception" in all those. Adding Eloy and Mark, to see if they have any strong opinion on which exception to throw here.
            Hide
            Damyon Wiese added a comment -

            I think it would make most sense from the calling code to be able to catch backup_step_exception only - so please wrap it.

            Show
            Damyon Wiese added a comment - I think it would make most sense from the calling code to be able to catch backup_step_exception only - so please wrap it.
            Hide
            Ankit Agarwal added a comment -

            updated, thanks

            Show
            Ankit Agarwal added a comment - updated, thanks
            Hide
            Damyon Wiese added a comment -

            Thanks Ankit,

            Looks good now. Integrated to 24, 25 and master.

            Show
            Damyon Wiese added a comment - Thanks Ankit, Looks good now. Integrated to 24, 25 and master.
            Hide
            Eloy Lafuente (stronk7) added a comment -
            Show
            Eloy Lafuente (stronk7) added a comment - Boom, this is leading to phpunit failures in stable branches (master passes apparently): 25_STABLE: http://integration.moodle.org/view/M25/job/20.%20Run%20phpunit%20UnitTests%20(25_STABLE)/470/ 24_STABLE: http://integration.moodle.org/view/M24/job/07.%20Run%20phpunit%20UnitTests%20(24_STABLE)/941/ Ciao
            Hide
            Eloy Lafuente (stronk7) added a comment -

            Aha, found it. Seems that some bits from the (2.6 only) backup progress tracking/bar were added to 24 and 25 stables… going to fix that and will push after test.

            Show
            Eloy Lafuente (stronk7) added a comment - Aha, found it. Seems that some bits from the (2.6 only) backup progress tracking/bar were added to 24 and 25 stables… going to fix that and will push after test.
            Hide
            Eloy Lafuente (stronk7) added a comment - - edited
            Show
            Eloy Lafuente (stronk7) added a comment - - edited Pushed a fix. It was simply this (to both 24 and 25): http://git.moodle.org/gw?p=integration.git;a=commitdiff;h=644a0d0023484e7277150235cf94e66e92ae310b Ciao
            Hide
            Ankit Agarwal added a comment -

            Thanks Eloy, for taking care of that.

            Show
            Ankit Agarwal added a comment - Thanks Eloy, for taking care of that.
            Hide
            Dan Poltawski added a comment -

            Testing, testing. 123?

            Show
            Dan Poltawski added a comment - Testing, testing. 123?
            Hide
            Mark Nelson added a comment -

            Is this thing on?

            Show
            Mark Nelson added a comment - Is this thing on?
            Hide
            Michael de Raadt added a comment -

            Check 1 2.

            I just caught up with this and I'm about to head home. I will test this first thing tomorrow morning.

            Show
            Michael de Raadt added a comment - Check 1 2. I just caught up with this and I'm about to head home. I will test this first thing tomorrow morning.
            Hide
            Michael de Raadt added a comment -

            Some notes as I progress through this test...

            I have now tested this in Master with a course with 8gb of files. Relevant to this issue:

            • The automated backups failed on the large course and this was shown in the email sent to the admin and in the report. There was no indication in the terminal output from the automated backup script that there was a fail, only that the process was complete.
            • During a manual backup, when the zipping failed due to running out of space, an error was shown.
            • During a manual backup, when the zipping failed due to its size, an error was shown.
            • With the experimental tgz setting on, a backup of the large course was possible. It's worth noting that I'm using Windows.
            • The script had trouble cleaning up the temp folder under windows. It was trying to delete a non-empty dir. This is mostly due to the OS, I think.

            Now onto lesser versions...

            Show
            Michael de Raadt added a comment - Some notes as I progress through this test... I have now tested this in Master with a course with 8gb of files. Relevant to this issue: The automated backups failed on the large course and this was shown in the email sent to the admin and in the report. There was no indication in the terminal output from the automated backup script that there was a fail, only that the process was complete. During a manual backup, when the zipping failed due to running out of space, an error was shown. During a manual backup, when the zipping failed due to its size, an error was shown. With the experimental tgz setting on, a backup of the large course was possible. It's worth noting that I'm using Windows. The script had trouble cleaning up the temp folder under windows. It was trying to delete a non-empty dir. This is mostly due to the OS, I think. Now onto lesser versions...
            Hide
            Michael de Raadt added a comment -

            I've tested this in 2.4 and 2.5. I ran into an issue there where automated course backups were failing after a large course backup failed. I don't think this is specifically related to this issue. I've created MDL-42422 to describe the problem. Ankit is having a quick look into this.

            Show
            Michael de Raadt added a comment - I've tested this in 2.4 and 2.5. I ran into an issue there where automated course backups were failing after a large course backup failed. I don't think this is specifically related to this issue. I've created MDL-42422 to describe the problem. Ankit is having a quick look into this.
            Hide
            Michael de Raadt added a comment -

            Test result: Success!

            The issue I found in 2.5 and 2.4 was pre-existing.

            Show
            Michael de Raadt added a comment - Test result: Success! The issue I found in 2.5 and 2.4 was pre-existing.
            Hide
            Dan Poltawski added a comment -

            Hurrah! Thanks for your contribution - this fix is part of Moodle.

            Show
            Dan Poltawski added a comment - Hurrah! Thanks for your contribution - this fix is part of Moodle.
            Hide
            Mary Cooch added a comment -

            Removing the docs_required label as there doesn't seem to be anything needing documenting, nor any screenshots needing updating. If not, please describe what needs doing in a comment and re-add the docs_required label.

            Show
            Mary Cooch added a comment - Removing the docs_required label as there doesn't seem to be anything needing documenting, nor any screenshots needing updating. If not, please describe what needs doing in a comment and re-add the docs_required label.
            Hide
            Ankit Agarwal added a comment -

            Looking back at this, doesn't look like it needs any further documentation.

            cheers

            Show
            Ankit Agarwal added a comment - Looking back at this, doesn't look like it needs any further documentation. cheers

              People

              • Votes:
                1 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Agile