Moodle
  1. Moodle
  2. MDL-37877

Over-large backups fail transparently, then do not get deleted next time, so space fills up

    Details

    • Testing Instructions:
      Hide
      1. Turn on full debug.
      2. Set time limit in your php ini something very large like a day.
      3. set time limit for a backup step by changing core_backup_progress::TIME_LIMIT_WITHOUT_PROGRESS to a few hours. (located in backup>util>progress>core_backup_progress.class.php)
      4. Generate a course with lots of activities/sections etc.
      5. Add atleast 8-10 GB of media content to the course spread over multiple activities.
      6. Set the following configs:-
        1. backup_auto_active - manual
        2. backup_auto_storage - specified directory
        3. backup_auto_destination - path to some writeable dir
      7. Run the script admin/cli/automated_backup.php
      8. Make sure backup was created for all courses except the large one. The large one would fail.
      9. Goto My home ► Site administration ► Reports ► Backups
      10. Make sure the status of all courses except the large course is "OK"
      11. Re run the backup script.
      12. Again, make sure backup was created for all courses except the large one.
      13. Goto My home ► Site administration ► Reports ► Backups
      14. Make sure the status of all courses except the large course is "OK"
      15. Status for the large course is "ERROR"
      16. Try backing up the courses using web interface. The large course backup should fail, generating an error. Any other course should go as expected.
      Show
      Turn on full debug. Set time limit in your php ini something very large like a day. set time limit for a backup step by changing core_backup_progress::TIME_LIMIT_WITHOUT_PROGRESS to a few hours. (located in backup>util>progress>core_backup_progress.class.php) Generate a course with lots of activities/sections etc. Add atleast 8-10 GB of media content to the course spread over multiple activities. Set the following configs:- backup_auto_active - manual backup_auto_storage - specified directory backup_auto_destination - path to some writeable dir Run the script admin/cli/automated_backup.php Make sure backup was created for all courses except the large one. The large one would fail. Goto My home ► Site administration ► Reports ► Backups Make sure the status of all courses except the large course is "OK" Re run the backup script. Again, make sure backup was created for all courses except the large one. Goto My home ► Site administration ► Reports ► Backups Make sure the status of all courses except the large course is "OK" Status for the large course is "ERROR" Try backing up the courses using web interface. The large course backup should fail, generating an error. Any other course should go as expected.
    • Affected Branches:
      MOODLE_24_STABLE, MOODLE_26_STABLE
    • Fixed Branches:
      MOODLE_24_STABLE, MOODLE_25_STABLE
    • Pull 2.4 Branch:
      MDL-37877-m24
    • Pull 2.5 Branch:
      MDL-37877-m25
    • Pull Master Branch:
      MDL-37877-master
    • Story Points:
      40
    • Rank:
      52654
    • Sprint:
      BACKEND Sprint 6

      Description

      This refers to scheduled backups, where the number of versions kept is set to one. This customer site has about half a dozen very large courses, where the backup system generates vary large files (two examples from the test I ran this morning are 7.8GB and 9.2GB). These are obviously too large to be successfully zipped, and when the backup processes them, it eventually fails with the error like : "Error - backup-moodle2-course-1681-20130205-1034.mbz does not appear to be a valid backup (missing_moodle_backup_xml_file)"

      Then (in the log) it says "complete - next execution (the next scheduled backup time)"

      So it fails, presumably because it is too big - but then carries on. This in itself is good, as previously, failures like this were stopping the backup schedule completely.

      However, the next time that the scheduled course backup is run, all of the old versions of the backups are replaced (remember, we are only keeping one of each) EXCEPT the failed backups. Moodle does not seem to recognise them as valid backups, despite their filenames, so they are NOT deleted. Over the course of several days, the multiple copies of large failed backups that result from this eventually fill up the server disk, and cause teh site to fail.

      I know that this relates to the larger subject of handling very large course backups, but can anything be done with the checking process, so that these large failed backup files do not get left to fill up the server?

        Issue Links

          Activity

          Hide
          Sean Keogh added a comment - - edited

          Could someone take a look at this please - it is now affecting more than one of our customers - and if we don't remember to go into their server every few days to delete the failed backups, their disk fills up and the site fails, and several thousand students and teaching staff want to jump up and down on my head!.

          Show
          Sean Keogh added a comment - - edited Could someone take a look at this please - it is now affecting more than one of our customers - and if we don't remember to go into their server every few days to delete the failed backups, their disk fills up and the site fails, and several thousand students and teaching staff want to jump up and down on my head!.
          Hide
          Michael de Raadt added a comment -

          Thanks for reporting that.

          Feel free to contribute to fixing this issue.

          Show
          Michael de Raadt added a comment - Thanks for reporting that. Feel free to contribute to fixing this issue.
          Hide
          Ankit Agarwal added a comment -

          Hi Sean,
          Can you please provide further information on this please.

          1. What do you get the status of backups in the emails sent?
          2. What are the status of the course in context when you go to My home ► Site administration ► Reports ► Backups? Is it "OK" or "unfinished" or "Error"?

          Thanks

          Show
          Ankit Agarwal added a comment - Hi Sean, Can you please provide further information on this please. What do you get the status of backups in the emails sent? What are the status of the course in context when you go to My home ► Site administration ► Reports ► Backups? Is it "OK" or "unfinished" or "Error"? Thanks
          Hide
          Sean Keogh added a comment -

          I don't get the emails, but I'll have a look at the reports and get back to you.

          Show
          Sean Keogh added a comment - I don't get the emails, but I'll have a look at the reports and get back to you.
          Hide
          Sean Keogh added a comment -

          Hiya,

          It reports in the backup log as completing OK. The .mbz file is almost 17GB in size, which I suspect is not a valid backup...

          Show
          Sean Keogh added a comment - Hiya, It reports in the backup log as completing OK. The .mbz file is almost 17GB in size, which I suspect is not a valid backup...
          Hide
          Ankit Agarwal added a comment - - edited

          I was finally able to replicate this at will. The problem was zip archiver didn't know an error occurred and it generated a corrupt file with a status of OK.
          So the first part of this patch deals to identify such cases and throw an exception. We monitor the generated backup file and see if it is valid zip or not. If found to be corrupt we throw an exception.
          Second part of the solution makes sure we delete any generated backup in case of any error. So we don't store half baked files.

          The issue in context should be resolved by this patch. However Zip_archive::open() throws a debug (developer level) notice when an corrupt backup is identified. I am not sure what is the best way to suppress this notice or even if we should suppress this notice. I will appreciate feedback on that from the reviewer.

          Show
          Ankit Agarwal added a comment - - edited I was finally able to replicate this at will. The problem was zip archiver didn't know an error occurred and it generated a corrupt file with a status of OK. So the first part of this patch deals to identify such cases and throw an exception. We monitor the generated backup file and see if it is valid zip or not. If found to be corrupt we throw an exception. Second part of the solution makes sure we delete any generated backup in case of any error. So we don't store half baked files. The issue in context should be resolved by this patch. However Zip_archive::open() throws a debug (developer level) notice when an corrupt backup is identified. I am not sure what is the best way to suppress this notice or even if we should suppress this notice. I will appreciate feedback on that from the reviewer.
          Hide
          Ankit Agarwal added a comment -

          Also Sean,
          If you are able to test the patch, that would be great.

          Thanks

          Show
          Ankit Agarwal added a comment - Also Sean, If you are able to test the patch, that would be great. Thanks
          Hide
          Sean Keogh added a comment -

          I can certainly give it a try. I will report back.

          Show
          Sean Keogh added a comment - I can certainly give it a try. I will report back.
          Hide
          Petr Škoda added a comment -

          looks ok, +1, submitting for integration

          Show
          Petr Škoda added a comment - looks ok, +1, submitting for integration
          Hide
          Ankit Agarwal added a comment -

          Thanks for the review Petr.

          I have backported this and added docs label.

          Also this needs to be mentioned on release notes once integrated.

          Show
          Ankit Agarwal added a comment - Thanks for the review Petr. I have backported this and added docs label. Also this needs to be mentioned on release notes once integrated.
          Hide
          Eloy Lafuente (stronk7) added a comment -

          The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week.

          TIA and ciao

          Show
          Eloy Lafuente (stronk7) added a comment - The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week. TIA and ciao
          Hide
          Damyon Wiese added a comment -

          Thanks Ankit - the patch looks OK - but I would like a small change.

          Assuming that the reason the backup failed is because of the size of the course is IMO an unhelpful guess - on master they could be using tgz backups which allow much larger course backups.

          I think you are catching a useful exception here and throwing a less useful one. Can you just rethrow the useful one?

          Show
          Damyon Wiese added a comment - Thanks Ankit - the patch looks OK - but I would like a small change. Assuming that the reason the backup failed is because of the size of the course is IMO an unhelpful guess - on master they could be using tgz backups which allow much larger course backups. I think you are catching a useful exception here and throwing a less useful one. Can you just rethrow the useful one?
          Hide
          Ankit Agarwal added a comment -

          Shouldn't the backup step throw only backup_step_exception? or is it aright to throw other exceptions here?

          Show
          Ankit Agarwal added a comment - Shouldn't the backup step throw only backup_step_exception? or is it aright to throw other exceptions here?
          Hide
          Damyon Wiese added a comment -

          That's for you to check and decide on (please). I think the catch is just based on "Exception" but I haven't looked at it closely. If you have to wrap the original exception that's fine.

          Show
          Damyon Wiese added a comment - That's for you to check and decide on (please). I think the catch is just based on "Exception" but I haven't looked at it closely. If you have to wrap the original exception that's fine.
          Hide
          Ankit Agarwal added a comment -

          code wise throwing any exception should be fine, however looking at rest of the backup steps we are throwing "backup_step"exception" in all those.

          Adding Eloy and Mark, to see if they have any strong opinion on which exception to throw here.

          Show
          Ankit Agarwal added a comment - code wise throwing any exception should be fine, however looking at rest of the backup steps we are throwing "backup_step"exception" in all those. Adding Eloy and Mark, to see if they have any strong opinion on which exception to throw here.
          Hide
          Damyon Wiese added a comment -

          I think it would make most sense from the calling code to be able to catch backup_step_exception only - so please wrap it.

          Show
          Damyon Wiese added a comment - I think it would make most sense from the calling code to be able to catch backup_step_exception only - so please wrap it.
          Hide
          Ankit Agarwal added a comment -

          updated, thanks

          Show
          Ankit Agarwal added a comment - updated, thanks
          Hide
          Damyon Wiese added a comment -

          Thanks Ankit,

          Looks good now. Integrated to 24, 25 and master.

          Show
          Damyon Wiese added a comment - Thanks Ankit, Looks good now. Integrated to 24, 25 and master.
          Hide
          Eloy Lafuente (stronk7) added a comment -
          Show
          Eloy Lafuente (stronk7) added a comment - Boom, this is leading to phpunit failures in stable branches (master passes apparently): 25_STABLE: http://integration.moodle.org/view/M25/job/20.%20Run%20phpunit%20UnitTests%20(25_STABLE)/470/ 24_STABLE: http://integration.moodle.org/view/M24/job/07.%20Run%20phpunit%20UnitTests%20(24_STABLE)/941/ Ciao
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Aha, found it. Seems that some bits from the (2.6 only) backup progress tracking/bar were added to 24 and 25 stables… going to fix that and will push after test.

          Show
          Eloy Lafuente (stronk7) added a comment - Aha, found it. Seems that some bits from the (2.6 only) backup progress tracking/bar were added to 24 and 25 stables… going to fix that and will push after test.
          Hide
          Eloy Lafuente (stronk7) added a comment - - edited
          Show
          Eloy Lafuente (stronk7) added a comment - - edited Pushed a fix. It was simply this (to both 24 and 25): http://git.moodle.org/gw?p=integration.git;a=commitdiff;h=644a0d0023484e7277150235cf94e66e92ae310b Ciao
          Hide
          Ankit Agarwal added a comment -

          Thanks Eloy, for taking care of that.

          Show
          Ankit Agarwal added a comment - Thanks Eloy, for taking care of that.
          Hide
          Dan Poltawski added a comment -

          Testing, testing. 123?

          Show
          Dan Poltawski added a comment - Testing, testing. 123?
          Hide
          Mark Nelson added a comment -

          Is this thing on?

          Show
          Mark Nelson added a comment - Is this thing on?
          Hide
          Michael de Raadt added a comment -

          Check 1 2.

          I just caught up with this and I'm about to head home. I will test this first thing tomorrow morning.

          Show
          Michael de Raadt added a comment - Check 1 2. I just caught up with this and I'm about to head home. I will test this first thing tomorrow morning.
          Hide
          Michael de Raadt added a comment -

          Some notes as I progress through this test...

          I have now tested this in Master with a course with 8gb of files. Relevant to this issue:

          • The automated backups failed on the large course and this was shown in the email sent to the admin and in the report. There was no indication in the terminal output from the automated backup script that there was a fail, only that the process was complete.
          • During a manual backup, when the zipping failed due to running out of space, an error was shown.
          • During a manual backup, when the zipping failed due to its size, an error was shown.
          • With the experimental tgz setting on, a backup of the large course was possible. It's worth noting that I'm using Windows.
          • The script had trouble cleaning up the temp folder under windows. It was trying to delete a non-empty dir. This is mostly due to the OS, I think.

          Now onto lesser versions...

          Show
          Michael de Raadt added a comment - Some notes as I progress through this test... I have now tested this in Master with a course with 8gb of files. Relevant to this issue: The automated backups failed on the large course and this was shown in the email sent to the admin and in the report. There was no indication in the terminal output from the automated backup script that there was a fail, only that the process was complete. During a manual backup, when the zipping failed due to running out of space, an error was shown. During a manual backup, when the zipping failed due to its size, an error was shown. With the experimental tgz setting on, a backup of the large course was possible. It's worth noting that I'm using Windows. The script had trouble cleaning up the temp folder under windows. It was trying to delete a non-empty dir. This is mostly due to the OS, I think. Now onto lesser versions...
          Hide
          Michael de Raadt added a comment -

          I've tested this in 2.4 and 2.5. I ran into an issue there where automated course backups were failing after a large course backup failed. I don't think this is specifically related to this issue. I've created MDL-42422 to describe the problem. Ankit is having a quick look into this.

          Show
          Michael de Raadt added a comment - I've tested this in 2.4 and 2.5. I ran into an issue there where automated course backups were failing after a large course backup failed. I don't think this is specifically related to this issue. I've created MDL-42422 to describe the problem. Ankit is having a quick look into this.
          Hide
          Michael de Raadt added a comment -

          Test result: Success!

          The issue I found in 2.5 and 2.4 was pre-existing.

          Show
          Michael de Raadt added a comment - Test result: Success! The issue I found in 2.5 and 2.4 was pre-existing.
          Hide
          Dan Poltawski added a comment -

          Hurrah! Thanks for your contribution - this fix is part of Moodle.

          Show
          Dan Poltawski added a comment - Hurrah! Thanks for your contribution - this fix is part of Moodle.

            People

            • Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Agile