Moodle
  1. Moodle
  2. MDL-38189

META Backup/restore issues with large courses

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 2.3.4, 2.4.1
    • Fix Version/s: None
    • Component/s: Backup
    • Affected Branches:
      MOODLE_23_STABLE, MOODLE_24_STABLE
    • Rank:
      48024

      Description

      I wrote a document describing a range of issues identified with backup/restore that prevent it functioning for large courses (either 'large' as in many activities, or 'large' as in large file sizes).

      Michael asked us to file these in the Moodle tracker. I've made this issue to track them, and filed subtasks for all the specific items.

      We don't intend to implement these ourselves at the present time. Also to note, I have set 'Affects version' as 2.3.x because that's where testing was done, but any changes on this scale clearly would not happen until a future Moodle major version.

      Here is some context about use of backup and restore at the Open University, which may explain why these issues cause us problems. We expect some of these factors will apply to other institutions too, now or in future.

      Many of our sites have the following characteristics which affect backup and restore performance:

      • A large number of activities. (Max: 1,770 activities. Median: 70. 39 courses with 500+.)
      • A large number of sections. (Max: 754 sections. Median: 100. 42 courses with 200+.)
      • A large number of files (due to the use of interactive activities and per-user data). (Max: 57,000 different files. Median: 70. 66 courses with 1,000+.)
      • Large files. Due to the provision of high-quality videos (and other files that include videos, such as EPUB3 ebooks), some of our sites contain a number of large files which add up to a large total size. (Max: 11GB in total. Median: 50MB. 129 courses have 1GB+.) (Note: These sizes include duplicate files with same contenthash; I wasn’t able to make a fast enough query that takes this into account.)
      • A large number of users (several thousand).

      We use backup and restore in two ways:

      • The standard Moodle backup and restore features for certain manual tasks.
      • A custom script to ‘roll forward’ a course to the next presentation (this uses the backup API and then the restore API, to automate the process).

        Issue Links

        Progress
        Resolved Sub-Tasks

        Sub-Tasks

        1.
        Backup and restore operations should display progress Sub-task Closed Sam Marshall
         
        2.
        Backup and restore should use large memory limit Sub-task Closed Sam Marshall
         
        3.
        Backup and restore: Allow selection by activity type Sub-task Closed Sam Marshall
         
        4.
        Backup and restore should be tested with large backup files Sub-task Closed David Monllaó
         
        5.
        Large backup files should be downloadable Sub-task Closed Sam Marshall
         
        6.
        Backup/restore API should have better option to skip 'zip' operatiorn Sub-task Closed moodle.com
         
        7.
        Backup and restore should display log Sub-task Closed Sam Marshall
         
        8.
        Automated large-course generation for testing Sub-task Closed Sam Marshall
         
        9.
        Large course won't backup due to POST limit Sub-task Closed Sam Marshall
         
        10.
        Backup progress: add progress tracking inside long-running steps Sub-task Closed Sam Marshall
         
        11.
        Backup and restore progress: include file copies Sub-task Closed Sam Marshall
         
        12.
        Backup and restore progress: include forum discussions Sub-task Closed Sam Marshall
         
        13.
        Large course restore fails due to time limit when unzipping Sub-task Closed Sam Marshall
         
        14.
        Large course restore fails due to time limit on schema page Sub-task Closed Sam Marshall
         
        15.
        Large course restore times out on Review page Sub-task Closed Sam Marshall
         
        16.
        Restore: Need a way to display progress during UI stages Sub-task Closed Sam Marshall
         
        17.
        Restore: Processing restore times out in precheck Sub-task Closed Michael Aherne
         
        18.
        Restore: Progress bar needs to include more tasks Sub-task Closed Sam Marshall
         
        19.
        Backup: Very large course times out on user interface pages Sub-task Closed Sam Marshall
         
        20.
        Restore: Very large course times out Sub-task Closed Sam Marshall
         
        21.
        Improve time limit handling (including configuration for front-end servers) Sub-task Closed Sam Marshall
         

          Activity

          Hide
          Michael de Raadt added a comment -

          Thanks for bringing this all together, Sam. And thanks to the other people who were involved at the OU. I will look for links to similar issues and triage each of these issues individually.

          Show
          Michael de Raadt added a comment - Thanks for bringing this all together, Sam. And thanks to the other people who were involved at the OU. I will look for links to similar issues and triage each of these issues individually.
          Hide
          Michael de Raadt added a comment -

          I've added a few issues that I thought were directly related as sub-tasks and linked a few more issues that were related, but not directly.

          Show
          Michael de Raadt added a comment - I've added a few issues that I thought were directly related as sub-tasks and linked a few more issues that were related, but not directly.
          Hide
          Robert Russo added a comment - - edited

          I've got an issue where backups with user data selected, Moodle backs up ALL user profiles within the backup.

          This only happens on some courses.

          With 45,000 plus users, this makes a backup of a minimum of 6.4GB from the profile pictures alone.

          Still digging. Affects 200 of our 12,000 courses.

          **UPDATE**
          This is related to a now solved problem with our DB. Grade_grades were inserted for everyone in the system in the affected courses, causing them to be backed up during any backup that included enrolled users.

          Please ignore.

          Show
          Robert Russo added a comment - - edited I've got an issue where backups with user data selected, Moodle backs up ALL user profiles within the backup. This only happens on some courses. With 45,000 plus users, this makes a backup of a minimum of 6.4GB from the profile pictures alone. Still digging. Affects 200 of our 12,000 courses. ** UPDATE ** This is related to a now solved problem with our DB. Grade_grades were inserted for everyone in the system in the affected courses, causing them to be backed up during any backup that included enrolled users. Please ignore.
          Hide
          Sam Marshall added a comment -

          Info for people watching this issue: I have now started work on developments to address several of these problems (which I've assigned myself to), and intend to take on some more once I finish those.

          Once things are ready for review (on this list or on blockers for them), I'd appreciate reviews from HQ developers to speed the process. I'm hoping to do as much development as I can fit in during this week and next, after that I'm on holiday for a bit; obviously I can continue when I return if I don't finish.

          Either way I am definitely hoping to get this problem at least largely solved (along with the things that other people have already fixed, thank you) in the Moodle 2.6 timeframe.

          Show
          Sam Marshall added a comment - Info for people watching this issue: I have now started work on developments to address several of these problems (which I've assigned myself to), and intend to take on some more once I finish those. Once things are ready for review (on this list or on blockers for them), I'd appreciate reviews from HQ developers to speed the process. I'm hoping to do as much development as I can fit in during this week and next, after that I'm on holiday for a bit; obviously I can continue when I return if I don't finish. Either way I am definitely hoping to get this problem at least largely solved (along with the things that other people have already fixed, thank you) in the Moodle 2.6 timeframe.
          Hide
          Sam Marshall added a comment -

          I'm about to go away for two weeks so I've posted an update, in the file backup-issues.png, of where I've got to with this so far.

          The diagram shows what I've submitted (or in the case of the pink-border item, not yet submitted) and whether it's been reviewed or committed or not, along with dependencies between items.

          The pink-background area is stuff necessary for testing, by being able to generate large courses. This is nearly done.

          The yellow-background area is stuff required to successfully backup an 'L' sized test courses. With the commits shown, I have been able to successfully achieve this backup. (It's a 900MB backup file.)

          The blue-background area is stuff required (in addition to the yellow-background changes) to successfully restore the backup created above. I have not yet been able to successfully achieve this restore, but I've got all the way to the final 'Processing restore' step.

          Please note that this diagram does NOT show all the issues; even when these issues are dealt with, I'm not finished. More changes will be necessary to make the 'XL' course work. However, making the 'L' course work would be a very good start - many of the real courses at our institution are approximately at the 'L' size.

          It will help if, during the fortnight in which I'm away, HQ developers could review the changes that are awaiting peer review - especially those with many dependencies. Getting these reviewed and into the code should significantly simplify development on the remaining tasks.

          I'd like to thank everyone who's helped comment on and review issues so far. This is a big effort for other people as well as me, but I think we should be able to get it to work in the end.

          Show
          Sam Marshall added a comment - I'm about to go away for two weeks so I've posted an update, in the file backup-issues.png, of where I've got to with this so far. The diagram shows what I've submitted (or in the case of the pink-border item, not yet submitted) and whether it's been reviewed or committed or not, along with dependencies between items. The pink-background area is stuff necessary for testing, by being able to generate large courses. This is nearly done. The yellow-background area is stuff required to successfully backup an 'L' sized test courses. With the commits shown, I have been able to successfully achieve this backup. (It's a 900MB backup file.) The blue-background area is stuff required (in addition to the yellow-background changes) to successfully restore the backup created above. I have not yet been able to successfully achieve this restore, but I've got all the way to the final 'Processing restore' step. Please note that this diagram does NOT show all the issues; even when these issues are dealt with, I'm not finished. More changes will be necessary to make the 'XL' course work. However, making the 'L' course work would be a very good start - many of the real courses at our institution are approximately at the 'L' size. It will help if, during the fortnight in which I'm away, HQ developers could review the changes that are awaiting peer review - especially those with many dependencies. Getting these reviewed and into the code should significantly simplify development on the remaining tasks. I'd like to thank everyone who's helped comment on and review issues so far. This is a big effort for other people as well as me, but I think we should be able to get it to work in the end.
          Hide
          Michael Aherne added a comment - - edited

          It's interesting that this issue only mentions "'large' as in many activities, or 'large' as in large file sizes", as I've generally found that the courses we have trouble with (in terms of time to restore or timeout during restore) tend to be ones with large numbers of users. This has been borne out by some testing I did on MDL-41254, where an otherwise empty course with 4000 users was timing out, and also (maybe) by two of the subtasks here, MDL-41254 and MDL-41167, which both appear to have timed out during the user processing. When profiling the restore of the empty test class, I counted 44 million+ function calls, which is considerably higher than any other script I have ever seen!

          Does anyone have a feel for whether the user processing during restore is necessarily processor-intensive, or could there be some problem with this particular bit of code?

          In any case, it might be good to change the issue description to mention large numbers of users too.

          Show
          Michael Aherne added a comment - - edited It's interesting that this issue only mentions "'large' as in many activities, or 'large' as in large file sizes", as I've generally found that the courses we have trouble with (in terms of time to restore or timeout during restore) tend to be ones with large numbers of users. This has been borne out by some testing I did on MDL-41254 , where an otherwise empty course with 4000 users was timing out, and also (maybe) by two of the subtasks here, MDL-41254 and MDL-41167 , which both appear to have timed out during the user processing. When profiling the restore of the empty test class, I counted 44 million+ function calls, which is considerably higher than any other script I have ever seen! Does anyone have a feel for whether the user processing during restore is necessarily processor-intensive, or could there be some problem with this particular bit of code? In any case, it might be good to change the issue description to mention large numbers of users too.
          Hide
          Sam Marshall added a comment -

          Michael: You're right this can also be a problem - if you look at the test script I built for this, MDL-38197, you'll see that it does in fact also make courses with a large number of users. Specifically up to 100,000 in the largest option. So we didn't forget about it. I've added it to the description here.

          Show
          Sam Marshall added a comment - Michael: You're right this can also be a problem - if you look at the test script I built for this, MDL-38197 , you'll see that it does in fact also make courses with a large number of users. Specifically up to 100,000 in the largest option. So we didn't forget about it. I've added it to the description here.
          Hide
          Sam Marshall added a comment -

          I've updated my diagram to show the current situation (as I understand it). Specifically:

          1. Some key changes have now been integrated.
          2. My other completed changes have now been submitted for integration review.
          3. I've changed the box relating to the max_input_vars fix as somebody else provided what looks almost certain to be a better solution than mine.

          Thanks very much to everyone who's done peer/integration reviews and contributed to this. It's getting significantly closer to working.

          I'm going to resume working on the remaining issues now.

          Show
          Sam Marshall added a comment - I've updated my diagram to show the current situation (as I understand it). Specifically: 1. Some key changes have now been integrated. 2. My other completed changes have now been submitted for integration review. 3. I've changed the box relating to the max_input_vars fix as somebody else provided what looks almost certain to be a better solution than mine. Thanks very much to everyone who's done peer/integration reviews and contributed to this. It's getting significantly closer to working. I'm going to resume working on the remaining issues now.
          Hide
          Sam Marshall added a comment -

          Progress update:

          After this week's set of peer and integration reviews and testing (thanks all!), many of my issues (and MDL-41451 fixed by Paul Nicholls) have been integrated.

          This means that, after just one more issue currently awaiting peer review (MDL-41669), the L test course will successfully backup/restore on my server.

          I've coded some minor/related issues and am looking at issues with the XL test course, but definitely we are getting to the end of this process now. Yay!

          Show
          Sam Marshall added a comment - Progress update: After this week's set of peer and integration reviews and testing (thanks all!), many of my issues (and MDL-41451 fixed by Paul Nicholls) have been integrated. This means that, after just one more issue currently awaiting peer review ( MDL-41669 ), the L test course will successfully backup/restore on my server. I've coded some minor/related issues and am looking at issues with the XL test course, but definitely we are getting to the end of this process now. Yay!
          Hide
          Sam Marshall added a comment -

          Progress update:

          I have attached the latest version of the diagram showing all the issues I've been working on (or associated with) related to backup/restore with large courses.

          Thanks to everyone, especially the HQ developers who have done peer reviews and integration reviews, for helping with these changes.

          As of today's Moodle 2.6 master build, the changes that have already been integrated mean that it should be possible to backup and restore the 'L' test course without getting errors, which is a big improvement.

          Also of note - including the other changes (those awaiting peer review) I have even managed to backup the 'XL' test course on my developer server. For background, the 'XL' course is larger* than every real course we have on our system here. It has 5,000 activities in 1,000 sections; we don't have any real courses with more than 2,000 activities. So it really is quite a large course (and I will not be testing with the XXL course!)

          • Okay, one exception: in terms of filesize, we do have a course with 16GB of files compared to 'only' 9GB in the XL test course, but that's the only one, and it's smaller in every other respect.

          My development work on this is winding down (which is good because we're getting near code freeze). Next I intend to make it restore the XL course.

          Show
          Sam Marshall added a comment - Progress update: I have attached the latest version of the diagram showing all the issues I've been working on (or associated with) related to backup/restore with large courses. Thanks to everyone, especially the HQ developers who have done peer reviews and integration reviews, for helping with these changes. As of today's Moodle 2.6 master build, the changes that have already been integrated mean that it should be possible to backup and restore the 'L' test course without getting errors, which is a big improvement. Also of note - including the other changes (those awaiting peer review) I have even managed to backup the 'XL' test course on my developer server. For background, the 'XL' course is larger* than every real course we have on our system here. It has 5,000 activities in 1,000 sections; we don't have any real courses with more than 2,000 activities. So it really is quite a large course (and I will not be testing with the XXL course!) Okay, one exception: in terms of filesize, we do have a course with 16GB of files compared to 'only' 9GB in the XL test course, but that's the only one, and it's smaller in every other respect. My development work on this is winding down (which is good because we're getting near code freeze). Next I intend to make it restore the XL course.
          Hide
          Sam Marshall added a comment -

          I've attached a new version of the 'all bugs that are part of this' diagram.

          I've removed the dependency arrows (basically for the later changes, I couldn't really tell which of the earlier changes they depended on, so it didn't make sense). There's new indication by each issue of whether it's already in Moodle 2.5.x or whether it's been backported into the OU code. (Will write second comment re backport.)

          Show
          Sam Marshall added a comment - I've attached a new version of the 'all bugs that are part of this' diagram. I've removed the dependency arrows (basically for the later changes, I couldn't really tell which of the earlier changes they depended on, so it didn't make sense). There's new indication by each issue of whether it's already in Moodle 2.5.x or whether it's been backported into the OU code. (Will write second comment re backport.)
          Hide
          Sam Marshall added a comment -

          Because there were several API changes among fixing this problem, I didn't intend to backport most of these changes into core Moodle 2.5.x. However, I've done a backport for the OU as we want these changes on our Moodle 2.5 system.

          Because it might be useful for others, I'm providing it here as a patch which currently applies to MOODLE_25_STABLE (it will probably rot fairly quickly as so many files are affected, so unless you apply this week, I expect you'll need to resolve conflicts). However, there are some limitations to this backport because of the way we manage backports locally:

          • The change is in a single commit, not one commit per issue.
          • I've added 'ou-specific begins/ends' tags around every single changed chunk of code, and the original code is included (commented-out) where it was changed or deleted. This means that even though I did delete or modify code, the git stat shows 0 deletions (it's all still there but commented out).
          • Where a chunk of the change only affected comments (e.g. phpdoc corrections/additions/improvements, of which there were quite a lot), this has not been included.
          • This has been tested on OU moodle - a bit - but I haven't tested it against MOODLE_25_STABLE, and there were a handful of conflicts when I applied it which I fixed without trying the code again (so might be something broken).

          Here's a URL to the commit on my GitHub:

          https://github.com/sammarshallou/moodle/commit/6f1f72419f1f72cc0416ee9eafce1d1a868c37f2

          I don't really have time to do any more work on this backport at present (sorry) as I was expecting that, due to the API changes, this would be 2.6-only for core; but of course others should feel free, e.g. if you want to do some script to strip out the ou-specific junk and actually delete the delete bits, shouldn't be too hard... Or you could redo the backport yourself based on the issues shown in the diagram also attached to this issue, but bear in mind there are a few points where it needed changing for 2.5 rather than the straightforward port. (Not too many though!)

          For information, this change affects 52 files with 4,524 lines inserted (as noted a good number of these will be the ou-specific begins / ends and /* */ lines).

          Show
          Sam Marshall added a comment - Because there were several API changes among fixing this problem, I didn't intend to backport most of these changes into core Moodle 2.5.x. However, I've done a backport for the OU as we want these changes on our Moodle 2.5 system. Because it might be useful for others, I'm providing it here as a patch which currently applies to MOODLE_25_STABLE (it will probably rot fairly quickly as so many files are affected, so unless you apply this week, I expect you'll need to resolve conflicts). However, there are some limitations to this backport because of the way we manage backports locally: The change is in a single commit, not one commit per issue. I've added 'ou-specific begins/ends' tags around every single changed chunk of code, and the original code is included (commented-out) where it was changed or deleted. This means that even though I did delete or modify code, the git stat shows 0 deletions (it's all still there but commented out). Where a chunk of the change only affected comments (e.g. phpdoc corrections/additions/improvements, of which there were quite a lot), this has not been included. This has been tested on OU moodle - a bit - but I haven't tested it against MOODLE_25_STABLE, and there were a handful of conflicts when I applied it which I fixed without trying the code again (so might be something broken). Here's a URL to the commit on my GitHub: https://github.com/sammarshallou/moodle/commit/6f1f72419f1f72cc0416ee9eafce1d1a868c37f2 I don't really have time to do any more work on this backport at present (sorry) as I was expecting that, due to the API changes, this would be 2.6-only for core; but of course others should feel free, e.g. if you want to do some script to strip out the ou-specific junk and actually delete the delete bits, shouldn't be too hard... Or you could redo the backport yourself based on the issues shown in the diagram also attached to this issue, but bear in mind there are a few points where it needed changing for 2.5 rather than the straightforward port. (Not too many though!) For information, this change affects 52 files with 4,524 lines inserted (as noted a good number of these will be the ou-specific begins / ends and /* */ lines).
          Hide
          Tony Butler added a comment -

          I've been testing this backport this week. It applied cleanly enough to MOODLE_25_STABLE with no conflicts, but I've found a couple of bugs so far:

          The last step of import, just after the progress bar has reached 100%, errors out with "Coding error detected, it must be fixed by a programmer: Invalid state passed to moodle_page::set_state. We are in state 2 and state 1 was requested". The import completes successfully though, and it seems to be fixed by removing the (now duplicate) 'echo $OUTPUT->header();' line from backup/import.php (around line 194).

          Every backup ends with "Undefined variable: result in /srv/www/htdocs/backup/moodle2/backup_stepslib.php on line 1765", because the line '$result = $zippacker->archive_to_pathname($files, $zipfile, true, $this);' has been commented out.

          Cheers,
          Tony

          Show
          Tony Butler added a comment - I've been testing this backport this week. It applied cleanly enough to MOODLE_25_STABLE with no conflicts, but I've found a couple of bugs so far: The last step of import, just after the progress bar has reached 100%, errors out with "Coding error detected, it must be fixed by a programmer: Invalid state passed to moodle_page::set_state. We are in state 2 and state 1 was requested". The import completes successfully though, and it seems to be fixed by removing the (now duplicate) 'echo $OUTPUT->header();' line from backup/import.php (around line 194). Every backup ends with "Undefined variable: result in /srv/www/htdocs/backup/moodle2/backup_stepslib.php on line 1765", because the line '$result = $zippacker->archive_to_pathname($files, $zipfile, true, $this);' has been commented out. Cheers, Tony
          Hide
          Sam Marshall added a comment -

          Thanks for testing this, Tony!

          Regarding your two points:

          1) Ooops! Sorry. Agree with your fix. I checked and this looks OK in master, so I must have just messed up in the backport.

          2) Again, your fix looks good. This is because of a difference in the OU Moodle version I was basing this on (2.5.2) and the current MOODLE_25_STABLE which introduced the '$result' value - MDL-37877.

          Show
          Sam Marshall added a comment - Thanks for testing this, Tony! Regarding your two points: 1) Ooops! Sorry. Agree with your fix. I checked and this looks OK in master, so I must have just messed up in the backport. 2) Again, your fix looks good. This is because of a difference in the OU Moodle version I was basing this on (2.5.2) and the current MOODLE_25_STABLE which introduced the '$result' value - MDL-37877 .
          Hide
          Michael de Raadt added a comment -

          As all the sub-tasks of this issue have now been resolved, I'm closing this meta issue.

          Well done to all involved, especially Sam.

          Show
          Michael de Raadt added a comment - As all the sub-tasks of this issue have now been resolved, I'm closing this meta issue. Well done to all involved, especially Sam.

            People

            • Votes:
              7 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: