Moodle

Restore XML parsing: Improve for speed by preprocessing and splitting moodle.xml

Details

  • Type: Improvement Improvement
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Fixed
  • Affects Version/s: 1.9.4
  • Fix Version/s: 1.9.5
  • Component/s: Backup
  • Labels:
    None
  • Affected Branches:
    MOODLE_19_STABLE
  • Fixed Branches:
    MOODLE_19_STABLE

Description

While researching about http://docs.moodle.org/en/Development:Backup_2.0_-_Improve_XML_parsing and some important bugs like MDL-15489 one of the things being planned for 2.0 is to split the current moodle.xml file into bunch of smaller files like:

  • info
  • course_header
  • course_structure
  • users
  • modules (one file per activity)
  • blocks
  • ...

This will allow the parser to perform it's job (reading contents from XML and pushing them to in-memory structures) really quicker than current approach, where the whole moodle.xml file in parsed up to 19 times (19 is the number of "TODO"s - parts present in a moodle.xml that are parsed separately).

While that split is going to be done in Moodle 2.0 backup (in order to allow restore to handle directly those smaller files), I thought that, perhaps it could be interesting also to perform some split in Moodle 1.9.

After some (a lot!) of coding and tests, comparing results, I've ended with:

1) One parser (moodle_splitter_parser) able to split the moodle.xml file into 19 smaller files.
2) Hack the main restore parser (MoodleParser) to use those split files instead of the original moodle.xml one
3) Put all that code under one experimental $CFG->experimentalsplitrestore configuration setting

While results aren't noticeable in small backups, processing one 50MB files, spent initially 50 seconds performing the split, but later, the whole restore ended 350 seconds quicker, so we saved 300 seconds of non-useful parsing time. Note results can a lot, depending of the contents of the backup, but in any case, the split time should be always smaller than the saved time.

So, I'm going to commit this both to 19_STABLE and HEAD (for the "legacy" Moodle 1.9 => 2.0 restore). It would be great to have people using and testing it, in order to get some feedback to promote the split strategy to be enabled always. I've restored at least 10 different courses of all sort of types and sizes and everything is working ok here. Let's see how it evolves.

Ciao

Issue Links

Activity

Hide
Eloy Lafuente (stronk7) added a comment -

Attached basic patch that implements the splitter, based on $CFG->experimentalsplitrestore

I've optimised the splitter in every bit I've been able to do so. Has a good throughput right now, IMO (and a small memory usage).

Missing bits:

  • admin setting frontend.
  • lang string.

Ciao

Show
Eloy Lafuente (stronk7) added a comment - Attached basic patch that implements the splitter, based on $CFG->experimentalsplitrestore I've optimised the splitter in every bit I've been able to do so. Has a good throughput right now, IMO (and a small memory usage). Missing bits:
  • admin setting frontend.
  • lang string.
Ciao
Hide
Martin Dougiamas added a comment -

WOO HOO! My +1 for 1.9 under a non-default option for now. Will be happy to do some testing!

Show
Martin Dougiamas added a comment - WOO HOO! My +1 for 1.9 under a non-default option for now. Will be happy to do some testing!
Hide
Ray Lawrence added a comment -

Eloy: does this mean 1.9 and 2.0 back ups are incompatible?

Show
Ray Lawrence added a comment - Eloy: does this mean 1.9 and 2.0 back ups are incompatible?
Hide
Eloy Lafuente (stronk7) added a comment -

Code has been committed both to 19_STABLE and HEAD, with one "experimental" setting (under Admin) controlling it 8defaults to disabled).

Ray, it depends of what do you mean by "incompatible":

  • I'm 100% sure that Moodle 1.9.x backups will restore in 2.0 without problems. No doubt about that. It's a primary objective.
  • I'm 90% sure that Moodle 2.0 backups will have a different internal format (see http://docs.moodle.org/en/Development:Backup_2.0 - specially the ongoing research about parsing). My current idea is that Moodle 2.0 won't have one "monolithic" moodle.xml file containing all the backup info, but a bunch of smaller files containing "specialised parts". That will be a huge improvement both for backup and restore.

Resolving this as fixed... ciao

Show
Eloy Lafuente (stronk7) added a comment - Code has been committed both to 19_STABLE and HEAD, with one "experimental" setting (under Admin) controlling it 8defaults to disabled). Ray, it depends of what do you mean by "incompatible":
  • I'm 100% sure that Moodle 1.9.x backups will restore in 2.0 without problems. No doubt about that. It's a primary objective.
  • I'm 90% sure that Moodle 2.0 backups will have a different internal format (see http://docs.moodle.org/en/Development:Backup_2.0 - specially the ongoing research about parsing). My current idea is that Moodle 2.0 won't have one "monolithic" moodle.xml file containing all the backup info, but a bunch of smaller files containing "specialised parts". That will be a huge improvement both for backup and restore.
Resolving this as fixed... ciao
Hide
Helen Foster added a comment -

Eloy, thanks a lot for this improvement

Information added to the documentation wiki:

http://docs.moodle.org/en/Backup_and_restore_FAQ
http://docs.moodle.org/en/Moodle_1.9.5_release_notes

Show
Helen Foster added a comment - Eloy, thanks a lot for this improvement Information added to the documentation wiki: http://docs.moodle.org/en/Backup_and_restore_FAQ http://docs.moodle.org/en/Moodle_1.9.5_release_notes
Hide
Eloy Lafuente (stronk7) added a comment -

Great! Thanks Helen! B-)

Show
Eloy Lafuente (stronk7) added a comment - Great! Thanks Helen! B-)

Dates

  • Created:
    Updated:
    Resolved: