Issue Details (XML | Word | Printable)

Key: MDL-18468
Type: Improvement Improvement
Status: Resolved Resolved
Resolution: Fixed
Priority: Minor Minor
Assignee: Eloy Lafuente (stronk7)
Reporter: Eloy Lafuente (stronk7)
Votes: 1
Watchers: 4
Operations

Add/Edit UI Mockup to this issue
If you were logged in you would be able to see more operations.
Moodle

Restore XML parsing: Improve for speed by preprocessing and splitting moodle.xml

Created: 07/Mar/09 07:44 AM   Updated: 10/Mar/09 06:07 PM
Component/s: Backup
Affects Version/s: 1.9.4
Fix Version/s: 1.9.5

File Attachments: 1. Text File MDL-18468.basic.patch.txt (16 kB)

Issue Links:
Relates
 

Participants: Eloy Lafuente (stronk7), Helen Foster, Martin Dougiamas and Ray Lawrence
Security Level: None
Resolved date: 10/Mar/09
Affected Branches: MOODLE_19_STABLE
Fixed Branches: MOODLE_19_STABLE


 Description  « Hide
While researching about http://docs.moodle.org/en/Development:Backup_2.0_-_Improve_XML_parsing and some important bugs like MDL-15489 one of the things being planned for 2.0 is to split the current moodle.xml file into bunch of smaller files like:
  • info
  • course_header
  • course_structure
  • users
  • modules (one file per activity)
  • blocks
  • ...

This will allow the parser to perform it's job (reading contents from XML and pushing them to in-memory structures) really quicker than current approach, where the whole moodle.xml file in parsed up to 19 times (19 is the number of "TODO"s - parts present in a moodle.xml that are parsed separately).

While that split is going to be done in Moodle 2.0 backup (in order to allow restore to handle directly those smaller files), I thought that, perhaps it could be interesting also to perform some split in Moodle 1.9.

After some (a lot!) of coding and tests, comparing results, I've ended with:

1) One parser (moodle_splitter_parser) able to split the moodle.xml file into 19 smaller files.
2) Hack the main restore parser (MoodleParser) to use those split files instead of the original moodle.xml one
3) Put all that code under one experimental $CFG->experimentalsplitrestore configuration setting

While results aren't noticeable in small backups, processing one 50MB files, spent initially 50 seconds performing the split, but later, the whole restore ended 350 seconds quicker, so we saved 300 seconds of non-useful parsing time. Note results can a lot, depending of the contents of the backup, but in any case, the split time should be always smaller than the saved time.

So, I'm going to commit this both to 19_STABLE and HEAD (for the "legacy" Moodle 1.9 => 2.0 restore). It would be great to have people using and testing it, in order to get some feedback to promote the split strategy to be enabled always. I've restored at least 10 different courses of all sort of types and sizes and everything is working ok here. Let's see how it evolves.

Ciao



 All   Comments   Change History   Version Control      Sort Order: Ascending order - Click to sort in descending order
Eloy Lafuente (stronk7) added a comment - 09/Mar/09 11:01 AM
Attached basic patch that implements the splitter, based on $CFG->experimentalsplitrestore

I've optimised the splitter in every bit I've been able to do so. Has a good throughput right now, IMO (and a small memory usage).

Missing bits:

  • admin setting frontend.
  • lang string.

Ciao


Martin Dougiamas added a comment - 09/Mar/09 11:12 AM
WOO HOO! My +1 for 1.9 under a non-default option for now. Will be happy to do some testing!

Ray Lawrence added a comment - 10/Mar/09 03:03 AM
Eloy: does this mean 1.9 and 2.0 back ups are incompatible?

Eloy Lafuente (stronk7) added a comment - 10/Mar/09 08:52 AM
Code has been committed both to 19_STABLE and HEAD, with one "experimental" setting (under Admin) controlling it 8defaults to disabled).

Ray, it depends of what do you mean by "incompatible":

  • I'm 100% sure that Moodle 1.9.x backups will restore in 2.0 without problems. No doubt about that. It's a primary objective.
  • I'm 90% sure that Moodle 2.0 backups will have a different internal format (see http://docs.moodle.org/en/Development:Backup_2.0 - specially the ongoing research about parsing). My current idea is that Moodle 2.0 won't have one "monolithic" moodle.xml file containing all the backup info, but a bunch of smaller files containing "specialised parts". That will be a huge improvement both for backup and restore.

Resolving this as fixed... ciao


Helen Foster added a comment - 10/Mar/09 06:02 PM
Eloy, thanks a lot for this improvement

Information added to the documentation wiki:

http://docs.moodle.org/en/Backup_and_restore_FAQ
http://docs.moodle.org/en/Moodle_1.9.5_release_notes


Eloy Lafuente (stronk7) added a comment - 10/Mar/09 06:07 PM
Great! Thanks Helen! B-)