Hi Tim,
I really don't understand either why I assigned this to you. For sure I read it really quickly and assumed that 524 MB for one activity is far too much.
Talking about memory usage when restoring... I remember I started with a "pure SAX" conception of the task, saving info sequentially to temp tables while reading it. And AFAIK that continues being the approach now.
So, for example, MODULES section, when processed... each time ONE particular MODULE is parsed (SAX) from backup file, it's sent straight to temp table, consuming only the memory needed by that MODULE.
The, once all the individual MODULES have been stored in temp tables, the restore process starts processing them. And here is where the conversion of 1-MODULE XML file to one array is performed by xmlize.
I.E. one activity is the "atom" in the restore process. I took this decision because never thought that 1-activity could need more than a few MB so, xmlizing it shouldn't really require hundreds of MBs.
Certainly we can improve restore in a lot of ways... just addressing here some of them:
- Change a lot of get_records() loops by they get_recordset() counterparts.
- Provide some alternative to XMLIZE trying to save memory as possible.
- Reduce the size of the array generated by XMLIZE (or replacement). It seems to have too much levels to support, for example attributes, where moodle backup format hasn't attributes at all.
- Try to detect if some module is backing up "too much".
- Profile how memory grows along the process...
- Separate computations, debug and output along all the process, to make it more readable and easy to trace.
- ...
Apart from this...also... we could change the "atom" here... from "activity" to something within them (different for each activity), but I really think this will add complexity to the process. I would leave atoms as they currently are.
But yes, definitively we are SAX-parsing backup files (the best way, for sure, DOM is evil here). The problem is that we are reading some "atoms" that due to current implementation (xmlize, sql...) eats too much memory (apart from other bugs, of course).
Ciao 
Wow. That's really a lot of memory for just 500 users... assigning to Tim...