Implement one progressive (sax) parser, in order to handle big amounts of data without problem, able to "push" meaningful pieces of information to restore code easily. Split parsing and process.
Surely will end using some specialized processor for better/simpler integration with restore infrastructure.
Ciao