2.1, 2.2, 2.7.7
Running Windows 2003, PHP 5.3.6, MSSQL 2008, using the Microsoft MSSQL drivers for PHP
MOODLE_21_STABLE, MOODLE_22_STABLE, MOODLE_27_STABLE
1. Use MSSQL, preferably if previously you were able to reproduce this problem or
2. Create a course
3. Add a folder resource to the course
4. Backup the course
5. Restore the course
BACKEND Sprint 10
We encountered the problem when our automated backups weren't running properly; they'd get stuck part way through. Trying to backup the courses the auto backup got stuck at gave us the same problem.
Certain big courses (in the sense of lots of resources, not filesize) don't backup up properly either manually or through the automated system. The backup process just ran until it gets a fastcgi error. We tried to see if we could narrow it down to particular types of resources, but that wasn't it. As an example, one course which wouldn't backup had a category level question bank with 1000+ questions.
I did a lot of unsuccessful troubleshooting, which I've not reported-- here's where we're at now, hopefully it's useful to someone. Please note that the changes I detail below are NOT A FIX, they were done for troubleshooting, using them in production introduces other problems:
Tracing the backup process we found the problem would always occur in the
function move_annotations_to_final($backupid, $itemname)
$DB->set_field('backup_ids_temp', 'itemname', $itemname . 'final', array('id' => $annotation->id));
Adding traces here shows it ticking along fine but then just stopping. Multiple attempts to backup the same course (after waiting for cron to clear out temp stuff) ended up failing at the same point.
Tracing deeper into the set_field function, it calls the sqlsrv do_query,
To troubleshoot, I added a query timeout as follows (none was specified originally)
In sqlsrv_native_moodle_database function do_query
$result = sqlsrv_query($this->sqlsrv, $sql,array(),array('QueryTimeout'=>10));
the backups now return with error
Debug info: SQLState: HYT00<br>
Error Code: 0<br>
Message: [Microsoft][SQL Server Native Client 10.0]Query timeout expired<br>
UPDATE #mdl_backup_ids_temp SET itemname = N'filefinal' WHERE id = '1366'
0 => 'filefinal',
1 => '1366',
•line 397 of \lib\dml\moodle_database.php: dml_write_exception thrown
•line 252 of \lib\dml\sqlsrv_native_moodle_database.php: call to moodle_database->query_end()
•line 370 of \lib\dml\sqlsrv_native_moodle_database.php: call to sqlsrv_native_moodle_database->query_end()
•line 1098 of \lib\dml\sqlsrv_native_moodle_database.php: call to sqlsrv_native_moodle_database->do_query()
•line 1475 of \lib\dml\moodle_database.php: call to sqlsrv_native_moodle_database->set_field_select()
•line 146 of \backup\util\dbops\backup_structure_dbops.class.php: call to moodle_database->set_field()
•line 1338 of \backup\moodle2\backup_stepslib.php: call to backup_structure_dbops::move_annotations_to_final()
•line 34 of \backup\util\plan\backup_execution_step.class.php: call to move_info
•line 153 of \backup\util\plan\base_task.class.php: call to backup_execution_step->execute()
•line 148 of \backup\util\plan\base_plan.class.php: call to base_task->execute()
•line 105 of \backup\util\plan\backup_plan.class.php: call to base_plan->execute()
•line 296 of \backup\controller\backup_controller.class.php: call to backup_plan->execute()
•line 111 of \backup\util\ui\backup_ui.class.php: call to backup_controller->execute_plan()
•line 89 of \backup\backup.php: call to backup_ui->execute()
Varying the timeout had no effect; it failed at the same place, just would take longer...
So adding the timeout here gets the backup process to fail after the timeout (rather than waiting for a fastcgi timeout). It looks like mssql just not responding.
I wanted to see what was happening with the tables, which is hard for mssql and temp tables, since they're session based. I don't have access to the moodle db sessions' temp tables from my db admin tool.
To get around this, I modified \backup\util\dbops\backup_controller_dbops_class.php (around line 141&150), so that a real table was created/deleted, instead of using MSSQLs session based temp tables (just removed "temp" in the calls to dbman, ie
$dbman->create_temp_table($xmldb_table); becomes $dbman->create_table($xmldb_table);
and around 150
$dbman->drop_temp_table($table); becomes $dbman->drop_table($table);
There's a similar fix neccessary in
\backup\util\dbops\restore_controller_dbops_class.php, line 109,
With this change, the backup and restore work ok.
My hypothesis is that at some point during the long backup, another connection gets made to mssql; since temp tables are associated with connections, the link to the original temp table gets lost and the data isn't available. Of course this doesn't explain why the sqlsrv call ust times out rather than returning an error.
We've actually been using the system with this little change, but it's not really satisfactory – the problem, of course, is that only one backup/restore can be run on the system at any one time because there's just the one temp table; and multiple backups just conflict, requiring the temp table we created to be dropped.
Some related posts
http://moodle.org/mod/forum/discuss.php?d=177822 (I posted this early)
Possibly related http://tracker.moodle.org/browse/MDL-28162 other commenters on this issue were on MSSQL