Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-57698

Backup and restore can cause deadlock with sqlsrv driver

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.1.4, 3.2.4, 3.3
    • Fix Version/s: 3.2.5, 3.3.2
    • Component/s: Backup, Database SQL/XMLDB
    • Environment:
      Database servers:
      * Microsoft SQL Server 2008 R2
      * Microsoft SQL Server 2016

      Database driver:
      * sqlsrv

      No other details applicable.
    • Database:
      Microsoft SQL
    • Testing Instructions:
      Hide
      Prerequisites

      Test on sqlsrv driver only. No changes to other DBS present.

      This is a medium-to-large test with a lot of prerequisites. mssql, stats and ldap. If you think you need to, please ask for a test session to be created to share the load.

      Unit tests

      CI will do this too, but it's nice if we can get more environments running since it's such a hard-to-pin-down issue.

      Execute the PHPUnit tests

      1. --filter=\core_coursecatlib_test.
      2. --filter=search (uses temp tables)
      3. backup/controller/tests/controller_test.php
        Execute the PHPUnit tests, particularly those covering DML and those which trigger course backup and restore operations (e.g. \core_coursecatlib_test).
      Backup test
      1. Backup the biggest/most complex course you have from any moodle instance. Preferably from 3.2 or lower. Save the mbz ready for use later. Make sure it has a SCORM module, if not add one. Use the zip below
      2. Restore the deadlock.mbz backup file
      3. Ensure it restores successfully
      4. Try restoring your big backup
      5. Ensure it restores successfully
      Stats test
      Prerequisites:
      1. Make sure you set "Enabled statistics" on Site administration ► Advanced features
      2. Make sure you have some data on site and run cron to generate statistics data.
        • Use attached file to generate stats (Use it on dev server only.)
        • Ensure \core\task\stats_cron_task is executed for generating statistics data.
      Test
      1. Log in as admin.
      2. Go to Site administration ► Reports ► Course overview.
      3. View all reports and ensure there is a graph for following report type:
        • Most active courses
        • Most active courses (weighted)
        • Most participatory courses (enrolments)
        • Most participatory courses (views/posts)
      4. Go to Site administration ► Reports ► Statistics
      5. View report and make sure Line graph is generated.
      LDAP
      1. Set up an LDAP server with some users
      2. Turn on LDAP auth
      3. Run LDAP sync
      4. Make sure all the synced users have the correct information from LDAP
      5. Update a user in moodle
      6. Sync
      7. Check the user is updated in your LDAP server
      Show
      Prerequisites Test on sqlsrv driver only. No changes to other DBS present. This is a medium-to-large test with a lot of prerequisites. mssql, stats and ldap. If you think you need to, please ask for a test session to be created to share the load. Unit tests CI will do this too, but it's nice if we can get more environments running since it's such a hard-to-pin-down issue. Execute the PHPUnit tests --filter=\core_coursecatlib_test . --filter=search (uses temp tables) backup/controller/tests/controller_test.php Execute the PHPUnit tests, particularly those covering DML and those which trigger course backup and restore operations (e.g. \core_coursecatlib_test ). Backup test Backup the biggest/most complex course you have from any moodle instance. Preferably from 3.2 or lower. Save the mbz ready for use later. Make sure it has a SCORM module, if not add one. Use the zip below Restore the deadlock.mbz backup file Ensure it restores successfully Try restoring your big backup Ensure it restores successfully Stats test Prerequisites: Make sure you set "Enabled statistics" on Site administration ► Advanced features Make sure you have some data on site and run cron to generate statistics data. Use attached file to generate stats (Use it on dev server only.) Ensure \core\task\stats_cron_task is executed for generating statistics data. Test Log in as admin. Go to Site administration ► Reports ► Course overview. View all reports and ensure there is a graph for following report type: Most active courses Most active courses (weighted) Most participatory courses (enrolments) Most participatory courses (views/posts) Go to Site administration ► Reports ► Statistics View report and make sure Line graph is generated. LDAP Set up an LDAP server with some users Turn on LDAP auth Run LDAP sync Make sure all the synced users have the correct information from LDAP Update a user in moodle Sync Check the user is updated in your LDAP server
    • Workaround:
      Hide

      Use the mssql driver, and remain on PHP 5.x. Support for mssql has been dropped from PHP 7.0 onwards.

      Show
      Use the mssql driver, and remain on PHP 5.x. Support for mssql has been dropped from PHP 7.0 onwards.
    • Affected Branches:
      MOODLE_31_STABLE, MOODLE_32_STABLE, MOODLE_33_STABLE
    • Fixed Branches:
      MOODLE_32_STABLE, MOODLE_33_STABLE
    • Pull from Repository:
    • Pull Master Branch:
      MDL-57698-master
    • Sprint:
      3.4 Sprint 2, 3.4 Sprint 3

      Description

      During backup and restore operations, the connection to SQL Server can end up in a deadlocked state. This issue appears to affect only the sqlsrv driver, regardless of host operating system. I assume that a workaround must be applied in FreeTDS for mssql.

      This deadlock appears to occur because of SELECT...UPDATE loops such as the following in \backup_structure_dbops::move_annotations_to_final():

      $rs = $DB->get_recordset('backup_ids_temp', array('backupid' => $backupid, 'itemname' => $itemname));
      foreach ($rs as $annotation) {
          $DB->set_field('backup_ids_temp', 'itemname', $itemname . 'final', array('id' => $annotation->id));
      }
      

      Executing the above will, in some cases, cause two competing locks to be obtained on the #backup_ids_temp table:

      • A SELECT, waiting for ASYNC_NETWORK_IO.
      • An UPDATE, waiting for an intent lock (LCK_M_IX).

      Reviewing the process list in the SSMS Activity Monitor will show two SUSPENDED processes, and the profiler will show the SELECT and UPDATE queries as the last two to have started (BatchStarted event), with no corresponding complete events.

      In the end, I settled on adding an optional $hints parameter to the get_recordset() and get_recordset_select() methods on \moodle_database which accepts a bitmask of HINT_* values (just HINT_NO_LOCK for now). I then added an empty get_hints() implementation to the class and added an implementation in \sqlsrv_native_moodle_database.

      For reference, I've explained how I came to this approach.

      I first attempted to solve this problem by switching from the READ_COMMITTED isolation level to SNAPSHOT. This appeared to resolve the problem affecting the backup process, but I was unable to get this to behave correctly in the PHPUnit environment without calling \advanced_testcase::preventResetByRollback(). Upon investigation, I realised that SQL Server's SNAPSHOT isolation mode prevents certain DDL changes within a transaction, yielding error 3964 when applying CREATE INDEX statements on temporary tables.

      I then modified the specific case preventing our backup from completing (user creation) to include the WITH (NOLOCK) clause and saw that the backup completed successfully. Tests invoking the backup and restore process also completed successfully.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                14 Vote for this issue
                Watchers:
                18 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Fix Release Date:
                  11/Sep/17