Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-58729

Full unicode support conversion impractically slow

    XMLWordPrintable

Details

    • MOODLE_33_STABLE
    • MOODLE_32_STABLE, MOODLE_33_STABLE
    • wip-MDL-58729-master
    • Hide
      • Testing must be done on a MySQL (or MariaDB) database.
      • The collation for your database needs to be three byte UTF-8 (e.g. utf8_unicode_ci).
      • Create two large courses using the test course creator. (Site administration > Development > make test course)
      • Make one of the tables in the database use the 'Compact' row format.
        • You can update a table to use 'Compact' with the following sql - ALTER TABLE `mdl_auth_oauth2_linked_login` ROW_FORMAT=COMPACT
        • Use the following sql to check that row format

          SELECT row_format
          FROM information_schema.tables
          WHERE table_schema = {name of your database here} AND table_name = 'mdl_auth_oauth2_linked_login'
          

      1. To get an idea of how long the script took before make the following changes: line 122

        $sql = "ALTER DATABASE $CFG->dbname CONVERT TO CHARACTER SET $charset COLLATE $collation";
        

        back to

        $sql = "ALTER DATABASE $CFG->dbname DEFAULT CHARACTER SET $charset DEFAULT COLLATE = $collation";
        

      2. Add some additional code to check the time taken to run the script.
      3. Run the CLI script php admin/cli/mysql_collation collation=utf8mb4_unicode_ci
      4. Make note of how long it took to run the script.
      5. Revert the database back to utf8_unicode_ci.
      6. Change the script to use the CONVERT TO query.
      7. Run the script again and note the time taken to finish. The new change to the query should show a significant difference in time taken to execute.
      8. Run the sql query to check the row format of the previously 'Compact' table to ensure that it has been changed to 'Compressed'
      Show
      Testing must be done on a MySQL (or MariaDB) database. The collation for your database needs to be three byte UTF-8 (e.g. utf8_unicode_ci). Create two large courses using the test course creator. (Site administration > Development > make test course) Make one of the tables in the database use the 'Compact' row format. You can update a table to use 'Compact' with the following sql -  ALTER TABLE `mdl_auth_oauth2_linked_login` ROW_FORMAT=COMPACT Use the following sql to check that row format SELECT row_format FROM information_schema.tables WHERE table_schema = {name of your database here} AND table_name = 'mdl_auth_oauth2_linked_login' To get an idea of how long the script took before make the following changes: line 122 $sql = "ALTER DATABASE $CFG->dbname CONVERT TO CHARACTER SET $charset COLLATE $collation"; back to $sql = "ALTER DATABASE $CFG->dbname DEFAULT CHARACTER SET $charset DEFAULT COLLATE = $collation"; Add some additional code to check the time taken to run the script. Run the CLI script php admin/cli/mysql_collation collation=utf8mb4_unicode_ci Make note of how long it took to run the script. Revert the database back to utf8_unicode_ci. Change the script to use the CONVERT TO query. Run the script again and note the time taken to finish. The new change to the query should show a significant difference in time taken to execute. Run the sql query to check the row format of the previously 'Compact' table to ensure that it has been changed to 'Compressed'
    • 3.4 Sprint 1

    Description

      This is neither a bug nor an improvement strictly speaking but I think it would be wise to have a proper think about it.

      On a database of any significant size the conversion to full unicode support is impractically slow. My test site has around 30,000 users, 8,000 courses and several years of data. Experiments (on a reasonably well specified MariaDB server) indicate that this may well take the order of several days to complete. 

      I'm still looking into this but it looks like the conversion may use vast amounts of disk space too. It's probably for temporary tables but if your server doesn't have it...

      It's a big ask for large sites to go offline for this period of time. 

      I know it's optional just now but these things have a habit of being required down the line. 

       

      Attachments

        Issue Links

          Activity

            People

              abgreeve Adrian Greeve
              howardsmiller Howard Miller
              John Okely John Okely
              Jake Dallimore Jake Dallimore
              David Monllaó David Monllaó
              David Woloszyn, Huong Nguyen, Jake Dallimore, Meirza, Michael Hawkins, Raquel Ortega, Safat Shahin, Stevani Andolo
              Votes:
              7 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:
                10/Jul/17