Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-48228

MySQL drivers only use partial UTF-8

    XMLWordPrintable

    Details

    • Database:
      MySQL
    • Testing Instructions:
      Hide

      Create a new install.

      1. Make sure that my.cnf (/etc/mysql) settings are as follows:

        [mysqld]
        innodb_file_format = barracuda
        innodb_large_prefix
        innodb_file_per_table = 1
        

      2. Reset the mysql server.
      3. Let the web interface create the database. This will happen if you do not create a database beforehand. If the settings give correct access to the database then it will create the database.
      4. Progressing through the installation pages you should reach the environment settings page. If you didn't have any of the above settings set, then errors will appear with information about the setting that needs to be enabled.
      5. Continue with the installation until you reach the home page.
      6. Add a four byte character to a form.
        1. Go to / create a course and turn editing on.
        2. Create a page or label or something similar.
        3. Use one of the characters contained in test.html that is attached to this issue.
      7. When saving you should not see a database error. When viewing the page / label / whatever you should see the characters that you entered. Visibility is determined by system / browser configuration.
      8. Check the database and look at the collation for each table. It should be 'utf8mb4_general_ci' or something similar to that. Not 'utf8_general_ci' or a similar derivative.

      Create a new install. This time with utf8.

      1. Do a web install. If you are not prompted to copy and paste the text for the config.php file then stop when reaching the environment check page.
      2. Update your config.php file with the following database options:

        $CFG->dboptions = array (
          'dbpersist' => 0,
          'dbport' => 3306,
          'dbsocket' => '',
          'dbcollation' => 'utf8_unicode_ci', // This is the important line to add.
        );
        

      3. After the config.php file has been updated continue on with the installation process.
      4. Add some random content to your moodle install such as courses with different activities. Use the test course generator if you like.
      5. Time to use the collation conversion tool.
      6. Go back into config.php and either remove the 'dbcollation' line or change the collation to 'utf8mb4_unicode_ci'
      7. From the command line run the following script: php admin/cli/mysql_collation.php --collation=utf8mb4_unicode_ci
      8. All of the tables should be converted to the new collation. Check the database to see that this is the case.
      9. Once the conversion is complete try to insert a 4 byte character as described above.

      Do a command line install.

      1. Run the following command: php admin/cli/install.php
      2. Enter in the required information.
      3. Once the installation is complete try to insert a four byte character.
      4. Check the database to ensure that the collation starts with utf8mb4.

      Do a command line install but skip the database installation.

      1. Run the following command: php admin/cli/install.php --skip-database
      2. Fill in all of the details.
      3. When this part of the installation is complete update the config.php file to include the database options as was done with the web install (plain utf8).
      4. Run the following command: {{php admin/cli/install_database.php --adminpass= {adminpassword}

        --agree-license}}

      5. When the installation has finished check that collation for the tables is set to utf8_ {blah}

        .

      Additional time consuming testing.

      • For each of the different installations mentioned above it would be good to test that errors are thrown when the database config settings are not set for use with utf8mb4. Disable / Change the settings in the my.cnf file and check when installing and using the cli collation script that errors are created.
      • The following should also be done on a windows apache mysql setup as well.
      • If that wasn't enough, please also check Maria DB.
      Show
      Create a new install. Make sure that my.cnf (/etc/mysql) settings are as follows: [mysqld] innodb_file_format = barracuda innodb_large_prefix innodb_file_per_table = 1 Reset the mysql server. Let the web interface create the database. This will happen if you do not create a database beforehand. If the settings give correct access to the database then it will create the database. Progressing through the installation pages you should reach the environment settings page. If you didn't have any of the above settings set, then errors will appear with information about the setting that needs to be enabled. Continue with the installation until you reach the home page. Add a four byte character to a form. Go to / create a course and turn editing on. Create a page or label or something similar. Use one of the characters contained in test.html that is attached to this issue. When saving you should not see a database error. When viewing the page / label / whatever you should see the characters that you entered. Visibility is determined by system / browser configuration. Check the database and look at the collation for each table. It should be 'utf8mb4_general_ci' or something similar to that. Not 'utf8_general_ci' or a similar derivative. Create a new install. This time with utf8. Do a web install. If you are not prompted to copy and paste the text for the config.php file then stop when reaching the environment check page. Update your config.php file with the following database options: $CFG->dboptions = array ( 'dbpersist' => 0, 'dbport' => 3306, 'dbsocket' => '', 'dbcollation' => 'utf8_unicode_ci', // This is the important line to add. ); After the config.php file has been updated continue on with the installation process. Add some random content to your moodle install such as courses with different activities. Use the test course generator if you like. Time to use the collation conversion tool. Go back into config.php and either remove the 'dbcollation' line or change the collation to 'utf8mb4_unicode_ci' From the command line run the following script: php admin/cli/mysql_collation.php --collation=utf8mb4_unicode_ci All of the tables should be converted to the new collation. Check the database to see that this is the case. Once the conversion is complete try to insert a 4 byte character as described above. Do a command line install. Run the following command: php admin/cli/install.php Enter in the required information. Once the installation is complete try to insert a four byte character. Check the database to ensure that the collation starts with utf8mb4. Do a command line install but skip the database installation. Run the following command: php admin/cli/install.php --skip-database Fill in all of the details. When this part of the installation is complete update the config.php file to include the database options as was done with the web install (plain utf8). Run the following command: {{php admin/cli/install_database.php --adminpass= {adminpassword} --agree-license}} When the installation has finished check that collation for the tables is set to utf8_ {blah} . Additional time consuming testing. For each of the different installations mentioned above it would be good to test that errors are thrown when the database config settings are not set for use with utf8mb4. Disable / Change the settings in the my.cnf file and check when installing and using the cli collation script that errors are created. The following should also be done on a windows apache mysql setup as well. If that wasn't enough, please also check Maria DB.
    • Difficulty:
      Moderate
    • Affected Branches:
      MOODLE_27_STABLE, MOODLE_28_STABLE, MOODLE_29_STABLE, MOODLE_31_STABLE, MOODLE_32_STABLE, MOODLE_33_STABLE
    • Fixed Branches:
      MOODLE_31_STABLE, MOODLE_32_STABLE
    • Pull from Repository:
    • Pull Master Branch:
      wip-MDL-48228-master-v2
    • Sprint:
      3.3 sprint 4
    • Issue size:
      Epic

      Description

      Thanks to Haruhiko Okumura for bringing this to our attention.

      This is a particularly bad bug that where posting a message containing a character outside of the Unicode BMP (16-bit Basic Multilingual Plane) such as emoji results in a "Error writing to database" message and a complete loss of what was being written (no draft is restored after the error message is displayed.

      It's not just emoji that are affected as there are some Japanese kanji that are beyond the scope of the UTF-16 BMP, e.g. the kanji for "shikaru" (to scold), so this isn't just for razzle-dazzle - this actually impedes functional communication.

      It'd be nice if Moodle could filter out offending characters before database error occurs, or perhaps fix the behavior entirely if possible.

      Current known affected DBs:
      • MySQL 5.5.40
      Current known unaffected DBs:
      • Postgres
      Current untested DBs:
      • Oracle
      • MSSQL
      • MariaDB

      Reproduction steps

      • Use a browser / OS with at least partial emoji support
      • Download / open the attached html file.
        • NOTE: the test characters can't be directly embedded in this description, as JIRA exhibits the same issue and will just throw your post away if there's anything in it it doesn't underrstand.
      • Set the autosave delay for Atto to something short, like 5 seconds.
      • Create a forum activity, and in your new forum...
        • Create a new discussion, with the body set to the first element in the attached html (i.e. [notebook] memo) and wait for autosave to kick in.
          • If your DB is affected: instead of a successful autosave, you'll see an error: "Could not connect to the server. If you submit this page now, your changes may be lost."
        • Hit submit.
          • If your DB is affected: instead of notifying you of a successful post, you'll see an error: "Error writing to database" and upon returning to the create discussion page, as no draft was saved, your entire post is gone.
        • Repeat test with second element in the attached html file.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                27 Vote for this issue
                Watchers:
                31 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Fix Release Date:
                  13/Mar/17