Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-53273

Solr search uses unicode unsafe string truncation

    XMLWordPrintable

    Details

    • Testing Instructions:
      Hide
      1. Enable global search, with the Solr plugin
      2. Make sure that indexing of Assign activities is on, and cron isn't running
      3. Download the attached file in a UTF-8 editor, it should be one space followed by thousands of Japanese characters
        • The single space ensures that when truncated, it falls in the middle of a multi-byte unicode character
      4. Copy the contents of the file
      5. Create a new assignment, and in the description field paste the contents
      6. php admin/tool/task/cli/schedule_task.php --execute="\\core\\task\\search_task"

      7. Make sure no errors are thrown
      8. Search for "訳し"
      9. Confirm the new assignment is a result
      10. Comment out the following line in lib/classes/text.php::str_max_bytes():

        +            return mb_strcut($string, 0, $bytes, 'UTF-8');
        

      11. Repeat the instructions above
      Show
      Enable global search, with the Solr plugin Make sure that indexing of Assign activities is on, and cron isn't running Download the attached file in a UTF-8 editor, it should be one space followed by thousands of Japanese characters The single space ensures that when truncated, it falls in the middle of a multi-byte unicode character Copy the contents of the file Create a new assignment, and in the description field paste the contents php admin/tool/task/cli/schedule_task.php --execute="\\core\\task\\search_task" Make sure no errors are thrown Search for "訳し" Confirm the new assignment is a result Comment out the following line in lib/classes/text.php::str_max_bytes(): + return mb_strcut($string, 0, $bytes, 'UTF-8'); Repeat the instructions above
    • Affected Branches:
      MOODLE_31_STABLE
    • Fixed Branches:
      MOODLE_31_STABLE
    • Pull Master Branch:
      MDL-53273-master

      Description

      Solr, as configured, allows fields no larger than 32766 bytes. In search_solr\document::format_string_for_engine, substr is used to truncate the string.

      The problem is this is multi-byte unsafe, and can truncate the string mid-character, which results in Solr server exceptions being thrown.

      We can't use mb_substr, because it will return 32766 characters, which can be much larger than that number of bytes.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                1 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Fix Release Date:
                  23/May/16