Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-53273

Solr search uses unicode unsafe string truncation

XMLWordPrintable

    • MOODLE_31_STABLE
    • MOODLE_31_STABLE
    • MDL-53273-master
    • Hide
      1. Enable global search, with the Solr plugin
      2. Make sure that indexing of Assign activities is on, and cron isn't running
      3. Download the attached file in a UTF-8 editor, it should be one space followed by thousands of Japanese characters
        • The single space ensures that when truncated, it falls in the middle of a multi-byte unicode character
      4. Copy the contents of the file
      5. Create a new assignment, and in the description field paste the contents
      6. php admin/tool/task/cli/schedule_task.php --execute="\\core\\task\\search_task"

      7. Make sure no errors are thrown
      8. Search for "訳し"
      9. Confirm the new assignment is a result
      10. Comment out the following line in lib/classes/text.php::str_max_bytes():

        +            return mb_strcut($string, 0, $bytes, 'UTF-8');
        

      11. Repeat the instructions above
      Show
      Enable global search, with the Solr plugin Make sure that indexing of Assign activities is on, and cron isn't running Download the attached file in a UTF-8 editor, it should be one space followed by thousands of Japanese characters The single space ensures that when truncated, it falls in the middle of a multi-byte unicode character Copy the contents of the file Create a new assignment, and in the description field paste the contents php admin/tool/task/cli/schedule_task.php --execute="\\core\\task\\search_task" Make sure no errors are thrown Search for "訳し" Confirm the new assignment is a result Comment out the following line in lib/classes/text.php::str_max_bytes(): + return mb_strcut($string, 0, $bytes, 'UTF-8'); Repeat the instructions above

      Solr, as configured, allows fields no larger than 32766 bytes. In search_solr\document::format_string_for_engine, substr is used to truncate the string.

      The problem is this is multi-byte unsafe, and can truncate the string mid-character, which results in Solr server exceptions being thrown.

      We can't use mb_substr, because it will return 32766 characters, which can be much larger than that number of bytes.

            emerrill Eric Merrill
            emerrill Eric Merrill
            David Monllaó David Monllaó
            Andrew Lyons Andrew Lyons
            Adrian Greeve Adrian Greeve
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved:

                Error rendering 'clockify-timesheets-time-tracking-reports:timer-sidebar'. Please contact your Jira administrators.