Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-53273

Solr search uses unicode unsafe string truncation

    XMLWordPrintable

Details

    • MOODLE_31_STABLE
    • MOODLE_31_STABLE
    • MDL-53273-master
    • Hide
      1. Enable global search, with the Solr plugin
      2. Make sure that indexing of Assign activities is on, and cron isn't running
      3. Download the attached file in a UTF-8 editor, it should be one space followed by thousands of Japanese characters
        • The single space ensures that when truncated, it falls in the middle of a multi-byte unicode character
      4. Copy the contents of the file
      5. Create a new assignment, and in the description field paste the contents
      6. php admin/tool/task/cli/schedule_task.php --execute="\\core\\task\\search_task"

      7. Make sure no errors are thrown
      8. Search for "訳し"
      9. Confirm the new assignment is a result
      10. Comment out the following line in lib/classes/text.php::str_max_bytes():

        +            return mb_strcut($string, 0, $bytes, 'UTF-8');
        

      11. Repeat the instructions above
      Show
      Enable global search, with the Solr plugin Make sure that indexing of Assign activities is on, and cron isn't running Download the attached file in a UTF-8 editor, it should be one space followed by thousands of Japanese characters The single space ensures that when truncated, it falls in the middle of a multi-byte unicode character Copy the contents of the file Create a new assignment, and in the description field paste the contents php admin/tool/task/cli/schedule_task.php --execute="\\core\\task\\search_task" Make sure no errors are thrown Search for "訳し" Confirm the new assignment is a result Comment out the following line in lib/classes/text.php::str_max_bytes(): + return mb_strcut($string, 0, $bytes, 'UTF-8'); Repeat the instructions above

    Description

      Solr, as configured, allows fields no larger than 32766 bytes. In search_solr\document::format_string_for_engine, substr is used to truncate the string.

      The problem is this is multi-byte unsafe, and can truncate the string mid-character, which results in Solr server exceptions being thrown.

      We can't use mb_substr, because it will return 32766 characters, which can be much larger than that number of bytes.

      Attachments

        Issue Links

          Activity

            People

              emerrill Eric Merrill
              emerrill Eric Merrill
              David Monllaó David Monllaó
              Andrew Lyons Andrew Lyons
              Adrian Greeve Adrian Greeve
              David Woloszyn, Huong Nguyen, Jake Dallimore, Meirza, Michael Hawkins, Raquel Ortega, Safat Shahin, Stevani Andolo, David Woloszyn, Huong Nguyen, Jake Dallimore, Meirza, Michael Hawkins, Raquel Ortega, Safat Shahin, Stevani Andolo
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:
                23/May/16