Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-60943

core_search: Improve exception for preg_replace errors

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.3.3, 3.4, 3.5
    • Fix Version/s: 3.3.4, 3.4.1
    • Component/s: Global search
    • Labels:
    • Testing Instructions:
      Hide

      NOTE: Probably most people don't have a system that can reproduce this as it only happens on certain software versions, so in order to test it, we have to hack a file to make the error occur. (I did do a local test of the real thing on an affected machine - the test script I used is the same as below, except instead of hacking the code, I did a database update to change the text of the label to include the character U+FDD0.)

      To test this you will need a system with working Moodle global search install, including a Solr search engine.

      Please make sure developer debugging is enabled in order to see the expected message in this test

      1. Run search indexing (e.g. using the 'Run now' on the scheduled tasks page for the 'Global search indexing' task) and ensure the index is current.
      2. Add a Label to a course page (the text doesn't matter).
      3. Add the hack code line (below) to search/classes/document.php, about line 283, immediately below the line '$this->data[$fieldname] = preg_replace("/\s+/u", " ", $value);'
      4. Run search indexing again.
        • EXPECTED: The task should fail with an error message that includes text similar to: '"content" value causes preg_replace error (may be caused by unusual characters) in document with id "mod_label-activity-1145"
      5. Don't forget to remove the added hack line.

      if ($fieldname === 'content') { $this->data[$fieldname] = null; }
      

      Show
      NOTE: Probably most people don't have a system that can reproduce this as it only happens on certain software versions, so in order to test it, we have to hack a file to make the error occur. (I did do a local test of the real thing on an affected machine - the test script I used is the same as below, except instead of hacking the code, I did a database update to change the text of the label to include the character U+FDD0.) To test this you will need a system with working Moodle global search install, including a Solr search engine. Please make sure developer debugging is enabled in order to see the expected message in this test Run search indexing (e.g. using the 'Run now' on the scheduled tasks page for the 'Global search indexing' task) and ensure the index is current. Add a Label to a course page (the text doesn't matter). Add the hack code line (below) to search/classes/document.php, about line 283, immediately below the line '$this->data [$fieldname] = preg_replace("/\s+/u", " ", $value);' Run search indexing again. EXPECTED: The task should fail with an error message that includes text similar to: '"content" value causes preg_replace error (may be caused by unusual characters) in document with id "mod_label-activity-1145" Don't forget to remove the added hack line. if ($fieldname === 'content') { $this->data[$fieldname] = null; }
    • Workaround:
      Hide

      In Postgres, you can find affected messages like this (example for OU ForumNG but you can change to any table):

      select * from mdl_forumng_posts
      where message ~ '[\uFDD0-\uFDEF]'

      You can then remove them like this:

      update mdl_forumng_posts
      set message = regexp_replace(message, '[\uFDD0-\uFDEF]', '', 'g')
      where message ~ '[\uFDD0-\uFDEF]'

      Show
      In Postgres, you can find affected messages like this (example for OU ForumNG but you can change to any table): select * from mdl_forumng_posts where message ~ ' [\uFDD0-\uFDEF] ' You can then remove them like this: update mdl_forumng_posts set message = regexp_replace(message, ' [\uFDD0-\uFDEF] ', '', 'g') where message ~ ' [\uFDD0-\uFDEF] '
    • Affected Branches:
      MOODLE_33_STABLE, MOODLE_34_STABLE, MOODLE_35_STABLE
    • Fixed Branches:
      MOODLE_33_STABLE, MOODLE_34_STABLE
    • Pull Master Branch:
      MDL-60943-master

      Description

      As part of \core_search\document::set function, it does a preg_replace with the Unicode /u modifier.

      On some systems (for example a server running RHEL 7 with PHP 5.6.31, PCRE library version 8.32) this function returns null if passed a source string containing the (non-)character U+FDD0.

      On other systems (for example my desktop running Windows 7 with PHP 7.0.14, PCRE 8.38) it works fine.

      We had a very few forum posts containing this character. I have fixed them in the database.

      I don't propose taking any specific action involving this character - it seems like this character is not really a problem for search alone and if we were to block it, it should be done somewhere when accepting user input; also it's very rare. However, I suggest we improve the exception message to make it easier to track down this problem if it occurs to others in future.

      Note: The following complete PHP script is sufficient to observe this issue. On systems with the problem it will output 'null' from the var_dump. On systems that don't it will output a string containing the character.

      <?php
      $str = '&#xfdd0;';
      $evil = html_entity_decode($str);
      var_dump(preg_replace('/x/u', '', $evil));
      

       Example of error message you get before this change:

      Scheduled task failed: Global search indexing (core\task\search_index_task),Coding error detected, it must be fixed by a programmer: Missing "content" field in document with id "mod_label-activity-1145"

      Example of error message you get after it:

      Scheduled task failed: Global search indexing (core\task\search_index_task),An error occurred while indexing
      Debug info:
      "content" value causes preg_replace error (may be caused by unusual characters) in document with id "mod_label-activity-7"

      Small change but it took me a long time to debug this because the unusual character wasn't visible to the naked eye so I didn't notice it; I had been assuming the content value was somehow blank.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Fix Release Date:
                  15/Jan/18