Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-37241

HTML purifier is stripping out/mangling content causing issues with filters.

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Not a bug
    • Affects Version/s: 2.4
    • Fix Version/s: None
    • Labels:
      None
    • Workaround:
      Hide

      Avoid the use of single HTML tags or invalid HTML tags within code snippets.

      Show
      Avoid the use of single HTML tags or invalid HTML tags within code snippets.
    • Affected Branches:
      MOODLE_24_STABLE

      Description

      Sorry about the title I couldn't imagine how to frame this in a single sentence.

      I noticed a bug this morning while replying to posts on Moodle.org forums.
      With Eloy's help we tracked it down to clean_text + html_purifier which is stripping invalid tags and trying to fix HTML => XHTML.
      This in conjunction with the Geshi filter in particular is leading to a mess.

      To quickly see what is going on:

      1. enable the geshi filter on your site.
      2. save the following a test.php and browse to it.

      <?php
      require_once('config.php');
      require_once($CFG->dirroot.'/filter/geshi/filter.php');
       
      $PAGE->set_context(get_system_context());
       
      $test = "<code>echo '<div>';</code>";
       
      echo "<h1>Result of geshi_filter</h1><pre style='border: 1px solid #000;'>".geshi_filter(1, $test)."</pre>";
      echo "<h1>Result of format_text</h1><pre style='border: 1px solid #000;'>".format_text($test, FORMAT_MOODLE, array('nocache' => true))."</pre>";
      echo "<h1>Result of format_text (source)</h1><pre style='border: 1px solid #000;'>".htmlspecialchars(format_text($test, FORMAT_MOODLE, array('nocache' => true)))."</pre>";

      Whats of concern is that if you look at the third and final block you will see that:

      <code>echo '<div>';</code>

      Has been converted to:

      <code>echo '</code><div><code>';</code></div><code></code>

      What's happened is that htmlpurifier has attempted to correct the <div> tag within the string in an effort to fix the HTML.
      If you have it user <diva> instead it gets stripped out entirely because diva is not a valid XHTML tag.

      I hit this particular issue this morning while trying to share a code snippet.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                skodak Petr Skoda
                Reporter:
                samhemelryk Sam Hemelryk
                Participants:
                Component watchers:
                Andrew Nicols, Mathew May, Michael Hawkins, Shamim Rezaie, Simey Lameze, Amaia Anabitarte, Carlos Escobedo, Ferran Recio, Sara Arjona (@sarjona), Víctor Déniz Falcón
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: