Moodle
  1. Moodle
  2. MDL-32427

HTML Purifier strips non standard protocol links such as rtsp

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.2, 2.2.1, 2.2.2
    • Fix Version/s: 2.1.6, 2.2.3
    • Component/s: Other
    • Labels:
      None
    • Rank:
      39291

      Description

      When upgrading a site from 2.1 to 2.2, we find that the HTML Purifier setting is no longer listed in Site Policies. Un-checking this setting was allowing us to access to thousands of video links on our site, which were created as in-line text links inside Glossary entries (essentially forming a video library).

      The streaming links (both Quicktime and Flash) are stripped during the HTML Purification process to just <a>tagged words like this</a>. When we un-check the setting in 2.1, the full links are returned. The Enable Trusted Content setting is on, and so is the Embed/Object tag setting.

      With the setting gone in 2.2, we are facing a huge content access problem.

      Can anyone describe why the setting was removed, and/or let me know if this is something that can be brought back? If not, what can I do to ensure that my links are not cleaned? Thanks so much for your help!

      Laura Mikowychok

        Issue Links

          Activity

          Hide
          Petr Škoda added a comment -

          Hello,

          in older versions you could use select KSES text cleaning or HTML Purifier - the first option was removed because it was not secure, so there is no need for the option now.

          I suppose you are talking about video embedding which is handled via multimedia filter which is independent from HTML Purifier. HTML purifier prevents flash and java, it does note remove <a> style multimedia links, it strips only <iframe>, <object>, <embed> from text BEFORE it is sent to multimedia filter which converts some <a> links to media.

          If you use <object or embed> tags in html code then the only thing you could do is to edit moodle code and relax the text cleanup, but make sure to not introduce any security vulnerabilities...

          Petr

          Show
          Petr Škoda added a comment - Hello, in older versions you could use select KSES text cleaning or HTML Purifier - the first option was removed because it was not secure, so there is no need for the option now. I suppose you are talking about video embedding which is handled via multimedia filter which is independent from HTML Purifier. HTML purifier prevents flash and java, it does note remove <a> style multimedia links, it strips only <iframe>, <object>, <embed> from text BEFORE it is sent to multimedia filter which converts some <a> links to media. If you use <object or embed> tags in html code then the only thing you could do is to edit moodle code and relax the text cleanup, but make sure to not introduce any security vulnerabilities... Petr
          Hide
          Thomas Haines added a comment -

          Petr,

          I am a colleague of Laura's, and I'd like to ellaborate a bit on this particular use case of our client. They are entering links to videos into Glossary entries on a Moodle 1.9 instance.

          The HTML looks like this:
          7 Wonders: <a href="rtsp://stream.ourstreamingserver.org/_all/sci/env/env_11_o_7wondersbiomes.mov">Click Here</a>

          Then when we upgrade to Moodle 2.1, the HTML gets cleansed to:
          7 Wonders: <a>Click Here</a>

          Then when we turn on HTML Purifier, the HTML no longer gets cleansed, and the links work properly.

          But then, when we upgrade to Moodle 2.2, the HTML gets re-cleansed and there is no setting to turn on HTML Purifier is missing.

          Show
          Thomas Haines added a comment - Petr, I am a colleague of Laura's, and I'd like to ellaborate a bit on this particular use case of our client. They are entering links to videos into Glossary entries on a Moodle 1.9 instance. The HTML looks like this: 7 Wonders: <a href="rtsp://stream.ourstreamingserver.org/_all/sci/env/env_11_o_7wondersbiomes.mov">Click Here</a> Then when we upgrade to Moodle 2.1, the HTML gets cleansed to: 7 Wonders: <a>Click Here</a> Then when we turn on HTML Purifier, the HTML no longer gets cleansed, and the links work properly. But then, when we upgrade to Moodle 2.2, the HTML gets re-cleansed and there is no setting to turn on HTML Purifier is missing.
          Hide
          Petr Škoda added a comment -

          Oh, we explicitly tell htmlpurifier to allow rtsp, but for some reason it strips it. I am now able to reproduce the problem, thanks for the extra information.

          Show
          Petr Škoda added a comment - Oh, we explicitly tell htmlpurifier to allow rtsp, but for some reason it strips it. I am now able to reproduce the problem, thanks for the extra information.
          Hide
          Petr Škoda added a comment -

          confirming, I need to write new validator class for rtsp.

          Show
          Petr Škoda added a comment - confirming, I need to write new validator class for rtsp.
          Hide
          Dan Poltawski added a comment -

          The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week.

          TIA and ciao

          Show
          Dan Poltawski added a comment - The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week. TIA and ciao
          Hide
          Dan Poltawski added a comment -

          Hmm, I might be overthinking this - but just commenting my thoughts on this to see what you think..

          • I think calling the addition 'lib.php' a change is a bit misleading for upstreams (e.g. debian/ubuntu), because that lib is just an addition which is 'poluting' the same directory directory. Debian could strip our bundled library if they wanted and still include the lib.php and our 'customisations' would be in tact.
          • They might actually introduce their own lib.php in the future!
          • Should lib be MOODLE_INTERNAL restricted?

          So my suggestion would be to call the file moodle_htmlpurfifer.php (or something along those lines) to make it clear its 'our file'. It could be outside of the htmlpurifier directory (maybe?) and it could include htmlpurifer itself and weblib only includes that file.

          Show
          Dan Poltawski added a comment - Hmm, I might be overthinking this - but just commenting my thoughts on this to see what you think.. I think calling the addition 'lib.php' a change is a bit misleading for upstreams (e.g. debian/ubuntu), because that lib is just an addition which is 'poluting' the same directory directory. Debian could strip our bundled library if they wanted and still include the lib.php and our 'customisations' would be in tact. They might actually introduce their own lib.php in the future! Should lib be MOODLE_INTERNAL restricted? So my suggestion would be to call the file moodle_htmlpurfifer.php (or something along those lines) to make it clear its 'our file'. It could be outside of the htmlpurifier directory (maybe?) and it could include htmlpurifer itself and weblib only includes that file.
          Hide
          Petr Škoda added a comment - - edited

          1/ I was thinking about the file name too and did not like it, no problem if it gets renamed, let's see what Eloy says.
          2/ if downstream changes anything the tests should detect it - that is why I keep adding tests especially for our hacks in libs
          3/ yes, our libs should contain MOODLE_INTERNAL, fixing after we decide the file name change

          thanks!!

          Show
          Petr Škoda added a comment - - edited 1/ I was thinking about the file name too and did not like it, no problem if it gets renamed, let's see what Eloy says. 2/ if downstream changes anything the tests should detect it - that is why I keep adding tests especially for our hacks in libs 3/ yes, our libs should contain MOODLE_INTERNAL, fixing after we decide the file name change thanks!!
          Hide
          Eloy Lafuente (stronk7) added a comment - - edited

          ah, no probs with the name, perhaps we could add it into sort of lib/htmlpurifierlib.php (to keep the dir clean) or so? Or perhaps lib/htmlpurifier/locallib.php ?

          Maybe the 1st is the one used in other places like textlib... so +1 for it, I think.

          Ciao

          PS: Any of them +1

          Show
          Eloy Lafuente (stronk7) added a comment - - edited ah, no probs with the name, perhaps we could add it into sort of lib/htmlpurifierlib.php (to keep the dir clean) or so? Or perhaps lib/htmlpurifier/locallib.php ? Maybe the 1st is the one used in other places like textlib... so +1 for it, I think. Ciao PS: Any of them +1
          Hide
          Petr Škoda added a comment -

          I like the locallib.php to keep it out from /lib/, I try not to use the word 'moodle' because it is a trademark.

          Show
          Petr Škoda added a comment - I like the locallib.php to keep it out from /lib/, I try not to use the word 'moodle' because it is a trademark.
          Hide
          Petr Škoda added a comment -

          repos updated, thanks!

          Show
          Petr Škoda added a comment - repos updated, thanks!
          Hide
          Petr Škoda added a comment -

          we use the pdflib.php and the likes to expose public APIs for 3rd party libs, but in this case we add internal features to htmlpurifier, all the public api is actually in weblib.php which makes this a special case imho.

          Show
          Petr Škoda added a comment - we use the pdflib.php and the likes to expose public APIs for 3rd party libs, but in this case we add internal features to htmlpurifier, all the public api is actually in weblib.php which makes this a special case imho.
          Hide
          Dan Poltawski added a comment -

          Thanks, this has been integrated now

          Show
          Dan Poltawski added a comment - Thanks, this has been integrated now
          Hide
          Michael de Raadt added a comment -

          I ran the PHPUnit tests and, apart from some know issues, they passed.

          However, these changes were applied to other branches also. How should they be tested?

          Also, is there a way that we can functionally test these changes? We should not rely solely on unit tests.

          Show
          Michael de Raadt added a comment - I ran the PHPUnit tests and, apart from some know issues, they passed. However, these changes were applied to other branches also. How should they be tested? Also, is there a way that we can functionally test these changes? We should not rely solely on unit tests.
          Hide
          Petr Škoda added a comment -

          If you want to test manually add the code from unit tests to your forum post as student and see if it gets through. Then try some bogus protocol in url and make sure it does not get through.

          Show
          Petr Škoda added a comment - If you want to test manually add the code from unit tests to your forum post as student and see if it gets through. Then try some bogus protocol in url and make sure it does not get through.
          Hide
          Michael de Raadt added a comment -

          I tested the filter manually in 2.1 and 2.2. I added the following content to a forum post...

          <a href="http://www.example.com/course/view.php?id=5">link</a><br />
          <a href="https://www.example.com/course/view.php?id=5">link</a><br />
          <a href="ftp://user@ftp.example.com/some/file.txt">link</a><br />
          <a href="nntp://example.com/group/123">link</a><br />
          <a href="news:groupname">link</a><br />
          <a href="mailto:user@example.com">link</a><br />
          <a href="irc://irc.example.com/3213?pass">link</a><br />
          <a href="rtsp://www.example.com/movie.mov">link</a><br />
          <a href="teamspeak://speak.example.com/?par=val?par2=val2">link</a><br />
          <a href="gopher://gopher.example.com/resource">link</a><br />
          <a href="mms://www.example.com/movie.mms">link</a><br />
          <a href="javascript://www.example.com">link</a><br />
          <a href="hmmm://www.example.com">link</a>
          

          ...and the resulting stored post was (in both 2.1 and 2.2)...

          <a href="http://www.example.com/course/view.php?id=5">link</a><br />
          <a href="https://www.example.com/course/view.php?id=5">link</a><br />
          <a href="ftp://user@ftp.example.com/some/file.txt">link</a><br />
          <a href="nntp://example.com/group/123">link</a><br />
          <a href="news:groupname">link</a><br />
          <a href="mailto:user@example.com">link</a><br />
          <a href="irc://irc.example.com/3213?pass">link</a><br />
          <a href="rtsp://www.example.com/movie.mov">link</a><br />
          <a href="teamspeak://speak.example.com/?par=val?par2=val2">link</a><br />
          <a href="gopher://gopher.example.com/resource">link</a><br />
          <a href="mms://www.example.com/movie.mms">link</a><br />
          <a>link</a><br />
          <a>link</a>
          

          ...which I believe is the desired result.

          Show
          Michael de Raadt added a comment - I tested the filter manually in 2.1 and 2.2. I added the following content to a forum post... <a href= "http: //www.example.com/course/view.php?id=5" >link</a><br /> <a href= "https: //www.example.com/course/view.php?id=5" >link</a><br /> <a href= "ftp: //user@ftp.example.com/some/file.txt" >link</a><br /> <a href= "nntp: //example.com/group/123" >link</a><br /> <a href= "news:groupname" >link</a><br /> <a href= "mailto:user@example.com" >link</a><br /> <a href= "irc: //irc.example.com/3213?pass" >link</a><br /> <a href= "rtsp: //www.example.com/movie.mov" >link</a><br /> <a href= "teamspeak: //speak.example.com/?par=val?par2=val2" >link</a><br /> <a href= "gopher: //gopher.example.com/resource" >link</a><br /> <a href= "mms: //www.example.com/movie.mms" >link</a><br /> <a href= "javascript: //www.example.com" >link</a><br /> <a href= "hmmm: //www.example.com" >link</a> ...and the resulting stored post was (in both 2.1 and 2.2)... <a href= "http: //www.example.com/course/view.php?id=5" >link</a><br /> <a href= "https: //www.example.com/course/view.php?id=5" >link</a><br /> <a href= "ftp: //user@ftp.example.com/some/file.txt" >link</a><br /> <a href= "nntp: //example.com/group/123" >link</a><br /> <a href= "news:groupname" >link</a><br /> <a href= "mailto:user@example.com" >link</a><br /> <a href= "irc: //irc.example.com/3213?pass" >link</a><br /> <a href= "rtsp: //www.example.com/movie.mov" >link</a><br /> <a href= "teamspeak: //speak.example.com/?par=val?par2=val2" >link</a><br /> <a href= "gopher: //gopher.example.com/resource" >link</a><br /> <a href= "mms: //www.example.com/movie.mms" >link</a><br /> <a>link</a><br /> <a>link</a> ...which I believe is the desired result.
          Hide
          Michael de Raadt added a comment - - edited

          Test result: Success.

          Tested in 2.1, 2.2 and master.

          Show
          Michael de Raadt added a comment - - edited Test result: Success. Tested in 2.1, 2.2 and master.
          Hide
          Michael de Raadt added a comment -

          I thought I should confirm the results of the unit test and ran the same list of links through master and experienced the correct result.

          Show
          Michael de Raadt added a comment - I thought I should confirm the results of the unit test and ran the same list of links through master and experienced the correct result.
          Hide
          Eloy Lafuente (stronk7) added a comment -

          This has been near becoming rejected, because it's not the best code you are able to produce.

          But, luckily, at the end, it has landed and has been spread to all repos out there.

          Many thanks and, don't forget it, keep improving your skills, you can!

          Closing, ciao

          Show
          Eloy Lafuente (stronk7) added a comment - This has been near becoming rejected, because it's not the best code you are able to produce. But, luckily, at the end, it has landed and has been spread to all repos out there. Many thanks and, don't forget it, keep improving your skills, you can! Closing, ciao

            People

            • Votes:
              2 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: