Moodle
  1. Moodle
  2. MDL-32427

HTML Purifier strips non standard protocol links such as rtsp

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.2, 2.2.1, 2.2.2
    • Fix Version/s: 2.1.6, 2.2.3
    • Component/s: Other
    • Labels:
      None

      Description

      When upgrading a site from 2.1 to 2.2, we find that the HTML Purifier setting is no longer listed in Site Policies. Un-checking this setting was allowing us to access to thousands of video links on our site, which were created as in-line text links inside Glossary entries (essentially forming a video library).

      The streaming links (both Quicktime and Flash) are stripped during the HTML Purification process to just <a>tagged words like this</a>. When we un-check the setting in 2.1, the full links are returned. The Enable Trusted Content setting is on, and so is the Embed/Object tag setting.

      With the setting gone in 2.2, we are facing a huge content access problem.

      Can anyone describe why the setting was removed, and/or let me know if this is something that can be brought back? If not, what can I do to ensure that my links are not cleaned? Thanks so much for your help!

      Laura Mikowychok

        Gliffy Diagrams

          Issue Links

            Activity

            Hide
            Petr Skoda added a comment -

            Hello,

            in older versions you could use select KSES text cleaning or HTML Purifier - the first option was removed because it was not secure, so there is no need for the option now.

            I suppose you are talking about video embedding which is handled via multimedia filter which is independent from HTML Purifier. HTML purifier prevents flash and java, it does note remove <a> style multimedia links, it strips only <iframe>, <object>, <embed> from text BEFORE it is sent to multimedia filter which converts some <a> links to media.

            If you use <object or embed> tags in html code then the only thing you could do is to edit moodle code and relax the text cleanup, but make sure to not introduce any security vulnerabilities...

            Petr

            Show
            Petr Skoda added a comment - Hello, in older versions you could use select KSES text cleaning or HTML Purifier - the first option was removed because it was not secure, so there is no need for the option now. I suppose you are talking about video embedding which is handled via multimedia filter which is independent from HTML Purifier. HTML purifier prevents flash and java, it does note remove <a> style multimedia links, it strips only <iframe>, <object>, <embed> from text BEFORE it is sent to multimedia filter which converts some <a> links to media. If you use <object or embed> tags in html code then the only thing you could do is to edit moodle code and relax the text cleanup, but make sure to not introduce any security vulnerabilities... Petr
            Hide
            Thomas Haines added a comment -

            Petr,

            I am a colleague of Laura's, and I'd like to ellaborate a bit on this particular use case of our client. They are entering links to videos into Glossary entries on a Moodle 1.9 instance.

            The HTML looks like this:
            7 Wonders: <a href="rtsp://stream.ourstreamingserver.org/_all/sci/env/env_11_o_7wondersbiomes.mov">Click Here</a>

            Then when we upgrade to Moodle 2.1, the HTML gets cleansed to:
            7 Wonders: <a>Click Here</a>

            Then when we turn on HTML Purifier, the HTML no longer gets cleansed, and the links work properly.

            But then, when we upgrade to Moodle 2.2, the HTML gets re-cleansed and there is no setting to turn on HTML Purifier is missing.

            Show
            Thomas Haines added a comment - Petr, I am a colleague of Laura's, and I'd like to ellaborate a bit on this particular use case of our client. They are entering links to videos into Glossary entries on a Moodle 1.9 instance. The HTML looks like this: 7 Wonders: <a href="rtsp://stream.ourstreamingserver.org/_all/sci/env/env_11_o_7wondersbiomes.mov">Click Here</a> Then when we upgrade to Moodle 2.1, the HTML gets cleansed to: 7 Wonders: <a>Click Here</a> Then when we turn on HTML Purifier, the HTML no longer gets cleansed, and the links work properly. But then, when we upgrade to Moodle 2.2, the HTML gets re-cleansed and there is no setting to turn on HTML Purifier is missing.
            Hide
            Petr Skoda added a comment -

            Oh, we explicitly tell htmlpurifier to allow rtsp, but for some reason it strips it. I am now able to reproduce the problem, thanks for the extra information.

            Show
            Petr Skoda added a comment - Oh, we explicitly tell htmlpurifier to allow rtsp, but for some reason it strips it. I am now able to reproduce the problem, thanks for the extra information.
            Hide
            Petr Skoda added a comment -

            confirming, I need to write new validator class for rtsp.

            Show
            Petr Skoda added a comment - confirming, I need to write new validator class for rtsp.
            Hide
            Dan Poltawski added a comment -

            The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week.

            TIA and ciao

            Show
            Dan Poltawski added a comment - The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week. TIA and ciao
            Hide
            Dan Poltawski added a comment -

            Hmm, I might be overthinking this - but just commenting my thoughts on this to see what you think..

            • I think calling the addition 'lib.php' a change is a bit misleading for upstreams (e.g. debian/ubuntu), because that lib is just an addition which is 'poluting' the same directory directory. Debian could strip our bundled library if they wanted and still include the lib.php and our 'customisations' would be in tact.
            • They might actually introduce their own lib.php in the future!
            • Should lib be MOODLE_INTERNAL restricted?

            So my suggestion would be to call the file moodle_htmlpurfifer.php (or something along those lines) to make it clear its 'our file'. It could be outside of the htmlpurifier directory (maybe?) and it could include htmlpurifer itself and weblib only includes that file.

            Show
            Dan Poltawski added a comment - Hmm, I might be overthinking this - but just commenting my thoughts on this to see what you think.. I think calling the addition 'lib.php' a change is a bit misleading for upstreams (e.g. debian/ubuntu), because that lib is just an addition which is 'poluting' the same directory directory. Debian could strip our bundled library if they wanted and still include the lib.php and our 'customisations' would be in tact. They might actually introduce their own lib.php in the future! Should lib be MOODLE_INTERNAL restricted? So my suggestion would be to call the file moodle_htmlpurfifer.php (or something along those lines) to make it clear its 'our file'. It could be outside of the htmlpurifier directory (maybe?) and it could include htmlpurifer itself and weblib only includes that file.
            Hide
            Petr Skoda added a comment - - edited

            1/ I was thinking about the file name too and did not like it, no problem if it gets renamed, let's see what Eloy says.
            2/ if downstream changes anything the tests should detect it - that is why I keep adding tests especially for our hacks in libs
            3/ yes, our libs should contain MOODLE_INTERNAL, fixing after we decide the file name change

            thanks!!

            Show
            Petr Skoda added a comment - - edited 1/ I was thinking about the file name too and did not like it, no problem if it gets renamed, let's see what Eloy says. 2/ if downstream changes anything the tests should detect it - that is why I keep adding tests especially for our hacks in libs 3/ yes, our libs should contain MOODLE_INTERNAL, fixing after we decide the file name change thanks!!
            Hide
            Eloy Lafuente (stronk7) added a comment - - edited

            ah, no probs with the name, perhaps we could add it into sort of lib/htmlpurifierlib.php (to keep the dir clean) or so? Or perhaps lib/htmlpurifier/locallib.php ?

            Maybe the 1st is the one used in other places like textlib... so +1 for it, I think.

            Ciao

            PS: Any of them +1

            Show
            Eloy Lafuente (stronk7) added a comment - - edited ah, no probs with the name, perhaps we could add it into sort of lib/htmlpurifierlib.php (to keep the dir clean) or so? Or perhaps lib/htmlpurifier/locallib.php ? Maybe the 1st is the one used in other places like textlib... so +1 for it, I think. Ciao PS: Any of them +1
            Hide
            Petr Skoda added a comment -

            I like the locallib.php to keep it out from /lib/, I try not to use the word 'moodle' because it is a trademark.

            Show
            Petr Skoda added a comment - I like the locallib.php to keep it out from /lib/, I try not to use the word 'moodle' because it is a trademark.
            Hide
            Petr Skoda added a comment -

            repos updated, thanks!

            Show
            Petr Skoda added a comment - repos updated, thanks!
            Hide
            Petr Skoda added a comment -

            we use the pdflib.php and the likes to expose public APIs for 3rd party libs, but in this case we add internal features to htmlpurifier, all the public api is actually in weblib.php which makes this a special case imho.

            Show
            Petr Skoda added a comment - we use the pdflib.php and the likes to expose public APIs for 3rd party libs, but in this case we add internal features to htmlpurifier, all the public api is actually in weblib.php which makes this a special case imho.
            Hide
            Dan Poltawski added a comment -

            Thanks, this has been integrated now

            Show
            Dan Poltawski added a comment - Thanks, this has been integrated now
            Hide
            Michael de Raadt added a comment -

            I ran the PHPUnit tests and, apart from some know issues, they passed.

            However, these changes were applied to other branches also. How should they be tested?

            Also, is there a way that we can functionally test these changes? We should not rely solely on unit tests.

            Show
            Michael de Raadt added a comment - I ran the PHPUnit tests and, apart from some know issues, they passed. However, these changes were applied to other branches also. How should they be tested? Also, is there a way that we can functionally test these changes? We should not rely solely on unit tests.
            Hide
            Petr Skoda added a comment -

            If you want to test manually add the code from unit tests to your forum post as student and see if it gets through. Then try some bogus protocol in url and make sure it does not get through.

            Show
            Petr Skoda added a comment - If you want to test manually add the code from unit tests to your forum post as student and see if it gets through. Then try some bogus protocol in url and make sure it does not get through.
            Hide
            Michael de Raadt added a comment -

            I tested the filter manually in 2.1 and 2.2. I added the following content to a forum post...

            <a href="http://www.example.com/course/view.php?id=5">link</a><br />
            <a href="https://www.example.com/course/view.php?id=5">link</a><br />
            <a href="ftp://user@ftp.example.com/some/file.txt">link</a><br />
            <a href="nntp://example.com/group/123">link</a><br />
            <a href="news:groupname">link</a><br />
            <a href="mailto:user@example.com">link</a><br />
            <a href="irc://irc.example.com/3213?pass">link</a><br />
            <a href="rtsp://www.example.com/movie.mov">link</a><br />
            <a href="teamspeak://speak.example.com/?par=val?par2=val2">link</a><br />
            <a href="gopher://gopher.example.com/resource">link</a><br />
            <a href="mms://www.example.com/movie.mms">link</a><br />
            <a href="javascript://www.example.com">link</a><br />
            <a href="hmmm://www.example.com">link</a>
            

            ...and the resulting stored post was (in both 2.1 and 2.2)...

            <a href="http://www.example.com/course/view.php?id=5">link</a><br />
            <a href="https://www.example.com/course/view.php?id=5">link</a><br />
            <a href="ftp://user@ftp.example.com/some/file.txt">link</a><br />
            <a href="nntp://example.com/group/123">link</a><br />
            <a href="news:groupname">link</a><br />
            <a href="mailto:user@example.com">link</a><br />
            <a href="irc://irc.example.com/3213?pass">link</a><br />
            <a href="rtsp://www.example.com/movie.mov">link</a><br />
            <a href="teamspeak://speak.example.com/?par=val?par2=val2">link</a><br />
            <a href="gopher://gopher.example.com/resource">link</a><br />
            <a href="mms://www.example.com/movie.mms">link</a><br />
            <a>link</a><br />
            <a>link</a>
            

            ...which I believe is the desired result.

            Show
            Michael de Raadt added a comment - I tested the filter manually in 2.1 and 2.2. I added the following content to a forum post... <a href="http://www.example.com/course/view.php?id=5">link</a><br /> <a href="https://www.example.com/course/view.php?id=5">link</a><br /> <a href="ftp://user@ftp.example.com/some/file.txt">link</a><br /> <a href="nntp://example.com/group/123">link</a><br /> <a href="news:groupname">link</a><br /> <a href="mailto:user@example.com">link</a><br /> <a href="irc://irc.example.com/3213?pass">link</a><br /> <a href="rtsp://www.example.com/movie.mov">link</a><br /> <a href="teamspeak://speak.example.com/?par=val?par2=val2">link</a><br /> <a href="gopher://gopher.example.com/resource">link</a><br /> <a href="mms://www.example.com/movie.mms">link</a><br /> <a href="javascript://www.example.com">link</a><br /> <a href="hmmm://www.example.com">link</a> ...and the resulting stored post was (in both 2.1 and 2.2)... <a href="http://www.example.com/course/view.php?id=5">link</a><br /> <a href="https://www.example.com/course/view.php?id=5">link</a><br /> <a href="ftp://user@ftp.example.com/some/file.txt">link</a><br /> <a href="nntp://example.com/group/123">link</a><br /> <a href="news:groupname">link</a><br /> <a href="mailto:user@example.com">link</a><br /> <a href="irc://irc.example.com/3213?pass">link</a><br /> <a href="rtsp://www.example.com/movie.mov">link</a><br /> <a href="teamspeak://speak.example.com/?par=val?par2=val2">link</a><br /> <a href="gopher://gopher.example.com/resource">link</a><br /> <a href="mms://www.example.com/movie.mms">link</a><br /> <a>link</a><br /> <a>link</a> ...which I believe is the desired result.
            Hide
            Michael de Raadt added a comment - - edited

            Test result: Success.

            Tested in 2.1, 2.2 and master.

            Show
            Michael de Raadt added a comment - - edited Test result: Success. Tested in 2.1, 2.2 and master.
            Hide
            Michael de Raadt added a comment -

            I thought I should confirm the results of the unit test and ran the same list of links through master and experienced the correct result.

            Show
            Michael de Raadt added a comment - I thought I should confirm the results of the unit test and ran the same list of links through master and experienced the correct result.
            Hide
            Eloy Lafuente (stronk7) added a comment -

            This has been near becoming rejected, because it's not the best code you are able to produce.

            But, luckily, at the end, it has landed and has been spread to all repos out there.

            Many thanks and, don't forget it, keep improving your skills, you can!

            Closing, ciao

            Show
            Eloy Lafuente (stronk7) added a comment - This has been near becoming rejected, because it's not the best code you are able to produce. But, luckily, at the end, it has landed and has been spread to all repos out there. Many thanks and, don't forget it, keep improving your skills, you can! Closing, ciao

              People

              • Votes:
                2 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: