Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0.1
    • Component/s: Blocks, RSS
    • Labels:
    • Environment:
      LAMP
    • Database:
      MySQL
    • Difficulty:
      Moderate
    • Affected Branches:
      MOODLE_20_STABLE
    • Fixed Branches:
      MOODLE_20_STABLE
    • Rank:
      1602

      Description

      The RSS blocks for various forums & glossary are showing changed URLs as illustrated in the following example on upgrade from 1.9.5+ to 2.0 latest weekly:

      http://examplemoodlesite.com/mod/forum/discuss.php?d=2013&parent=5688
      http://examplemoodlesite.com/mod/forum/discuss.php?d=2013&parent=5688

      Because of the ampersand issue in the URL, all URLs are broken.

      This is true for internal feeds. There is no URL change issue noticed for external feeds.

      1. 20101208rssURLs.patch
        0.7 kB
        Andrew Davis
      2. 20101209rssURLs.patch
        0.8 kB
        Andrew Davis

        Activity

        Hide
        Andrew Davis added a comment - - edited

        Hi. What RSS client are you using? The ampersand in the URL has been escaped by that shouldn't cause the link to not work.

        As an experiment just tried not escaping the URLs in the feed and they cause the xml file to not be valid so its not as simple as just not escaping the URLs.

        Show
        Andrew Davis added a comment - - edited Hi. What RSS client are you using? The ampersand in the URL has been escaped by that shouldn't cause the link to not work. As an experiment just tried not escaping the URLs in the feed and they cause the xml file to not be valid so its not as simple as just not escaping the URLs.
        Hide
        Manish Verma added a comment -

        Hi Andrew, I just sent you message via moodle.org messsaging system since involved URLs involved the moodle test installation.

        Show
        Manish Verma added a comment - Hi Andrew, I just sent you message via moodle.org messsaging system since involved URLs involved the moodle test installation.
        Hide
        Andrew Davis added a comment - - edited

        Thanks Manish. I've had a look at your site and have now been able to reproduce the problem within my own development site. For the problem to occur you must have an RSS block which is using a glossary RSS feed as its source. An RSS feed displaying the feed from a forum, even though it uses the same URL escaping, works fine. Something is wrong with the glossary specifically.

        update: If the RSS block uses the RSS feed for a database activity as its source when you click the link it doesn't give an error but it just takes you to the activity instead of the specific entry you clicked on.

        Show
        Andrew Davis added a comment - - edited Thanks Manish. I've had a look at your site and have now been able to reproduce the problem within my own development site. For the problem to occur you must have an RSS block which is using a glossary RSS feed as its source. An RSS feed displaying the feed from a forum, even though it uses the same URL escaping, works fine. Something is wrong with the glossary specifically. update: If the RSS block uses the RSS feed for a database activity as its source when you click the link it doesn't give an error but it just takes you to the activity instead of the specific entry you clicked on.
        Hide
        Andrew Davis added a comment -

        I'm attaching a patch for discussion. Moodle 2.0 RSS feeds cache RSS feeds contents as an xml file. Elements like the URLs are escaped.

        When the RSS feed is accessed some clients automatically unescape the elements. For example if you access a Moodle RSS feed with Firefox itself everything works fine. However, our own RSS block doesn't do this.

        The patch Ive attached alters the RSS block to unescape the item links. However, this may introduce some security concerns if a Moodle site is configured to access an external RSS feed that has malicious content injected into it.

        An alternative way to handle this, which I haven't tested, would be for the RSS feed content to be partially or wholly unescaped before it is sent. This would increase server load for every RSS request so isnt particularly desirable.

        Show
        Andrew Davis added a comment - I'm attaching a patch for discussion. Moodle 2.0 RSS feeds cache RSS feeds contents as an xml file. Elements like the URLs are escaped. When the RSS feed is accessed some clients automatically unescape the elements. For example if you access a Moodle RSS feed with Firefox itself everything works fine. However, our own RSS block doesn't do this. The patch Ive attached alters the RSS block to unescape the item links. However, this may introduce some security concerns if a Moodle site is configured to access an external RSS feed that has malicious content injected into it. An alternative way to handle this, which I haven't tested, would be for the RSS feed content to be partially or wholly unescaped before it is sent. This would increase server load for every RSS request so isnt particularly desirable.
        Hide
        Sam Hemelryk added a comment - - edited

        Well this was a twisted wee issue.
        I have just been testing this with both forum and glossary. I found the issue does affect forum as well however there is only one reason an ampersand will be in a forum for the parent, you get id=6&parent=5 which leads to:

         $_GET = Array (
            [d] => 22
            [amp;parent] => 58
        )
        

        This doesn't break anything because parent is an optional arg.
        In glossary usually the entry id (eid) is second and is required so we get the error.

        So -1 for your patch at the moment Andrew.

        The problem to me looks like a production fault when we generate the RSS feed.
        Full tags get processed by lib/rsslib.php::rss_full_tag() which always runs the content of the tag through htmlspecialchars.
        Personally I think we should either add a sixth argument to that function $escape = true, so by default things get escapes but we can turn it off (and we will need to for all link attributes in RSS feeds.)
        Alternatively if the RSS spec doesn't require ampersands to be escaped in any fields then we could use get_html_translation_table() to get the translation table, drop the ampersand and then process the content with strtr().

        Anyway that all said and done I am not an RSS expert, perhaps I've missed something, yell out if you have any questions Andrew.

        Cheers
        Sam

        Show
        Sam Hemelryk added a comment - - edited Well this was a twisted wee issue. I have just been testing this with both forum and glossary. I found the issue does affect forum as well however there is only one reason an ampersand will be in a forum for the parent, you get id=6&parent=5 which leads to: $_GET = Array ( [d] => 22 [amp;parent] => 58 ) This doesn't break anything because parent is an optional arg. In glossary usually the entry id (eid) is second and is required so we get the error. So -1 for your patch at the moment Andrew. The problem to me looks like a production fault when we generate the RSS feed. Full tags get processed by lib/rsslib.php::rss_full_tag() which always runs the content of the tag through htmlspecialchars. Personally I think we should either add a sixth argument to that function $escape = true, so by default things get escapes but we can turn it off (and we will need to for all link attributes in RSS feeds.) Alternatively if the RSS spec doesn't require ampersands to be escaped in any fields then we could use get_html_translation_table() to get the translation table, drop the ampersand and then process the content with strtr(). Anyway that all said and done I am not an RSS expert, perhaps I've missed something, yell out if you have any questions Andrew. Cheers Sam
        Hide
        Andrew Davis added a comment -

        Sorry, yeah, should have come back and explained that forums are actually affected. Just that the second param is optional so the error isn't so apparent. Any RSS feed coming out of Moodle exhibits this behaviour.

        "Personally I think we should either add a sixth argument to that function $escape = true"

        I actually did exactly that as my first fix. I'll go back and figure out why however that caused the cached XML file version be invalid. Hence the slightly dodgy fixing the error on the way out. I'll go back and figure out what's going on there.

        Show
        Andrew Davis added a comment - Sorry, yeah, should have come back and explained that forums are actually affected. Just that the second param is optional so the error isn't so apparent. Any RSS feed coming out of Moodle exhibits this behaviour. "Personally I think we should either add a sixth argument to that function $escape = true" I actually did exactly that as my first fix. I'll go back and figure out why however that caused the cached XML file version be invalid. Hence the slightly dodgy fixing the error on the way out. I'll go back and figure out what's going on there.
        Hide
        Filter Manager added a comment -

        Hi guys,

        IMO the problem is clearly in the remote RSS block. Let me explain:

        When the RSS feed is generated it must be XML compliant so all the "&" must be converted to their & entity. And we are doing that perfectly.

        Later, if the RSS block fetches one feed, it must be able to decode back those entities AND add them back when generating correct X-HTML to be displayed in the block.

        If you take a look to the HTML generated by the block you will see that it's DOUBLE encoding entities, so it shows things like:

        href="http://xxxx.com/mod/glossary/showentry.php?courseid=2&amp;amp;eid=8">dog</a>
        

        And, IMO that's the cause of the problem (so $_GET['eid'] becomes "corrupt" and so on). As said, the feed reader (the block) should be able to handle both correct and incorrect XML feeds (note our one are strictly correct). And then output links properly (only entity-zed one, to fulfill Moodle XHTML req).

        Ciao

        Show
        Filter Manager added a comment - Hi guys, IMO the problem is clearly in the remote RSS block. Let me explain: When the RSS feed is generated it must be XML compliant so all the "&" must be converted to their & entity. And we are doing that perfectly. Later, if the RSS block fetches one feed, it must be able to decode back those entities AND add them back when generating correct X-HTML to be displayed in the block. If you take a look to the HTML generated by the block you will see that it's DOUBLE encoding entities, so it shows things like: href= "http: //xxxx.com/mod/glossary/showentry.php?courseid=2&amp;amp;eid=8" >dog</a> And, IMO that's the cause of the problem (so $_GET ['eid'] becomes "corrupt" and so on). As said, the feed reader (the block) should be able to handle both correct and incorrect XML feeds (note our one are strictly correct). And then output links properly (only entity-zed one, to fulfill Moodle XHTML req). Ciao
        Hide
        Filter Manager added a comment -

        Moving this to the STABLE backlog (although work should continue, plz, mark as work in progress if so) and marking as triaged. Ciao

        Show
        Filter Manager added a comment - Moving this to the STABLE backlog (although work should continue, plz, mark as work in progress if so) and marking as triaged. Ciao
        Hide
        Eloy Lafuente (stronk7) added a comment -

        Filter Manager was me, LOL!

        Show
        Eloy Lafuente (stronk7) added a comment - Filter Manager was me, LOL!
        Hide
        Andrew Davis added a comment - - edited

        Hey yeah, this mysterious Filter Manager person is right.

        url from cache
        string 'http://andrew.moodle.local/moodle/mod/forum/discuss.php?d=8&parent=42' (length=73)

        link going into the page
        string '<a onclick="this.target="_blank"" href="http://andrew.moodle.local/moodle/mod/forum/discuss.php?d=8&amp;parent=42">Re: bananas</a>' (length=144)

        Double escaped. Guess I have to add something a bit clever to see if the URL is already escaped.

        Show
        Andrew Davis added a comment - - edited Hey yeah, this mysterious Filter Manager person is right. url from cache string 'http://andrew.moodle.local/moodle/mod/forum/discuss.php?d=8&parent=42' (length=73) link going into the page string '<a onclick="this.target="_blank"" href="http://andrew.moodle.local/moodle/mod/forum/discuss.php?d=8&amp;parent=42">Re: bananas</a>' (length=144) Double escaped. Guess I have to add something a bit clever to see if the URL is already escaped.
        Hide
        Andrew Davis added a comment - - edited

        It seems like the escaping of the url is happening deep within html_writer::link(). rather than me messing with something used all over the place and barring a better suggestion Im re-proposing a slightly modified version of my original patch. same fix but with an explanation.

        Show
        Andrew Davis added a comment - - edited It seems like the escaping of the url is happening deep within html_writer::link(). rather than me messing with something used all over the place and barring a better suggestion Im re-proposing a slightly modified version of my original patch. same fix but with an explanation.
        Hide
        Andrew Davis added a comment -

        I have committed a fix for this. After more discussion with Sam I ultimately did this:

        $link = new moodle_url($link);

        rather than using htmlspecialchars_decode() as it has the desired effect while being less likely to unescape something nasty in another part of the URL string.

        Eloy, let me know if I've there's anything more or I've fixed this in the wrong way.

        Show
        Andrew Davis added a comment - I have committed a fix for this. After more discussion with Sam I ultimately did this: $link = new moodle_url($link); rather than using htmlspecialchars_decode() as it has the desired effect while being less likely to unescape something nasty in another part of the URL string. Eloy, let me know if I've there's anything more or I've fixed this in the wrong way.
        Hide
        Eloy Lafuente (stronk7) added a comment -

        Oh,

        if now the block isn't printing double-entitised HTML anymore and it produces correct XHTML contents (entitised-once) for any feed (strict XML one no), I haven't anything to argue.

        Note I just arrived to this because I was triaging blockers, so I'm not monitoring/reviewing it.

        Final note for asiggnees:

        If something from the BACKLOG (stable or dev) gets fixed, the developer must both:

        1) set the fixfor versions the issue was fixexed for.
        2) delete the BACKLOG xxx version, as far as, once resolved it doesn't belong to the BACKLOG anymore.

        Ciao

        Show
        Eloy Lafuente (stronk7) added a comment - Oh, if now the block isn't printing double-entitised HTML anymore and it produces correct XHTML contents (entitised-once) for any feed (strict XML one no), I haven't anything to argue. Note I just arrived to this because I was triaging blockers, so I'm not monitoring/reviewing it. Final note for asiggnees: If something from the BACKLOG (stable or dev) gets fixed, the developer must both: 1) set the fixfor versions the issue was fixexed for. 2) delete the BACKLOG xxx version, as far as, once resolved it doesn't belong to the BACKLOG anymore. Ciao
        Hide
        Andrew Davis added a comment -

        Thanks for the clarification Eloy

        Show
        Andrew Davis added a comment - Thanks for the clarification Eloy
        Hide
        Manish Verma added a comment -

        It appears that the current status of this issue is fixed & closed. However, using CVS update for 2.0, I do not find any change.

        Show
        Manish Verma added a comment - It appears that the current status of this issue is fixed & closed. However, using CVS update for 2.0, I do not find any change.

          People

          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: