Moodle
  1. Moodle
  2. MDL-14907

UTF8 not folded correctly in ICAL export

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.8.4, 1.8.5, 1.9, 1.9.1
    • Fix Version/s: 1.9.6
    • Component/s: Calendar
    • Labels:
      None
    • Affected Branches:
      MOODLE_18_STABLE, MOODLE_19_STABLE
    • Fixed Branches:
      MOODLE_19_STABLE
    • Rank:
      30889

      Description

      http://xref.moodle.org/lib/bennu/iCalendar_rfc2445.php.html

      contains function rfc2445_fold() which does not differ between "characters" and "octets". RFC2445 section 4.1

      "Lines of text SHOULD NOT be longer than 75 octets"

      "That is, a long line can be split between any two characters ..."

      If one UTF8 character consists of two or more octets, the function happens to put them on different lines, causing importers, such as Thunderbird/Lightning, to reject the file, because it is no valid UTF8 anymore.

      The rfc2445_unfold() function of the same file ignores the different consequentently, so it is not effected.

        Activity

        Hide
        Tomasz Muras added a comment -

        This patch alters the rfc2445_fold function in lib bennu so that it uses mb_strlen and mb_substr to fold the text at 75 utf8 characters. This has solved an issue with Thunderbird importing the iCal file where previously it was failing due to the fold occurring within a multi-byte glyph.

        Should this fold at 75 octets and prevent the breaking of multi-byte characters? I'm not sure; libical chunk splits at 75 octets and this is possibly correct even for multi-byte strings, does the problem lie with Thunderbird? Should Thunderbird reconstruct the text in such a way that the multi-byte characters will be correctly recombined and then imported successfully..

        Show
        Tomasz Muras added a comment - This patch alters the rfc2445_fold function in lib bennu so that it uses mb_strlen and mb_substr to fold the text at 75 utf8 characters. This has solved an issue with Thunderbird importing the iCal file where previously it was failing due to the fold occurring within a multi-byte glyph. Should this fold at 75 octets and prevent the breaking of multi-byte characters? I'm not sure; libical chunk splits at 75 octets and this is possibly correct even for multi-byte strings, does the problem lie with Thunderbird? Should Thunderbird reconstruct the text in such a way that the multi-byte characters will be correctly recombined and then imported successfully..
        Hide
        Tomasz Muras added a comment -

        Tentatively resolving; the Thunderbird import of the Moodle export will work. I would like to delve deeper into this issue

        Show
        Tomasz Muras added a comment - Tentatively resolving; the Thunderbird import of the Moodle export will work. I would like to delve deeper into this issue
        Hide
        Tim Hunt added a comment -

        I don't understand. Was anything changed in the code to fix this?

        If yes, why is there nothing on the version control tab.
        In no, why is the bug marked fixed.

        Thanks for clarifying.

        Show
        Tim Hunt added a comment - I don't understand. Was anything changed in the code to fix this? If yes, why is there nothing on the version control tab. In no, why is the bug marked fixed. Thanks for clarifying.
        Hide
        Martin Dougiamas added a comment -

        No, nothing's in CVS yet. http://cvs.moodle.org/moodle/lib/bennu/iCalendar_rfc2445.php?view=log

        I'm happy to check it in, just need someone to review the patch a little and verify it doesn't break anything.

        Show
        Martin Dougiamas added a comment - No, nothing's in CVS yet. http://cvs.moodle.org/moodle/lib/bennu/iCalendar_rfc2445.php?view=log I'm happy to check it in, just need someone to review the patch a little and verify it doesn't break anything.
        Hide
        ska added a comment -

        I think you make too much and the resulting lines now have up to 74 characters, in worst case 4 octets in length each.

        PHP is not my fluent programming language, but I think that the only portion to change is here:

        mb_strcut()'s manual entry says:
        "It subtracts string from str that is shorter than length AND character that is not part of multi-byte string or not being middle of shift sequence."
        The topmost commenter says that start and length are byte offsets rather than character offsets, so how about:

        • $retval .= substr($string, 0, RFC2445_FOLDED_LINE_LENGTH - 1) . RFC2445_CRLF . ' ';
        • $string = substr($string, RFC2445_FOLDED_LINE_LENGTH - 1);

        + $str = mb_strcut($string, 0, RFC2445_FOLDED_LINE_LENGTH - strlen(RFC2445_WSP), 'utf-8');
        + $retval .= $str . RFC2445_CRLF . RFC2445_WSP;
        + $string = substr($string, strlen($str));

        mb_strcut() ensures that $str is valid UTF8; the remaining operations can work on octets safely then.

        Show
        ska added a comment - I think you make too much and the resulting lines now have up to 74 characters, in worst case 4 octets in length each. PHP is not my fluent programming language, but I think that the only portion to change is here: mb_strcut()'s manual entry says: "It subtracts string from str that is shorter than length AND character that is not part of multi-byte string or not being middle of shift sequence." The topmost commenter says that start and length are byte offsets rather than character offsets, so how about: $retval .= substr($string, 0, RFC2445_FOLDED_LINE_LENGTH - 1) . RFC2445_CRLF . ' '; $string = substr($string, RFC2445_FOLDED_LINE_LENGTH - 1); + $str = mb_strcut($string, 0, RFC2445_FOLDED_LINE_LENGTH - strlen(RFC2445_WSP), 'utf-8'); + $retval .= $str . RFC2445_CRLF . RFC2445_WSP; + $string = substr($string, strlen($str)); mb_strcut() ensures that $str is valid UTF8; the remaining operations can work on octets safely then.
        Hide
        Martin Dougiamas added a comment -

        For review and checkin

        Show
        Martin Dougiamas added a comment - For review and checkin
        Hide
        Dongsheng Cai added a comment -

        checked in, please review, thanks

        Show
        Dongsheng Cai added a comment - checked in, please review, thanks

          People

          • Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: