Moodle

UTF8 not folded correctly in ICAL export

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Fixed
  • Affects Version/s: 1.8.4, 1.8.5, 1.9, 1.9.1
  • Fix Version/s: 1.9.6
  • Component/s: Calendar
  • Labels:
    None
  • Affected Branches:
    MOODLE_18_STABLE, MOODLE_19_STABLE
  • Fixed Branches:
    MOODLE_19_STABLE

Description

http://xref.moodle.org/lib/bennu/iCalendar_rfc2445.php.html

contains function rfc2445_fold() which does not differ between "characters" and "octets". RFC2445 section 4.1

"Lines of text SHOULD NOT be longer than 75 octets"

"That is, a long line can be split between any two characters ..."

If one UTF8 character consists of two or more octets, the function happens to put them on different lines, causing importers, such as Thunderbird/Lightning, to reject the file, because it is no valid UTF8 anymore.

The rfc2445_unfold() function of the same file ignores the different consequentently, so it is not effected.

Activity

Hide
Tomasz Muras added a comment -

This patch alters the rfc2445_fold function in lib bennu so that it uses mb_strlen and mb_substr to fold the text at 75 utf8 characters. This has solved an issue with Thunderbird importing the iCal file where previously it was failing due to the fold occurring within a multi-byte glyph.

Should this fold at 75 octets and prevent the breaking of multi-byte characters? I'm not sure; libical chunk splits at 75 octets and this is possibly correct even for multi-byte strings, does the problem lie with Thunderbird? Should Thunderbird reconstruct the text in such a way that the multi-byte characters will be correctly recombined and then imported successfully..

Show
Tomasz Muras added a comment - This patch alters the rfc2445_fold function in lib bennu so that it uses mb_strlen and mb_substr to fold the text at 75 utf8 characters. This has solved an issue with Thunderbird importing the iCal file where previously it was failing due to the fold occurring within a multi-byte glyph. Should this fold at 75 octets and prevent the breaking of multi-byte characters? I'm not sure; libical chunk splits at 75 octets and this is possibly correct even for multi-byte strings, does the problem lie with Thunderbird? Should Thunderbird reconstruct the text in such a way that the multi-byte characters will be correctly recombined and then imported successfully..
Hide
Tomasz Muras added a comment -

Tentatively resolving; the Thunderbird import of the Moodle export will work. I would like to delve deeper into this issue

Show
Tomasz Muras added a comment - Tentatively resolving; the Thunderbird import of the Moodle export will work. I would like to delve deeper into this issue
Hide
Tim Hunt added a comment -

I don't understand. Was anything changed in the code to fix this?

If yes, why is there nothing on the version control tab.
In no, why is the bug marked fixed.

Thanks for clarifying.

Show
Tim Hunt added a comment - I don't understand. Was anything changed in the code to fix this? If yes, why is there nothing on the version control tab. In no, why is the bug marked fixed. Thanks for clarifying.
Hide
Martin Dougiamas added a comment -

No, nothing's in CVS yet. http://cvs.moodle.org/moodle/lib/bennu/iCalendar_rfc2445.php?view=log

I'm happy to check it in, just need someone to review the patch a little and verify it doesn't break anything.

Show
Martin Dougiamas added a comment - No, nothing's in CVS yet. http://cvs.moodle.org/moodle/lib/bennu/iCalendar_rfc2445.php?view=log I'm happy to check it in, just need someone to review the patch a little and verify it doesn't break anything.
Hide
ska added a comment -

I think you make too much and the resulting lines now have up to 74 characters, in worst case 4 octets in length each.

PHP is not my fluent programming language, but I think that the only portion to change is here:

mb_strcut()'s manual entry says:
"It subtracts string from str that is shorter than length AND character that is not part of multi-byte string or not being middle of shift sequence."
The topmost commenter says that start and length are byte offsets rather than character offsets, so how about:

  • $retval .= substr($string, 0, RFC2445_FOLDED_LINE_LENGTH - 1) . RFC2445_CRLF . ' ';
  • $string = substr($string, RFC2445_FOLDED_LINE_LENGTH - 1);

+ $str = mb_strcut($string, 0, RFC2445_FOLDED_LINE_LENGTH - strlen(RFC2445_WSP), 'utf-8');
+ $retval .= $str . RFC2445_CRLF . RFC2445_WSP;
+ $string = substr($string, strlen($str));

mb_strcut() ensures that $str is valid UTF8; the remaining operations can work on octets safely then.

Show
ska added a comment - I think you make too much and the resulting lines now have up to 74 characters, in worst case 4 octets in length each. PHP is not my fluent programming language, but I think that the only portion to change is here: mb_strcut()'s manual entry says: "It subtracts string from str that is shorter than length AND character that is not part of multi-byte string or not being middle of shift sequence." The topmost commenter says that start and length are byte offsets rather than character offsets, so how about:
  • $retval .= substr($string, 0, RFC2445_FOLDED_LINE_LENGTH - 1) . RFC2445_CRLF . ' ';
  • $string = substr($string, RFC2445_FOLDED_LINE_LENGTH - 1);
+ $str = mb_strcut($string, 0, RFC2445_FOLDED_LINE_LENGTH - strlen(RFC2445_WSP), 'utf-8'); + $retval .= $str . RFC2445_CRLF . RFC2445_WSP; + $string = substr($string, strlen($str)); mb_strcut() ensures that $str is valid UTF8; the remaining operations can work on octets safely then.
Hide
Martin Dougiamas added a comment -

For review and checkin

Show
Martin Dougiamas added a comment - For review and checkin
Hide
Dongsheng Cai added a comment -

checked in, please review, thanks

Show
Dongsheng Cai added a comment - checked in, please review, thanks

People

Vote (2)
Watch (5)

Dates

  • Created:
    Updated:
    Resolved: