Moodle

The email bodies which are written by HTML editor sent by forum_cron() is broken in Multibyte character set environment

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Critical Critical
  • Resolution: Duplicate
  • Affects Version/s: 1.9.5, 2.0
  • Fix Version/s: 1.9.6, 2.0
  • Component/s: Libraries, Unicode
  • Labels:
    None
  • Difficulty:
    Easy
  • Affected Branches:
    MOODLE_19_STABLE, MOODLE_20_STABLE
  • Fixed Branches:
    MOODLE_19_STABLE, MOODLE_20_STABLE

Description

The cause of this problem is utf8_encode() function in lib/html2text.php.(since 12th/Jun/2009)

function _convert()
{
// Variables used for building the link list
$this->_link_count = 0;
$this->_link_list = '';

$text = trim(stripslashes($this->html));

// Convert <PRE>
$this->_convert_pre($text);

// Run our defined search-and-replace
$text = preg_replace($this->search, $this->replace, $text);
$text = preg_replace_callback($this->callback_search, array(&$this, '_preg_callback'), $text);

// Replace known html entities
// $text = utf8_encode(html_entity_decode($text)); // Here!!!
$text = html_entity_decode($text, ENT_COMPAT, 'UTF-8');

// Remove unknown/unhandled entities (this cannot be done in search-and-replace block)
$text = preg_replace('/&[^&;]+;/i', '', $text);

utf8_encode() function converts characters from ISO-8859-1 to UTF-8.
Then, Multibyte UTF-8 characters were convereted to wrong unexpected characters...

Issue Links

Activity

Hide
Tatsuya Shirai added a comment -

I had posted the same comment for CVS.

Show
Tatsuya Shirai added a comment - I had posted the same comment for CVS.
Hide
Helen Foster added a comment -

Tatsuya, thanks for your report. Reassigning to Eloy for consideration.

Show
Helen Foster added a comment - Tatsuya, thanks for your report. Reassigning to Eloy for consideration.
Hide
Helen Foster added a comment -

Please see linked issue MDLSITE-764 for a further report of this problem thanks to Dmitry.

Show
Helen Foster added a comment - Please see linked issue MDLSITE-764 for a further report of this problem thanks to Dmitry.
Hide
Eloy Lafuente (stronk7) added a comment -

Moved to Moodle normal bugs (from site ones).

Assigned to Francois Marier, as far as I think this is related with the new html2text library recently introduced both in Moodle 1.9 and 2.0.

IMO, html_entity_decode() is 100% evil and that's the cause we have into our nice textlib library this function (that was the one used by old html2text library):

function entities_to_utf8($str, $htmlent=true)

it's able to convert both numerical and html entities to utf-8 as has been working fine since now, so something like:

$tl=textlib_get_instance();
$text = $tl->entities_to_utf8($text,true);

Should do the trick, without needing further processing out from the function (see weblib!). It will convert all entities.

Ciao

Show
Eloy Lafuente (stronk7) added a comment - Moved to Moodle normal bugs (from site ones). Assigned to Francois Marier, as far as I think this is related with the new html2text library recently introduced both in Moodle 1.9 and 2.0. IMO, html_entity_decode() is 100% evil and that's the cause we have into our nice textlib library this function (that was the one used by old html2text library): function entities_to_utf8($str, $htmlent=true) it's able to convert both numerical and html entities to utf-8 as has been working fine since now, so something like: $tl=textlib_get_instance(); $text = $tl->entities_to_utf8($text,true); Should do the trick, without needing further processing out from the function (see weblib!). It will convert all entities. Ciao
Hide
Eloy Lafuente (stronk7) added a comment -

Note that also there are some tests related to this:

function test_format_text_email()

that should pass to check that everything is in place, ciao

Show
Eloy Lafuente (stronk7) added a comment - Note that also there are some tests related to this: function test_format_text_email() that should pass to check that everything is in place, ciao
Hide
Eloy Lafuente (stronk7) added a comment -

Raising this to critical as is breaking lots of mailouts badly. Confirmed that test "lib/simpletest/testweblib.php" is broken is 19_STABLE and HEAD.

Show
Eloy Lafuente (stronk7) added a comment - Raising this to critical as is breaking lots of mailouts badly. Confirmed that test "lib/simpletest/testweblib.php" is broken is 19_STABLE and HEAD.
Hide
Francois Marier added a comment -

I've got a patch for this already on MDL-2794 (html_entity_decode_utf8.patch), but I'm waiting for someone to test it on a non-latin system (Japanese, Chinese, or something like that).

Show
Francois Marier added a comment - I've got a patch for this already on MDL-2794 (html_entity_decode_utf8.patch), but I'm waiting for someone to test it on a non-latin system (Japanese, Chinese, or something like that).
Hide
Dmitry Pupinin added a comment -

Mail still broken from moodle.org!

Today delivered different from two men in one topic:
http://moodle.org/mod/forum/post.php?reply=551628 - good
http://moodle.org/mod/forum/post.php?reply=551644 - broken

Show
Dmitry Pupinin added a comment - Mail still broken from moodle.org! Today delivered different from two men in one topic: http://moodle.org/mod/forum/post.php?reply=551628 - good http://moodle.org/mod/forum/post.php?reply=551644 - broken
Hide
Eloy Lafuente (stronk7) added a comment -

Hi Dmitry,

I guess moodle.org will be updated tomorrow (weekly build). So in 24h from now it should be working ok.

Thanks and ciao

Show
Eloy Lafuente (stronk7) added a comment - Hi Dmitry, I guess moodle.org will be updated tomorrow (weekly build). So in 24h from now it should be working ok. Thanks and ciao
Hide
Jordan Tomkinson added a comment -

moodle.org has been updated to build 20090617

Show
Jordan Tomkinson added a comment - moodle.org has been updated to build 20090617

Dates

  • Created:
    Updated:
    Resolved: