Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-2794

html2text not compatible with utf-8

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.5, 1.8.9, 1.9.5
    • Fix Version/s: 1.8.10, 1.9.6
    • Component/s: General
    • Labels:
      None
    • Environment:
      Both PHP4 and PHP5
    • Affected Branches:
      MOODLE_15_STABLE, MOODLE_18_STABLE, MOODLE_19_STABLE
    • Fixed Branches:
      MOODLE_18_STABLE, MOODLE_19_STABLE

      Description

      html2text function called indirectly from format_text_email function is not compatible with utf-8 charset encoding. html2text replaces all chr(160) bytes with ' ' at end of the function, while chr(160) in utf-8 encoding does not mean a white space. This causes some characters in utf-8 encoding such as 'da' (U+3060) characters in ja_utf8 garbled in text formatted email.

      — ../20050325/moodle/lib/html2text.php Sun Jan 23 11:18:50 2005

      +++ html2text.php Sat Mar 26 16:56:06 2005

      @@ -157,12 +157,12 @@

      $goodStr = wordwrap( $goodStr, 78 );

      //make sure there are no more than 3 linebreaks in a row and trim whitespace

      • $goodStr = str_replace(chr(160), ' ', $goodStr );

      +// $goodStr = str_replace(chr(160), ' ', $goodStr );

      $goodStr = preg_replace(/\r\n?/\f/, \n, $goodStr);

      $goodStr = preg_replace(/\n(\s*\n)

      {2}

      /, \n\n\n, $goodStr);

      $goodStr = preg_replace(/[ \t]+(\n/$)/, $1, $goodStr);

      $goodStr = preg_replace(/^\n*/\n*$/, '', $goodStr);

      • $goodStr = str_replace(chr(160), ' ', $goodStr );

      +// $goodStr = str_replace(chr(160), ' ', $goodStr );

      return $goodStr;

        Gliffy Diagrams

        1. html_entity_decode_utf8.patch
          2 kB
          Francois Marier
        2. html2text_utf8_fixes.patch
          2 kB
          Francois Marier
        3. html2text.20090608.patch
          0.8 kB
          Juan Segarra Montesinos
        4. html2text.20090610.patch
          0.8 kB
          Juan Segarra Montesinos

          Issue Links

            Activity

            Hide
            dougiamas Martin Dougiamas added a comment -

            From Martin Dougiamas (martin at moodle.com) Sunday, 27 March 2005, 01:50 PM:

            Thanks! Fixed in 1.5 CVS

            Show
            dougiamas Martin Dougiamas added a comment - From Martin Dougiamas (martin at moodle.com) Sunday, 27 March 2005, 01:50 PM: Thanks! Fixed in 1.5 CVS
            Hide
            mblake Michael Blake added a comment -

            assign to a valid user

            Show
            mblake Michael Blake added a comment - assign to a valid user
            Hide
            rezeau Joseph Rézeau added a comment -

            I have just one question: what is the use of the html_to_text($html) function in moodle weblib ? That function uses the html2text.php library, which mangles utf8 text.

            I found this by accident, because html_to_text() IS used in the Questionnaire plugin (where it causes a problem in non-ASCII languages), but nowhere else.

            In MDL-17542, François Marier reported "I have committed an updated version of this file to CVS, along with a readme describing its origin." but again, what's the point of it all if that library is not being used anywhere in Moodle core files?

            I remain puzzled,

            Joseph

            Show
            rezeau Joseph Rézeau added a comment - I have just one question: what is the use of the html_to_text($html) function in moodle weblib ? That function uses the html2text.php library, which mangles utf8 text. I found this by accident, because html_to_text() IS used in the Questionnaire plugin (where it causes a problem in non-ASCII languages), but nowhere else. In MDL-17542 , François Marier reported "I have committed an updated version of this file to CVS, along with a readme describing its origin." but again, what's the point of it all if that library is not being used anywhere in Moodle core files? I remain puzzled, Joseph
            Hide
            francois Francois Marier added a comment -

            It's also used for example when emailing out forum posts.

            Show
            francois Francois Marier added a comment - It's also used for example when emailing out forum posts.
            Hide
            rezeau Joseph Rézeau added a comment -

            François, sorry but in moodle 1.9 I cannot find a reference to the html_to_text($html) function being used anywhere. In which script is it actually used?

            Show
            rezeau Joseph Rézeau added a comment - François, sorry but in moodle 1.9 I cannot find a reference to the html_to_text($html) function being used anywhere. In which script is it actually used?
            Hide
            jsegarra Juan Segarra Montesinos added a comment -

            Hi

            html2text is not working correctly in 1.9.5. To reproduce the problem:

            1. Write a forum email with non ASCII charaters
            2. Send the email

            Look at the text/plain part. Part of the body is incorrectly encoded.

            Problem is in method _convert() in html2text.php. html_entity_decode() works in latin1 by default, so first text should be converted to latin1 or specify what's the input encoding.

            The patch attached seems to solve the problem.

            Thanks in advance

            Show
            jsegarra Juan Segarra Montesinos added a comment - Hi html2text is not working correctly in 1.9.5. To reproduce the problem: 1. Write a forum email with non ASCII charaters 2. Send the email Look at the text/plain part. Part of the body is incorrectly encoded. Problem is in method _convert() in html2text.php. html_entity_decode() works in latin1 by default, so first text should be converted to latin1 or specify what's the input encoding. The patch attached seems to solve the problem. Thanks in advance
            Hide
            francois Francois Marier added a comment -

            Joseph, if you look in lib/weblib.php, the html_to_text() function is used once within the format_text_email() function:

            function format_text_email($text, $format) {

            switch ($format)

            { ... case FORMAT_HTML: return html_to_text($text); break; ... }

            }

            see: http://git.moodle.org/gw?p=moodle.git;a=blob;f=lib/weblib.php;h=0a6dec42bcb2ea5288aacad9822b292a6e9a3460;hb=MOODLE_19_STABLE#l1775

            Show
            francois Francois Marier added a comment - Joseph, if you look in lib/weblib.php, the html_to_text() function is used once within the format_text_email() function: function format_text_email($text, $format) { switch ($format) { ... case FORMAT_HTML: return html_to_text($text); break; ... } } see: http://git.moodle.org/gw?p=moodle.git;a=blob;f=lib/weblib.php;h=0a6dec42bcb2ea5288aacad9822b292a6e9a3460;hb=MOODLE_19_STABLE#l1775
            Hide
            francois Francois Marier added a comment -

            Alright, I've got a patch which seems to work both on PHP5 and PHP4.

            Can people please test it and confirm whether or not it fixes their issues? I'm particularly interested to hear whether it works on non-latin locales (for example, Japanese).

            Cheers,
            Francois

            Show
            francois Francois Marier added a comment - Alright, I've got a patch which seems to work both on PHP5 and PHP4. Can people please test it and confirm whether or not it fixes their issues? I'm particularly interested to hear whether it works on non-latin locales (for example, Japanese). Cheers, Francois
            Hide
            jsegarra Juan Segarra Montesinos added a comment -

            Sorry for the noise guys, but I submitted a wrong patch the other day... bad day

            This solves our issues with plain text email...

            I'll try to provide feedback for the other patch too.

            bye

            Show
            jsegarra Juan Segarra Montesinos added a comment - Sorry for the noise guys, but I submitted a wrong patch the other day... bad day This solves our issues with plain text email... I'll try to provide feedback for the other patch too. bye
            Hide
            francois Francois Marier added a comment -

            Hi Juan,

            I'm not sure about the conversion to latin1. What happens if there are characters (e.g. Japanese characters) which fall outside of that range?

            This is why I'd like someone using a non-latin1 locale to confirm that the html_entity_decode_utf8 patch works.

            Cheers,
            Francois

            Show
            francois Francois Marier added a comment - Hi Juan, I'm not sure about the conversion to latin1. What happens if there are characters (e.g. Japanese characters) which fall outside of that range? This is why I'd like someone using a non-latin1 locale to confirm that the html_entity_decode_utf8 patch works. Cheers, Francois
            Hide
            jsegarra Juan Segarra Montesinos added a comment -

            Well done Francois

            Show
            jsegarra Juan Segarra Montesinos added a comment - Well done Francois
            Hide
            mlzjens Jens Eremie added a comment -

            Thanks for the Fix Francois and Juan!

            Works on m1.9.5+ (php 5.2.9)

            Cheers,
            Jens

            Show
            mlzjens Jens Eremie added a comment - Thanks for the Fix Francois and Juan! Works on m1.9.5+ (php 5.2.9) Cheers, Jens
            Hide
            stronk7 Eloy Lafuente (stronk7) added a comment -

            Hi,

            as commented in MDL-19499, +1 to use current textlib->entities_to_utf8(), plus tests in lib/simpletest/testweblib.php to see if it works ok. (-1 to add new function into html2text).

            Ciao

            Show
            stronk7 Eloy Lafuente (stronk7) added a comment - Hi, as commented in MDL-19499 , +1 to use current textlib->entities_to_utf8(), plus tests in lib/simpletest/testweblib.php to see if it works ok. (-1 to add new function into html2text). Ciao
            Hide
            francois Francois Marier added a comment -

            Thanks for that Eloy, I'm going to have a look at textlib and test it on PHP4.

            Show
            francois Francois Marier added a comment - Thanks for that Eloy, I'm going to have a look at textlib and test it on PHP4.
            Hide
            francois Francois Marier added a comment -

            New patch based on Eloy's suggestion.

            Show
            francois Francois Marier added a comment - New patch based on Eloy's suggestion.
            Hide
            francois Francois Marier added a comment -

            Fixed in 1.8 and 1.9 (HEAD was not affected by this problem).

            I have updated the unit tests to match the output of this new library. They all pass now.

            Show
            francois Francois Marier added a comment - Fixed in 1.8 and 1.9 (HEAD was not affected by this problem). I have updated the unit tests to match the output of this new library. They all pass now.
            Hide
            stronk7 Eloy Lafuente (stronk7) added a comment -

            I've backported tests to 18_STABLE and they are passing ok under all branches.

            So closing as reviewed. Thanks, Francois B)

            Ciao

            Show
            stronk7 Eloy Lafuente (stronk7) added a comment - I've backported tests to 18_STABLE and they are passing ok under all branches. So closing as reviewed. Thanks, Francois B) Ciao

              People

              • Votes:
                1 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Fix Release Date:
                  21/Oct/09