Moodle
  1. Moodle
  2. MDL-2794

html2text not compatible with utf-8

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 1.5, 1.8.9, 1.9.5
    • Fix Version/s: 1.8.10, 1.9.6
    • Component/s: General
    • Labels:
      None
    • Environment:
      Both PHP4 and PHP5
    • Affected Branches:
      MOODLE_15_STABLE, MOODLE_18_STABLE, MOODLE_19_STABLE
    • Fixed Branches:
      MOODLE_18_STABLE, MOODLE_19_STABLE
    • Rank:
      27457

      Description

      html2text function called indirectly from format_text_email function is not compatible with utf-8 charset encoding. html2text replaces all chr(160) bytes with ' ' at end of the function, while chr(160) in utf-8 encoding does not mean a white space. This causes some characters in utf-8 encoding such as 'da' (U+3060) characters in ja_utf8 garbled in text formatted email.

      — ../20050325/moodle/lib/html2text.php Sun Jan 23 11:18:50 2005

      +++ html2text.php Sat Mar 26 16:56:06 2005

      @@ -157,12 +157,12 @@

      $goodStr = wordwrap( $goodStr, 78 );

      //make sure there are no more than 3 linebreaks in a row and trim whitespace

      • $goodStr = str_replace(chr(160), ' ', $goodStr );

      +// $goodStr = str_replace(chr(160), ' ', $goodStr );

      $goodStr = preg_replace(/\r\n?/\f/, \n, $goodStr);

      $goodStr = preg_replace(/\n(\s*\n)

      {2}

      /, \n\n\n, $goodStr);

      $goodStr = preg_replace(/[ \t]+(\n/$)/, $1, $goodStr);

      $goodStr = preg_replace(/^\n*/\n*$/, '', $goodStr);

      • $goodStr = str_replace(chr(160), ' ', $goodStr );

      +// $goodStr = str_replace(chr(160), ' ', $goodStr );

      return $goodStr;

      1. html_entity_decode_utf8.patch
        2 kB
        Francois Marier
      2. html2text_utf8_fixes.patch
        2 kB
        Francois Marier
      3. html2text.20090608.patch
        0.8 kB
        Juan Segarra Montesinos
      4. html2text.20090610.patch
        0.8 kB
        Juan Segarra Montesinos

        Issue Links

          Activity

          Hide
          Martin Dougiamas added a comment -

          From Martin Dougiamas (martin at moodle.com) Sunday, 27 March 2005, 01:50 PM:

          Thanks! Fixed in 1.5 CVS

          Show
          Martin Dougiamas added a comment - From Martin Dougiamas (martin at moodle.com) Sunday, 27 March 2005, 01:50 PM: Thanks! Fixed in 1.5 CVS
          Hide
          Michael Blake added a comment -

          assign to a valid user

          Show
          Michael Blake added a comment - assign to a valid user
          Hide
          Joseph Rézeau added a comment -

          I have just one question: what is the use of the html_to_text($html) function in moodle weblib ? That function uses the html2text.php library, which mangles utf8 text.

          I found this by accident, because html_to_text() IS used in the Questionnaire plugin (where it causes a problem in non-ASCII languages), but nowhere else.

          In MDL-17542, François Marier reported "I have committed an updated version of this file to CVS, along with a readme describing its origin." but again, what's the point of it all if that library is not being used anywhere in Moodle core files?

          I remain puzzled,

          Joseph

          Show
          Joseph Rézeau added a comment - I have just one question: what is the use of the html_to_text($html) function in moodle weblib ? That function uses the html2text.php library, which mangles utf8 text. I found this by accident, because html_to_text() IS used in the Questionnaire plugin (where it causes a problem in non-ASCII languages), but nowhere else. In MDL-17542 , François Marier reported "I have committed an updated version of this file to CVS, along with a readme describing its origin." but again, what's the point of it all if that library is not being used anywhere in Moodle core files? I remain puzzled, Joseph
          Hide
          Francois Marier added a comment -

          It's also used for example when emailing out forum posts.

          Show
          Francois Marier added a comment - It's also used for example when emailing out forum posts.
          Hide
          Joseph Rézeau added a comment -

          François, sorry but in moodle 1.9 I cannot find a reference to the html_to_text($html) function being used anywhere. In which script is it actually used?

          Show
          Joseph Rézeau added a comment - François, sorry but in moodle 1.9 I cannot find a reference to the html_to_text($html) function being used anywhere. In which script is it actually used?
          Hide
          Juan Segarra Montesinos added a comment -

          Hi

          html2text is not working correctly in 1.9.5. To reproduce the problem:

          1. Write a forum email with non ASCII charaters
          2. Send the email

          Look at the text/plain part. Part of the body is incorrectly encoded.

          Problem is in method _convert() in html2text.php. html_entity_decode() works in latin1 by default, so first text should be converted to latin1 or specify what's the input encoding.

          The patch attached seems to solve the problem.

          Thanks in advance

          Show
          Juan Segarra Montesinos added a comment - Hi html2text is not working correctly in 1.9.5. To reproduce the problem: 1. Write a forum email with non ASCII charaters 2. Send the email Look at the text/plain part. Part of the body is incorrectly encoded. Problem is in method _convert() in html2text.php. html_entity_decode() works in latin1 by default, so first text should be converted to latin1 or specify what's the input encoding. The patch attached seems to solve the problem. Thanks in advance
          Hide
          Francois Marier added a comment -

          Joseph, if you look in lib/weblib.php, the html_to_text() function is used once within the format_text_email() function:

          function format_text_email($text, $format) {

          switch ($format)

          { ... case FORMAT_HTML: return html_to_text($text); break; ... }

          }

          see: http://git.moodle.org/gw?p=moodle.git;a=blob;f=lib/weblib.php;h=0a6dec42bcb2ea5288aacad9822b292a6e9a3460;hb=MOODLE_19_STABLE#l1775

          Show
          Francois Marier added a comment - Joseph, if you look in lib/weblib.php, the html_to_text() function is used once within the format_text_email() function: function format_text_email($text, $format) { switch ($format) { ... case FORMAT_HTML: return html_to_text($text); break; ... } } see: http://git.moodle.org/gw?p=moodle.git;a=blob;f=lib/weblib.php;h=0a6dec42bcb2ea5288aacad9822b292a6e9a3460;hb=MOODLE_19_STABLE#l1775
          Hide
          Francois Marier added a comment -

          Alright, I've got a patch which seems to work both on PHP5 and PHP4.

          Can people please test it and confirm whether or not it fixes their issues? I'm particularly interested to hear whether it works on non-latin locales (for example, Japanese).

          Cheers,
          Francois

          Show
          Francois Marier added a comment - Alright, I've got a patch which seems to work both on PHP5 and PHP4. Can people please test it and confirm whether or not it fixes their issues? I'm particularly interested to hear whether it works on non-latin locales (for example, Japanese). Cheers, Francois
          Hide
          Juan Segarra Montesinos added a comment -

          Sorry for the noise guys, but I submitted a wrong patch the other day... bad day

          This solves our issues with plain text email...

          I'll try to provide feedback for the other patch too.

          bye

          Show
          Juan Segarra Montesinos added a comment - Sorry for the noise guys, but I submitted a wrong patch the other day... bad day This solves our issues with plain text email... I'll try to provide feedback for the other patch too. bye
          Hide
          Francois Marier added a comment -

          Hi Juan,

          I'm not sure about the conversion to latin1. What happens if there are characters (e.g. Japanese characters) which fall outside of that range?

          This is why I'd like someone using a non-latin1 locale to confirm that the html_entity_decode_utf8 patch works.

          Cheers,
          Francois

          Show
          Francois Marier added a comment - Hi Juan, I'm not sure about the conversion to latin1. What happens if there are characters (e.g. Japanese characters) which fall outside of that range? This is why I'd like someone using a non-latin1 locale to confirm that the html_entity_decode_utf8 patch works. Cheers, Francois
          Hide
          Juan Segarra Montesinos added a comment -

          Well done Francois

          Show
          Juan Segarra Montesinos added a comment - Well done Francois
          Hide
          Jens Eremie added a comment -

          Thanks for the Fix Francois and Juan!

          Works on m1.9.5+ (php 5.2.9)

          Cheers,
          Jens

          Show
          Jens Eremie added a comment - Thanks for the Fix Francois and Juan! Works on m1.9.5+ (php 5.2.9) Cheers, Jens
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Hi,

          as commented in MDL-19499, +1 to use current textlib->entities_to_utf8(), plus tests in lib/simpletest/testweblib.php to see if it works ok. (-1 to add new function into html2text).

          Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - Hi, as commented in MDL-19499 , +1 to use current textlib->entities_to_utf8(), plus tests in lib/simpletest/testweblib.php to see if it works ok. (-1 to add new function into html2text). Ciao
          Hide
          Francois Marier added a comment -

          Thanks for that Eloy, I'm going to have a look at textlib and test it on PHP4.

          Show
          Francois Marier added a comment - Thanks for that Eloy, I'm going to have a look at textlib and test it on PHP4.
          Hide
          Francois Marier added a comment -

          New patch based on Eloy's suggestion.

          Show
          Francois Marier added a comment - New patch based on Eloy's suggestion.
          Hide
          Francois Marier added a comment -

          Fixed in 1.8 and 1.9 (HEAD was not affected by this problem).

          I have updated the unit tests to match the output of this new library. They all pass now.

          Show
          Francois Marier added a comment - Fixed in 1.8 and 1.9 (HEAD was not affected by this problem). I have updated the unit tests to match the output of this new library. They all pass now.
          Hide
          Eloy Lafuente (stronk7) added a comment -

          I've backported tests to 18_STABLE and they are passing ok under all branches.

          So closing as reviewed. Thanks, Francois B)

          Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - I've backported tests to 18_STABLE and they are passing ok under all branches. So closing as reviewed. Thanks, Francois B) Ciao

            People

            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: