Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-2794

html2text not compatible with utf-8

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.5, 1.8.9, 1.9.5
    • Fix Version/s: 1.8.10, 1.9.6
    • Component/s: General
    • Labels:
      None
    • Environment:
      Both PHP4 and PHP5
    • Affected Branches:
      MOODLE_15_STABLE, MOODLE_18_STABLE, MOODLE_19_STABLE
    • Fixed Branches:
      MOODLE_18_STABLE, MOODLE_19_STABLE

      Description

      html2text function called indirectly from format_text_email function is not compatible with utf-8 charset encoding. html2text replaces all chr(160) bytes with ' ' at end of the function, while chr(160) in utf-8 encoding does not mean a white space. This causes some characters in utf-8 encoding such as 'da' (U+3060) characters in ja_utf8 garbled in text formatted email.

      — ../20050325/moodle/lib/html2text.php Sun Jan 23 11:18:50 2005

      +++ html2text.php Sat Mar 26 16:56:06 2005

      @@ -157,12 +157,12 @@

      $goodStr = wordwrap( $goodStr, 78 );

      //make sure there are no more than 3 linebreaks in a row and trim whitespace

      • $goodStr = str_replace(chr(160), ' ', $goodStr );

      +// $goodStr = str_replace(chr(160), ' ', $goodStr );

      $goodStr = preg_replace(/\r\n?/\f/, \n, $goodStr);

      $goodStr = preg_replace(/\n(\s*\n)

      {2}

      /, \n\n\n, $goodStr);

      $goodStr = preg_replace(/[ \t]+(\n/$)/, $1, $goodStr);

      $goodStr = preg_replace(/^\n*/\n*$/, '', $goodStr);

      • $goodStr = str_replace(chr(160), ' ', $goodStr );

      +// $goodStr = str_replace(chr(160), ' ', $goodStr );

      return $goodStr;

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              dougiamas Martin Dougiamas added a comment -

              From Martin Dougiamas (martin at moodle.com) Sunday, 27 March 2005, 01:50 PM:

              Thanks! Fixed in 1.5 CVS

              Show
              dougiamas Martin Dougiamas added a comment - From Martin Dougiamas (martin at moodle.com) Sunday, 27 March 2005, 01:50 PM: Thanks! Fixed in 1.5 CVS
              Hide
              mblake Michael Blake added a comment -

              assign to a valid user

              Show
              mblake Michael Blake added a comment - assign to a valid user
              Hide
              rezeau Joseph Rézeau added a comment -

              I have just one question: what is the use of the html_to_text($html) function in moodle weblib ? That function uses the html2text.php library, which mangles utf8 text.

              I found this by accident, because html_to_text() IS used in the Questionnaire plugin (where it causes a problem in non-ASCII languages), but nowhere else.

              In MDL-17542, François Marier reported "I have committed an updated version of this file to CVS, along with a readme describing its origin." but again, what's the point of it all if that library is not being used anywhere in Moodle core files?

              I remain puzzled,

              Joseph

              Show
              rezeau Joseph Rézeau added a comment - I have just one question: what is the use of the html_to_text($html) function in moodle weblib ? That function uses the html2text.php library, which mangles utf8 text. I found this by accident, because html_to_text() IS used in the Questionnaire plugin (where it causes a problem in non-ASCII languages), but nowhere else. In MDL-17542 , François Marier reported "I have committed an updated version of this file to CVS, along with a readme describing its origin." but again, what's the point of it all if that library is not being used anywhere in Moodle core files? I remain puzzled, Joseph
              Hide
              francois Francois Marier added a comment -

              It's also used for example when emailing out forum posts.

              Show
              francois Francois Marier added a comment - It's also used for example when emailing out forum posts.
              Hide
              rezeau Joseph Rézeau added a comment -

              François, sorry but in moodle 1.9 I cannot find a reference to the html_to_text($html) function being used anywhere. In which script is it actually used?

              Show
              rezeau Joseph Rézeau added a comment - François, sorry but in moodle 1.9 I cannot find a reference to the html_to_text($html) function being used anywhere. In which script is it actually used?
              Hide
              jsegarra Juan Segarra Montesinos added a comment -

              Hi

              html2text is not working correctly in 1.9.5. To reproduce the problem:

              1. Write a forum email with non ASCII charaters
              2. Send the email

              Look at the text/plain part. Part of the body is incorrectly encoded.

              Problem is in method _convert() in html2text.php. html_entity_decode() works in latin1 by default, so first text should be converted to latin1 or specify what's the input encoding.

              The patch attached seems to solve the problem.

              Thanks in advance

              Show
              jsegarra Juan Segarra Montesinos added a comment - Hi html2text is not working correctly in 1.9.5. To reproduce the problem: 1. Write a forum email with non ASCII charaters 2. Send the email Look at the text/plain part. Part of the body is incorrectly encoded. Problem is in method _convert() in html2text.php. html_entity_decode() works in latin1 by default, so first text should be converted to latin1 or specify what's the input encoding. The patch attached seems to solve the problem. Thanks in advance
              Hide
              francois Francois Marier added a comment -

              Joseph, if you look in lib/weblib.php, the html_to_text() function is used once within the format_text_email() function:

              function format_text_email($text, $format) {

              switch ($format)

              { ... case FORMAT_HTML: return html_to_text($text); break; ... }

              }

              see: http://git.moodle.org/gw?p=moodle.git;a=blob;f=lib/weblib.php;h=0a6dec42bcb2ea5288aacad9822b292a6e9a3460;hb=MOODLE_19_STABLE#l1775

              Show
              francois Francois Marier added a comment - Joseph, if you look in lib/weblib.php, the html_to_text() function is used once within the format_text_email() function: function format_text_email($text, $format) { switch ($format) { ... case FORMAT_HTML: return html_to_text($text); break; ... } } see: http://git.moodle.org/gw?p=moodle.git;a=blob;f=lib/weblib.php;h=0a6dec42bcb2ea5288aacad9822b292a6e9a3460;hb=MOODLE_19_STABLE#l1775
              Hide
              francois Francois Marier added a comment -

              Alright, I've got a patch which seems to work both on PHP5 and PHP4.

              Can people please test it and confirm whether or not it fixes their issues? I'm particularly interested to hear whether it works on non-latin locales (for example, Japanese).

              Cheers,
              Francois

              Show
              francois Francois Marier added a comment - Alright, I've got a patch which seems to work both on PHP5 and PHP4. Can people please test it and confirm whether or not it fixes their issues? I'm particularly interested to hear whether it works on non-latin locales (for example, Japanese). Cheers, Francois
              Hide
              jsegarra Juan Segarra Montesinos added a comment -

              Sorry for the noise guys, but I submitted a wrong patch the other day... bad day

              This solves our issues with plain text email...

              I'll try to provide feedback for the other patch too.

              bye

              Show
              jsegarra Juan Segarra Montesinos added a comment - Sorry for the noise guys, but I submitted a wrong patch the other day... bad day This solves our issues with plain text email... I'll try to provide feedback for the other patch too. bye
              Hide
              francois Francois Marier added a comment -

              Hi Juan,

              I'm not sure about the conversion to latin1. What happens if there are characters (e.g. Japanese characters) which fall outside of that range?

              This is why I'd like someone using a non-latin1 locale to confirm that the html_entity_decode_utf8 patch works.

              Cheers,
              Francois

              Show
              francois Francois Marier added a comment - Hi Juan, I'm not sure about the conversion to latin1. What happens if there are characters (e.g. Japanese characters) which fall outside of that range? This is why I'd like someone using a non-latin1 locale to confirm that the html_entity_decode_utf8 patch works. Cheers, Francois
              Hide
              jsegarra Juan Segarra Montesinos added a comment -

              Well done Francois

              Show
              jsegarra Juan Segarra Montesinos added a comment - Well done Francois
              Hide
              mlzjens Jens Eremie added a comment -

              Thanks for the Fix Francois and Juan!

              Works on m1.9.5+ (php 5.2.9)

              Cheers,
              Jens

              Show
              mlzjens Jens Eremie added a comment - Thanks for the Fix Francois and Juan! Works on m1.9.5+ (php 5.2.9) Cheers, Jens
              Hide
              stronk7 Eloy Lafuente (stronk7) added a comment -

              Hi,

              as commented in MDL-19499, +1 to use current textlib->entities_to_utf8(), plus tests in lib/simpletest/testweblib.php to see if it works ok. (-1 to add new function into html2text).

              Ciao

              Show
              stronk7 Eloy Lafuente (stronk7) added a comment - Hi, as commented in MDL-19499 , +1 to use current textlib->entities_to_utf8(), plus tests in lib/simpletest/testweblib.php to see if it works ok. (-1 to add new function into html2text). Ciao
              Hide
              francois Francois Marier added a comment -

              Thanks for that Eloy, I'm going to have a look at textlib and test it on PHP4.

              Show
              francois Francois Marier added a comment - Thanks for that Eloy, I'm going to have a look at textlib and test it on PHP4.
              Hide
              francois Francois Marier added a comment -

              New patch based on Eloy's suggestion.

              Show
              francois Francois Marier added a comment - New patch based on Eloy's suggestion.
              Hide
              francois Francois Marier added a comment -

              Fixed in 1.8 and 1.9 (HEAD was not affected by this problem).

              I have updated the unit tests to match the output of this new library. They all pass now.

              Show
              francois Francois Marier added a comment - Fixed in 1.8 and 1.9 (HEAD was not affected by this problem). I have updated the unit tests to match the output of this new library. They all pass now.
              Hide
              stronk7 Eloy Lafuente (stronk7) added a comment -

              I've backported tests to 18_STABLE and they are passing ok under all branches.

              So closing as reviewed. Thanks, Francois B)

              Ciao

              Show
              stronk7 Eloy Lafuente (stronk7) added a comment - I've backported tests to 18_STABLE and they are passing ok under all branches. So closing as reviewed. Thanks, Francois B) Ciao

                People

                • Votes:
                  1 Vote for this issue
                  Watchers:
                  5 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:
                    Fix Release Date:
                    21/Oct/09