Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-25018

html_to_text destroys UTF-8 strings

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.9.10
    • Fix Version/s: 1.9.11
    • Component/s: Libraries
    • Labels:
      None
    • Affected Branches:
      MOODLE_19_STABLE
    • Fixed Branches:
      MOODLE_19_STABLE

      Description

      html_to_text destroys UTF-8.

      Here is an example

      <?php
      require_once('config.php');
      header('Content-type: text/plain; charset=UTF-8');
      $htmlstring = "<p>Choose the correct answer for this occasion:</p><p><strong>How do you say you are sorry e.g.. at someone's misfortunes?</strong><strong></strong></p>";
      echo "Testing input to see if it is UTF-8: ", iconv("UTF-8", "UTF-8", $htmlstring);
      $textstring = html_to_text($htmlstring);
      echo "Testing output to see if it is UTF-8: ", iconv("UTF-8", "UTF-8", $textstring);

        Gliffy Diagrams

          Activity

          Hide
          timhunt Tim Hunt added a comment -

          Grr! I think Jira just screwed that up.The problem is the ' in someone's, which was a curly quote in the input I copied.

          <?php
          require_once('config.php');
          header('Content-type: text/plain; charset=UTF-8');
          $htmlstring = "<p>How do you say you are sorry for <strong>someone's</strong> misfortunes?</p>";
          echo "Testing input to see if it is UTF-8: ", iconv("UTF-8", "UTF-8", $htmlstring);
          $textstring = html_to_text($htmlstring);
          echo "Testing output to see if it is UTF-8: ", iconv("UTF-8", "UTF-8", $textstring);

          Show
          timhunt Tim Hunt added a comment - Grr! I think Jira just screwed that up.The problem is the ' in someone's, which was a curly quote in the input I copied. <?php require_once('config.php'); header('Content-type: text/plain; charset=UTF-8'); $htmlstring = "<p>How do you say you are sorry for <strong>someone's</strong> misfortunes?</p>"; echo "Testing input to see if it is UTF-8: ", iconv("UTF-8", "UTF-8", $htmlstring); $textstring = html_to_text($htmlstring); echo "Testing output to see if it is UTF-8: ", iconv("UTF-8", "UTF-8", $textstring);
          Hide
          timhunt Tim Hunt added a comment -

          And it was not just

          { code }

          messing it up. That quote in someone's should be a Right single quotation mark - Unicode | U+2019 (decimal: 8217)

          Anyway, the problem is real, and I am then getting a fatal error when trying to store the result in Postgres.

          Show
          timhunt Tim Hunt added a comment - And it was not just { code } messing it up. That quote in someone's should be a Right single quotation mark - Unicode | U+2019 (decimal: 8217) Anyway, the problem is real, and I am then getting a fatal error when trying to store the result in Postgres.
          Hide
          timhunt Tim Hunt added a comment -

          Patch to fix this seems pleasingly simple. Please could someone review the attached.

          Show
          timhunt Tim Hunt added a comment - Patch to fix this seems pleasingly simple. Please could someone review the attached.
          Hide
          mudrd8mz David Mudrak added a comment -

          Good catch Tim. Once committed, please do not forget to comment in lib/html2text_readme.txt again.

          Show
          mudrd8mz David Mudrak added a comment - Good catch Tim. Once committed, please do not forget to comment in lib/html2text_readme.txt again.
          Hide
          timhunt Tim Hunt added a comment -

          I should also add a unit test. I suppose.

          I am sufficiently confident that this is right to commit it. Thanks for reviewing David, and reminding me to update the readme.

          Show
          timhunt Tim Hunt added a comment - I should also add a unit test. I suppose. I am sufficiently confident that this is right to commit it. Thanks for reviewing David, and reminding me to update the readme.
          Hide
          stronk7 Eloy Lafuente (stronk7) added a comment -

          Nice one Tim,

          anyway, it's me, or you've garbled some other (using utf-8) tests in 19_STABLE?

          http://cvs.moodle.org/moodle/lib/simpletest/testweblib.php?r1=1.6.4.15&r2=1.6.4.16

          Ciao

          Show
          stronk7 Eloy Lafuente (stronk7) added a comment - Nice one Tim, anyway, it's me, or you've garbled some other (using utf-8) tests in 19_STABLE? http://cvs.moodle.org/moodle/lib/simpletest/testweblib.php?r1=1.6.4.15&r2=1.6.4.16 Ciao
          Hide
          timhunt Tim Hunt added a comment -

          Grrrrrr! I was worried about that, so I carefully reviewed the patch in Eclipse before CVS comitting it to ensure it only changed the lines I wanted changed. However, it seems that either Eclipse lied to me, or CVS if crap.

          And, why did it only bread 1.9, not 2.0? That is crazy.

          Anyway, re-fixed now. Possibly. If CVS cooperated. The CVS commit email looks OK, at any rate.

          Show
          timhunt Tim Hunt added a comment - Grrrrrr! I was worried about that, so I carefully reviewed the patch in Eclipse before CVS comitting it to ensure it only changed the lines I wanted changed. However, it seems that either Eclipse lied to me, or CVS if crap. And, why did it only bread 1.9, not 2.0? That is crazy. Anyway, re-fixed now. Possibly. If CVS cooperated. The CVS commit email looks OK, at any rate.

            People

            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Fix Release Date:
                21/Feb/11