Moodle
  1. Moodle
  2. MDL-25018

html_to_text destroys UTF-8 strings

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.9.10
    • Fix Version/s: 1.9.11
    • Component/s: Libraries
    • Labels:
      None
    • Affected Branches:
      MOODLE_19_STABLE
    • Fixed Branches:
      MOODLE_19_STABLE
    • Rank:
      1327

      Description

      html_to_text destroys UTF-8.

      Here is an example

      <?php
      require_once('config.php');
      header('Content-type: text/plain; charset=UTF-8');
      $htmlstring = "<p>Choose the correct answer for this occasion:</p><p><strong>How do you say you are sorry e.g.. at someone's misfortunes?</strong><strong></strong></p>";
      echo "Testing input to see if it is UTF-8: ", iconv("UTF-8", "UTF-8", $htmlstring);
      $textstring = html_to_text($htmlstring);
      echo "Testing output to see if it is UTF-8: ", iconv("UTF-8", "UTF-8", $textstring);
      

        Activity

        Hide
        Tim Hunt added a comment -

        Grr! I think Jira just screwed that up.The problem is the ' in someone's, which was a curly quote in the input I copied.

        <?php
        require_once('config.php');
        header('Content-type: text/plain; charset=UTF-8');
        $htmlstring = "<p>How do you say you are sorry for <strong>someone's</strong> misfortunes?</p>";
        echo "Testing input to see if it is UTF-8: ", iconv("UTF-8", "UTF-8", $htmlstring);
        $textstring = html_to_text($htmlstring);
        echo "Testing output to see if it is UTF-8: ", iconv("UTF-8", "UTF-8", $textstring);

        Show
        Tim Hunt added a comment - Grr! I think Jira just screwed that up.The problem is the ' in someone's, which was a curly quote in the input I copied. <?php require_once('config.php'); header('Content-type: text/plain; charset=UTF-8'); $htmlstring = "<p>How do you say you are sorry for <strong>someone's</strong> misfortunes?</p>"; echo "Testing input to see if it is UTF-8: ", iconv("UTF-8", "UTF-8", $htmlstring); $textstring = html_to_text($htmlstring); echo "Testing output to see if it is UTF-8: ", iconv("UTF-8", "UTF-8", $textstring);
        Hide
        Tim Hunt added a comment -

        And it was not just

        { code }

        messing it up. That quote in someone's should be a Right single quotation mark - Unicode | U+2019 (decimal: 8217)

        Anyway, the problem is real, and I am then getting a fatal error when trying to store the result in Postgres.

        Show
        Tim Hunt added a comment - And it was not just { code } messing it up. That quote in someone's should be a Right single quotation mark - Unicode | U+2019 (decimal: 8217) Anyway, the problem is real, and I am then getting a fatal error when trying to store the result in Postgres.
        Hide
        Tim Hunt added a comment -

        Patch to fix this seems pleasingly simple. Please could someone review the attached.

        Show
        Tim Hunt added a comment - Patch to fix this seems pleasingly simple. Please could someone review the attached.
        Hide
        David Mudrak added a comment -

        Good catch Tim. Once committed, please do not forget to comment in lib/html2text_readme.txt again.

        Show
        David Mudrak added a comment - Good catch Tim. Once committed, please do not forget to comment in lib/html2text_readme.txt again.
        Hide
        Tim Hunt added a comment -

        I should also add a unit test. I suppose.

        I am sufficiently confident that this is right to commit it. Thanks for reviewing David, and reminding me to update the readme.

        Show
        Tim Hunt added a comment - I should also add a unit test. I suppose. I am sufficiently confident that this is right to commit it. Thanks for reviewing David, and reminding me to update the readme.
        Hide
        Eloy Lafuente (stronk7) added a comment -

        Nice one Tim,

        anyway, it's me, or you've garbled some other (using utf-8) tests in 19_STABLE?

        http://cvs.moodle.org/moodle/lib/simpletest/testweblib.php?r1=1.6.4.15&r2=1.6.4.16

        Ciao

        Show
        Eloy Lafuente (stronk7) added a comment - Nice one Tim, anyway, it's me, or you've garbled some other (using utf-8) tests in 19_STABLE? http://cvs.moodle.org/moodle/lib/simpletest/testweblib.php?r1=1.6.4.15&r2=1.6.4.16 Ciao
        Hide
        Tim Hunt added a comment -

        Grrrrrr! I was worried about that, so I carefully reviewed the patch in Eclipse before CVS comitting it to ensure it only changed the lines I wanted changed. However, it seems that either Eclipse lied to me, or CVS if crap.

        And, why did it only bread 1.9, not 2.0? That is crazy.

        Anyway, re-fixed now. Possibly. If CVS cooperated. The CVS commit email looks OK, at any rate.

        Show
        Tim Hunt added a comment - Grrrrrr! I was worried about that, so I carefully reviewed the patch in Eclipse before CVS comitting it to ensure it only changed the lines I wanted changed. However, it seems that either Eclipse lied to me, or CVS if crap. And, why did it only bread 1.9, not 2.0? That is crazy. Anyway, re-fixed now. Possibly. If CVS cooperated. The CVS commit email looks OK, at any rate.

          People

          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: