Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-22896

bad regular expression in html2text library causes text to go missing from forum emails

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 1.9.9, 2.1.4, 2.2.1, 2.3
    • 2.1.5, 2.2.2
    • Libraries
    • MOODLE_19_STABLE, MOODLE_21_STABLE, MOODLE_22_STABLE, MOODLE_23_STABLE
    • MOODLE_21_STABLE, MOODLE_22_STABLE
    • wip-mdl-22896
    • Easy
    • Hide

      Note: To test this, you should have email working for forum.

      1. set "Email format" as plain text in your profile.
      2. Add a forum post with following text with "Mail now" checked

        Gin & Tonic
        - 2oz gin;
        - 5oz tonic water;
        - 5 cubes of ice;
        - 1 lime wedge.

      3. Run cron /admin/cron.php after 1 min.
      4. Make sure no text is lost.
      Show
      Note: To test this, you should have email working for forum. set "Email format" as plain text in your profile. Add a forum post with following text with "Mail now" checked Gin & Tonic - 2oz gin; - 5oz tonic water; - 5 cubes of ice; - 1 lime wedge. Run cron /admin/cron.php after 1 min. Make sure no text is lost.

    Description

      Greetings.. I believe I've found and fixed a bug in the html2text library.

      In /lib/html2text.php...
      ---------------------------
      478 // Remove unknown/unhandled entities (this cannot be done in search-and-replace block)
      479 $text = preg_replace('/&[^&;]+;/i', '', $text);
      ---------------------------

      That regular expression is too greedy... it matches any sequence of characters that starts with an ampersand and ends with a semicolon.

      We've had numerous reports from users that huge chunks of forum posts are missing from the plain-text emails they receive by subscription.

      The problem occurs when someone happens to include an ampersand in their text, and also a semicolon somewhere. Anything between those two characters is filtered out.

      Here's an example...

      Gin & Tonic

      • 2oz gin;
      • 5oz tonic water;
      • 5 cubes of ice;
      • 1 lime wedge.

      if you ran that through html2text, it would output this..

      Gin

      • 5oz tonic water;
      • 5 cubes of ice;
      • 1 lime wedge.

      The simple fix I am testing now is this:
      479 $text = preg_replace('/&[^&;\s]+;/i', '', $text);

      The additional \s makes sure the match stops on whitespace.

      Best regards,
      -Garret

      Attachments

        Issue Links

          Activity

            People

              rajeshtaneja Rajesh Taneja
              garretg Garret Gengler
              Gerard Caulfield Gerard Caulfield
              Aparup Banerjee Aparup Banerjee
              Ankit Agarwal Ankit Agarwal
              Amaia Anabitarte, Carlos Escobedo, Ferran Recio, Ilya Tregubov, Sara Arjona (@sarjona)
              Votes:
              9 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:
                12/Mar/12