Details
-
Type:
Bug
-
Status: Closed
-
Priority:
Blocker
-
Resolution: Fixed
-
Affects Version/s: 1.9.9, 2.1.4, 2.2.1, 2.3
-
Component/s: Libraries
-
Testing Instructions:
-
Difficulty:Easy
-
Affected Branches:MOODLE_19_STABLE, MOODLE_21_STABLE, MOODLE_22_STABLE, MOODLE_23_STABLE
-
Fixed Branches:MOODLE_21_STABLE, MOODLE_22_STABLE
-
Pull from Repository:
-
Pull Master Branch:wip-mdl-22896
-
Pull Master Diff URL:
Description
Greetings.. I believe I've found and fixed a bug in the html2text library.
In /lib/html2text.php...
---------------------------
478 // Remove unknown/unhandled entities (this cannot be done in search-and-replace block)
479 $text = preg_replace('/&[^&;]+;/i', '', $text);
---------------------------
That regular expression is too greedy... it matches any sequence of characters that starts with an ampersand and ends with a semicolon.
We've had numerous reports from users that huge chunks of forum posts are missing from the plain-text emails they receive by subscription.
The problem occurs when someone happens to include an ampersand in their text, and also a semicolon somewhere. Anything between those two characters is filtered out.
Here's an example...
Gin & Tonic
- 2oz gin;
- 5oz tonic water;
- 5 cubes of ice;
- 1 lime wedge.
if you ran that through html2text, it would output this..
Gin
- 5oz tonic water;
- 5 cubes of ice;
- 1 lime wedge.
The simple fix I am testing now is this:
479 $text = preg_replace('/&[^&;\s]+;/i', '', $text);
The additional \s makes sure the match stops on whitespace.
Best regards,
-Garret