|
[
Permalink
| « Hide
]
Petr ?koda added a comment - 07/Dec/08 05:47 AM
thanks, going to review it later this week
Assigning to Petr
Here's a patch which replaces the html2text library with the attached GPL one.
The output is slightly different and we could help with more testing with different HTML inputs. However, I will need to commit this to the Debian package today as it is considered a release-critical issue for Debian. I'm cool with the patch going into Lenny.
It can also go into HEAD for further testing (but not 1.9.x yet). Passing to Dongsheng to run some performance comparisons (that's the main thing I'm worried about).
The new one is faster to process <p></p>
New: 0.0113739967346 Old: 0.183360099792 (500 p tags) but slower to process <img and <table> > 500 a tags
There is a setting to disable the processing of links (create a list of URLs at the end I think). Perhaps we should disable that to make it faster? Hi, Francois
Another test on links only: New: 0.126650094986 Old: 0.449314832687 Actually, the new lib faster in everything except <img, the funnest thing is the new lib directly ignore img tags, should we make it support process img tags? I modified this file, now, the new lib wins on all tests
Hi Dongsheng,
Here's a new version of the file. There are a few code changes, but the most important one is a security fix for possible code execution (see bottom of http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=508909 Can you please run your script against this one as well? Cheers, both OOP html2text versions are fater than the proceduring html2text,
new version is a bit slower the the old version, I guess it use more regex in code. Old OOP: 0.0414950847626 New OOP: 0.055447101593 proceduring: 0.39603304863 the testing text includes 500 links, images, tables and p tags each. New patch against 1.9 with the fixed version of html2text.php
New patch against 1.8 with the new version of html2text.php
New patch against HEAD with the fixed version of html2text.php
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||