Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-17542

html2text appears to be incompatible with the GPL

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.8, 1.7.6, 1.8.7, 1.9.3
    • Fix Version/s: 1.8.9, 1.9.5
    • Component/s: Libraries
    • Labels:
      None

      Description

      This library was reported in http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=507947

      The license for html2text is not included in the code, but rather on the website of the author:

      http://www.howtocreate.co.uk/jslibs/termsOfUse.html

      The most problematic bit of html2text's license is the requirement to get written permission to distribute:

      "However, if you want to include the scripts [...] in bundled or distributed scripts/software (such as Opera or Dashboard widgets, or the Formativ Academic Timetable), you should contact me first and obtain my express written permission."

      Then, there is a bit of discrimination for commercial sites through the addition of an "advertising clause":

      "Commercial/profit-making public websites:

      You should put a note saying that the script was provided for free by http://www.howtocreate.co.uk.

      You put a plain text (ie. not script) link to my site on your policy page or other appropriate page, to help me with my search engine rankings."

      The Debian developer who filed the bug suggested that we take a look at replacing this library with the one I have attached here. It's got the same name but is under the GPLv2+.

      Based on a quick 1.9 grep, that file is only used in lib/weblib.php:html_to_text(), which in turn is only used in format_text_email() of the same library file.

        Gliffy Diagrams

        1. html2text_replacement18.patch
          28 kB
          Francois Marier
        2. html2text_replacement19.patch
          28 kB
          Francois Marier
        3. html2text_replacement20.patch
          28 kB
          Francois Marier
        4. html2text.new.php
          20 kB
          Francois Marier
        5. html2text.php
          15 kB
          Dongsheng Cai

          Issue Links

            Activity

            francois Francois Marier created issue -
            Hide
            skodak Petr Skoda added a comment -

            thanks, going to review it later this week

            Show
            skodak Petr Skoda added a comment - thanks, going to review it later this week
            skodak Petr Skoda made changes -
            Field Original Value New Value
            Fix Version/s 1.9.4 [ 10300 ]
            Fix Version/s 2.0 [ 10122 ]
            stronk7 Eloy Lafuente (stronk7) made changes -
            Assignee moodle.com [ moodle.com ] Petr ?koda [ skodak ]
            Hide
            stronk7 Eloy Lafuente (stronk7) added a comment -

            Assigning to Petr

            Show
            stronk7 Eloy Lafuente (stronk7) added a comment - Assigning to Petr
            Hide
            francois Francois Marier added a comment -

            Here's a patch which replaces the html2text library with the attached GPL one.

            The output is slightly different and we could help with more testing with different HTML inputs.

            However, I will need to commit this to the Debian package today as it is considered a release-critical issue for Debian.

            Show
            francois Francois Marier added a comment - Here's a patch which replaces the html2text library with the attached GPL one. The output is slightly different and we could help with more testing with different HTML inputs. However, I will need to commit this to the Debian package today as it is considered a release-critical issue for Debian.
            francois Francois Marier made changes -
            Attachment html2text_replacement.patch [ 15888 ]
            Hide
            dougiamas Martin Dougiamas added a comment -

            I'm cool with the patch going into Lenny.

            It can also go into HEAD for further testing (but not 1.9.x yet).

            Show
            dougiamas Martin Dougiamas added a comment - I'm cool with the patch going into Lenny. It can also go into HEAD for further testing (but not 1.9.x yet).
            Hide
            dougiamas Martin Dougiamas added a comment -

            Passing to Dongsheng to run some performance comparisons (that's the main thing I'm worried about).

            Show
            dougiamas Martin Dougiamas added a comment - Passing to Dongsheng to run some performance comparisons (that's the main thing I'm worried about).
            dougiamas Martin Dougiamas made changes -
            Assignee Petr ?koda [ skodak ] Dongsheng Cai [ dongsheng ]
            Hide
            dongsheng Dongsheng Cai added a comment -

            The new one is faster to process <p></p>
            New: 0.0113739967346
            Old: 0.183360099792
            (500 p tags)

            but slower to process <img and <table>
            New: 0.755731105804
            Old: 0.208492994308
            (500 tables, 500 imgs and 500 a tags)

            Show
            dongsheng Dongsheng Cai added a comment - The new one is faster to process <p></p> New: 0.0113739967346 Old: 0.183360099792 (500 p tags) but slower to process <img and <table> New: 0.755731105804 Old: 0.208492994308 (500 tables, 500 imgs and 500 a tags)
            Hide
            francois Francois Marier added a comment -

            > 500 a tags

            There is a setting to disable the processing of links (create a list of URLs at the end I think). Perhaps we should disable that to make it faster?

            Show
            francois Francois Marier added a comment - > 500 a tags There is a setting to disable the processing of links (create a list of URLs at the end I think). Perhaps we should disable that to make it faster?
            Hide
            dongsheng Dongsheng Cai added a comment -

            Hi, Francois
            Another test on links only:
            New: 0.126650094986
            Old: 0.449314832687

            Actually, the new lib faster in everything except <img, the funnest thing is the new lib directly ignore img tags, should we make it support process img tags?

            Show
            dongsheng Dongsheng Cai added a comment - Hi, Francois Another test on links only: New: 0.126650094986 Old: 0.449314832687 Actually, the new lib faster in everything except <img, the funnest thing is the new lib directly ignore img tags, should we make it support process img tags?
            Hide
            dongsheng Dongsheng Cai added a comment -

            I modified this file, now, the new lib wins on all tests

            Show
            dongsheng Dongsheng Cai added a comment - I modified this file, now, the new lib wins on all tests
            dongsheng Dongsheng Cai made changes -
            Attachment html2text.php [ 15889 ]
            Hide
            francois Francois Marier added a comment -

            Hi Dongsheng,

            Here's a new version of the file. There are a few code changes, but the most important one is a security fix for possible code execution (see bottom of http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=508909 for the details).

            Can you please run your script against this one as well?

            Cheers,
            Francois

            Show
            francois Francois Marier added a comment - Hi Dongsheng, Here's a new version of the file. There are a few code changes, but the most important one is a security fix for possible code execution (see bottom of http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=508909 for the details). Can you please run your script against this one as well? Cheers, Francois
            francois Francois Marier made changes -
            Attachment html2text.new.php [ 15897 ]
            Hide
            dongsheng Dongsheng Cai added a comment -

            both OOP html2text versions are fater than the proceduring html2text,
            new version is a bit slower the the old version, I guess it use more regex in code.

            Old OOP: 0.0414950847626

            New OOP: 0.055447101593

            proceduring: 0.39603304863

            the testing text includes 500 links, images, tables and p tags each.

            Show
            dongsheng Dongsheng Cai added a comment - both OOP html2text versions are fater than the proceduring html2text, new version is a bit slower the the old version, I guess it use more regex in code. Old OOP: 0.0414950847626 New OOP: 0.055447101593 proceduring: 0.39603304863 the testing text includes 500 links, images, tables and p tags each.
            francois Francois Marier made changes -
            Attachment html2text.inc [ 15825 ]
            francois Francois Marier made changes -
            Attachment html2text_replacement.patch [ 15888 ]
            Hide
            francois Francois Marier added a comment -

            New patch against 1.9 with the fixed version of html2text.php

            Show
            francois Francois Marier added a comment - New patch against 1.9 with the fixed version of html2text.php
            francois Francois Marier made changes -
            Attachment html2text_replacement19.patch [ 15898 ]
            Hide
            francois Francois Marier added a comment -

            New patch against 1.8 with the new version of html2text.php

            Show
            francois Francois Marier added a comment - New patch against 1.8 with the new version of html2text.php
            francois Francois Marier made changes -
            Attachment html2text_replacement18.patch [ 15899 ]
            Hide
            francois Francois Marier added a comment -

            New patch against HEAD with the fixed version of html2text.php

            Show
            francois Francois Marier added a comment - New patch against HEAD with the fixed version of html2text.php
            francois Francois Marier made changes -
            Attachment html2text_replacement20.patch [ 15900 ]
            francois Francois Marier made changes -
            Status Open [ 1 ] Resolved [ 5 ]
            Fix Version/s 1.9.4 [ 10300 ]
            Resolution Fixed [ 1 ]
            Assignee Dongsheng Cai [ dongsheng ] Francois Marier [ francois ]
            aborrow Anthony Borrow made changes -
            Link This issue will help resolve MDL-18368 [ MDL-18368 ]
            Hide
            aborrow Anthony Borrow added a comment -

            Francois - Since Martin's concern was with performance and the new function was shown to be faster and because of the licensing issues, should we not also apply the changes to 1.9 and 1.8? I came across this file when looking at another issue (MDL-18368). Applying the 1.9 patch cleared up the issue I was experiencing in MDL-18368 so it would resolve licensing, be faster, and less prone to PHP notices. I see no reason not to apply it to 18STABLE and 19STABLE. Peace - Anthony

            Show
            aborrow Anthony Borrow added a comment - Francois - Since Martin's concern was with performance and the new function was shown to be faster and because of the licensing issues, should we not also apply the changes to 1.9 and 1.8? I came across this file when looking at another issue ( MDL-18368 ). Applying the 1.9 patch cleared up the issue I was experiencing in MDL-18368 so it would resolve licensing, be faster, and less prone to PHP notices. I see no reason not to apply it to 18STABLE and 19STABLE. Peace - Anthony
            Hide
            aborrow Anthony Borrow added a comment -

            I am re-opening in the hopes that we can apply this to 1.8 and 1.9 and thus resolve MDL-18368. Thanks for your consideration. Peace - Anthony

            Show
            aborrow Anthony Borrow added a comment - I am re-opening in the hopes that we can apply this to 1.8 and 1.9 and thus resolve MDL-18368 . Thanks for your consideration. Peace - Anthony
            aborrow Anthony Borrow made changes -
            Resolution Fixed [ 1 ]
            Status Resolved [ 5 ] Reopened [ 4 ]
            Hide
            aborrow Anthony Borrow added a comment -

            One last thought/question, if the licensing is an issue for Debian will they not want this fixed in the 1.8 Moodle package?

            Show
            aborrow Anthony Borrow added a comment - One last thought/question, if the licensing is an issue for Debian will they not want this fixed in the 1.8 Moodle package?
            Hide
            francois Francois Marier added a comment -

            I don't mind applying it to 1.8 and 1.9. There are a few differences in the conversions from HTML to text but as far as I remember, it was a matter of the output being "different" (as opposed to being "wrong").

            The Debian packages all use the new code already.

            Francois

            Show
            francois Francois Marier added a comment - I don't mind applying it to 1.8 and 1.9. There are a few differences in the conversions from HTML to text but as far as I remember, it was a matter of the output being "different" (as opposed to being "wrong"). The Debian packages all use the new code already. Francois
            Hide
            francois Francois Marier added a comment -

            Committed the fix to 1.8 and 1.9 stable.

            Show
            francois Francois Marier added a comment - Committed the fix to 1.8 and 1.9 stable.
            francois Francois Marier made changes -
            Status Reopened [ 4 ] Resolved [ 5 ]
            Fix Version/s 1.8.9 [ 10322 ]
            Fix Version/s 1.9.5 [ 10320 ]
            Fix Version/s 2.0 [ 10122 ]
            Resolution Fixed [ 1 ]
            skodak Petr Skoda made changes -
            Link This issue has been marked as being related by MDL-19266 [ MDL-19266 ]
            Hide
            skodak Petr Skoda added a comment -

            looks like we have a serious regression, please review the linked bug report, thanks

            Show
            skodak Petr Skoda added a comment - looks like we have a serious regression, please review the linked bug report, thanks
            skodak Petr Skoda made changes -
            Resolution Fixed [ 1 ]
            Status Resolved [ 5 ] Reopened [ 4 ]
            Hide
            francois Francois Marier added a comment -

            I have committed an updated version of this file to CVS, along with a readme describing its origin.

            Show
            francois Francois Marier added a comment - I have committed an updated version of this file to CVS, along with a readme describing its origin.
            francois Francois Marier made changes -
            Status Reopened [ 4 ] Resolved [ 5 ]
            Resolution Fixed [ 1 ]
            skodak Petr Skoda made changes -
            Link This issue has been marked as being related by MDL-19447 [ MDL-19447 ]
            Hide
            tsala Helen Foster added a comment -

            Thanks Francois

            Show
            tsala Helen Foster added a comment - Thanks Francois
            tsala Helen Foster made changes -
            Status Resolved [ 5 ] Closed [ 6 ]
            QA Assignee tsala
            dougiamas Martin Dougiamas made changes -
            Workflow jira [ 29777 ] MDL Workflow [ 61552 ]
            dougiamas Martin Dougiamas made changes -
            Workflow MDL Workflow [ 61552 ] MDL Full Workflow [ 90746 ]

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Fix Release Date:
                  13/May/09