Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.2
    • Fix Version/s: 2.3
    • Component/s: General

      Description

      internally clean_text() uses HTMLPurifier for all html texts, we could probably skip this expensive cleaning in case of simple well formed strings...

        Gliffy Diagrams

          Issue Links

            Activity

            skodak Petr Skoda created issue -
            skodak Petr Skoda made changes -
            Field Original Value New Value
            Assignee moodle.com [ moodle.com ] Petr Škoda (skodak) [ skodak ]
            skodak Petr Skoda made changes -
            Status Open [ 1 ] Development in progress [ 3 ]
            skodak Petr Skoda made changes -
            Fix Version/s DEV backlog [ 10464 ]
            Labels triaged
            Hide
            skodak Petr Skoda added a comment - - edited

            Used memory on forum discussion page (macports 64 bit PHP 5.3.10):

            • before - 67MB
            • after (simple texts in forum posts, HTMLPurifier not used on page) - 60MB
            Show
            skodak Petr Skoda added a comment - - edited Used memory on forum discussion page (macports 64 bit PHP 5.3.10): before - 67MB after (simple texts in forum posts, HTMLPurifier not used on page) - 60MB
            Hide
            skodak Petr Skoda added a comment -

            Loop test:

             
            $text = "<br />abc\n<p>def<em>efg</em><strong>hi<br />j</strong></p>";
             
            $start = time();
            for ($i=0;$i<10000;$i++) {
                purify_html($text);
            }
            echo "purifier: ".(time()-$start)." s<br />";
             
            $start = time();
            for ($i=0;$i<10000;$i++) {
                is_purify_html_necessary($text);
            }
            echo "detection only: ".(time()-$start)." s<br />";

            Result:

            purifier: 29 s
            detection only: 0 s

            Show
            skodak Petr Skoda added a comment - Loop test:   $text = "<br />abc\n<p>def<em>efg</em><strong>hi<br />j</strong></p>";   $start = time(); for ($i=0;$i<10000;$i++) { purify_html($text); } echo "purifier: ".(time()-$start)." s<br />";   $start = time(); for ($i=0;$i<10000;$i++) { is_purify_html_necessary($text); } echo "detection only: ".(time()-$start)." s<br />"; Result: purifier: 29 s detection only: 0 s
            skodak Petr Skoda made changes -
            Status Development in progress [ 3 ] Waiting for integration review [ 10010 ]
            Pull Master Diff URL https://github.com/skodak/moodle/compare/master...wip_MDL-32109_m23_fastpurify https://github.com/skodak/moodle/compare/master...w12_MDL-32109_m23_fastpurify
            Pull Master Branch w12_MDL-32109_m23_fastpurify
            Pull from Repository git://github.com/skodak/moodle.git
            Fix Version/s 2.3 [ 10657 ]
            Fix Version/s DEV backlog [ 10464 ]
            Testing Instructions 1/ run lib/simpletest/testhtmlpurifier.php
            skodak Petr Skoda made changes -
            Labels triaged performance triaged
            skodak Petr Skoda made changes -
            Comment [ A comment with security level 'Integrators' was removed. ]
            Hide
            samhemelryk Sam Hemelryk added a comment -

            The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week.

            TIA and ciao

            Show
            samhemelryk Sam Hemelryk added a comment - The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week. TIA and ciao
            nebgor Aparup Banerjee made changes -
            Status Waiting for integration review [ 10010 ] Integration review in progress [ 10004 ]
            Integrator nebgor
            Currently in integration Yes [ 10041 ]
            Hide
            nebgor Aparup Banerjee added a comment -

            Hi Petr,
            this looks really cool. We should have more of these simple/quick checks for overall performance improvements i think.

            I did wonder how much does the checking add to load/time in complex/large pages with large strings ?

            I would just suggest renaming is_purify_html_necessary() to something like is_purify_html_required() or is_html_complex(since thats what the function is really checking)

            Show
            nebgor Aparup Banerjee added a comment - Hi Petr, this looks really cool. We should have more of these simple/quick checks for overall performance improvements i think. I did wonder how much does the checking add to load/time in complex/large pages with large strings ? I would just suggest renaming is_purify_html_necessary() to something like is_purify_html_required() or is_html_complex(since thats what the function is really checking)
            Hide
            skodak Petr Skoda added a comment -

            Hi,

            1/ did you see:

            if (strpos($text, '&') !== false or preg_match('|<[^pesb/]|', $text)) 

            It should kick in an vast majority of complex texts. The cost should be very tiny compared to the following htmlpurifier processing.

            2/ I guess the name is not important as long as it is grammatically correct because it is used form a single place only and is marked as @private which should exclude it from docs.

            Show
            skodak Petr Skoda added a comment - Hi, 1/ did you see: if (strpos($text, '&') !== false or preg_match('|<[^pesb/]|', $text)) It should kick in an vast majority of complex texts. The cost should be very tiny compared to the following htmlpurifier processing. 2/ I guess the name is not important as long as it is grammatically correct because it is used form a single place only and is marked as @private which should exclude it from docs.
            Hide
            skodak Petr Skoda added a comment -

            rebased + migrated tests to phpunit

            Show
            skodak Petr Skoda added a comment - rebased + migrated tests to phpunit
            Hide
            nebgor Aparup Banerjee added a comment -

            Thanks for this, its been integrated into master.

            Show
            nebgor Aparup Banerjee added a comment - Thanks for this, its been integrated into master.
            nebgor Aparup Banerjee made changes -
            Status Integration review in progress [ 10004 ] Waiting for testing [ 10005 ]
            nebgor Aparup Banerjee made changes -
            Testing Instructions 1/ run lib/simpletest/testhtmlpurifier.php
            1/ run lib/simpletest/testhtmlpurifier.php

            tester: some notable speed improvements would be great feedback.
            abgreeve Adrian Greeve made changes -
            Status Waiting for testing [ 10005 ] Testing in progress [ 10011 ]
            Tester abgreeve
            Hide
            abgreeve Adrian Greeve added a comment -

            I got no errors in the unit tests. I didn't notice any difference in speed.
            I tested this patch with a discussion that had 100 posts in it to see what the statistics were like they are as follows:

            pre-patch
            5.086309 secs RAM: 69MB RAM peak: 69.9MB
            5.719542 secs RAM: 69MB RAM peak: 69.9MB
            5.465588 secs RAM: 68.9MB RAM peak: 69.9MB
            5.34252 secs RAM: 69MB RAM peak: 69.9MB

            post-patch
            5.388531 secs RAM: 61.9MB RAM peak: 62.8MB
            4.854898 secs RAM: 61.8MB RAM peak: 62.7MB
            5.165334 secs RAM: 61.8MB RAM peak: 62.7MB
            5.124471 secs RAM: 61.8MB RAM peak: 62.7MB

            So there is a bit of improvement with the RAM.

            Show
            abgreeve Adrian Greeve added a comment - I got no errors in the unit tests. I didn't notice any difference in speed. I tested this patch with a discussion that had 100 posts in it to see what the statistics were like they are as follows: pre-patch 5.086309 secs RAM: 69MB RAM peak: 69.9MB 5.719542 secs RAM: 69MB RAM peak: 69.9MB 5.465588 secs RAM: 68.9MB RAM peak: 69.9MB 5.34252 secs RAM: 69MB RAM peak: 69.9MB post-patch 5.388531 secs RAM: 61.9MB RAM peak: 62.8MB 4.854898 secs RAM: 61.8MB RAM peak: 62.7MB 5.165334 secs RAM: 61.8MB RAM peak: 62.7MB 5.124471 secs RAM: 61.8MB RAM peak: 62.7MB So there is a bit of improvement with the RAM.
            abgreeve Adrian Greeve made changes -
            Status Testing in progress [ 10011 ] Tested [ 10006 ]
            Hide
            stronk7 Eloy Lafuente (stronk7) added a comment -

            And this has landed upstream, finally! Yay!

            תודה רבה && شكرا جزيلا



            Closing, ciao

            Show
            stronk7 Eloy Lafuente (stronk7) added a comment - And this has landed upstream, finally! Yay! תודה רבה && شكرا جزيلا Closing, ciao
            stronk7 Eloy Lafuente (stronk7) made changes -
            Status Tested [ 10006 ] Closed [ 6 ]
            Resolution Fixed [ 1 ]
            Currently in integration Yes [ 10041 ]
            Integration date 29/Mar/12
            skodak Petr Skoda made changes -
            Link This issue has been marked as being related by MDL-29981 [ MDL-29981 ]

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Fix Release Date:
                  25/Jun/12