Moodle
  1. Moodle
  2. MDL-25826

HTMLPurifier rewrites definition cache on every request

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0.3
    • Component/s: Performance
    • Environment:
      Produced on linux cluster under Apache httpd, PHP 5.2.x and PostgreSQL 8.4, but should be irrelevant.
    • Database:
      Any
    • Affected Branches:
      MOODLE_20_STABLE
    • Fixed Branches:
      MOODLE_20_STABLE
    • Rank (Obsolete):
      15333

      Description

      During performance testing – with HTMLPurifier enabled, we witnessed Moodle 2.x re-wrote the HTMLPurifier cached serialised definition file for every request which used the purifier; this thoroughly destroyed performance of the entire application.

      This is a major point of contention performance-wise, as all other pages being purified will block until completion of this write. This is heavily exaggerated when used in an environment with shared storage or a clustered filesystem used across multiple webservers and is essentially the worst use-case for a shared filesystem (multiple writes to the same file in the same directory, concurrently from multiple nodes).

      I'm not sure if this is "correct" behaviour from HTMLPurifier (i.e. does it need to re-write this file frequently?), but I suggest this needs to be repaired to not re-write the serialised definition file and use the already cached file (if possible), otherwise alternate config options would need to be provided so that these cached serialised definition files can be written to alternate directories (e.g. one for each webserver perhaps).

      The poor performing call stack is:

      lib/weblib.php: purify_html()
      => lib/htmlpurifier/HTMLPurifier.php: HTMLPurifier->purify()
      => lib/htmlpurifier/HTMLPurifier/Generator.php: HTMLPurifier_Generator->__construct()
      => lib/htmlpurifier/HTMLPurifier/Config.php: HTMLPurifier_Config->getHTMLDefinition()
      => lib/htmlpurifier/HTMLPurifier/Config.php: HTMLPurifier_Config->getDefinition()
      => lib/htmlpurifier/HTMLPurifier/DefinitionCache/Decorator/Cleanup.php: HTMLPurifier_DefinitionCache_Decorator_Cleanup->set()
      => lib/htmlpurifier/HTMLPurifier/DefinitionCache/Decorator.php: HTMLPurifier_DefinitionCache_Decorator->set()
      => lib/htmlpurifier/HTMLPurifier/DefinitionCache/Serializer.php: HTMLPurifier_DefinitionCache_Serializer->set()
      => lib/htmlpurifier/HTMLPurifier/DefinitionCache/Serializer.php: HTMLPurifier_DefinitionCache_Serializer->_write()

        Activity

        Hide
        Ashley Holman added a comment -

        HTMLPurifier does not appear to have a public bug tracker so I've posted to their forums: http://htmlpurifier.org/phorum/read.php?5,5164

        Will see what they have to say.

        Show
        Ashley Holman added a comment - HTMLPurifier does not appear to have a public bug tracker so I've posted to their forums: http://htmlpurifier.org/phorum/read.php?5,5164 Will see what they have to say.
        Hide
        Ashley Holman added a comment -

        Good thing the HTML Purifier author is extremely responsive and helpful!

        He has developed a fix for this: http://repo.or.cz/w/htmlpurifier.git/commit/f3d050c517aab64a24f077ee6ea9009381db9de4

        That patch should eventually get bundled into a new release of htmlpurifier which should in turn get packaged into Moodle.

        I have attached two patches:

        • htmlpurifier-cache-definitions.patch: this applies the upstream fix from html purifier
        • weblib-html-purify-cache-fix.patch: this changes weblib's html_purify() function to use the new fixed API
        Show
        Ashley Holman added a comment - Good thing the HTML Purifier author is extremely responsive and helpful! He has developed a fix for this: http://repo.or.cz/w/htmlpurifier.git/commit/f3d050c517aab64a24f077ee6ea9009381db9de4 That patch should eventually get bundled into a new release of htmlpurifier which should in turn get packaged into Moodle. I have attached two patches: htmlpurifier-cache-definitions.patch: this applies the upstream fix from html purifier weblib-html-purify-cache-fix.patch: this changes weblib's html_purify() function to use the new fixed API
        Hide
        Petr Skoda added a comment -

        The new release should be available soon, it should also contain our permission improvements. I will integrate it as soon if possible, if it is not ready before 2.0.2 we will have to use the backport patches.

        Thanks a lot!

        Show
        Petr Skoda added a comment - The new release should be available soon, it should also contain our permission improvements. I will integrate it as soon if possible, if it is not ready before 2.0.2 we will have to use the backport patches. Thanks a lot!
        Hide
        Eloy Lafuente (stronk7) added a comment -

        +1.99

        Show
        Eloy Lafuente (stronk7) added a comment - +1.99
        Hide
        Petr Skoda added a comment -

        I am afraid this will have to wait till after the 2.0.2 release which will happen sometime next week, because it was not yet released upstream and we do not have enough time to test for potential regressions.

        Sorry.

        Show
        Petr Skoda added a comment - I am afraid this will have to wait till after the 2.0.2 release which will happen sometime next week, because it was not yet released upstream and we do not have enough time to test for potential regressions. Sorry.
        Hide
        Petr Skoda added a comment -

        This issues should be finally solved in the next weekly build, thanks for the report and patches.

        Show
        Petr Skoda added a comment - This issues should be finally solved in the next weekly build, thanks for the report and patches.

          People

          • Votes:
            3 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: