Moodle
  1. Moodle
  2. MDL-25826

HTMLPurifier rewrites definition cache on every request

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0.3
    • Component/s: Performance
    • Environment:
      Produced on linux cluster under Apache httpd, PHP 5.2.x and PostgreSQL 8.4, but should be irrelevant.
    • Database:
      Any
    • Affected Branches:
      MOODLE_20_STABLE
    • Fixed Branches:
      MOODLE_20_STABLE

      Description

      During performance testing – with HTMLPurifier enabled, we witnessed Moodle 2.x re-wrote the HTMLPurifier cached serialised definition file for every request which used the purifier; this thoroughly destroyed performance of the entire application.

      This is a major point of contention performance-wise, as all other pages being purified will block until completion of this write. This is heavily exaggerated when used in an environment with shared storage or a clustered filesystem used across multiple webservers and is essentially the worst use-case for a shared filesystem (multiple writes to the same file in the same directory, concurrently from multiple nodes).

      I'm not sure if this is "correct" behaviour from HTMLPurifier (i.e. does it need to re-write this file frequently?), but I suggest this needs to be repaired to not re-write the serialised definition file and use the already cached file (if possible), otherwise alternate config options would need to be provided so that these cached serialised definition files can be written to alternate directories (e.g. one for each webserver perhaps).

      The poor performing call stack is:

      lib/weblib.php: purify_html()
      => lib/htmlpurifier/HTMLPurifier.php: HTMLPurifier->purify()
      => lib/htmlpurifier/HTMLPurifier/Generator.php: HTMLPurifier_Generator->__construct()
      => lib/htmlpurifier/HTMLPurifier/Config.php: HTMLPurifier_Config->getHTMLDefinition()
      => lib/htmlpurifier/HTMLPurifier/Config.php: HTMLPurifier_Config->getDefinition()
      => lib/htmlpurifier/HTMLPurifier/DefinitionCache/Decorator/Cleanup.php: HTMLPurifier_DefinitionCache_Decorator_Cleanup->set()
      => lib/htmlpurifier/HTMLPurifier/DefinitionCache/Decorator.php: HTMLPurifier_DefinitionCache_Decorator->set()
      => lib/htmlpurifier/HTMLPurifier/DefinitionCache/Serializer.php: HTMLPurifier_DefinitionCache_Serializer->set()
      => lib/htmlpurifier/HTMLPurifier/DefinitionCache/Serializer.php: HTMLPurifier_DefinitionCache_Serializer->_write()

        Gliffy Diagrams

          Activity

          Hide
          Ashley Holman added a comment -

          HTMLPurifier does not appear to have a public bug tracker so I've posted to their forums: http://htmlpurifier.org/phorum/read.php?5,5164

          Will see what they have to say.

          Show
          Ashley Holman added a comment - HTMLPurifier does not appear to have a public bug tracker so I've posted to their forums: http://htmlpurifier.org/phorum/read.php?5,5164 Will see what they have to say.
          Hide
          Ashley Holman added a comment -

          Good thing the HTML Purifier author is extremely responsive and helpful!

          He has developed a fix for this: http://repo.or.cz/w/htmlpurifier.git/commit/f3d050c517aab64a24f077ee6ea9009381db9de4

          That patch should eventually get bundled into a new release of htmlpurifier which should in turn get packaged into Moodle.

          I have attached two patches:

          • htmlpurifier-cache-definitions.patch: this applies the upstream fix from html purifier
          • weblib-html-purify-cache-fix.patch: this changes weblib's html_purify() function to use the new fixed API
          Show
          Ashley Holman added a comment - Good thing the HTML Purifier author is extremely responsive and helpful! He has developed a fix for this: http://repo.or.cz/w/htmlpurifier.git/commit/f3d050c517aab64a24f077ee6ea9009381db9de4 That patch should eventually get bundled into a new release of htmlpurifier which should in turn get packaged into Moodle. I have attached two patches: htmlpurifier-cache-definitions.patch: this applies the upstream fix from html purifier weblib-html-purify-cache-fix.patch: this changes weblib's html_purify() function to use the new fixed API
          Hide
          Petr Skoda added a comment -

          The new release should be available soon, it should also contain our permission improvements. I will integrate it as soon if possible, if it is not ready before 2.0.2 we will have to use the backport patches.

          Thanks a lot!

          Show
          Petr Skoda added a comment - The new release should be available soon, it should also contain our permission improvements. I will integrate it as soon if possible, if it is not ready before 2.0.2 we will have to use the backport patches. Thanks a lot!
          Hide
          Eloy Lafuente (stronk7) added a comment -

          +1.99

          Show
          Eloy Lafuente (stronk7) added a comment - +1.99
          Hide
          Petr Skoda added a comment -

          I am afraid this will have to wait till after the 2.0.2 release which will happen sometime next week, because it was not yet released upstream and we do not have enough time to test for potential regressions.

          Sorry.

          Show
          Petr Skoda added a comment - I am afraid this will have to wait till after the 2.0.2 release which will happen sometime next week, because it was not yet released upstream and we do not have enough time to test for potential regressions. Sorry.
          Hide
          Petr Skoda added a comment -

          This issues should be finally solved in the next weekly build, thanks for the report and patches.

          Show
          Petr Skoda added a comment - This issues should be finally solved in the next weekly build, thanks for the report and patches.

            People

            • Votes:
              3 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: