Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-70631

Poor performance of zip_packer::extract_to_pathname()

    XMLWordPrintable

    Details

    • Testing Instructions:
      Hide

      Exploratory testing welcome. Please try with a wide range of various ZIP files such as:

      • Moodle plugins - see MDLSITE-6129 for an example of one that was causing particular troubles
      • Personal data exported from a Moodle site
      • etc.

      You can use the attached unzip.php script to test the functionality and compare the time it needed, e.g.:

      $ time php unzip.php test.zip /tmp
      

       

      You'll need to run the previous command with and without the patch using different "test.zip" files. For instance (as mentioned before): Moodle plugins, personal data exported, large H5P files...

      Verify time with the patch is shorter than without it. You can compare both manually or you can use your favourite diff utility.

      (Optional) Extra bonus for having profiling enabled while performing your tests and comparisons.
       

      Show
      Exploratory testing welcome. Please try with a wide range of various ZIP files such as: Moodle plugins - see MDLSITE-6129 for an example of one that was causing particular troubles Personal data exported from a Moodle site etc. You can use the attached unzip.php script to test the functionality and compare the time it needed, e.g.: $ time php unzip.php test.zip /tmp   You'll need to run the previous command with and without the patch using different "test.zip" files. For instance (as mentioned before): Moodle plugins, personal data exported, large H5P files... Verify  time with the patch is shorter than without it. You can compare both manually or you can use your favourite diff utility. (Optional) Extra bonus for having profiling enabled while performing your tests and comparisons.  
    • Affected Branches:
      MOODLE_310_STABLE
    • Fixed Branches:
      MOODLE_310_STABLE, MOODLE_39_STABLE
    • Pull from Repository:
    • Pull 3.9 Branch:
      MDL-70631-39-unzip
    • Pull 3.10 Branch:
      MDL-70631-310-unzip
    • Pull 3.11 Branch:
      MDL-70631-311-unzip
    • Pull Master Branch:
      MDL-70631-master-unzip

      Description

      It takes extremely long to extract a ZIP archive, especially if it contains many files.

      This was originally raised as MDLSITE-6114 and MDLSITE-6129 where plugin developers experienced timeouts when they were submitting plugins to the Plugins directory. Moodle did not manage to extract the submitted ZIP and timed out.

      Comments in MDLSITE-6129 have the whole story, the executive summary follows.

      Eloy Lafuente (stronk7) correctly identified the bottleneck in the current implementation of zip_packer::extract_to_pathname() which iterates over all files in the archive, obtains a stream resource for each of the files, reads from the stream in 256KB blocks and writes them into the target location. ZipArchive::getStream() takes significant time in this whole chain and if there are many files (e.g. a plugin with vendor folder), the difference becomes significant.

      It was suggested to switch to the alternative implementation that makes use of ZipArchive::extractTo(). That was confirmed to have significantly improved performance. During the development, a PHP bug in the ZipArchive extensions was discovered and communicated upstream.

      This issue brings a new version of the method which works significantly faster than the previous implementation and has work around for the said upstream PHP bug.

        Attachments

        1. image-2021-02-11-10-10-17-282.png
          image-2021-02-11-10-10-17-282.png
          19 kB
        2. screenshot-1.png
          screenshot-1.png
          62 kB
        3. unzip.php
          0.8 kB

          Issue Links

            Activity

              People

              Assignee:
              mudrd8mz David Mudrák (@mudrd8mz)
              Reporter:
              mudrd8mz David Mudrák (@mudrd8mz)
              Peer reviewer:
              Víctor Déniz Falcón
              Integrator:
              Sara Arjona (@sarjona)
              Tester:
              Janelle Barcega
              Participants:
              Component watchers:
              Matteo Scaramuccia, Andrew Nicols, Dongsheng Cai, Huong Nguyen, Jun Pataleta, Michael Hawkins, Shamim Rezaie, Simey Lameze, Amaia Anabitarte, Carlos Escobedo, Ferran Recio, Ilya Tregubov, Sara Arjona (@sarjona)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Fix Release Date:
                8/Mar/21

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 1 day, 3 hours, 45 minutes
                  1d 3h 45m