Moodle
  1. Moodle
  2. MDL-39339

zip_archive should ignore system files

    Details

    • Testing Instructions:
      Hide
      • Ensure that you have the PHP INTL extension installed (the OSX test zip requires it to standardise unicode)
      • Run the zip_packer unit tests (lib/filestorage/tests/zip_packer_test.php)
      • Drag and drop lib/filestorage/tests/fixtures/test_osx_compress.zip into a Moodle course, choosing to extract it to create a Folder
      • Confirm that the resulting Folder does not contain a __MACOSX folder or any .DS_Store files
      • Drag and drop lib/filestorage/tests/fixtures/test_thumbsdb.zip into a Moodle course, choosing to extract it to create a Folder
      • Confirm that the resulting Folder does not contain a Thumbs.db file

      If you have access to a Mac and/or a computer running WinXP, you can also create and test your own zip files. WinXP requires that thumbnail caching be enabled, and you can then trigger creation of Thumbs.db files by opening a folder containing images (JPG works, and some other formats may too). OSX is a bit trickier; you need to open a folder in Finder, move the Finder window around and resize it, change the view mode and then hold your tongue in the right position while you zip the folder up ("Compress" from the context menu).

      Show
      Ensure that you have the PHP INTL extension installed (the OSX test zip requires it to standardise unicode) Run the zip_packer unit tests (lib/filestorage/tests/zip_packer_test.php) Drag and drop lib/filestorage/tests/fixtures/test_osx_compress.zip into a Moodle course, choosing to extract it to create a Folder Confirm that the resulting Folder does not contain a __MACOSX folder or any .DS_Store files Drag and drop lib/filestorage/tests/fixtures/test_thumbsdb.zip into a Moodle course, choosing to extract it to create a Folder Confirm that the resulting Folder does not contain a Thumbs.db file If you have access to a Mac and/or a computer running WinXP, you can also create and test your own zip files. WinXP requires that thumbnail caching be enabled, and you can then trigger creation of Thumbs.db files by opening a folder containing images (JPG works, and some other formats may too). OSX is a bit trickier; you need to open a folder in Finder, move the Finder window around and resize it, change the view mode and then hold your tongue in the right position while you zip the folder up ("Compress" from the context menu).
    • Affected Branches:
      MOODLE_24_STABLE, MOODLE_25_STABLE
    • Fixed Branches:
      MOODLE_26_STABLE
    • Pull from Repository:
    • Pull 2.4 Branch:
    • Pull 2.5 Branch:
    • Pull Master Branch:
      MDL-39339_master
    • Rank:
      49967

      Description

      The zip_archive class should ignore OS files like

      • Thumbs.db (Windows)
      • .DS_Store (Mac OS)
      • __MACOSX (Mac OS)
      • Possibly others, but nobody seems to know of any right now (and no good list has been found)

      as typical users do not see these files or know that they exist. Their presence in uploaded zip files may cause issues in some automated processing situations, or may cause confusion for users who do not know what they are or where they have come from.

      This will be beneficial in places such as:

      • Folder drag/drop (drop a zip in and extract to a Folder resource)
      • Uploading and extracting zip files into file managers / file pickers
      • Bulk-uploading feedback files for assignments (particularly useful in conjunction with MDL-39359 - OSX tends to put its files in the root of the zip, which can mess up the checks done there)
      • Anywhere else that zip_archive is used

        Issue Links

          Activity

          Hide
          Marina Glancy added a comment -

          This issue was assigned to me automatically, however I will not be able to work on this issue in the immediate future. In order to create a truer sense of the state of this issue and to allow other developers to have chance to become involved, I am removing myself as the assignee of this issue.

          For more information, see http://docs.moodle.org/dev/Changes_to_issue_assignment

          Show
          Marina Glancy added a comment - This issue was assigned to me automatically, however I will not be able to work on this issue in the immediate future. In order to create a truer sense of the state of this issue and to allow other developers to have chance to become involved, I am removing myself as the assignee of this issue. For more information, see http://docs.moodle.org/dev/Changes_to_issue_assignment
          Hide
          Paul Nicholls added a comment -

          Shifting development to this ticket from MDL-39666, at Michael de Raadt's request.

          Show
          Paul Nicholls added a comment - Shifting development to this ticket from MDL-39666 , at Michael de Raadt 's request.
          Hide
          Paul Nicholls added a comment -

          The OSX system files can cause confusion when MDL-39359 is implemented - because it sees more than just one folder in the zip, it assumes that the feedback files are in the root of the zip (and therefore finds no matching files). I'd therefore recommend not integrating MDL-39359 without first integrating this (MDL-39339).

          Show
          Paul Nicholls added a comment - The OSX system files can cause confusion when MDL-39359 is implemented - because it sees more than just one folder in the zip, it assumes that the feedback files are in the root of the zip (and therefore finds no matching files). I'd therefore recommend not integrating MDL-39359 without first integrating this ( MDL-39339 ).
          Hide
          Tim Hunt added a comment -

          I don't get the bit about UTF8-normalisation. Is that something in the existing code, not something changed by this patch?

          Show
          Tim Hunt added a comment - I don't get the bit about UTF8-normalisation. Is that something in the existing code, not something changed by this patch?
          Hide
          Paul Nicholls added a comment -

          Yes, it's existing code (I just added another test case which happens to fall under its purview). My understanding is that OSX uses metadata to store unicode filenames in a non-standard way (and infozip presumably does something similar, given that it's bundled into the INTL check with the OSX test cases), and the INTL extension is therefore required to normalise them. It'll still extract without INTL, but the filenames will be incorrect if they contain unicode characters - so the test cases would fail (as they match filenames against an array of expected ones), and are therefore only run if you have INTL and are therefore equipped to extract them properly.

          Show
          Paul Nicholls added a comment - Yes, it's existing code (I just added another test case which happens to fall under its purview). My understanding is that OSX uses metadata to store unicode filenames in a non-standard way (and infozip presumably does something similar, given that it's bundled into the INTL check with the OSX test cases), and the INTL extension is therefore required to normalise them. It'll still extract without INTL, but the filenames will be incorrect if they contain unicode characters - so the test cases would fail (as they match filenames against an array of expected ones), and are therefore only run if you have INTL and are therefore equipped to extract them properly.
          Hide
          Tim Hunt added a comment -

          Thanks for the explanation. This change now makes sense to me.

          However, I am not an expert on this part of the code, so I would feel happier if someone else could peer review this as well as me.

          Show
          Tim Hunt added a comment - Thanks for the explanation. This change now makes sense to me. However, I am not an expert on this part of the code, so I would feel happier if someone else could peer review this as well as me.
          Hide
          Petr Škoda added a comment -

          Hi, I am not sure it is ok to unconditionally remove all these extra files, the patch looks ok otherwise. +0.33333333

          Show
          Petr Škoda added a comment - Hi, I am not sure it is ok to unconditionally remove all these extra files, the patch looks ok otherwise. +0.33333333
          Hide
          Petr Škoda added a comment -

          I suppose this should be master only. I suppose you need to get a few more +1 to push this through the integration.

          Anyway thanks for the report and patch.

          Show
          Petr Škoda added a comment - I suppose this should be master only. I suppose you need to get a few more +1 to push this through the integration. Anyway thanks for the report and patch.
          Hide
          Paul Nicholls added a comment -

          Hi Petr,
          I've made it check pretty thoroughly that Thumbs.db is actually a Windows thumbnail cache (it would be possible to go a little further, as there's a flag buried deeper into the file that apparently flags it as specifically being a thumbnail cache rather than some other kind of OLE Compound File, but I can't find any suggestion online that there is any other software which will create OLE Compound Files with the .db extension - so it's extremely unlikely that somebody will have a legitimate OLE Compound File called Thumbs.db. There is no valid reason to upload a thumbnail cache to Moodle, as any Windows system which makes use of such caches will generate the cache for itself if it's missing. In fact, the Windows thumbnail cache files can contain references to (and thumbnails of) files which have been deleted or otherwise removed from the folder in which they were generated, so stripping them out could be considered a security/privacy measure.

          As for the OSX files, .DS_Store files may get ignored by some parts of Moodle already (I recently had a case, albeit on our 1.9 instance, where a student uploaded a file for an assignment which had a name starting with a dot; it was completely inaccessible from the assignment grading interface until I removed the dot). These files are used to store custom folder attributes "such as the position of icons or the choice of a background image" (see Wikipedia) - so are about as useless as Windows thumbnail cache files. It's also incredibly unlikely that anyone would have a legitimate file called .DS_Store. That leaves the __MACOSX folder, which is the only contentious one in my mind - but even so, though it's quite possible for somebody to create a legitimate directory called __MACOSX, I think that it's sufficiently unlikely that there will be a legitimate directory of that name at the top level of a zip file that we can safely ditch it. If you have any suggestions as to what else we can do to increase confidence that it's an unwanted OSX system folder (that won't slow it down substantially), I'm all ears.

          As for making this master-only, that certainly fits with its current classification as an improvement - but it could be seen as a bug instead (due to the unexpected/unwanted effects that these system files can have), and therefore integrated into other supported branches; if somebody really wanted it to be backported even further, you could use the security/privacy issue I mentioned with Thumbs.db files to classify this as a security issue...

          Show
          Paul Nicholls added a comment - Hi Petr, I've made it check pretty thoroughly that Thumbs.db is actually a Windows thumbnail cache (it would be possible to go a little further, as there's a flag buried deeper into the file that apparently flags it as specifically being a thumbnail cache rather than some other kind of OLE Compound File, but I can't find any suggestion online that there is any other software which will create OLE Compound Files with the .db extension - so it's extremely unlikely that somebody will have a legitimate OLE Compound File called Thumbs.db. There is no valid reason to upload a thumbnail cache to Moodle, as any Windows system which makes use of such caches will generate the cache for itself if it's missing. In fact, the Windows thumbnail cache files can contain references to (and thumbnails of) files which have been deleted or otherwise removed from the folder in which they were generated, so stripping them out could be considered a security/privacy measure. As for the OSX files, .DS_Store files may get ignored by some parts of Moodle already (I recently had a case, albeit on our 1.9 instance, where a student uploaded a file for an assignment which had a name starting with a dot; it was completely inaccessible from the assignment grading interface until I removed the dot). These files are used to store custom folder attributes "such as the position of icons or the choice of a background image" (see Wikipedia ) - so are about as useless as Windows thumbnail cache files. It's also incredibly unlikely that anyone would have a legitimate file called .DS_Store. That leaves the __MACOSX folder, which is the only contentious one in my mind - but even so, though it's quite possible for somebody to create a legitimate directory called __MACOSX, I think that it's sufficiently unlikely that there will be a legitimate directory of that name at the top level of a zip file that we can safely ditch it. If you have any suggestions as to what else we can do to increase confidence that it's an unwanted OSX system folder (that won't slow it down substantially), I'm all ears. As for making this master-only, that certainly fits with its current classification as an improvement - but it could be seen as a bug instead (due to the unexpected/unwanted effects that these system files can have), and therefore integrated into other supported branches; if somebody really wanted it to be backported even further, you could use the security/privacy issue I mentioned with Thumbs.db files to classify this as a security issue...
          Hide
          Petr Škoda added a comment -

          Thanks a lot for the explanation, the backports of new features are now handled as separate issues. My +2x0.3333333

          Show
          Petr Škoda added a comment - Thanks a lot for the explanation, the backports of new features are now handled as separate issues. My +2x0.3333333
          Hide
          Tim Hunt added a comment -

          I think the concept is sound, so I will add the remaining 0.3333334 () necessary to get this integrated into master, and then later back-ported.

          Show
          Tim Hunt added a comment - I think the concept is sound, so I will add the remaining 0.3333334 ( ) necessary to get this integrated into master, and then later back-ported.
          Hide
          Paul Nicholls added a comment -

          If we now have (at least) a full +1, can somebody with the power to do so please put it up for integration?

          Please let me know if I need to rebase.

          Show
          Paul Nicholls added a comment - If we now have (at least) a full +1, can somebody with the power to do so please put it up for integration? Please let me know if I need to rebase.
          Hide
          Tim Hunt added a comment -

          Submitting for integration as requested.

          Show
          Tim Hunt added a comment - Submitting for integration as requested.
          Hide
          Dan Poltawski added a comment -

          I've added api_change to this issue, mostly so that it gets noted in the release notes, as its a nice change.

          Show
          Dan Poltawski added a comment - I've added api_change to this issue, mostly so that it gets noted in the release notes, as its a nice change.
          Hide
          Dan Poltawski added a comment -

          Thanks Paul, I really like this change and i've integrated it to master only.

          I don't regard this as a bugfix and so if you would like it backporting, please follow the process for requesting a non-bugfix backport: http://docs.moodle.org/dev/Integration_Review#Process_for_requesting_a_non_bug-fix_backport

          I would not be in favour of back-porting this issue because I want to give our users some incentives to upgrade :-P . ^HHHHH I mean....I think its a relatively minor issue and I think it good if we don't change subtleties like this within stable branch upgrades.

          Show
          Dan Poltawski added a comment - Thanks Paul, I really like this change and i've integrated it to master only. I don't regard this as a bugfix and so if you would like it backporting, please follow the process for requesting a non-bugfix backport: http://docs.moodle.org/dev/Integration_Review#Process_for_requesting_a_non_bug-fix_backport I would not be in favour of back-porting this issue because I want to give our users some incentives to upgrade :-P . ^HHHHH I mean....I think its a relatively minor issue and I think it good if we don't change subtleties like this within stable branch upgrades.
          Hide
          Paul Nicholls added a comment -

          Thanks, Dan - I'm not terribly concerned about backporting, as I already have it on our branch
          If somebody else thinks that it ought to be backported, I'll leave it to them to request it...

          Show
          Paul Nicholls added a comment - Thanks, Dan - I'm not terribly concerned about backporting, as I already have it on our branch If somebody else thinks that it ought to be backported, I'll leave it to them to request it...
          Hide
          Jason Fowler added a comment -

          Works fine Paul, thank you

          Show
          Jason Fowler added a comment - Works fine Paul, thank you
          Hide
          Dan Poltawski added a comment -

          Thanks for your contributions!

          _main:
          @ BB#0:
                  push    {r7, lr}
                  mov     r7, sp
                  sub     sp, #4
                  movw    r0, :lower16:(L_.str-(LPC0_0+4))
                  movt    r0, :upper16:(L_.str-(LPC0_0+4))
          LPC0_0:
                  add     r0, pc
                  bl      _printf
                  movs    r1, #0
                  movt    r1, #0
                  str     r0, [sp]                @ 4-byte Spill
                  mov     r0, r1
                  add     sp, #4
                  pop     {r7, pc}
          
                  .section        __TEXT,__cstring,cstring_literals
          L_.str:                                 @ @.str
                  .asciz   "This code is now upstream!"
          
          Show
          Dan Poltawski added a comment - Thanks for your contributions! _main: @ BB#0: push {r7, lr} mov r7, sp sub sp, #4 movw r0, :lower16:(L_.str-(LPC0_0+4)) movt r0, :upper16:(L_.str-(LPC0_0+4)) LPC0_0: add r0, pc bl _printf movs r1, #0 movt r1, #0 str r0, [sp] @ 4- byte Spill mov r0, r1 add sp, #4 pop {r7, pc} .section __TEXT,__cstring,cstring_literals L_.str: @ @.str .asciz "This code is now upstream!"
          Hide
          Petr Škoda added a comment - - edited

          Oh, it looks like nobody tested this on a 32-bit OS - the unpack of 8byte number fails badly there. The Thumbs.db hack needs to be rewritten. I will fix that as part of MDL-40900.

          Show
          Petr Škoda added a comment - - edited Oh, it looks like nobody tested this on a 32-bit OS - the unpack of 8byte number fails badly there. The Thumbs.db hack needs to be rewritten. I will fix that as part of MDL-40900 .

            People

            • Votes:
              8 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: