Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-37407

Windows Filenames in zip files disappear or turn to gibberish when unzipped in the Moodle bases on Linux

    Details

    • Testing Instructions:
      Hide

      1/ if you understand Chinese switch to zh_cn and unzip the attached 16.zip in Moodle
      2/ alternatively open some filepicker and upload the 16.zip file, switch to other browser tab and select to zh_cn there, then switch back and extract the file (the point is to keep filepicker UI in English, but the ajax call in Chinese)

      Show
      1/ if you understand Chinese switch to zh_cn and unzip the attached 16.zip in Moodle 2/ alternatively open some filepicker and upload the 16.zip file, switch to other browser tab and select to zh_cn there, then switch back and extract the file (the point is to keep filepicker UI in English, but the ajax call in Chinese)
    • Affected Branches:
      MOODLE_23_STABLE, MOODLE_24_STABLE
    • Fixed Branches:
      MOODLE_24_STABLE
    • Pull from Repository:
    • Pull Master Branch:
      w04_MDL-37407_m25_chineseunzip

      Description

      Unzip a .zip file, which is created under Microsoft Windows, the double-byte character filename will disapper or turn to gibberish! See the screehshots.
      This issue MDL-33068 has not fixed the problem.
      More information:
      Both of the files in the zip, and the zip file are created under MS Windows 7.
      The Moodle 2.4 is base on CentOS


      The problem was that attached file is not compatible with unicode and Moodle did not contain yet the necessary heuristics to detect encoding from Chinese selected as current language in Moodle.

        Gliffy Diagrams

          Attachments

          1. 16.zip
            0.3 kB
          2. a.zip
            43 kB
          3. fixed_unzip.png
            fixed_unzip.png
            38 kB
          4. screenshot-1.jpg
            screenshot-1.jpg
            39 kB
          5. screenshot-2.jpg
            screenshot-2.jpg
            29 kB

            Issue Links

              Activity

              Hide
              xaero xaero added a comment - - edited

              【16.zip】in the attachments is an example zip file created under MS Windows by WinRAR

              Show
              xaero xaero added a comment - - edited 【16.zip】in the attachments is an example zip file created under MS Windows by WinRAR
              Hide
              xaero xaero added a comment -

              MDL-33068 hasn't fixed the problem

              Show
              xaero xaero added a comment - MDL-33068 hasn't fixed the problem
              Hide
              xaero xaero added a comment - - edited

              【screenshot-1】zip file

              Show
              xaero xaero added a comment - - edited 【screenshot-1】zip file
              Hide
              xaero xaero added a comment - - edited

              【screenshot-2】unzip it, and the Chinese character disappear

              Show
              xaero xaero added a comment - - edited 【screenshot-2】unzip it, and the Chinese character disappear
              Hide
              tsala Helen Foster added a comment -

              Setting priority according to http://docs.moodle.org/dev/Tracker_guide

              Assigning to Marina as filepicker component lead and MDL-33068 assignee.

              Show
              tsala Helen Foster added a comment - Setting priority according to http://docs.moodle.org/dev/Tracker_guide Assigning to Marina as filepicker component lead and MDL-33068 assignee.
              Hide
              marina Marina Glancy added a comment -

              Fred, can you please take a look at it and say how bad/urgent it is? I think you know much more than me about zips

              Show
              marina Marina Glancy added a comment - Fred, can you please take a look at it and say how bad/urgent it is? I think you know much more than me about zips
              Hide
              fred Frédéric Massart added a comment -

              Marina, the things I worked on with Zip files are not related to their encodings. I think Petr is the expert in that area (MDL-24928). From what I have seen, even opening the file in Ubuntu's file manager does not display the correct file names. Also, these are the results from zip/7-zip CLI tools:

              fred@fred:~/Downloads$ 7z l 16.zip 
               
              7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
              p7zip Version 9.20 (locale=en_AU.UTF-8,Utf16=on,HugeFiles=on,4 CPUs)
               
              Listing archive: 16.zip
               
              --
              Path = 16.zip
              Type = zip
              Physical Size = 283
               
                 Date      Time    Attr         Size   Compressed  Name
              ------------------- ----- ------------ ------------  ------------------------
              2013-01-07 15:46:30 ....A           13           13  12Öìæºâù_185_kongfu_bg
              2013-01-07 15:46:20 ....A            8            8  12Öìè´Ï¼_164_kongfu_bg
              ------------------- ----- ------------ ------------  ------------------------
                                                  21           21  2 files, 0 folders
              fred@fred:~/Downloads$ zipinfo 16.zip 
              Archive:  16.zip
              Zip file size: 283 bytes, number of entries: 2
              -rw-a--     2.0 fat       13 b- stor 13-Jan-07 15:46 12???????????????_185_kongfu_bg
              -rw-a--     2.0 fat        8 b- stor 13-Jan-07 15:46 12????????????????_164_kongfu_bg
              2 files, 21 bytes uncompressed, 21 bytes compressed:  0.0%

              (Added Petr as watcher)

              Show
              fred Frédéric Massart added a comment - Marina, the things I worked on with Zip files are not related to their encodings. I think Petr is the expert in that area ( MDL-24928 ). From what I have seen, even opening the file in Ubuntu's file manager does not display the correct file names. Also, these are the results from zip/7-zip CLI tools: fred@fred:~/Downloads$ 7z l 16.zip   7-Zip [64] 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18 p7zip Version 9.20 (locale=en_AU.UTF-8,Utf16=on,HugeFiles=on,4 CPUs)   Listing archive: 16.zip   -- Path = 16.zip Type = zip Physical Size = 283   Date Time Attr Size Compressed Name ------------------- ----- ------------ ------------ ------------------------ 2013-01-07 15:46:30 ....A 13 13 12Öìæºâù_185_kongfu_bg 2013-01-07 15:46:20 ....A 8 8 12Öìè´Ï¼_164_kongfu_bg ------------------- ----- ------------ ------------ ------------------------ 21 21 2 files, 0 folders fred@fred:~/Downloads$ zipinfo 16.zip Archive: 16.zip Zip file size: 283 bytes, number of entries: 2 -rw-a-- 2.0 fat 13 b- stor 13-Jan-07 15:46 12???????????????_185_kongfu_bg -rw-a-- 2.0 fat 8 b- stor 13-Jan-07 15:46 12????????????????_164_kongfu_bg 2 files, 21 bytes uncompressed, 21 bytes compressed: 0.0% (Added Petr as watcher)
              Hide
              xaero xaero added a comment - - edited

              Yes, the files in the zip were created under MS Windows 7 by WinRAR, in a default way.
              I think MS Windows files, with double-byte character filename, would aways display incorrectly under Linux, as the filenames in MS Windows are encoded as GBK, not UTF

              Show
              xaero xaero added a comment - - edited Yes, the files in the zip were created under MS Windows 7 by WinRAR, in a default way. I think MS Windows files, with double-byte character filename, would aways display incorrectly under Linux, as the filenames in MS Windows are encoded as GBK, not UTF
              Hide
              skodak Petr Skoda added a comment -

              The attached zip archive is not compatible with unicode, you need to use some other more standards compliant software, sorry.

              in moodle there is a tool that tells you if the zip file is ok, it is in lib/filestorage/tests/fixtures/zip_info.php

              skodak:fixtures skodak$ php zip_info.php 16.zip 
              Archive:         16.zip
              Number of files: 2
              Archive comment: "" (0 bytes)
              ======== File 1 ==============================================
                Name:           "12?????_185_kongfu_bg" (22 bytes) - CRC 3214817444
                Version:        0x0014
                Required:       0x000a
                Method:         0x0000 (Stored)
                General:        0000000000000000
                Modified:       Monday, 7 January 2013, 3:46 pm
                Size:           13 ==> 13 bytes
                CRC-32:         3523151500
              ======== File 2 ==============================================
                Name:           "12???ϼ_164_kongfu_bg" (22 bytes) - CRC 673260215
                Version:        0x0014
                Required:       0x000a
                Method:         0x0000 (Stored)
                General:        0000000000000000
                Modified:       Monday, 7 January 2013, 3:46 pm
                Size:           8 ==> 8 bytes
                CRC-32:         1714574663
              skodak:fixtures skodak$ 

              if there is no "(Unicode name)" flag it might work only if you set the same language in Moodle as there was in Windows.

              I am going to add hack that tries to detect the simplified chinese encoding

              Show
              skodak Petr Skoda added a comment - The attached zip archive is not compatible with unicode, you need to use some other more standards compliant software, sorry. in moodle there is a tool that tells you if the zip file is ok, it is in lib/filestorage/tests/fixtures/zip_info.php skodak:fixtures skodak$ php zip_info.php 16.zip Archive: 16.zip Number of files: 2 Archive comment: "" (0 bytes) ======== File 1 ============================================== Name: "12?????_185_kongfu_bg" (22 bytes) - CRC 3214817444 Version: 0x0014 Required: 0x000a Method: 0x0000 (Stored) General: 0000000000000000 Modified: Monday, 7 January 2013, 3:46 pm Size: 13 ==> 13 bytes CRC-32: 3523151500 ======== File 2 ============================================== Name: "12???ϼ_164_kongfu_bg" (22 bytes) - CRC 673260215 Version: 0x0014 Required: 0x000a Method: 0x0000 (Stored) General: 0000000000000000 Modified: Monday, 7 January 2013, 3:46 pm Size: 8 ==> 8 bytes CRC-32: 1714574663 skodak:fixtures skodak$ if there is no "(Unicode name)" flag it might work only if you set the same language in Moodle as there was in Windows. I am going to add hack that tries to detect the simplified chinese encoding
              Hide
              skodak Petr Skoda added a comment -

              Thanks for the report.

              Show
              skodak Petr Skoda added a comment - Thanks for the report.
              Hide
              poltawski Dan Poltawski added a comment -

              Please can you clarify if this should be fixed in 2.3 or not.

              Show
              poltawski Dan Poltawski added a comment - Please can you clarify if this should be fixed in 2.3 or not.
              Hide
              cibot CiBoT added a comment -

              Moving this reopened issue out from current integration. Please, re-submit it for integration once ready.

              Show
              cibot CiBoT added a comment - Moving this reopened issue out from current integration. Please, re-submit it for integration once ready.
              Hide
              skodak Petr Skoda added a comment - - edited

              oh, there is no utf-8 zip support in 2.3, there is no way to backport this

              Show
              skodak Petr Skoda added a comment - - edited oh, there is no utf-8 zip support in 2.3, there is no way to backport this
              Hide
              stronk7 Eloy Lafuente (stronk7) added a comment -

              (ops, sorry for the noise, I mixed a bunch of issues here)

              Show
              stronk7 Eloy Lafuente (stronk7) added a comment - (ops, sorry for the noise, I mixed a bunch of issues here)
              Hide
              poltawski Dan Poltawski added a comment -

              The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week.

              TIA and ciao

              Show
              poltawski Dan Poltawski added a comment - The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week. TIA and ciao
              Hide
              poltawski Dan Poltawski added a comment - - edited

              Hi Petr,

              Looking at this code it is not clear to me that what you are doing is intended, i.e. in the case of an empty localewincharset (e.g. english) setting it to an empty string and not oldcharset (ISO-8859-1) in english.

              Is it right that we should be setting this to an empty string? Should you be testing if localewincharset is empty before setting $encoding to it?

              Show
              poltawski Dan Poltawski added a comment - - edited Hi Petr, Looking at this code it is not clear to me that what you are doing is intended, i.e. in the case of an empty localewincharset (e.g. english) setting it to an empty string and not oldcharset (ISO-8859-1) in english. Is it right that we should be setting this to an empty string? Should you be testing if localewincharset is empty before setting $encoding to it?
              Hide
              xaero xaero added a comment - - edited

              Thanks all of you!
              It's truely filename encoding of zip problem! With the software (WinZip 17.0) , I zipped some files in my MS Windows, It unzipped correctly in the Moodle!
              The [a.zip] in the attachment is created by WinZip 17.0

              Show
              xaero xaero added a comment - - edited Thanks all of you! It's truely filename encoding of zip problem! With the software (WinZip 17.0) , I zipped some files in my MS Windows, It unzipped correctly in the Moodle! The [a.zip] in the attachment is created by WinZip 17.0
              Hide
              poltawski Dan Poltawski added a comment -

              [Ping Petr]

              Show
              poltawski Dan Poltawski added a comment - [Ping Petr]
              Hide
              poltawski Dan Poltawski added a comment -

              The integration of this issue has been delayed to next week because the integration period is over (Monday, Tuesday) and testing must happen on Wednesday.

              This change to a more rigid timeframe on each integration/testing cycle aims to produce a better and clear separation and organization of tasks for everybody.

              This is a bulk-automated message, so if you want to blame somebody/thing/where, don't do it here (use git instead)

              Show
              poltawski Dan Poltawski added a comment - The integration of this issue has been delayed to next week because the integration period is over (Monday, Tuesday) and testing must happen on Wednesday. This change to a more rigid timeframe on each integration/testing cycle aims to produce a better and clear separation and organization of tasks for everybody. This is a bulk-automated message, so if you want to blame somebody/thing/where, don't do it here (use git instead)
              Hide
              cibot CiBoT added a comment -

              Moving this reopened issue out from current integration. Please, re-submit it for integration once ready.

              Show
              cibot CiBoT added a comment - Moving this reopened issue out from current integration. Please, re-submit it for integration once ready.
              Hide
              skodak Petr Skoda added a comment -

              Thanks Dan, I have updated the patch to skip for empty charsets. I looked at other languages that do not use iso charsets and it seems the only futureproof solution might be to add new zipcharset to all language packs and let translators add proper encoding (that would be 2.5dev only).

              Show
              skodak Petr Skoda added a comment - Thanks Dan, I have updated the patch to skip for empty charsets. I looked at other languages that do not use iso charsets and it seems the only futureproof solution might be to add new zipcharset to all language packs and let translators add proper encoding (that would be 2.5dev only).
              Hide
              samhemelryk Sam Hemelryk added a comment -

              Thanks Petr - this has been integrated now.

              Show
              samhemelryk Sam Hemelryk added a comment - Thanks Petr - this has been integrated now.
              Hide
              andyjdavis Andrew Davis added a comment -

              Seems to be working fine. Passing.

              Show
              andyjdavis Andrew Davis added a comment - Seems to be working fine. Passing.
              Hide
              stronk7 Eloy Lafuente (stronk7) added a comment -

              A brilliant future is awaiting us out there, better with your code. Let's look towards the future together, this is now closed.

              (and won't be revisiting it unless some regression is found)

              Thanks and ciao

              Show
              stronk7 Eloy Lafuente (stronk7) added a comment - A brilliant future is awaiting us out there, better with your code. Let's look towards the future together, this is now closed. (and won't be revisiting it unless some regression is found) Thanks and ciao

                People

                • Votes:
                  0 Vote for this issue
                  Watchers:
                  10 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:
                    Fix Release Date:
                    11/Mar/13