Moodle
  1. Moodle
  2. MDL-37407

Windows Filenames in zip files disappear or turn to gibberish when unzipped in the Moodle bases on Linux

    Details

    • Rank:
      47031

      Description

      Unzip a .zip file, which is created under Microsoft Windows, the double-byte character filename will disapper or turn to gibberish! See the screehshots.
      This issue MDL-33068 has not fixed the problem.
      More information:
      Both of the files in the zip, and the zip file are created under MS Windows 7.
      The Moodle 2.4 is base on CentOS


      The problem was that attached file is not compatible with unicode and Moodle did not contain yet the necessary heuristics to detect encoding from Chinese selected as current language in Moodle.

        Issue Links

          Activity

          Hide
          xaero added a comment - - edited

          【16.zip】in the attachments is an example zip file created under MS Windows by WinRAR

          Show
          xaero added a comment - - edited 【16.zip】in the attachments is an example zip file created under MS Windows by WinRAR
          Hide
          xaero added a comment -

          MDL-33068 hasn't fixed the problem

          Show
          xaero added a comment - MDL-33068 hasn't fixed the problem
          Hide
          xaero added a comment - - edited

          【screenshot-1】zip file

          Show
          xaero added a comment - - edited 【screenshot-1】zip file
          Hide
          xaero added a comment - - edited

          【screenshot-2】unzip it, and the Chinese character disappear

          Show
          xaero added a comment - - edited 【screenshot-2】unzip it, and the Chinese character disappear
          Hide
          Helen Foster added a comment -

          Setting priority according to http://docs.moodle.org/dev/Tracker_guide

          Assigning to Marina as filepicker component lead and MDL-33068 assignee.

          Show
          Helen Foster added a comment - Setting priority according to http://docs.moodle.org/dev/Tracker_guide Assigning to Marina as filepicker component lead and MDL-33068 assignee.
          Hide
          Marina Glancy added a comment -

          Fred, can you please take a look at it and say how bad/urgent it is? I think you know much more than me about zips

          Show
          Marina Glancy added a comment - Fred, can you please take a look at it and say how bad/urgent it is? I think you know much more than me about zips
          Hide
          Frédéric Massart added a comment -

          Marina, the things I worked on with Zip files are not related to their encodings. I think Petr is the expert in that area (MDL-24928). From what I have seen, even opening the file in Ubuntu's file manager does not display the correct file names. Also, these are the results from zip/7-zip CLI tools:

          fred@fred:~/Downloads$ 7z l 16.zip 
          
          7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
          p7zip Version 9.20 (locale=en_AU.UTF-8,Utf16=on,HugeFiles=on,4 CPUs)
          
          Listing archive: 16.zip
          
          --
          Path = 16.zip
          Type = zip
          Physical Size = 283
          
             Date      Time    Attr         Size   Compressed  Name
          ------------------- ----- ------------ ------------  ------------------------
          2013-01-07 15:46:30 ....A           13           13  12Öìæºâù_185_kongfu_bg
          2013-01-07 15:46:20 ....A            8            8  12Öìè´Ï¼_164_kongfu_bg
          ------------------- ----- ------------ ------------  ------------------------
                                              21           21  2 files, 0 folders
          fred@fred:~/Downloads$ zipinfo 16.zip 
          Archive:  16.zip
          Zip file size: 283 bytes, number of entries: 2
          -rw-a--     2.0 fat       13 b- stor 13-Jan-07 15:46 12???????????????_185_kongfu_bg
          -rw-a--     2.0 fat        8 b- stor 13-Jan-07 15:46 12????????????????_164_kongfu_bg
          2 files, 21 bytes uncompressed, 21 bytes compressed:  0.0%
          

          (Added Petr as watcher)

          Show
          Frédéric Massart added a comment - Marina, the things I worked on with Zip files are not related to their encodings. I think Petr is the expert in that area ( MDL-24928 ). From what I have seen, even opening the file in Ubuntu's file manager does not display the correct file names. Also, these are the results from zip/7-zip CLI tools: fred@fred:~/Downloads$ 7z l 16.zip 7-Zip [64] 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18 p7zip Version 9.20 (locale=en_AU.UTF-8,Utf16=on,HugeFiles=on,4 CPUs) Listing archive: 16.zip -- Path = 16.zip Type = zip Physical Size = 283 Date Time Attr Size Compressed Name ------------------- ----- ------------ ------------ ------------------------ 2013-01-07 15:46:30 ....A 13 13 12Öìæºâù_185_kongfu_bg 2013-01-07 15:46:20 ....A 8 8 12Öìè´Ï¼_164_kongfu_bg ------------------- ----- ------------ ------------ ------------------------ 21 21 2 files, 0 folders fred@fred:~/Downloads$ zipinfo 16.zip Archive: 16.zip Zip file size: 283 bytes, number of entries: 2 -rw-a-- 2.0 fat 13 b- stor 13-Jan-07 15:46 12???????????????_185_kongfu_bg -rw-a-- 2.0 fat 8 b- stor 13-Jan-07 15:46 12????????????????_164_kongfu_bg 2 files, 21 bytes uncompressed, 21 bytes compressed: 0.0% (Added Petr as watcher)
          Hide
          xaero added a comment - - edited

          Yes, the files in the zip were created under MS Windows 7 by WinRAR, in a default way.
          I think MS Windows files, with double-byte character filename, would aways display incorrectly under Linux, as the filenames in MS Windows are encoded as GBK, not UTF

          Show
          xaero added a comment - - edited Yes, the files in the zip were created under MS Windows 7 by WinRAR, in a default way. I think MS Windows files, with double-byte character filename, would aways display incorrectly under Linux, as the filenames in MS Windows are encoded as GBK, not UTF
          Hide
          Petr Škoda added a comment -

          The attached zip archive is not compatible with unicode, you need to use some other more standards compliant software, sorry.

          in moodle there is a tool that tells you if the zip file is ok, it is in lib/filestorage/tests/fixtures/zip_info.php

          skodak:fixtures skodak$ php zip_info.php 16.zip 
          Archive:         16.zip
          Number of files: 2
          Archive comment: "" (0 bytes)
          ======== File 1 ==============================================
            Name:           "12?????_185_kongfu_bg" (22 bytes) - CRC 3214817444
            Version:        0x0014
            Required:       0x000a
            Method:         0x0000 (Stored)
            General:        0000000000000000
            Modified:       Monday, 7 January 2013, 3:46 pm
            Size:           13 ==> 13 bytes
            CRC-32:         3523151500
          ======== File 2 ==============================================
            Name:           "12???ϼ_164_kongfu_bg" (22 bytes) - CRC 673260215
            Version:        0x0014
            Required:       0x000a
            Method:         0x0000 (Stored)
            General:        0000000000000000
            Modified:       Monday, 7 January 2013, 3:46 pm
            Size:           8 ==> 8 bytes
            CRC-32:         1714574663
          skodak:fixtures skodak$ 
          

          if there is no "(Unicode name)" flag it might work only if you set the same language in Moodle as there was in Windows.

          I am going to add hack that tries to detect the simplified chinese encoding

          Show
          Petr Škoda added a comment - The attached zip archive is not compatible with unicode, you need to use some other more standards compliant software, sorry. in moodle there is a tool that tells you if the zip file is ok, it is in lib/filestorage/tests/fixtures/zip_info.php skodak:fixtures skodak$ php zip_info.php 16.zip Archive: 16.zip Number of files: 2 Archive comment: "" (0 bytes) ======== File 1 ============================================== Name: "12?????_185_kongfu_bg" (22 bytes) - CRC 3214817444 Version: 0x0014 Required: 0x000a Method: 0x0000 (Stored) General: 0000000000000000 Modified: Monday, 7 January 2013, 3:46 pm Size: 13 ==> 13 bytes CRC-32: 3523151500 ======== File 2 ============================================== Name: "12???ϼ_164_kongfu_bg" (22 bytes) - CRC 673260215 Version: 0x0014 Required: 0x000a Method: 0x0000 (Stored) General: 0000000000000000 Modified: Monday, 7 January 2013, 3:46 pm Size: 8 ==> 8 bytes CRC-32: 1714574663 skodak:fixtures skodak$ if there is no "(Unicode name)" flag it might work only if you set the same language in Moodle as there was in Windows. I am going to add hack that tries to detect the simplified chinese encoding
          Hide
          Petr Škoda added a comment -

          Thanks for the report.

          Show
          Petr Škoda added a comment - Thanks for the report.
          Hide
          Dan Poltawski added a comment -

          Please can you clarify if this should be fixed in 2.3 or not.

          Show
          Dan Poltawski added a comment - Please can you clarify if this should be fixed in 2.3 or not.
          Hide
          CiBoT added a comment -

          Moving this reopened issue out from current integration. Please, re-submit it for integration once ready.

          Show
          CiBoT added a comment - Moving this reopened issue out from current integration. Please, re-submit it for integration once ready.
          Hide
          Petr Škoda added a comment - - edited

          oh, there is no utf-8 zip support in 2.3, there is no way to backport this

          Show
          Petr Škoda added a comment - - edited oh, there is no utf-8 zip support in 2.3, there is no way to backport this
          Hide
          Eloy Lafuente (stronk7) added a comment -

          (ops, sorry for the noise, I mixed a bunch of issues here)

          Show
          Eloy Lafuente (stronk7) added a comment - (ops, sorry for the noise, I mixed a bunch of issues here)
          Hide
          Dan Poltawski added a comment -

          The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week.

          TIA and ciao

          Show
          Dan Poltawski added a comment - The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week. TIA and ciao
          Hide
          Dan Poltawski added a comment - - edited

          Hi Petr,

          Looking at this code it is not clear to me that what you are doing is intended, i.e. in the case of an empty localewincharset (e.g. english) setting it to an empty string and not oldcharset (ISO-8859-1) in english.

          Is it right that we should be setting this to an empty string? Should you be testing if localewincharset is empty before setting $encoding to it?

          Show
          Dan Poltawski added a comment - - edited Hi Petr, Looking at this code it is not clear to me that what you are doing is intended, i.e. in the case of an empty localewincharset (e.g. english) setting it to an empty string and not oldcharset (ISO-8859-1) in english. Is it right that we should be setting this to an empty string? Should you be testing if localewincharset is empty before setting $encoding to it?
          Hide
          xaero added a comment - - edited

          Thanks all of you!
          It's truely filename encoding of zip problem! With the software (WinZip 17.0) , I zipped some files in my MS Windows, It unzipped correctly in the Moodle!
          The [a.zip] in the attachment is created by WinZip 17.0

          Show
          xaero added a comment - - edited Thanks all of you! It's truely filename encoding of zip problem! With the software (WinZip 17.0) , I zipped some files in my MS Windows, It unzipped correctly in the Moodle! The [a.zip] in the attachment is created by WinZip 17.0
          Hide
          Dan Poltawski added a comment -

          [Ping Petr]

          Show
          Dan Poltawski added a comment - [Ping Petr]
          Hide
          Dan Poltawski added a comment -

          The integration of this issue has been delayed to next week because the integration period is over (Monday, Tuesday) and testing must happen on Wednesday.

          This change to a more rigid timeframe on each integration/testing cycle aims to produce a better and clear separation and organization of tasks for everybody.

          This is a bulk-automated message, so if you want to blame somebody/thing/where, don't do it here (use git instead)

          Show
          Dan Poltawski added a comment - The integration of this issue has been delayed to next week because the integration period is over (Monday, Tuesday) and testing must happen on Wednesday. This change to a more rigid timeframe on each integration/testing cycle aims to produce a better and clear separation and organization of tasks for everybody. This is a bulk-automated message, so if you want to blame somebody/thing/where, don't do it here (use git instead)
          Hide
          CiBoT added a comment -

          Moving this reopened issue out from current integration. Please, re-submit it for integration once ready.

          Show
          CiBoT added a comment - Moving this reopened issue out from current integration. Please, re-submit it for integration once ready.
          Hide
          Petr Škoda added a comment -

          Thanks Dan, I have updated the patch to skip for empty charsets. I looked at other languages that do not use iso charsets and it seems the only futureproof solution might be to add new zipcharset to all language packs and let translators add proper encoding (that would be 2.5dev only).

          Show
          Petr Škoda added a comment - Thanks Dan, I have updated the patch to skip for empty charsets. I looked at other languages that do not use iso charsets and it seems the only futureproof solution might be to add new zipcharset to all language packs and let translators add proper encoding (that would be 2.5dev only).
          Hide
          Sam Hemelryk added a comment -

          Thanks Petr - this has been integrated now.

          Show
          Sam Hemelryk added a comment - Thanks Petr - this has been integrated now.
          Hide
          Andrew Davis added a comment -

          Seems to be working fine. Passing.

          Show
          Andrew Davis added a comment - Seems to be working fine. Passing.
          Hide
          Eloy Lafuente (stronk7) added a comment -

          A brilliant future is awaiting us out there, better with your code. Let's look towards the future together, this is now closed.

          (and won't be revisiting it unless some regression is found)

          Thanks and ciao

          Show
          Eloy Lafuente (stronk7) added a comment - A brilliant future is awaiting us out there, better with your code. Let's look towards the future together, this is now closed. (and won't be revisiting it unless some regression is found) Thanks and ciao

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: