Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-37407

Windows Filenames in zip files disappear or turn to gibberish when unzipped in the Moodle bases on Linux

    Details

    • Testing Instructions:
      Hide

      1/ if you understand Chinese switch to zh_cn and unzip the attached 16.zip in Moodle
      2/ alternatively open some filepicker and upload the 16.zip file, switch to other browser tab and select to zh_cn there, then switch back and extract the file (the point is to keep filepicker UI in English, but the ajax call in Chinese)

      Show
      1/ if you understand Chinese switch to zh_cn and unzip the attached 16.zip in Moodle 2/ alternatively open some filepicker and upload the 16.zip file, switch to other browser tab and select to zh_cn there, then switch back and extract the file (the point is to keep filepicker UI in English, but the ajax call in Chinese)
    • Affected Branches:
      MOODLE_23_STABLE, MOODLE_24_STABLE
    • Fixed Branches:
      MOODLE_24_STABLE
    • Pull from Repository:
    • Pull Master Branch:
      w04_MDL-37407_m25_chineseunzip

      Description

      Unzip a .zip file, which is created under Microsoft Windows, the double-byte character filename will disapper or turn to gibberish! See the screehshots.
      This issue MDL-33068 has not fixed the problem.
      More information:
      Both of the files in the zip, and the zip file are created under MS Windows 7.
      The Moodle 2.4 is base on CentOS


      The problem was that attached file is not compatible with unicode and Moodle did not contain yet the necessary heuristics to detect encoding from Chinese selected as current language in Moodle.

        Gliffy Diagrams

        1. fixed_unzip.png
          38 kB
        2. screenshot-1.jpg
          39 kB
        3. screenshot-2.jpg
          29 kB

          Issue Links

            Activity

            Hide
            xaero xaero added a comment - - edited

            【16.zip】in the attachments is an example zip file created under MS Windows by WinRAR

            Show
            xaero xaero added a comment - - edited 【16.zip】in the attachments is an example zip file created under MS Windows by WinRAR
            Hide
            xaero xaero added a comment -

            MDL-33068 hasn't fixed the problem

            Show
            xaero xaero added a comment - MDL-33068 hasn't fixed the problem
            Hide
            xaero xaero added a comment - - edited

            【screenshot-1】zip file

            Show
            xaero xaero added a comment - - edited 【screenshot-1】zip file
            Hide
            xaero xaero added a comment - - edited

            【screenshot-2】unzip it, and the Chinese character disappear

            Show
            xaero xaero added a comment - - edited 【screenshot-2】unzip it, and the Chinese character disappear
            Hide
            tsala Helen Foster added a comment -

            Setting priority according to http://docs.moodle.org/dev/Tracker_guide

            Assigning to Marina as filepicker component lead and MDL-33068 assignee.

            Show
            tsala Helen Foster added a comment - Setting priority according to http://docs.moodle.org/dev/Tracker_guide Assigning to Marina as filepicker component lead and MDL-33068 assignee.
            Hide
            marina Marina Glancy added a comment -

            Fred, can you please take a look at it and say how bad/urgent it is? I think you know much more than me about zips

            Show
            marina Marina Glancy added a comment - Fred, can you please take a look at it and say how bad/urgent it is? I think you know much more than me about zips
            Hide
            fred Frédéric Massart added a comment -

            Marina, the things I worked on with Zip files are not related to their encodings. I think Petr is the expert in that area (MDL-24928). From what I have seen, even opening the file in Ubuntu's file manager does not display the correct file names. Also, these are the results from zip/7-zip CLI tools:

            fred@fred:~/Downloads$ 7z l 16.zip 
             
            7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
            p7zip Version 9.20 (locale=en_AU.UTF-8,Utf16=on,HugeFiles=on,4 CPUs)
             
            Listing archive: 16.zip
             
            --
            Path = 16.zip
            Type = zip
            Physical Size = 283
             
               Date      Time    Attr         Size   Compressed  Name
            ------------------- ----- ------------ ------------  ------------------------
            2013-01-07 15:46:30 ....A           13           13  12Öìæºâù_185_kongfu_bg
            2013-01-07 15:46:20 ....A            8            8  12Öìè´Ï¼_164_kongfu_bg
            ------------------- ----- ------------ ------------  ------------------------
                                                21           21  2 files, 0 folders
            fred@fred:~/Downloads$ zipinfo 16.zip 
            Archive:  16.zip
            Zip file size: 283 bytes, number of entries: 2
            -rw-a--     2.0 fat       13 b- stor 13-Jan-07 15:46 12???????????????_185_kongfu_bg
            -rw-a--     2.0 fat        8 b- stor 13-Jan-07 15:46 12????????????????_164_kongfu_bg
            2 files, 21 bytes uncompressed, 21 bytes compressed:  0.0%

            (Added Petr as watcher)

            Show
            fred Frédéric Massart added a comment - Marina, the things I worked on with Zip files are not related to their encodings. I think Petr is the expert in that area ( MDL-24928 ). From what I have seen, even opening the file in Ubuntu's file manager does not display the correct file names. Also, these are the results from zip/7-zip CLI tools: fred@fred:~/Downloads$ 7z l 16.zip   7-Zip [64] 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18 p7zip Version 9.20 (locale=en_AU.UTF-8,Utf16=on,HugeFiles=on,4 CPUs)   Listing archive: 16.zip   -- Path = 16.zip Type = zip Physical Size = 283   Date Time Attr Size Compressed Name ------------------- ----- ------------ ------------ ------------------------ 2013-01-07 15:46:30 ....A 13 13 12Öìæºâù_185_kongfu_bg 2013-01-07 15:46:20 ....A 8 8 12Öìè´Ï¼_164_kongfu_bg ------------------- ----- ------------ ------------ ------------------------ 21 21 2 files, 0 folders fred@fred:~/Downloads$ zipinfo 16.zip Archive: 16.zip Zip file size: 283 bytes, number of entries: 2 -rw-a-- 2.0 fat 13 b- stor 13-Jan-07 15:46 12???????????????_185_kongfu_bg -rw-a-- 2.0 fat 8 b- stor 13-Jan-07 15:46 12????????????????_164_kongfu_bg 2 files, 21 bytes uncompressed, 21 bytes compressed: 0.0% (Added Petr as watcher)
            Hide
            xaero xaero added a comment - - edited

            Yes, the files in the zip were created under MS Windows 7 by WinRAR, in a default way.
            I think MS Windows files, with double-byte character filename, would aways display incorrectly under Linux, as the filenames in MS Windows are encoded as GBK, not UTF

            Show
            xaero xaero added a comment - - edited Yes, the files in the zip were created under MS Windows 7 by WinRAR, in a default way. I think MS Windows files, with double-byte character filename, would aways display incorrectly under Linux, as the filenames in MS Windows are encoded as GBK, not UTF
            Hide
            skodak Petr Skoda added a comment -

            The attached zip archive is not compatible with unicode, you need to use some other more standards compliant software, sorry.

            in moodle there is a tool that tells you if the zip file is ok, it is in lib/filestorage/tests/fixtures/zip_info.php

            skodak:fixtures skodak$ php zip_info.php 16.zip 
            Archive:         16.zip
            Number of files: 2
            Archive comment: "" (0 bytes)
            ======== File 1 ==============================================
              Name:           "12?????_185_kongfu_bg" (22 bytes) - CRC 3214817444
              Version:        0x0014
              Required:       0x000a
              Method:         0x0000 (Stored)
              General:        0000000000000000
              Modified:       Monday, 7 January 2013, 3:46 pm
              Size:           13 ==> 13 bytes
              CRC-32:         3523151500
            ======== File 2 ==============================================
              Name:           "12???ϼ_164_kongfu_bg" (22 bytes) - CRC 673260215
              Version:        0x0014
              Required:       0x000a
              Method:         0x0000 (Stored)
              General:        0000000000000000
              Modified:       Monday, 7 January 2013, 3:46 pm
              Size:           8 ==> 8 bytes
              CRC-32:         1714574663
            skodak:fixtures skodak$ 

            if there is no "(Unicode name)" flag it might work only if you set the same language in Moodle as there was in Windows.

            I am going to add hack that tries to detect the simplified chinese encoding

            Show
            skodak Petr Skoda added a comment - The attached zip archive is not compatible with unicode, you need to use some other more standards compliant software, sorry. in moodle there is a tool that tells you if the zip file is ok, it is in lib/filestorage/tests/fixtures/zip_info.php skodak:fixtures skodak$ php zip_info.php 16.zip Archive: 16.zip Number of files: 2 Archive comment: "" (0 bytes) ======== File 1 ============================================== Name: "12?????_185_kongfu_bg" (22 bytes) - CRC 3214817444 Version: 0x0014 Required: 0x000a Method: 0x0000 (Stored) General: 0000000000000000 Modified: Monday, 7 January 2013, 3:46 pm Size: 13 ==> 13 bytes CRC-32: 3523151500 ======== File 2 ============================================== Name: "12???ϼ_164_kongfu_bg" (22 bytes) - CRC 673260215 Version: 0x0014 Required: 0x000a Method: 0x0000 (Stored) General: 0000000000000000 Modified: Monday, 7 January 2013, 3:46 pm Size: 8 ==> 8 bytes CRC-32: 1714574663 skodak:fixtures skodak$ if there is no "(Unicode name)" flag it might work only if you set the same language in Moodle as there was in Windows. I am going to add hack that tries to detect the simplified chinese encoding
            Hide
            skodak Petr Skoda added a comment -

            Thanks for the report.

            Show
            skodak Petr Skoda added a comment - Thanks for the report.
            Hide
            poltawski Dan Poltawski added a comment -

            Please can you clarify if this should be fixed in 2.3 or not.

            Show
            poltawski Dan Poltawski added a comment - Please can you clarify if this should be fixed in 2.3 or not.
            Hide
            cibot CiBoT added a comment -

            Moving this reopened issue out from current integration. Please, re-submit it for integration once ready.

            Show
            cibot CiBoT added a comment - Moving this reopened issue out from current integration. Please, re-submit it for integration once ready.
            Hide
            skodak Petr Skoda added a comment - - edited

            oh, there is no utf-8 zip support in 2.3, there is no way to backport this

            Show
            skodak Petr Skoda added a comment - - edited oh, there is no utf-8 zip support in 2.3, there is no way to backport this
            Hide
            stronk7 Eloy Lafuente (stronk7) added a comment -

            (ops, sorry for the noise, I mixed a bunch of issues here)

            Show
            stronk7 Eloy Lafuente (stronk7) added a comment - (ops, sorry for the noise, I mixed a bunch of issues here)
            Hide
            poltawski Dan Poltawski added a comment -

            The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week.

            TIA and ciao

            Show
            poltawski Dan Poltawski added a comment - The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week. TIA and ciao
            Hide
            poltawski Dan Poltawski added a comment - - edited

            Hi Petr,

            Looking at this code it is not clear to me that what you are doing is intended, i.e. in the case of an empty localewincharset (e.g. english) setting it to an empty string and not oldcharset (ISO-8859-1) in english.

            Is it right that we should be setting this to an empty string? Should you be testing if localewincharset is empty before setting $encoding to it?

            Show
            poltawski Dan Poltawski added a comment - - edited Hi Petr, Looking at this code it is not clear to me that what you are doing is intended, i.e. in the case of an empty localewincharset (e.g. english) setting it to an empty string and not oldcharset (ISO-8859-1) in english. Is it right that we should be setting this to an empty string? Should you be testing if localewincharset is empty before setting $encoding to it?
            Hide
            xaero xaero added a comment - - edited

            Thanks all of you!
            It's truely filename encoding of zip problem! With the software (WinZip 17.0) , I zipped some files in my MS Windows, It unzipped correctly in the Moodle!
            The [a.zip] in the attachment is created by WinZip 17.0

            Show
            xaero xaero added a comment - - edited Thanks all of you! It's truely filename encoding of zip problem! With the software (WinZip 17.0) , I zipped some files in my MS Windows, It unzipped correctly in the Moodle! The [a.zip] in the attachment is created by WinZip 17.0
            Hide
            poltawski Dan Poltawski added a comment -

            [Ping Petr]

            Show
            poltawski Dan Poltawski added a comment - [Ping Petr]
            Hide
            poltawski Dan Poltawski added a comment -

            The integration of this issue has been delayed to next week because the integration period is over (Monday, Tuesday) and testing must happen on Wednesday.

            This change to a more rigid timeframe on each integration/testing cycle aims to produce a better and clear separation and organization of tasks for everybody.

            This is a bulk-automated message, so if you want to blame somebody/thing/where, don't do it here (use git instead)

            Show
            poltawski Dan Poltawski added a comment - The integration of this issue has been delayed to next week because the integration period is over (Monday, Tuesday) and testing must happen on Wednesday. This change to a more rigid timeframe on each integration/testing cycle aims to produce a better and clear separation and organization of tasks for everybody. This is a bulk-automated message, so if you want to blame somebody/thing/where, don't do it here (use git instead)
            Hide
            cibot CiBoT added a comment -

            Moving this reopened issue out from current integration. Please, re-submit it for integration once ready.

            Show
            cibot CiBoT added a comment - Moving this reopened issue out from current integration. Please, re-submit it for integration once ready.
            Hide
            skodak Petr Skoda added a comment -

            Thanks Dan, I have updated the patch to skip for empty charsets. I looked at other languages that do not use iso charsets and it seems the only futureproof solution might be to add new zipcharset to all language packs and let translators add proper encoding (that would be 2.5dev only).

            Show
            skodak Petr Skoda added a comment - Thanks Dan, I have updated the patch to skip for empty charsets. I looked at other languages that do not use iso charsets and it seems the only futureproof solution might be to add new zipcharset to all language packs and let translators add proper encoding (that would be 2.5dev only).
            Hide
            samhemelryk Sam Hemelryk added a comment -

            Thanks Petr - this has been integrated now.

            Show
            samhemelryk Sam Hemelryk added a comment - Thanks Petr - this has been integrated now.
            Hide
            andyjdavis Andrew Davis added a comment -

            Seems to be working fine. Passing.

            Show
            andyjdavis Andrew Davis added a comment - Seems to be working fine. Passing.
            Hide
            stronk7 Eloy Lafuente (stronk7) added a comment -

            A brilliant future is awaiting us out there, better with your code. Let's look towards the future together, this is now closed.

            (and won't be revisiting it unless some regression is found)

            Thanks and ciao

            Show
            stronk7 Eloy Lafuente (stronk7) added a comment - A brilliant future is awaiting us out there, better with your code. Let's look towards the future together, this is now closed. (and won't be revisiting it unless some regression is found) Thanks and ciao

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Fix Release Date:
                  11/Mar/13