Issue Details (XML | Word | Printable)

Key: MDL-2307
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Minor Minor
Assignee: Petr Skoda
Reporter: Daniel Miksik
Votes: 12
Watchers: 5
Operations

Add/Edit UI Mockup to this issue
If you were logged in you would be able to see more operations.
Moodle

Special characters left in filenames after unzipping

Created: 23/Dec/04 01:01 AM   Updated: 11/May/09 11:48 PM
Return to search
Component/s: Administration
Affects Version/s: 1.5.3
Fix Version/s: 1.8.9, 1.9.5

File Attachments: 1. File unzip_clean_filenames.diff (0.9 kB)
2. File unzip_clean_filenames.diff (1.0 kB)

Environment: All
Issue Links:
Duplicate
 
Relates
 

Participants: Charles Fulton, Daniel Miksik, Deb Sarlin, Kenneth Newquist, Martin Dougiamas, Petr Skoda, Sara Arjona and Teresa Gibbison
Security Level: None
Resolved date: 08/May/09
Affected Branches: MOODLE_15_STABLE
Fixed Branches: MOODLE_18_STABLE, MOODLE_19_STABLE


 Description  « Hide
When an uploaded zip file is unzipped in the Files area, special characters are left in the unzipped files - resulting in Sorry, the requested file could not be found message when trying to access those files. Tested at my site and moodle.org today.

 All   Comments   Change History   Version Control      Sort Order: Ascending order - Click to sort in descending order
Martin Dougiamas added a comment - 17/Nov/05 07:39 AM
From Jaroslav Šeděnka (jaroslav at sedenka.net) Sunday, 16 October 2005, 03:24 AM:

I uploaded a simple patch replacing ilegal characters with underscore (on Linux; might be neccessary on other Unixes as well, but I can't test it).

From Jaroslav Šeděnka (jaroslav at sedenka.net) Sunday, 16 October 2005, 03:25 AM:

PS. the code was written by Dan Miksik, I't not like to forget the credits


Kenneth Newquist added a comment - 05/Mar/08 11:38 PM
This problem still exists in Moodle 1.8.4+.

Teresa Gibbison added a comment - 30/Jul/08 07:44 AM
This also still exists in Moodle 1.9 - it would be great if this can be fixed as the files are basically useless.

Charles Fulton added a comment - 14/Jan/09 11:35 AM
The problem can be traced to the fact that the file display function and the unzip function don't use the same "cleaning" method: unzip_clean_filename() has several built in regexs, while file.php calls clean_param (with PARAM_PATH), which has its own more extensive regexs. I've added a patch which changes unzip_clean_filename(); it now also calls clean_param() as part of its cleaning routine.

The desired behavior, then, is to remove strange characters from the get-go, so they'll never cause problems down the road.


Kenneth Newquist added a comment - 14/Jan/09 11:26 PM
Thanks for looking into this; I'd meant to look into this and never got around to it. I agree that the desired behavior is for it to strip any special characters, which is what happens when you upload a single file. This bug does render zip upload useless for a goodly number of faculty who end up using some sort of special character in their file name (who are then extremely annoyed at their inability to move/delete a file).

I'll check out the patch ASAP.


Deb Sarlin added a comment - 16/Jan/09 12:48 AM
Thanks Charles,
Really helpful - will try unzip_clean_filenames.diff as well.
Deb

Petr Skoda added a comment - 22/Jan/09 03:01 AM
this is already solved in HEAD

Sara Arjona added a comment - 11/Mar/09 05:41 PM
This solution doesn't work for us. We work with Moodle 1.8.7 with Oracle database and this solution still left some special characters (like ç, ñ, ' ...). Besides, we have solved this problem adding the following lines in the unzip_cleanfilename function (lib/moodlelib.php file):

...
$p_header['filename'] = cleardoubleslashes($p_header['filename']); //normalize the slashes/backslashes
// ******* Added
$tmp = split("/", $p_header['filename']);
$tmp[sizeof($tmp)-1]=clean_filename($tmp[sizeof($tmp)-1]);
$p_header['filename']=implode("/", $tmp);
// ***** End added
return 1;
...

It would be possible to revise the code and add this lines to solve definitively the problem? Thank you in advance!


Charles Fulton added a comment - 15/Apr/09 05:21 AM
Updated per Sara Arjona's comment. clean_filename() is even more restrictive than clean_param(), even when PARAM_FILE is passed (which doesn't make much sense to me).

Petr Skoda added a comment - 06/May/09 04:52 PM
should be fixed now, please test latest cvs
thanks the report

Daniel Miksik added a comment - 07/May/09 06:03 PM
I have tested the latest MOODLE_18_STABLE code from CVS, still getting illegal characters (like space or Czech characters with diacritic marks) in the names of the unzipped files.

Petr Skoda added a comment - 07/May/09 06:17 PM
which zip method are you using?

Petr Skoda added a comment - 07/May/09 06:50 PM
reclosing, the internal zipping should work fine too now.
International characters, spaces, etc. are not forbidden, only chars like '"<> are - this is required for backwards compatibility and also they will be allowed in 2.0

thanks a lot for the testing


Daniel Miksik added a comment - 07/May/09 08:03 PM
I don't know what exactly is meant by "zip method", sorry. I'm creating the ZIP files using 7-Zip (compression method: deflate, dictionary size: 32 KB, word size: 64).

I think something has got broken by the last fix. I get the following error trying to unzip any ZIP file now:
Notice: Undefined variable: temppath in /home/elfdev/lib/moodlelib.php on line 6562
PCLZIP_ERR_INVALID_PARAMETER (-3) : Invalid optional parameter '' for this method

And a final note: If international characters are not forbidden, then why are they removed from the names of files uploaded directly to course files?


Petr Skoda added a comment - 08/May/09 12:39 AM
zip mehods are two, internal using pclzip and external infozib binary,
reopening and sorry for the regression in 1.8.x branch

Petr Skoda added a comment - 08/May/09 12:45 AM
fixed, thanks

Daniel Miksik added a comment - 11/May/09 09:26 PM
Thanks for the quick fixes, Petr. Could you please also respond to the question from above: If international characters are not forbidden, then why are they removed from the names of files uploaded directly to course files? Thank you.

Petr Skoda added a comment - 11/May/09 11:48 PM
They are not forbidden for security reasons. They do break often if you use them. Unfortunately we can not remove them because it would break links in uploaded html files.

In 2.0 all the "not forbidden" chars should finally "sort of" work.