Moodle
  1. Moodle
  2. MDL-27125

Bug in repository->get_file causing data corruption for downloaded files

    Details

    • Testing Instructions:
      Hide

      Regression testing:

      1. Enable the 'URL downloader' repository
      2. Download a file to my private files
      3. Check the file in private files is not corrupt.
      Show
      Regression testing: Enable the 'URL downloader' repository Download a file to my private files Check the file in private files is not corrupt.
    • Difficulty:
      Easy
    • Affected Branches:
      MOODLE_21_STABLE, MOODLE_22_STABLE
    • Fixed Branches:
      MOODLE_21_STABLE, MOODLE_22_STABLE, MOODLE_23_STABLE
    • Pull from Repository:
      git@github.com:dongsheng/moodle.git
    • Pull Master Branch:
      MDL-27125_master_2

      Description

      Here is the code for get_file() method for repository class in repository/lib.php:

         public function get_file($url, $filename = '') {
              global $CFG;
              $path = $this->prepare_file($filename);
              $fp = fopen($path, 'w');
              $c = new curl;
              $c->download(array(array('url'=>$url, 'file'=>$fp)));
              return array('path'=>$path, 'url'=>$url);
          }

      Notice that the file is opened but not closed in this function. This will cause that within the request, after calling this function not whole file may be flushed to the disk. We have tested it on linux and were getting different results, but very often a partial file would be copied from temp/downloads area into the filestorage. The fix is very simple - adding fclose($fp) fixes the issue:

          public function get_file($url, $filename = '') {
              global $CFG;
              $path = $this->prepare_file($filename);
              $fp = fopen($path, 'w');
              $c = new curl;
              $c->download(array(array('url'=>$url, 'file'=>$fp)));
              fclose($fp);
              return array('path'=>$path, 'url'=>$url);
          }

        Gliffy Diagrams

        1. MDL-27125_jh1.diff
          0.6 kB
          Jonathan Harker
        2. MDL-27125-patch-1.diff
          1 kB
          Jonathan Harker

          Issue Links

            Activity

            Hide
            Tomasz Muras added a comment -

            Patch is available for pull in the repository:
            git://github.com/enovation/moodle.git

            branch:
            MDL-27125

            Show
            Tomasz Muras added a comment - Patch is available for pull in the repository: git://github.com/enovation/moodle.git branch: MDL-27125
            Hide
            Dongsheng Cai added a comment -

            Thanks Tomasz

            Adding this issue to stable backlog.

            Show
            Dongsheng Cai added a comment - Thanks Tomasz Adding this issue to stable backlog.
            Hide
            Petr Skoda added a comment -

            Hello,
            hmmm, there seem to be more similar problems in other places where this curl class is user (btw the name is too short for moodle specific stuff, I can image any other project using this class name). It might be better to fix it inside the download method and friends, but then it might break BC...

            Petr

            Show
            Petr Skoda added a comment - Hello, hmmm, there seem to be more similar problems in other places where this curl class is user (btw the name is too short for moodle specific stuff, I can image any other project using this class name). It might be better to fix it inside the download method and friends, but then it might break BC... Petr
            Hide
            Tomasz Muras added a comment - - edited

            I agree, download() function looks like much better place to fix it. It seems more natural (to me at least) to pass file path to a function like this instead of the file descriptor.
            If we are afraid about BC, what about adding support for new array key 'filepath' - the function could be then called like this:

            download(array(array('url'=>$url, 'filepath'=>$path)));

            'file' key should be deprecated and removed (in 2.2?)

            Or maybe let's be bold and rename the class to more unique name and change the function at the same time?

            Tomasz Muras

            Show
            Tomasz Muras added a comment - - edited I agree, download() function looks like much better place to fix it. It seems more natural (to me at least) to pass file path to a function like this instead of the file descriptor. If we are afraid about BC, what about adding support for new array key 'filepath' - the function could be then called like this: download(array(array('url'=>$url, 'filepath'=>$path))); 'file' key should be deprecated and removed (in 2.2?) Or maybe let's be bold and rename the class to more unique name and change the function at the same time? Tomasz Muras
            Hide
            Petr Skoda added a comment -

            Yes, I would prefer to pass file path there too. The handler passing is good if you do not want to disclose the file location such as in file_storage class.

            Show
            Petr Skoda added a comment - Yes, I would prefer to pass file path there too. The handler passing is good if you do not want to disclose the file location such as in file_storage class.
            Hide
            Tomasz Muras added a comment -

            Dongsheng/Petrt - do we agree on adding the 'filepath' key and changing download() as per my comment:
            http://tracker.moodle.org/browse/MDL-27125?focusedCommentId=108784&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-108784

            If so, then let me know if you're going to work on it - or do you want me tocreate a patch that does this, and updates the whole core code, where download() is being used.

            Show
            Tomasz Muras added a comment - Dongsheng/Petrt - do we agree on adding the 'filepath' key and changing download() as per my comment: http://tracker.moodle.org/browse/MDL-27125?focusedCommentId=108784&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-108784 If so, then let me know if you're going to work on it - or do you want me tocreate a patch that does this, and updates the whole core code, where download() is being used.
            Hide
            Dongsheng Cai added a comment -

            Hello Tomasz,

            Thanks for all comments, I started work on this, will add filepath.

            Regards,
            Dongsheng Cai

            Show
            Dongsheng Cai added a comment - Hello Tomasz, Thanks for all comments, I started work on this, will add filepath. Regards, Dongsheng Cai
            Hide
            Dongsheng Cai added a comment -

            Tomasz, could you please review my patch, see if it works for you?

            Thanks

            Show
            Dongsheng Cai added a comment - Tomasz, could you please review my patch, see if it works for you? Thanks
            Hide
            Tomasz Muras added a comment -

            I've used the CLI code as shown below to test it.
            It works OK but I'm not sure if we should be closing file descriptors passed into download() function. This is really just a side-effect and it may not be something that a user (well developer) would expect.

            <?php
            define('CLI_SCRIPT', true);
             
            require(dirname(dirname(dirname(__FILE__))).'/config.php');
            error_reporting(E_ALL);
            require_once($CFG->libdir.'/clilib.php');      // cli only functions
            require_once($CFG->libdir.'/filelib.php');      // cli only functions
             
            $c = new curl;
            $pid = getmypid();
             
            echo `ls -l /proc/$pid/fd`;
            $testh = fopen('/tmp/fileX', 'wb');
            fwrite($testh,'test');
             
            echo `ls -l /proc/$pid/fd`;
             
            $c->download(array(
              array('url'=>'http://localhost/20/', 'filepath'=>'/tmp/file2.tmp'),
              array('url'=>'http://localhost/20/','file'=>fopen('/tmp/file3', 'wb')),
            ));
            echo `ls -l /proc/$pid/fd`;
            

            Show
            Tomasz Muras added a comment - I've used the CLI code as shown below to test it. It works OK but I'm not sure if we should be closing file descriptors passed into download() function. This is really just a side-effect and it may not be something that a user (well developer) would expect. <?php define('CLI_SCRIPT', true);   require(dirname(dirname(dirname(__FILE__))).'/config.php'); error_reporting(E_ALL); require_once($CFG->libdir.'/clilib.php'); // cli only functions require_once($CFG->libdir.'/filelib.php'); // cli only functions   $c = new curl; $pid = getmypid();   echo `ls -l /proc/$pid/fd`; $testh = fopen('/tmp/fileX', 'wb'); fwrite($testh,'test');   echo `ls -l /proc/$pid/fd`;   $c->download(array( array('url'=>'http://localhost/20/', 'filepath'=>'/tmp/file2.tmp'), array('url'=>'http://localhost/20/','file'=>fopen('/tmp/file3', 'wb')), )); echo `ls -l /proc/$pid/fd`;
            Hide
            Tomasz Muras added a comment -

            I think we should not be introducing a side-effect of closing the file handle that was passed to the function.

            Show
            Tomasz Muras added a comment - I think we should not be introducing a side-effect of closing the file handle that was passed to the function.
            Hide
            Dongsheng Cai added a comment -

            Hi, Tomasz

            I fixed the code to close the file handler outside of function.

            Show
            Dongsheng Cai added a comment - Hi, Tomasz I fixed the code to close the file handler outside of function.
            Hide
            Tomasz Muras added a comment - - edited

            Hi Dongsheng,

            It looks OK now - I only looked at s11_MDL-27125_curl_file_handler_master, as I imagine that's the only one that will be integrated.
            I would just add one improvement in the comment, to prevent ppl from doing the same mistake of not closing the files, instead of:

                 * <code>
                 * $c = new curl;
                 * $c->download(array(
                 * array('url'=>'http://localhost/', 'file'=>fopen('a', 'wb')),
                 * array('url'=>'http://localhost/20/', 'file'=>fopen('b', 'wb'))
                 * ));
                 * </code>
            

            Let's put:

                 * <code>
                 * $c = new curl;
                 * $file1 = fopen('a', 'wb');
                 * $file2 = fopen('b', 'wb');
                 * $c->download(array(
                 * array('url'=>'http://localhost/', 'file'=>$file1),
                 * array('url'=>'http://localhost/20/', 'file'=>$file2)
                 * ));
                 * fclose($file1);
                 * fclose($file2);
                 * </code>
            

            Show
            Tomasz Muras added a comment - - edited Hi Dongsheng, It looks OK now - I only looked at s11_ MDL-27125 _curl_file_handler_master, as I imagine that's the only one that will be integrated. I would just add one improvement in the comment, to prevent ppl from doing the same mistake of not closing the files, instead of: * <code> * $c = new curl; * $c->download(array( * array('url'=>'http://localhost/', 'file'=>fopen('a', 'wb')), * array('url'=>'http://localhost/20/', 'file'=>fopen('b', 'wb')) * )); * </code> Let's put: * <code> * $c = new curl; * $file1 = fopen('a', 'wb'); * $file2 = fopen('b', 'wb'); * $c->download(array( * array('url'=>'http://localhost/', 'file'=>$file1), * array('url'=>'http://localhost/20/', 'file'=>$file2) * )); * fclose($file1); * fclose($file2); * </code>
            Hide
            Dongsheng Cai added a comment -

            Thanks Tomasz, I fixed the comment

            Show
            Dongsheng Cai added a comment - Thanks Tomasz, I fixed the comment
            Hide
            Sam Hemelryk added a comment -

            Hi Dongsheng,

            Failing this at the moment, in general the changes look good however I think there certainly needs to be some sort of checking to ensure that if the code is passing both file and filepath it doesn't just overwrite file with a new file handle which would introcude problems.
            Probably check file to make sure it is a file handle or not set before overwriting it plus some debugging (if possible) to explain to the developer that they should use one or the other and not both.

            Cheers
            Sam

            Show
            Sam Hemelryk added a comment - Hi Dongsheng, Failing this at the moment, in general the changes look good however I think there certainly needs to be some sort of checking to ensure that if the code is passing both file and filepath it doesn't just overwrite file with a new file handle which would introcude problems. Probably check file to make sure it is a file handle or not set before overwriting it plus some debugging (if possible) to explain to the developer that they should use one or the other and not both. Cheers Sam
            Hide
            Jonathan Harker added a comment - - edited

            Why not use the download_file_content function which does this already? It has the added bonus of respecting any configured Moodle cache/proxy settings.

            Show
            Jonathan Harker added a comment - - edited Why not use the download_file_content function which does this already? It has the added bonus of respecting any configured Moodle cache/proxy settings.
            Hide
            Dan Poltawski added a comment -

            The curl class should do this too. In fact download_file_content should really be converted internally to use the curl class so everything is following the same code path.

            Show
            Dan Poltawski added a comment - The curl class should do this too. In fact download_file_content should really be converted internally to use the curl class so everything is following the same code path.
            Hide
            Jonathan Harker added a comment -

            How about we assert that if there is a specified file handle, then that takes precedence, and filepath is ignored? If it has been passed both 'filepath' and 'file' then the existing open file handle takes precedence over the filepath. If only 'file' is specified, then it continues as normal. If only 'filepath' is specified, it creates the file handle under the 'file' parameter, adds an 'auto-handle' flag, and closes any flagged file handles at the end. This should kill all the parrots with one rock.

            Show
            Jonathan Harker added a comment - How about we assert that if there is a specified file handle, then that takes precedence, and filepath is ignored? If it has been passed both 'filepath' and 'file' then the existing open file handle takes precedence over the filepath. If only 'file' is specified, then it continues as normal. If only 'filepath' is specified, it creates the file handle under the 'file' parameter, adds an 'auto-handle' flag, and closes any flagged file handles at the end. This should kill all the parrots with one rock.
            Hide
            Jonathan Harker added a comment -

            Thing is, I still think that since the original bug doesn't even need the multiple download functionality anyway, this is solving a different problem. The simplest solution to the bug reported here (file corruption in get_file) is to use download_file_content and point it at a path, e.g. attached patch.

            Show
            Jonathan Harker added a comment - Thing is, I still think that since the original bug doesn't even need the multiple download functionality anyway, this is solving a different problem. The simplest solution to the bug reported here (file corruption in get_file) is to use download_file_content and point it at a path, e.g. attached patch.
            Hide
            Jonathan Harker added a comment -

            Patch on dongsheng's code to not clobber incoming file handles.

            Show
            Jonathan Harker added a comment - Patch on dongsheng's code to not clobber incoming file handles.
            Hide
            Jonathan Harker added a comment -

            Let's review this master branch before I spend time doing rebases to 2.0 2.1 and 2.2

            Show
            Jonathan Harker added a comment - Let's review this master branch before I spend time doing rebases to 2.0 2.1 and 2.2
            Hide
            Aaron Barnes added a comment -

            Jonathan,

            Reviewed your master patch - looks like a good improvement. Only comments I can make are to remove the unnecessary unset() calls and fix the constructor syntax used in the doc block.

            Thanks and +1
            Aaron

            Show
            Aaron Barnes added a comment - Jonathan, Reviewed your master patch - looks like a good improvement. Only comments I can make are to remove the unnecessary unset() calls and fix the constructor syntax used in the doc block. Thanks and +1 Aaron
            Hide
            Aaron Barnes added a comment -

            See prev comment

            Show
            Aaron Barnes added a comment - See prev comment
            Hide
            Jonathan Harker added a comment -

            Amended - see above.

            Show
            Jonathan Harker added a comment - Amended - see above.
            Hide
            Sam Hemelryk added a comment -

            Thanks Jonathan + Aaron, this has been integrated now.
            I've merged master and cherry-picked back to 21, 22, and 23.

            Cheers
            Sam

            Show
            Sam Hemelryk added a comment - Thanks Jonathan + Aaron, this has been integrated now. I've merged master and cherry-picked back to 21, 22, and 23. Cheers Sam
            Hide
            Adrian Greeve added a comment -

            Tested with branches 2.0, 2.1 and master. I didn't encounter any file corruption with the different downloaded files. No other issues either.
            Test passed.

            Show
            Adrian Greeve added a comment - Tested with branches 2.0, 2.1 and master. I didn't encounter any file corruption with the different downloaded files. No other issues either. Test passed.
            Hide
            Sam Hemelryk added a comment -

            Congratulations your code is upstream - gold star for you!

            This issue + 79 others made it in in time for the minor releases.
            Thank you everyone involved for your exuberant efforts.

            Show
            Sam Hemelryk added a comment - Congratulations your code is upstream - gold star for you! This issue + 79 others made it in in time for the minor releases. Thank you everyone involved for your exuberant efforts.

              People

              • Votes:
                1 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: