Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-27125

Bug in repository->get_file causing data corruption for downloaded files

    Details

    • Testing Instructions:
      Hide

      Regression testing:

      1. Enable the 'URL downloader' repository
      2. Download a file to my private files
      3. Check the file in private files is not corrupt.
      Show
      Regression testing: Enable the 'URL downloader' repository Download a file to my private files Check the file in private files is not corrupt.
    • Difficulty:
      Easy
    • Affected Branches:
      MOODLE_21_STABLE, MOODLE_22_STABLE
    • Fixed Branches:
      MOODLE_21_STABLE, MOODLE_22_STABLE, MOODLE_23_STABLE
    • Pull from Repository:
      git@github.com:dongsheng/moodle.git
    • Pull Master Branch:
      MDL-27125_master_2

      Description

      Here is the code for get_file() method for repository class in repository/lib.php:

         public function get_file($url, $filename = '') {
              global $CFG;
              $path = $this->prepare_file($filename);
              $fp = fopen($path, 'w');
              $c = new curl;
              $c->download(array(array('url'=>$url, 'file'=>$fp)));
              return array('path'=>$path, 'url'=>$url);
          }

      Notice that the file is opened but not closed in this function. This will cause that within the request, after calling this function not whole file may be flushed to the disk. We have tested it on linux and were getting different results, but very often a partial file would be copied from temp/downloads area into the filestorage. The fix is very simple - adding fclose($fp) fixes the issue:

          public function get_file($url, $filename = '') {
              global $CFG;
              $path = $this->prepare_file($filename);
              $fp = fopen($path, 'w');
              $c = new curl;
              $c->download(array(array('url'=>$url, 'file'=>$fp)));
              fclose($fp);
              return array('path'=>$path, 'url'=>$url);
          }

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              tmuras Tomasz Muras added a comment -

              Patch is available for pull in the repository:
              git://github.com/enovation/moodle.git

              branch:
              MDL-27125

              Show
              tmuras Tomasz Muras added a comment - Patch is available for pull in the repository: git://github.com/enovation/moodle.git branch: MDL-27125
              Hide
              dongsheng Dongsheng Cai added a comment -

              Thanks Tomasz

              Adding this issue to stable backlog.

              Show
              dongsheng Dongsheng Cai added a comment - Thanks Tomasz Adding this issue to stable backlog.
              Hide
              skodak Petr Skoda added a comment -

              Hello,
              hmmm, there seem to be more similar problems in other places where this curl class is user (btw the name is too short for moodle specific stuff, I can image any other project using this class name). It might be better to fix it inside the download method and friends, but then it might break BC...

              Petr

              Show
              skodak Petr Skoda added a comment - Hello, hmmm, there seem to be more similar problems in other places where this curl class is user (btw the name is too short for moodle specific stuff, I can image any other project using this class name). It might be better to fix it inside the download method and friends, but then it might break BC... Petr
              Hide
              tmuras Tomasz Muras added a comment - - edited

              I agree, download() function looks like much better place to fix it. It seems more natural (to me at least) to pass file path to a function like this instead of the file descriptor.
              If we are afraid about BC, what about adding support for new array key 'filepath' - the function could be then called like this:

              download(array(array('url'=>$url, 'filepath'=>$path)));

              'file' key should be deprecated and removed (in 2.2?)

              Or maybe let's be bold and rename the class to more unique name and change the function at the same time?

              Tomasz Muras

              Show
              tmuras Tomasz Muras added a comment - - edited I agree, download() function looks like much better place to fix it. It seems more natural (to me at least) to pass file path to a function like this instead of the file descriptor. If we are afraid about BC, what about adding support for new array key 'filepath' - the function could be then called like this: download(array(array('url'=>$url, 'filepath'=>$path))); 'file' key should be deprecated and removed (in 2.2?) Or maybe let's be bold and rename the class to more unique name and change the function at the same time? Tomasz Muras
              Hide
              skodak Petr Skoda added a comment -

              Yes, I would prefer to pass file path there too. The handler passing is good if you do not want to disclose the file location such as in file_storage class.

              Show
              skodak Petr Skoda added a comment - Yes, I would prefer to pass file path there too. The handler passing is good if you do not want to disclose the file location such as in file_storage class.
              Hide
              tmuras Tomasz Muras added a comment -

              Dongsheng/Petrt - do we agree on adding the 'filepath' key and changing download() as per my comment:
              http://tracker.moodle.org/browse/MDL-27125?focusedCommentId=108784&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-108784

              If so, then let me know if you're going to work on it - or do you want me tocreate a patch that does this, and updates the whole core code, where download() is being used.

              Show
              tmuras Tomasz Muras added a comment - Dongsheng/Petrt - do we agree on adding the 'filepath' key and changing download() as per my comment: http://tracker.moodle.org/browse/MDL-27125?focusedCommentId=108784&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-108784 If so, then let me know if you're going to work on it - or do you want me tocreate a patch that does this, and updates the whole core code, where download() is being used.
              Hide
              dongsheng Dongsheng Cai added a comment -

              Hello Tomasz,

              Thanks for all comments, I started work on this, will add filepath.

              Regards,
              Dongsheng Cai

              Show
              dongsheng Dongsheng Cai added a comment - Hello Tomasz, Thanks for all comments, I started work on this, will add filepath. Regards, Dongsheng Cai
              Hide
              dongsheng Dongsheng Cai added a comment -

              Tomasz, could you please review my patch, see if it works for you?

              Thanks

              Show
              dongsheng Dongsheng Cai added a comment - Tomasz, could you please review my patch, see if it works for you? Thanks
              Hide
              tmuras Tomasz Muras added a comment -

              I've used the CLI code as shown below to test it.
              It works OK but I'm not sure if we should be closing file descriptors passed into download() function. This is really just a side-effect and it may not be something that a user (well developer) would expect.

              <?php
              define('CLI_SCRIPT', true);
               
              require(dirname(dirname(dirname(__FILE__))).'/config.php');
              error_reporting(E_ALL);
              require_once($CFG->libdir.'/clilib.php');      // cli only functions
              require_once($CFG->libdir.'/filelib.php');      // cli only functions
               
              $c = new curl;
              $pid = getmypid();
               
              echo `ls -l /proc/$pid/fd`;
              $testh = fopen('/tmp/fileX', 'wb');
              fwrite($testh,'test');
               
              echo `ls -l /proc/$pid/fd`;
               
              $c->download(array(
                array('url'=>'http://localhost/20/', 'filepath'=>'/tmp/file2.tmp'),
                array('url'=>'http://localhost/20/','file'=>fopen('/tmp/file3', 'wb')),
              ));
              echo `ls -l /proc/$pid/fd`;

              Show
              tmuras Tomasz Muras added a comment - I've used the CLI code as shown below to test it. It works OK but I'm not sure if we should be closing file descriptors passed into download() function. This is really just a side-effect and it may not be something that a user (well developer) would expect. <?php define('CLI_SCRIPT', true);   require(dirname(dirname(dirname(__FILE__))).'/config.php'); error_reporting(E_ALL); require_once($CFG->libdir.'/clilib.php'); // cli only functions require_once($CFG->libdir.'/filelib.php'); // cli only functions   $c = new curl; $pid = getmypid();   echo `ls -l /proc/$pid/fd`; $testh = fopen('/tmp/fileX', 'wb'); fwrite($testh,'test');   echo `ls -l /proc/$pid/fd`;   $c->download(array( array('url'=>'http://localhost/20/', 'filepath'=>'/tmp/file2.tmp'), array('url'=>'http://localhost/20/','file'=>fopen('/tmp/file3', 'wb')), )); echo `ls -l /proc/$pid/fd`;
              Hide
              tmuras Tomasz Muras added a comment -

              I think we should not be introducing a side-effect of closing the file handle that was passed to the function.

              Show
              tmuras Tomasz Muras added a comment - I think we should not be introducing a side-effect of closing the file handle that was passed to the function.
              Hide
              dongsheng Dongsheng Cai added a comment -

              Hi, Tomasz

              I fixed the code to close the file handler outside of function.

              Show
              dongsheng Dongsheng Cai added a comment - Hi, Tomasz I fixed the code to close the file handler outside of function.
              Hide
              tmuras Tomasz Muras added a comment - - edited

              Hi Dongsheng,

              It looks OK now - I only looked at s11_MDL-27125_curl_file_handler_master, as I imagine that's the only one that will be integrated.
              I would just add one improvement in the comment, to prevent ppl from doing the same mistake of not closing the files, instead of:

                   * <code>
                   * $c = new curl;
                   * $c->download(array(
                   * array('url'=>'http://localhost/', 'file'=>fopen('a', 'wb')),
                   * array('url'=>'http://localhost/20/', 'file'=>fopen('b', 'wb'))
                   * ));
                   * </code>

              Let's put:

                   * <code>
                   * $c = new curl;
                   * $file1 = fopen('a', 'wb');
                   * $file2 = fopen('b', 'wb');
                   * $c->download(array(
                   * array('url'=>'http://localhost/', 'file'=>$file1),
                   * array('url'=>'http://localhost/20/', 'file'=>$file2)
                   * ));
                   * fclose($file1);
                   * fclose($file2);
                   * </code>

              Show
              tmuras Tomasz Muras added a comment - - edited Hi Dongsheng, It looks OK now - I only looked at s11_ MDL-27125 _curl_file_handler_master, as I imagine that's the only one that will be integrated. I would just add one improvement in the comment, to prevent ppl from doing the same mistake of not closing the files, instead of: * <code> * $c = new curl; * $c->download(array( * array('url'=>'http://localhost/', 'file'=>fopen('a', 'wb')), * array('url'=>'http://localhost/20/', 'file'=>fopen('b', 'wb')) * )); * </code> Let's put: * <code> * $c = new curl; * $file1 = fopen('a', 'wb'); * $file2 = fopen('b', 'wb'); * $c->download(array( * array('url'=>'http://localhost/', 'file'=>$file1), * array('url'=>'http://localhost/20/', 'file'=>$file2) * )); * fclose($file1); * fclose($file2); * </code>
              Hide
              dongsheng Dongsheng Cai added a comment -

              Thanks Tomasz, I fixed the comment

              Show
              dongsheng Dongsheng Cai added a comment - Thanks Tomasz, I fixed the comment
              Hide
              samhemelryk Sam Hemelryk added a comment -

              Hi Dongsheng,

              Failing this at the moment, in general the changes look good however I think there certainly needs to be some sort of checking to ensure that if the code is passing both file and filepath it doesn't just overwrite file with a new file handle which would introcude problems.
              Probably check file to make sure it is a file handle or not set before overwriting it plus some debugging (if possible) to explain to the developer that they should use one or the other and not both.

              Cheers
              Sam

              Show
              samhemelryk Sam Hemelryk added a comment - Hi Dongsheng, Failing this at the moment, in general the changes look good however I think there certainly needs to be some sort of checking to ensure that if the code is passing both file and filepath it doesn't just overwrite file with a new file handle which would introcude problems. Probably check file to make sure it is a file handle or not set before overwriting it plus some debugging (if possible) to explain to the developer that they should use one or the other and not both. Cheers Sam
              Hide
              jonathan Jonathan Harker added a comment - - edited

              Why not use the download_file_content function which does this already? It has the added bonus of respecting any configured Moodle cache/proxy settings.

              Show
              jonathan Jonathan Harker added a comment - - edited Why not use the download_file_content function which does this already? It has the added bonus of respecting any configured Moodle cache/proxy settings.
              Hide
              poltawski Dan Poltawski added a comment -

              The curl class should do this too. In fact download_file_content should really be converted internally to use the curl class so everything is following the same code path.

              Show
              poltawski Dan Poltawski added a comment - The curl class should do this too. In fact download_file_content should really be converted internally to use the curl class so everything is following the same code path.
              Hide
              jonathan Jonathan Harker added a comment -

              How about we assert that if there is a specified file handle, then that takes precedence, and filepath is ignored? If it has been passed both 'filepath' and 'file' then the existing open file handle takes precedence over the filepath. If only 'file' is specified, then it continues as normal. If only 'filepath' is specified, it creates the file handle under the 'file' parameter, adds an 'auto-handle' flag, and closes any flagged file handles at the end. This should kill all the parrots with one rock.

              Show
              jonathan Jonathan Harker added a comment - How about we assert that if there is a specified file handle, then that takes precedence, and filepath is ignored? If it has been passed both 'filepath' and 'file' then the existing open file handle takes precedence over the filepath. If only 'file' is specified, then it continues as normal. If only 'filepath' is specified, it creates the file handle under the 'file' parameter, adds an 'auto-handle' flag, and closes any flagged file handles at the end. This should kill all the parrots with one rock.
              Hide
              jonathan Jonathan Harker added a comment -

              Thing is, I still think that since the original bug doesn't even need the multiple download functionality anyway, this is solving a different problem. The simplest solution to the bug reported here (file corruption in get_file) is to use download_file_content and point it at a path, e.g. attached patch.

              Show
              jonathan Jonathan Harker added a comment - Thing is, I still think that since the original bug doesn't even need the multiple download functionality anyway, this is solving a different problem. The simplest solution to the bug reported here (file corruption in get_file) is to use download_file_content and point it at a path, e.g. attached patch.
              Hide
              jonathan Jonathan Harker added a comment -

              Patch on dongsheng's code to not clobber incoming file handles.

              Show
              jonathan Jonathan Harker added a comment - Patch on dongsheng's code to not clobber incoming file handles.
              Hide
              jonathan Jonathan Harker added a comment -

              Let's review this master branch before I spend time doing rebases to 2.0 2.1 and 2.2

              Show
              jonathan Jonathan Harker added a comment - Let's review this master branch before I spend time doing rebases to 2.0 2.1 and 2.2
              Hide
              sry_not4sale Aaron Barnes added a comment -

              Jonathan,

              Reviewed your master patch - looks like a good improvement. Only comments I can make are to remove the unnecessary unset() calls and fix the constructor syntax used in the doc block.

              Thanks and +1
              Aaron

              Show
              sry_not4sale Aaron Barnes added a comment - Jonathan, Reviewed your master patch - looks like a good improvement. Only comments I can make are to remove the unnecessary unset() calls and fix the constructor syntax used in the doc block. Thanks and +1 Aaron
              Hide
              sry_not4sale Aaron Barnes added a comment -

              See prev comment

              Show
              sry_not4sale Aaron Barnes added a comment - See prev comment
              Hide
              jonathan Jonathan Harker added a comment -

              Amended - see above.

              Show
              jonathan Jonathan Harker added a comment - Amended - see above.
              Hide
              samhemelryk Sam Hemelryk added a comment -

              Thanks Jonathan + Aaron, this has been integrated now.
              I've merged master and cherry-picked back to 21, 22, and 23.

              Cheers
              Sam

              Show
              samhemelryk Sam Hemelryk added a comment - Thanks Jonathan + Aaron, this has been integrated now. I've merged master and cherry-picked back to 21, 22, and 23. Cheers Sam
              Hide
              abgreeve Adrian Greeve added a comment -

              Tested with branches 2.0, 2.1 and master. I didn't encounter any file corruption with the different downloaded files. No other issues either.
              Test passed.

              Show
              abgreeve Adrian Greeve added a comment - Tested with branches 2.0, 2.1 and master. I didn't encounter any file corruption with the different downloaded files. No other issues either. Test passed.
              Hide
              samhemelryk Sam Hemelryk added a comment -

              Congratulations your code is upstream - gold star for you!

              This issue + 79 others made it in in time for the minor releases.
              Thank you everyone involved for your exuberant efforts.

              Show
              samhemelryk Sam Hemelryk added a comment - Congratulations your code is upstream - gold star for you! This issue + 79 others made it in in time for the minor releases. Thank you everyone involved for your exuberant efforts.

                People

                • Votes:
                  1 Vote for this issue
                  Watchers:
                  5 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:
                    Fix Release Date:
                    9/Jul/12