Moodle
  1. Moodle
  2. MDL-27125

Bug in repository->get_file causing data corruption for downloaded files

    Details

    • Testing Instructions:
      Hide

      Regression testing:

      1. Enable the 'URL downloader' repository
      2. Download a file to my private files
      3. Check the file in private files is not corrupt.
      Show
      Regression testing: Enable the 'URL downloader' repository Download a file to my private files Check the file in private files is not corrupt.
    • Difficulty:
      Easy
    • Affected Branches:
      MOODLE_21_STABLE, MOODLE_22_STABLE
    • Fixed Branches:
      MOODLE_21_STABLE, MOODLE_22_STABLE, MOODLE_23_STABLE
    • Pull from Repository:
      git@github.com:dongsheng/moodle.git
    • Pull Master Branch:
      MDL-27125_master_2
    • Rank:
      16765

      Description

      Here is the code for get_file() method for repository class in repository/lib.php:

         public function get_file($url, $filename = '') {
              global $CFG;
              $path = $this->prepare_file($filename);
              $fp = fopen($path, 'w');
              $c = new curl;
              $c->download(array(array('url'=>$url, 'file'=>$fp)));
              return array('path'=>$path, 'url'=>$url);
          }
      

      Notice that the file is opened but not closed in this function. This will cause that within the request, after calling this function not whole file may be flushed to the disk. We have tested it on linux and were getting different results, but very often a partial file would be copied from temp/downloads area into the filestorage. The fix is very simple - adding fclose($fp) fixes the issue:

          public function get_file($url, $filename = '') {
              global $CFG;
              $path = $this->prepare_file($filename);
              $fp = fopen($path, 'w');
              $c = new curl;
              $c->download(array(array('url'=>$url, 'file'=>$fp)));
              fclose($fp);
              return array('path'=>$path, 'url'=>$url);
          }
      
      1. MDL-27125_jh1.diff
        0.6 kB
        Jonathan Harker
      2. MDL-27125-patch-1.diff
        1 kB
        Jonathan Harker

        Issue Links

          Activity

          Hide
          Tomasz Muras added a comment -

          Patch is available for pull in the repository:
          git://github.com/enovation/moodle.git

          branch:
          MDL-27125

          Show
          Tomasz Muras added a comment - Patch is available for pull in the repository: git://github.com/enovation/moodle.git branch: MDL-27125
          Hide
          Dongsheng Cai added a comment -

          Thanks Tomasz

          Adding this issue to stable backlog.

          Show
          Dongsheng Cai added a comment - Thanks Tomasz Adding this issue to stable backlog.
          Hide
          Petr Škoda added a comment -

          Hello,
          hmmm, there seem to be more similar problems in other places where this curl class is user (btw the name is too short for moodle specific stuff, I can image any other project using this class name). It might be better to fix it inside the download method and friends, but then it might break BC...

          Petr

          Show
          Petr Škoda added a comment - Hello, hmmm, there seem to be more similar problems in other places where this curl class is user (btw the name is too short for moodle specific stuff, I can image any other project using this class name). It might be better to fix it inside the download method and friends, but then it might break BC... Petr
          Hide
          Tomasz Muras added a comment - - edited

          I agree, download() function looks like much better place to fix it. It seems more natural (to me at least) to pass file path to a function like this instead of the file descriptor.
          If we are afraid about BC, what about adding support for new array key 'filepath' - the function could be then called like this:

          download(array(array('url'=>$url, 'filepath'=>$path)));

          'file' key should be deprecated and removed (in 2.2?)

          Or maybe let's be bold and rename the class to more unique name and change the function at the same time?

          Tomasz Muras

          Show
          Tomasz Muras added a comment - - edited I agree, download() function looks like much better place to fix it. It seems more natural (to me at least) to pass file path to a function like this instead of the file descriptor. If we are afraid about BC, what about adding support for new array key 'filepath' - the function could be then called like this: download(array(array('url'=>$url, 'filepath'=>$path))); 'file' key should be deprecated and removed (in 2.2?) Or maybe let's be bold and rename the class to more unique name and change the function at the same time? Tomasz Muras
          Hide
          Petr Škoda added a comment -

          Yes, I would prefer to pass file path there too. The handler passing is good if you do not want to disclose the file location such as in file_storage class.

          Show
          Petr Škoda added a comment - Yes, I would prefer to pass file path there too. The handler passing is good if you do not want to disclose the file location such as in file_storage class.
          Hide
          Tomasz Muras added a comment -

          Dongsheng/Petrt - do we agree on adding the 'filepath' key and changing download() as per my comment:
          http://tracker.moodle.org/browse/MDL-27125?focusedCommentId=108784&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-108784

          If so, then let me know if you're going to work on it - or do you want me tocreate a patch that does this, and updates the whole core code, where download() is being used.

          Show
          Tomasz Muras added a comment - Dongsheng/Petrt - do we agree on adding the 'filepath' key and changing download() as per my comment: http://tracker.moodle.org/browse/MDL-27125?focusedCommentId=108784&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-108784 If so, then let me know if you're going to work on it - or do you want me tocreate a patch that does this, and updates the whole core code, where download() is being used.
          Hide
          Dongsheng Cai added a comment -

          Hello Tomasz,

          Thanks for all comments, I started work on this, will add filepath.

          Regards,
          Dongsheng Cai

          Show
          Dongsheng Cai added a comment - Hello Tomasz, Thanks for all comments, I started work on this, will add filepath. Regards, Dongsheng Cai
          Hide
          Dongsheng Cai added a comment -

          Tomasz, could you please review my patch, see if it works for you?

          Thanks

          Show
          Dongsheng Cai added a comment - Tomasz, could you please review my patch, see if it works for you? Thanks
          Hide
          Tomasz Muras added a comment -

          I've used the CLI code as shown below to test it.
          It works OK but I'm not sure if we should be closing file descriptors passed into download() function. This is really just a side-effect and it may not be something that a user (well developer) would expect.

          <?php
          define('CLI_SCRIPT', true);
          
          require(dirname(dirname(dirname(__FILE__))).'/config.php');
          error_reporting(E_ALL);
          require_once($CFG->libdir.'/clilib.php');      // cli only functions
          require_once($CFG->libdir.'/filelib.php');      // cli only functions
          
          $c = new curl;
          $pid = getmypid();
          
          echo `ls -l /proc/$pid/fd`;
          $testh = fopen('/tmp/fileX', 'wb');
          fwrite($testh,'test');
          
          echo `ls -l /proc/$pid/fd`;
          
          $c->download(array(
            array('url'=>'http://localhost/20/', 'filepath'=>'/tmp/file2.tmp'),
            array('url'=>'http://localhost/20/','file'=>fopen('/tmp/file3', 'wb')),
          ));
          echo `ls -l /proc/$pid/fd`;
          
          Show
          Tomasz Muras added a comment - I've used the CLI code as shown below to test it. It works OK but I'm not sure if we should be closing file descriptors passed into download() function. This is really just a side-effect and it may not be something that a user (well developer) would expect. <?php define('CLI_SCRIPT', true ); require(dirname(dirname(dirname(__FILE__))).'/config.php'); error_reporting(E_ALL); require_once($CFG->libdir.'/clilib.php'); // cli only functions require_once($CFG->libdir.'/filelib.php'); // cli only functions $c = new curl; $pid = getmypid(); echo `ls -l /proc/$pid/fd`; $testh = fopen('/tmp/fileX', 'wb'); fwrite($testh,'test'); echo `ls -l /proc/$pid/fd`; $c->download(array( array('url'=>'http: //localhost/20/', 'filepath'=>'/tmp/file2.tmp'), array('url'=>'http: //localhost/20/','file'=>fopen('/tmp/file3', 'wb')), )); echo `ls -l /proc/$pid/fd`;
          Hide
          Tomasz Muras added a comment -

          I think we should not be introducing a side-effect of closing the file handle that was passed to the function.

          Show
          Tomasz Muras added a comment - I think we should not be introducing a side-effect of closing the file handle that was passed to the function.
          Hide
          Dongsheng Cai added a comment -

          Hi, Tomasz

          I fixed the code to close the file handler outside of function.

          Show
          Dongsheng Cai added a comment - Hi, Tomasz I fixed the code to close the file handler outside of function.
          Hide
          Tomasz Muras added a comment - - edited

          Hi Dongsheng,

          It looks OK now - I only looked at s11_MDL-27125_curl_file_handler_master, as I imagine that's the only one that will be integrated.
          I would just add one improvement in the comment, to prevent ppl from doing the same mistake of not closing the files, instead of:

               * <code>
               * $c = new curl;
               * $c->download(array(
               * array('url'=>'http://localhost/', 'file'=>fopen('a', 'wb')),
               * array('url'=>'http://localhost/20/', 'file'=>fopen('b', 'wb'))
               * ));
               * </code>
          

          Let's put:

               * <code>
               * $c = new curl;
               * $file1 = fopen('a', 'wb');
               * $file2 = fopen('b', 'wb');
               * $c->download(array(
               * array('url'=>'http://localhost/', 'file'=>$file1),
               * array('url'=>'http://localhost/20/', 'file'=>$file2)
               * ));
               * fclose($file1);
               * fclose($file2);
               * </code>
          
          Show
          Tomasz Muras added a comment - - edited Hi Dongsheng, It looks OK now - I only looked at s11_ MDL-27125 _curl_file_handler_master, as I imagine that's the only one that will be integrated. I would just add one improvement in the comment, to prevent ppl from doing the same mistake of not closing the files, instead of: * <code> * $c = new curl; * $c->download(array( * array('url'=>'http: //localhost/', 'file'=>fopen('a', 'wb')), * array('url'=>'http: //localhost/20/', 'file'=>fopen('b', 'wb')) * )); * </code> Let's put: * <code> * $c = new curl; * $file1 = fopen('a', 'wb'); * $file2 = fopen('b', 'wb'); * $c->download(array( * array('url'=>'http: //localhost/', 'file'=>$file1), * array('url'=>'http: //localhost/20/', 'file'=>$file2) * )); * fclose($file1); * fclose($file2); * </code>
          Hide
          Dongsheng Cai added a comment -

          Thanks Tomasz, I fixed the comment

          Show
          Dongsheng Cai added a comment - Thanks Tomasz, I fixed the comment
          Hide
          Sam Hemelryk added a comment -

          Hi Dongsheng,

          Failing this at the moment, in general the changes look good however I think there certainly needs to be some sort of checking to ensure that if the code is passing both file and filepath it doesn't just overwrite file with a new file handle which would introcude problems.
          Probably check file to make sure it is a file handle or not set before overwriting it plus some debugging (if possible) to explain to the developer that they should use one or the other and not both.

          Cheers
          Sam

          Show
          Sam Hemelryk added a comment - Hi Dongsheng, Failing this at the moment, in general the changes look good however I think there certainly needs to be some sort of checking to ensure that if the code is passing both file and filepath it doesn't just overwrite file with a new file handle which would introcude problems. Probably check file to make sure it is a file handle or not set before overwriting it plus some debugging (if possible) to explain to the developer that they should use one or the other and not both. Cheers Sam
          Hide
          Jonathan Harker added a comment - - edited

          Why not use the download_file_content function which does this already? It has the added bonus of respecting any configured Moodle cache/proxy settings.

          Show
          Jonathan Harker added a comment - - edited Why not use the download_file_content function which does this already? It has the added bonus of respecting any configured Moodle cache/proxy settings.
          Hide
          Dan Poltawski added a comment -

          The curl class should do this too. In fact download_file_content should really be converted internally to use the curl class so everything is following the same code path.

          Show
          Dan Poltawski added a comment - The curl class should do this too. In fact download_file_content should really be converted internally to use the curl class so everything is following the same code path.
          Hide
          Jonathan Harker added a comment -

          How about we assert that if there is a specified file handle, then that takes precedence, and filepath is ignored? If it has been passed both 'filepath' and 'file' then the existing open file handle takes precedence over the filepath. If only 'file' is specified, then it continues as normal. If only 'filepath' is specified, it creates the file handle under the 'file' parameter, adds an 'auto-handle' flag, and closes any flagged file handles at the end. This should kill all the parrots with one rock.

          Show
          Jonathan Harker added a comment - How about we assert that if there is a specified file handle, then that takes precedence, and filepath is ignored? If it has been passed both 'filepath' and 'file' then the existing open file handle takes precedence over the filepath. If only 'file' is specified, then it continues as normal. If only 'filepath' is specified, it creates the file handle under the 'file' parameter, adds an 'auto-handle' flag, and closes any flagged file handles at the end. This should kill all the parrots with one rock.
          Hide
          Jonathan Harker added a comment -

          Thing is, I still think that since the original bug doesn't even need the multiple download functionality anyway, this is solving a different problem. The simplest solution to the bug reported here (file corruption in get_file) is to use download_file_content and point it at a path, e.g. attached patch.

          Show
          Jonathan Harker added a comment - Thing is, I still think that since the original bug doesn't even need the multiple download functionality anyway, this is solving a different problem. The simplest solution to the bug reported here (file corruption in get_file) is to use download_file_content and point it at a path, e.g. attached patch.
          Hide
          Jonathan Harker added a comment -

          Patch on dongsheng's code to not clobber incoming file handles.

          Show
          Jonathan Harker added a comment - Patch on dongsheng's code to not clobber incoming file handles.
          Hide
          Jonathan Harker added a comment -

          Let's review this master branch before I spend time doing rebases to 2.0 2.1 and 2.2

          Show
          Jonathan Harker added a comment - Let's review this master branch before I spend time doing rebases to 2.0 2.1 and 2.2
          Hide
          Aaron Barnes added a comment -

          Jonathan,

          Reviewed your master patch - looks like a good improvement. Only comments I can make are to remove the unnecessary unset() calls and fix the constructor syntax used in the doc block.

          Thanks and +1
          Aaron

          Show
          Aaron Barnes added a comment - Jonathan, Reviewed your master patch - looks like a good improvement. Only comments I can make are to remove the unnecessary unset() calls and fix the constructor syntax used in the doc block. Thanks and +1 Aaron
          Hide
          Aaron Barnes added a comment -

          See prev comment

          Show
          Aaron Barnes added a comment - See prev comment
          Hide
          Jonathan Harker added a comment -

          Amended - see above.

          Show
          Jonathan Harker added a comment - Amended - see above.
          Hide
          Sam Hemelryk added a comment -

          Thanks Jonathan + Aaron, this has been integrated now.
          I've merged master and cherry-picked back to 21, 22, and 23.

          Cheers
          Sam

          Show
          Sam Hemelryk added a comment - Thanks Jonathan + Aaron, this has been integrated now. I've merged master and cherry-picked back to 21, 22, and 23. Cheers Sam
          Hide
          Adrian Greeve added a comment -

          Tested with branches 2.0, 2.1 and master. I didn't encounter any file corruption with the different downloaded files. No other issues either.
          Test passed.

          Show
          Adrian Greeve added a comment - Tested with branches 2.0, 2.1 and master. I didn't encounter any file corruption with the different downloaded files. No other issues either. Test passed.
          Hide
          Sam Hemelryk added a comment -

          Congratulations your code is upstream - gold star for you!

          This issue + 79 others made it in in time for the minor releases.
          Thank you everyone involved for your exuberant efforts.

          Show
          Sam Hemelryk added a comment - Congratulations your code is upstream - gold star for you! This issue + 79 others made it in in time for the minor releases. Thank you everyone involved for your exuberant efforts.

            People

            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: