Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-27125

Bug in repository->get_file causing data corruption for downloaded files

    Details

    • Testing Instructions:
      Hide

      Regression testing:

      1. Enable the 'URL downloader' repository
      2. Download a file to my private files
      3. Check the file in private files is not corrupt.
      Show
      Regression testing: Enable the 'URL downloader' repository Download a file to my private files Check the file in private files is not corrupt.
    • Difficulty:
      Easy
    • Affected Branches:
      MOODLE_21_STABLE, MOODLE_22_STABLE
    • Fixed Branches:
      MOODLE_21_STABLE, MOODLE_22_STABLE, MOODLE_23_STABLE
    • Pull from Repository:
      git@github.com:dongsheng/moodle.git
    • Pull Master Branch:
      MDL-27125_master_2

      Description

      Here is the code for get_file() method for repository class in repository/lib.php:

         public function get_file($url, $filename = '') {
              global $CFG;
              $path = $this->prepare_file($filename);
              $fp = fopen($path, 'w');
              $c = new curl;
              $c->download(array(array('url'=>$url, 'file'=>$fp)));
              return array('path'=>$path, 'url'=>$url);
          }

      Notice that the file is opened but not closed in this function. This will cause that within the request, after calling this function not whole file may be flushed to the disk. We have tested it on linux and were getting different results, but very often a partial file would be copied from temp/downloads area into the filestorage. The fix is very simple - adding fclose($fp) fixes the issue:

          public function get_file($url, $filename = '') {
              global $CFG;
              $path = $this->prepare_file($filename);
              $fp = fopen($path, 'w');
              $c = new curl;
              $c->download(array(array('url'=>$url, 'file'=>$fp)));
              fclose($fp);
              return array('path'=>$path, 'url'=>$url);
          }

        Gliffy Diagrams

        1. MDL-27125_jh1.diff
          0.6 kB
          Jonathan Harker
        2. MDL-27125-patch-1.diff
          1 kB
          Jonathan Harker

          Issue Links

            Activity

            tmuras Tomasz Muras created issue -
            tmuras Tomasz Muras made changes -
            Field Original Value New Value
            Description Here is the code for get_file() method for repository class in repository/lib.php:
            {{
                public function get_file($url, $filename = '') {
                    global $CFG;
                    $path = $this->prepare_file($filename);
                    $fp = fopen($path, 'w');
                    $c = new curl;
                    $c->download(array(array('url'=>$url, 'file'=>$fp)));
                    return array('path'=>$path, 'url'=>$url);
                }
            }}

            Notice that the file is opened but not closed in this function. This will cause that within the request, after calling this function not whole file may be flushed to the disk. We have tested it on linux and were getting different results, but very often a partial file would be copied from temp/downloads area into the filestorage. The fix is very simple - adding fclose($fp) fixes the issue:
            {{
                public function get_file($url, $filename = '') {
                    global $CFG;
                    $path = $this->prepare_file($filename);
                    $fp = fopen($path, 'w');
                    $c = new curl;
                    $c->download(array(array('url'=>$url, 'file'=>$fp)));
                    fclose($fp);
                    return array('path'=>$path, 'url'=>$url);
                }
            }}
            Here is the code for get_file() method for repository class in repository/lib.php:
            {code}
               public function get_file($url, $filename = '') {
                    global $CFG;
                    $path = $this->prepare_file($filename);
                    $fp = fopen($path, 'w');
                    $c = new curl;
                    $c->download(array(array('url'=>$url, 'file'=>$fp)));
                    return array('path'=>$path, 'url'=>$url);
                }
            {code}

            Notice that the file is opened but not closed in this function. This will cause that within the request, after calling this function not whole file may be flushed to the disk. We have tested it on linux and were getting different results, but very often a partial file would be copied from temp/downloads area into the filestorage. The fix is very simple - adding fclose($fp) fixes the issue:
            {code}
                public function get_file($url, $filename = '') {
                    global $CFG;
                    $path = $this->prepare_file($filename);
                    $fp = fopen($path, 'w');
                    $c = new curl;
                    $c->download(array(array('url'=>$url, 'file'=>$fp)));
                    fclose($fp);
                    return array('path'=>$path, 'url'=>$url);
                }
            {code}
            Hide
            tmuras Tomasz Muras added a comment -

            Patch is available for pull in the repository:
            git://github.com/enovation/moodle.git

            branch:
            MDL-27125

            Show
            tmuras Tomasz Muras added a comment - Patch is available for pull in the repository: git://github.com/enovation/moodle.git branch: MDL-27125
            Hide
            dongsheng Dongsheng Cai added a comment -

            Thanks Tomasz

            Adding this issue to stable backlog.

            Show
            dongsheng Dongsheng Cai added a comment - Thanks Tomasz Adding this issue to stable backlog.
            dongsheng Dongsheng Cai made changes -
            Labels triaged
            Fix Version/s STABLE backlog [ 10463 ]
            Priority Major [ 3 ] Blocker [ 1 ]
            Hide
            skodak Petr Skoda added a comment -

            Hello,
            hmmm, there seem to be more similar problems in other places where this curl class is user (btw the name is too short for moodle specific stuff, I can image any other project using this class name). It might be better to fix it inside the download method and friends, but then it might break BC...

            Petr

            Show
            skodak Petr Skoda added a comment - Hello, hmmm, there seem to be more similar problems in other places where this curl class is user (btw the name is too short for moodle specific stuff, I can image any other project using this class name). It might be better to fix it inside the download method and friends, but then it might break BC... Petr
            Hide
            tmuras Tomasz Muras added a comment - - edited

            I agree, download() function looks like much better place to fix it. It seems more natural (to me at least) to pass file path to a function like this instead of the file descriptor.
            If we are afraid about BC, what about adding support for new array key 'filepath' - the function could be then called like this:

            download(array(array('url'=>$url, 'filepath'=>$path)));

            'file' key should be deprecated and removed (in 2.2?)

            Or maybe let's be bold and rename the class to more unique name and change the function at the same time?

            Tomasz Muras

            Show
            tmuras Tomasz Muras added a comment - - edited I agree, download() function looks like much better place to fix it. It seems more natural (to me at least) to pass file path to a function like this instead of the file descriptor. If we are afraid about BC, what about adding support for new array key 'filepath' - the function could be then called like this: download(array(array('url'=>$url, 'filepath'=>$path))); 'file' key should be deprecated and removed (in 2.2?) Or maybe let's be bold and rename the class to more unique name and change the function at the same time? Tomasz Muras
            Hide
            skodak Petr Skoda added a comment -

            Yes, I would prefer to pass file path there too. The handler passing is good if you do not want to disclose the file location such as in file_storage class.

            Show
            skodak Petr Skoda added a comment - Yes, I would prefer to pass file path there too. The handler passing is good if you do not want to disclose the file location such as in file_storage class.
            Hide
            tmuras Tomasz Muras added a comment -

            Dongsheng/Petrt - do we agree on adding the 'filepath' key and changing download() as per my comment:
            http://tracker.moodle.org/browse/MDL-27125?focusedCommentId=108784&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-108784

            If so, then let me know if you're going to work on it - or do you want me tocreate a patch that does this, and updates the whole core code, where download() is being used.

            Show
            tmuras Tomasz Muras added a comment - Dongsheng/Petrt - do we agree on adding the 'filepath' key and changing download() as per my comment: http://tracker.moodle.org/browse/MDL-27125?focusedCommentId=108784&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-108784 If so, then let me know if you're going to work on it - or do you want me tocreate a patch that does this, and updates the whole core code, where download() is being used.
            dougiamas Martin Dougiamas made changes -
            Workflow MDL Workflow [ 69162 ] MDL Full Workflow [ 76316 ]
            Hide
            dongsheng Dongsheng Cai added a comment -

            Hello Tomasz,

            Thanks for all comments, I started work on this, will add filepath.

            Regards,
            Dongsheng Cai

            Show
            dongsheng Dongsheng Cai added a comment - Hello Tomasz, Thanks for all comments, I started work on this, will add filepath. Regards, Dongsheng Cai
            dongsheng Dongsheng Cai made changes -
            Labels triaged dongsheng-patch triaged
            salvetore Michael de Raadt made changes -
            Fix Version/s DEV Sprint 2.1 [ 10650 ]
            Fix Version/s STABLE backlog [ 10463 ]
            Affects Version/s 2.0 [ 10122 ]
            dongsheng Dongsheng Cai made changes -
            Status Open [ 1 ] Development in progress [ 3 ]
            Hide
            dongsheng Dongsheng Cai added a comment -

            Tomasz, could you please review my patch, see if it works for you?

            Thanks

            Show
            dongsheng Dongsheng Cai added a comment - Tomasz, could you please review my patch, see if it works for you? Thanks
            dongsheng Dongsheng Cai made changes -
            Status Development in progress [ 3 ] Waiting for peer review [ 10012 ]
            Pull Master Diff URL https://github.com/dongsheng/moodle/compare/master...s11_MDL-27125_curl_file_handler_master
            Pull Master Branch s11_MDL-27125_curl_file_handler_master
            Pull 2.0 Diff URL https://github.com/dongsheng/moodle/compare/MOODLE_20_STABLE...s11_MDL-27125_curl_file_handler_20
            Pull 2.0 Branch s11_MDL-27125_curl_file_handler_20
            Pull from Repository git@github.com:dongsheng/moodle.git
            Peer reviewer moodle.com
            Testing Instructions It's not easy to test it from moodle interface, it needs a developer review.
            Hide
            tmuras Tomasz Muras added a comment -

            I've used the CLI code as shown below to test it.
            It works OK but I'm not sure if we should be closing file descriptors passed into download() function. This is really just a side-effect and it may not be something that a user (well developer) would expect.

            <?php
            define('CLI_SCRIPT', true);
             
            require(dirname(dirname(dirname(__FILE__))).'/config.php');
            error_reporting(E_ALL);
            require_once($CFG->libdir.'/clilib.php');      // cli only functions
            require_once($CFG->libdir.'/filelib.php');      // cli only functions
             
            $c = new curl;
            $pid = getmypid();
             
            echo `ls -l /proc/$pid/fd`;
            $testh = fopen('/tmp/fileX', 'wb');
            fwrite($testh,'test');
             
            echo `ls -l /proc/$pid/fd`;
             
            $c->download(array(
              array('url'=>'http://localhost/20/', 'filepath'=>'/tmp/file2.tmp'),
              array('url'=>'http://localhost/20/','file'=>fopen('/tmp/file3', 'wb')),
            ));
            echo `ls -l /proc/$pid/fd`;

            Show
            tmuras Tomasz Muras added a comment - I've used the CLI code as shown below to test it. It works OK but I'm not sure if we should be closing file descriptors passed into download() function. This is really just a side-effect and it may not be something that a user (well developer) would expect. <?php define('CLI_SCRIPT', true);   require(dirname(dirname(dirname(__FILE__))).'/config.php'); error_reporting(E_ALL); require_once($CFG->libdir.'/clilib.php'); // cli only functions require_once($CFG->libdir.'/filelib.php'); // cli only functions   $c = new curl; $pid = getmypid();   echo `ls -l /proc/$pid/fd`; $testh = fopen('/tmp/fileX', 'wb'); fwrite($testh,'test');   echo `ls -l /proc/$pid/fd`;   $c->download(array( array('url'=>'http://localhost/20/', 'filepath'=>'/tmp/file2.tmp'), array('url'=>'http://localhost/20/','file'=>fopen('/tmp/file3', 'wb')), )); echo `ls -l /proc/$pid/fd`;
            Hide
            tmuras Tomasz Muras added a comment -

            I think we should not be introducing a side-effect of closing the file handle that was passed to the function.

            Show
            tmuras Tomasz Muras added a comment - I think we should not be introducing a side-effect of closing the file handle that was passed to the function.
            tmuras Tomasz Muras made changes -
            Original Estimate 0 minutes [ 0 ]
            Remaining Estimate 0 minutes [ 0 ]
            Status Waiting for peer review [ 10012 ] Development in progress [ 3 ]
            moodle.com moodle.com made changes -
            Fix Version/s STABLE Sprint 11 [ 10751 ]
            Fix Version/s DEV Sprint 2.1 [ 10650 ]
            dongsheng Dongsheng Cai made changes -
            dongsheng Dongsheng Cai made changes -
            Status Development in progress [ 3 ] Waiting for peer review [ 10012 ]
            Hide
            dongsheng Dongsheng Cai added a comment -

            Hi, Tomasz

            I fixed the code to close the file handler outside of function.

            Show
            dongsheng Dongsheng Cai added a comment - Hi, Tomasz I fixed the code to close the file handler outside of function.
            Hide
            tmuras Tomasz Muras added a comment - - edited

            Hi Dongsheng,

            It looks OK now - I only looked at s11_MDL-27125_curl_file_handler_master, as I imagine that's the only one that will be integrated.
            I would just add one improvement in the comment, to prevent ppl from doing the same mistake of not closing the files, instead of:

                 * <code>
                 * $c = new curl;
                 * $c->download(array(
                 * array('url'=>'http://localhost/', 'file'=>fopen('a', 'wb')),
                 * array('url'=>'http://localhost/20/', 'file'=>fopen('b', 'wb'))
                 * ));
                 * </code>

            Let's put:

                 * <code>
                 * $c = new curl;
                 * $file1 = fopen('a', 'wb');
                 * $file2 = fopen('b', 'wb');
                 * $c->download(array(
                 * array('url'=>'http://localhost/', 'file'=>$file1),
                 * array('url'=>'http://localhost/20/', 'file'=>$file2)
                 * ));
                 * fclose($file1);
                 * fclose($file2);
                 * </code>

            Show
            tmuras Tomasz Muras added a comment - - edited Hi Dongsheng, It looks OK now - I only looked at s11_ MDL-27125 _curl_file_handler_master, as I imagine that's the only one that will be integrated. I would just add one improvement in the comment, to prevent ppl from doing the same mistake of not closing the files, instead of: * <code> * $c = new curl; * $c->download(array( * array('url'=>'http://localhost/', 'file'=>fopen('a', 'wb')), * array('url'=>'http://localhost/20/', 'file'=>fopen('b', 'wb')) * )); * </code> Let's put: * <code> * $c = new curl; * $file1 = fopen('a', 'wb'); * $file2 = fopen('b', 'wb'); * $c->download(array( * array('url'=>'http://localhost/', 'file'=>$file1), * array('url'=>'http://localhost/20/', 'file'=>$file2) * )); * fclose($file1); * fclose($file2); * </code>
            tmuras Tomasz Muras made changes -
            Status Waiting for peer review [ 10012 ] Development in progress [ 3 ]
            Hide
            dongsheng Dongsheng Cai added a comment -

            Thanks Tomasz, I fixed the comment

            Show
            dongsheng Dongsheng Cai added a comment - Thanks Tomasz, I fixed the comment
            dongsheng Dongsheng Cai made changes -
            Status Development in progress [ 3 ] Waiting for integration review [ 10010 ]
            dongsheng Dongsheng Cai made changes -
            Testing Instructions It's not easy to test it from moodle interface, it needs a developer review. It's not easy to test it from moodle interface, it needs a developer review. Please see Tomasz's comment:
            http://tracker.moodle.org/browse/MDL-27125?focusedCommentId=111873&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-111873
            skodak Petr Skoda made changes -
            Currently in integration Yes
            samhemelryk Sam Hemelryk made changes -
            Status Waiting for integration review [ 10010 ] Integration review in progress [ 10004 ]
            Integrator samhemelryk
            Hide
            samhemelryk Sam Hemelryk added a comment -

            Hi Dongsheng,

            Failing this at the moment, in general the changes look good however I think there certainly needs to be some sort of checking to ensure that if the code is passing both file and filepath it doesn't just overwrite file with a new file handle which would introcude problems.
            Probably check file to make sure it is a file handle or not set before overwriting it plus some debugging (if possible) to explain to the developer that they should use one or the other and not both.

            Cheers
            Sam

            Show
            samhemelryk Sam Hemelryk added a comment - Hi Dongsheng, Failing this at the moment, in general the changes look good however I think there certainly needs to be some sort of checking to ensure that if the code is passing both file and filepath it doesn't just overwrite file with a new file handle which would introcude problems. Probably check file to make sure it is a file handle or not set before overwriting it plus some debugging (if possible) to explain to the developer that they should use one or the other and not both. Cheers Sam
            samhemelryk Sam Hemelryk made changes -
            Status Integration review in progress [ 10004 ] Reopened [ 4 ]
            skodak Petr Skoda made changes -
            Currently in integration Yes
            tmuras Tomasz Muras made changes -
            Affects Version/s 2.2 [ 10656 ]
            Hide
            jonathan Jonathan Harker added a comment - - edited

            Why not use the download_file_content function which does this already? It has the added bonus of respecting any configured Moodle cache/proxy settings.

            Show
            jonathan Jonathan Harker added a comment - - edited Why not use the download_file_content function which does this already? It has the added bonus of respecting any configured Moodle cache/proxy settings.
            jonathan Jonathan Harker made changes -
            Attachment MDL-27125_jh1.diff [ 27566 ]
            salvetore Michael de Raadt made changes -
            Link This issue has been marked as being related by MDL-31925 [ MDL-31925 ]
            Hide
            poltawski Dan Poltawski added a comment -

            The curl class should do this too. In fact download_file_content should really be converted internally to use the curl class so everything is following the same code path.

            Show
            poltawski Dan Poltawski added a comment - The curl class should do this too. In fact download_file_content should really be converted internally to use the curl class so everything is following the same code path.
            Hide
            jonathan Jonathan Harker added a comment -

            How about we assert that if there is a specified file handle, then that takes precedence, and filepath is ignored? If it has been passed both 'filepath' and 'file' then the existing open file handle takes precedence over the filepath. If only 'file' is specified, then it continues as normal. If only 'filepath' is specified, it creates the file handle under the 'file' parameter, adds an 'auto-handle' flag, and closes any flagged file handles at the end. This should kill all the parrots with one rock.

            Show
            jonathan Jonathan Harker added a comment - How about we assert that if there is a specified file handle, then that takes precedence, and filepath is ignored? If it has been passed both 'filepath' and 'file' then the existing open file handle takes precedence over the filepath. If only 'file' is specified, then it continues as normal. If only 'filepath' is specified, it creates the file handle under the 'file' parameter, adds an 'auto-handle' flag, and closes any flagged file handles at the end. This should kill all the parrots with one rock.
            Hide
            jonathan Jonathan Harker added a comment -

            Thing is, I still think that since the original bug doesn't even need the multiple download functionality anyway, this is solving a different problem. The simplest solution to the bug reported here (file corruption in get_file) is to use download_file_content and point it at a path, e.g. attached patch.

            Show
            jonathan Jonathan Harker added a comment - Thing is, I still think that since the original bug doesn't even need the multiple download functionality anyway, this is solving a different problem. The simplest solution to the bug reported here (file corruption in get_file) is to use download_file_content and point it at a path, e.g. attached patch.
            Hide
            jonathan Jonathan Harker added a comment -

            Patch on dongsheng's code to not clobber incoming file handles.

            Show
            jonathan Jonathan Harker added a comment - Patch on dongsheng's code to not clobber incoming file handles.
            jonathan Jonathan Harker made changes -
            Attachment MDL-27125-patch-1.diff [ 28649 ]
            Hide
            jonathan Jonathan Harker added a comment -

            Let's review this master branch before I spend time doing rebases to 2.0 2.1 and 2.2

            Show
            jonathan Jonathan Harker added a comment - Let's review this master branch before I spend time doing rebases to 2.0 2.1 and 2.2
            jonathan Jonathan Harker made changes -
            Status Reopened [ 4 ] Waiting for peer review [ 10012 ]
            Hide
            sry_not4sale Aaron Barnes added a comment -

            Jonathan,

            Reviewed your master patch - looks like a good improvement. Only comments I can make are to remove the unnecessary unset() calls and fix the constructor syntax used in the doc block.

            Thanks and +1
            Aaron

            Show
            sry_not4sale Aaron Barnes added a comment - Jonathan, Reviewed your master patch - looks like a good improvement. Only comments I can make are to remove the unnecessary unset() calls and fix the constructor syntax used in the doc block. Thanks and +1 Aaron
            sry_not4sale Aaron Barnes made changes -
            Status Waiting for peer review [ 10012 ] Peer review in progress [ 10013 ]
            Hide
            sry_not4sale Aaron Barnes added a comment -

            See prev comment

            Show
            sry_not4sale Aaron Barnes added a comment - See prev comment
            sry_not4sale Aaron Barnes made changes -
            Status Peer review in progress [ 10013 ] Development in progress [ 3 ]
            Hide
            jonathan Jonathan Harker added a comment -

            Amended - see above.

            Show
            jonathan Jonathan Harker added a comment - Amended - see above.
            jonathan Jonathan Harker made changes -
            Status Development in progress [ 3 ] Waiting for integration review [ 10010 ]
            poltawski Dan Poltawski made changes -
            Currently in integration Yes [ 10041 ]
            samhemelryk Sam Hemelryk made changes -
            Status Waiting for integration review [ 10010 ] Integration review in progress [ 10004 ]
            Hide
            samhemelryk Sam Hemelryk added a comment -

            Thanks Jonathan + Aaron, this has been integrated now.
            I've merged master and cherry-picked back to 21, 22, and 23.

            Cheers
            Sam

            Show
            samhemelryk Sam Hemelryk added a comment - Thanks Jonathan + Aaron, this has been integrated now. I've merged master and cherry-picked back to 21, 22, and 23. Cheers Sam
            samhemelryk Sam Hemelryk made changes -
            Status Integration review in progress [ 10004 ] Waiting for testing [ 10005 ]
            Fix Version/s 2.1.7 [ 12161 ]
            Fix Version/s 2.2.4 [ 12162 ]
            Fix Version/s 2.3.1 [ 12253 ]
            abgreeve Adrian Greeve made changes -
            Status Waiting for testing [ 10005 ] Testing in progress [ 10011 ]
            Tester abgreeve
            poltawski Dan Poltawski made changes -
            Testing Instructions It's not easy to test it from moodle interface, it needs a developer review. Please see Tomasz's comment:
            http://tracker.moodle.org/browse/MDL-27125?focusedCommentId=111873&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-111873
            Regression testing:

            # Enable the 'URL downloader' repository
            # Download a file to my private files
            # Check the file in private files is not corrupt.
            Tester abgreeve
            Hide
            abgreeve Adrian Greeve added a comment -

            Tested with branches 2.0, 2.1 and master. I didn't encounter any file corruption with the different downloaded files. No other issues either.
            Test passed.

            Show
            abgreeve Adrian Greeve added a comment - Tested with branches 2.0, 2.1 and master. I didn't encounter any file corruption with the different downloaded files. No other issues either. Test passed.
            abgreeve Adrian Greeve made changes -
            Status Testing in progress [ 10011 ] Tested [ 10006 ]
            Hide
            samhemelryk Sam Hemelryk added a comment -

            Congratulations your code is upstream - gold star for you!

            This issue + 79 others made it in in time for the minor releases.
            Thank you everyone involved for your exuberant efforts.

            Show
            samhemelryk Sam Hemelryk added a comment - Congratulations your code is upstream - gold star for you! This issue + 79 others made it in in time for the minor releases. Thank you everyone involved for your exuberant efforts.
            samhemelryk Sam Hemelryk made changes -
            Status Tested [ 10006 ] Closed [ 6 ]
            Resolution Fixed [ 1 ]
            Currently in integration Yes [ 10041 ]
            Integration date 06/Jul/12
            stronk7 Eloy Lafuente (stronk7) made changes -
            Fix Version/s STABLE Sprint 11 [ 10751 ]
            Subversion JIRA

            Links Hierarchy

             Documentation

            Invalid license: EXPIRED

              People

              • Votes:
                1 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Fix Release Date:
                  9/Jul/12