Moodle
  1. Moodle
  2. MDL-33989

Improve Dropbox caching of file serving

    Details

    • Affected Branches:
      MOODLE_23_STABLE
    • Rank:
      42101

      Description

      As for now we cache Dropbox files for an arbitrary time. Time during which there is no check on Dropbox side to see if the file has been updated. This can cause us to serve an out-of-date file.

      Let's try to find a way to gather consistent data from Dropbox without having to download the whole file.

        Issue Links

          Activity

          Hide
          Frédéric Massart added a comment -

          It is quite easy to unvalidate the cache in get_file_by_reference(), by tweaking the reference to include the revision of the file. The only problem is that we have to probe Dropbox everytime to get the last revision number. I am now looking to do that using sync_external_file()

          Show
          Frédéric Massart added a comment - It is quite easy to unvalidate the cache in get_file_by_reference(), by tweaking the reference to include the revision of the file. The only problem is that we have to probe Dropbox everytime to get the last revision number. I am now looking to do that using sync_external_file()
          Hide
          Frédéric Massart added a comment -

          Adding Marina as a watcher (Don't know if there is another Files API, Repository guru).

          The following patch resolves the Dropbox problem in a dirty-not-to-use way but I'd like to use that as a start to find the best solution. Looking at sync_external_file(), which is not used apart from one local repository, I was wondering where we should store the information retrieved from the repositories. The files table should contain more fields imo. When we look at the Dropbox repository, we're able to get the revision number from the file, which is great to check if we have the file or not, instead of checking the modification date.

          Using the revision number in the reference would do the trick for Dropbox as it serializes more than one field, but what about other ones which only provide a URL? What would be the best approach to get external information such as revision/modification date in the get_file_by_reference if they're not included in the reference? When should we ideally run the sync_external_file() function?

          Thanks for your feedback.

          https://github.com/FMCorz/moodle/compare/MDL-33989-master

          (This patch fixes a bug in Oauth which crashes when executing a second request, it apparently also crashes when setting arguments to the request)

          Show
          Frédéric Massart added a comment - Adding Marina as a watcher (Don't know if there is another Files API, Repository guru). The following patch resolves the Dropbox problem in a dirty-not-to-use way but I'd like to use that as a start to find the best solution. Looking at sync_external_file(), which is not used apart from one local repository, I was wondering where we should store the information retrieved from the repositories. The files table should contain more fields imo. When we look at the Dropbox repository, we're able to get the revision number from the file, which is great to check if we have the file or not, instead of checking the modification date. Using the revision number in the reference would do the trick for Dropbox as it serializes more than one field, but what about other ones which only provide a URL? What would be the best approach to get external information such as revision/modification date in the get_file_by_reference if they're not included in the reference? When should we ideally run the sync_external_file() function? Thanks for your feedback. https://github.com/FMCorz/moodle/compare/MDL-33989-master (This patch fixes a bug in Oauth which crashes when executing a second request, it apparently also crashes when setting arguments to the request)
          Hide
          Marina Glancy added a comment -

          Fred,
          I like the way you are going. Please note that get_file_by_reference() does not need to retrieve file at all, we just need contenthash and filesize.

          I'm pretty sure we can use the dropbox 'hash' in our field 'contenthash'.

          We might even get rid of file caching for dropbox completely. If dropbox api allows us to retrieve just metadata, we don't need to download file to moodle at all. Just implement correctly get_file and send_file

          Show
          Marina Glancy added a comment - Fred, I like the way you are going. Please note that get_file_by_reference() does not need to retrieve file at all, we just need contenthash and filesize. I'm pretty sure we can use the dropbox 'hash' in our field 'contenthash'. We might even get rid of file caching for dropbox completely. If dropbox api allows us to retrieve just metadata, we don't need to download file to moodle at all. Just implement correctly get_file and send_file
          Hide
          Marina Glancy added a comment -

          by the way, since you created this as subissue of MDL-32474, maybe you can as well include more information about files in repository output?

          Show
          Marina Glancy added a comment - by the way, since you created this as subissue of MDL-32474 , maybe you can as well include more information about files in repository output?
          Hide
          Marina Glancy added a comment -

          This issue appeared to be much worse than it seemed. There is a big issue MDL-34290 that deals with many things including file caching.
          Thanks a lot to Fred for initial work!

          Show
          Marina Glancy added a comment - This issue appeared to be much worse than it seemed. There is a big issue MDL-34290 that deals with many things including file caching. Thanks a lot to Fred for initial work!

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: