Moodle
  1. Moodle
  2. MDL-39688

X-Sendfile with byte-served content breaks when ETag header is added

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.3.7
    • Fix Version/s: 2.3.8, 2.4.5, 2.5.1
    • Component/s: Files API
    • Labels:
    • Testing Instructions:
      Hide

      You will need to rerun each test multiple times with:

      • debug_developer disabled
      • your browser cache enabled

      The first time that you run the test, it should seed the file to cache. On subsequent times, the file should be picked up from the cache.

      1/ test normal file sending - for example resource activity files, there should be Etag header
      2/ test byteserving - for example resource with large PDF in Adobe viewer, the Etag should not be in range requests
      3/ repeat the tests with and without xsendfile multiple times.

      Show
      You will need to rerun each test multiple times with: debug_developer disabled your browser cache enabled The first time that you run the test, it should seed the file to cache. On subsequent times, the file should be picked up from the cache. 1/ test normal file sending - for example resource activity files, there should be Etag header 2/ test byteserving - for example resource with large PDF in Adobe viewer, the Etag should not be in range requests 3/ repeat the tests with and without xsendfile multiple times.
    • Affected Branches:
      MOODLE_23_STABLE
    • Fixed Branches:
      MOODLE_23_STABLE, MOODLE_24_STABLE, MOODLE_25_STABLE
    • Pull from Repository:
    • Pull 2.4 Branch:
      w22_MDL-39688_m24_etag
    • Pull 2.5 Branch:
      w22_MDL-39688_m25_etag
    • Pull Master Branch:
      w22_MDL-39688_m26_etag
    • Rank:
      50415

      Description

      I've just been investigating an issue whereby MP4 content served using X-Accel-Redirect/X-Sendfile is being served using Byte Ranges, but the partial content is cached by the browser.

      It appears that the ETag header is still added, and so the browser is caching some of the content, but not all of it. On subsequent attempts to view the same content, the incorrect partial range is returned and it fails to play.

        Issue Links

          Activity

          Hide
          Andrew Nicols added a comment -

          Have a patch for this - will submit shortly.

          Show
          Andrew Nicols added a comment - Have a patch for this - will submit shortly.
          Hide
          Andrew Nicols added a comment -

          According to RFC 2616, Section 10.2.7:

          The response MUST include the following header fields:

          • Either a Content-Range header field (section 14.16) indicating
            the range included with this response, or a multipart/byteranges
            Content-Type including Content-Range fields for each part. If a
            Content-Length header field is present in the response, its
            value MUST match the actual number of OCTETs transmitted in the
            message-body.
          • Date
          • ETag and/or Content-Location, if the header would have been sent
            in a 200 response to the same request

          Therefore, although removing the ETag does correct the behaviour, it is not correct.

          Show
          Andrew Nicols added a comment - According to RFC 2616, Section 10.2.7: The response MUST include the following header fields: Either a Content-Range header field (section 14.16) indicating the range included with this response, or a multipart/byteranges Content-Type including Content-Range fields for each part. If a Content-Length header field is present in the response, its value MUST match the actual number of OCTETs transmitted in the message-body. Date ETag and/or Content-Location, if the header would have been sent in a 200 response to the same request Therefore, although removing the ETag does correct the behaviour, it is not correct.
          Hide
          Petr Škoda added a comment -

          Thanks for the report, I have added a patch that removes the Etag from range requests, hopefully that should resolve this. I was looking for more info in RFCs and google but did not find much.

          Show
          Petr Škoda added a comment - Thanks for the report, I have added a patch that removes the Etag from range requests, hopefully that should resolve this. I was looking for more info in RFCs and google but did not find much.
          Hide
          Matteo Scaramuccia added a comment -

          Hi All,
          interesting issue: reading at Petr's comments in the code have raised me the curiosity about this vertical matter. I'm using this comment to fix/share my thoughts.

          mod_xsendfile does "almost" the same as in this patch, https://github.com/nmaier/mod_xsendfile/blob/master/mod_xsendfile.c#L545: there, it optionally drops the entire file related ETag regardless HTTP_RANGE to avoid issues when the ETag is provided by the script (as here) and give Apache the responsibility to eventually set the correct ETag.

          http://en.wikipedia.org/wiki/HTTP_ETag#Strong_and_weak_validation tells that "Strong ETags permit the caching and reassembly of partial responses, as with byte-range requests" and RFC2616, http://tools.ietf.org/html/rfc2616#section-14.19, tells that "The ETag response-header field provides the current value of the entity tag for the requested variant." which sounds as a strong ETag must always be referred to the exact piece of the content being delivered to let the caching mechanisms work as expected (http://tools.ietf.org/html/rfc2616#section-13.5.3, http://tools.ietf.org/html/rfc2616#section-10.2.7).

          Moodle, with these new Petr's PRs, creates strong ETags, being content hashes but not for partial responses which would require content hashing for the piece of content being served i.e. slowing the response => .

          TNX Petr to give me food for thoughts and for another cool commit !
          Matteo

          Show
          Matteo Scaramuccia added a comment - Hi All, interesting issue: reading at Petr's comments in the code have raised me the curiosity about this vertical matter. I'm using this comment to fix/share my thoughts. mod_xsendfile does "almost" the same as in this patch, https://github.com/nmaier/mod_xsendfile/blob/master/mod_xsendfile.c#L545: there, it optionally drops the entire file related ETag regardless HTTP_RANGE to avoid issues when the ETag is provided by the script (as here) and give Apache the responsibility to eventually set the correct ETag . http://en.wikipedia.org/wiki/HTTP_ETag#Strong_and_weak_validation tells that "Strong ETags permit the caching and reassembly of partial responses, as with byte-range requests" and RFC2616 , http://tools.ietf.org/html/rfc2616#section-14.19 , tells that "The ETag response-header field provides the current value of the entity tag for the requested variant." which sounds as a strong ETag must always be referred to the exact piece of the content being delivered to let the caching mechanisms work as expected ( http://tools.ietf.org/html/rfc2616#section-13.5.3 , http://tools.ietf.org/html/rfc2616#section-10.2.7 ). Moodle, with these new Petr's PRs, creates strong ETags , being content hashes but not for partial responses which would require content hashing for the piece of content being served i.e. slowing the response => . TNX Petr to give me food for thoughts and for another cool commit ! Matteo
          Hide
          Petr Škoda added a comment - - edited

          Thanks Matteo for the link to xsendfile source code, I suppose my hack should resolve it for all xsendfile alternatives.

          I was considering some Etags for the partial content, but it seemed technically hard to get anything unique from the range. But now I just realised we might append the range itself to the Etag value, hmmmm...

          In any case the etags are removed for a limited number of resource types - byteserving of large pdfs and videos, I do not expect any major performance problems here, right?

          Show
          Petr Škoda added a comment - - edited Thanks Matteo for the link to xsendfile source code, I suppose my hack should resolve it for all xsendfile alternatives. I was considering some Etags for the partial content, but it seemed technically hard to get anything unique from the range. But now I just realised we might append the range itself to the Etag value, hmmmm... In any case the etags are removed for a limited number of resource types - byteserving of large pdfs and videos, I do not expect any major performance problems here, right?
          Hide
          Matteo Scaramuccia added a comment -

          Cool, Petr! <content hash> + <byte range> keeps on being a strong ETag with no impact in evaluating it before serving the file!
          On the performance side I guess that the hit here is on the bandwidth at both side, server and client: it could be nice if Andrew Nicols could test the performance when using the code w/ and w/o the (new) partial content ETag. As per now, byte serving is broken regarding to caching so, strictly talking, no hit at all in performances.

          Matteo

          Show
          Matteo Scaramuccia added a comment - Cool, Petr! <content hash> + <byte range> keeps on being a strong ETag with no impact in evaluating it before serving the file! On the performance side I guess that the hit here is on the bandwidth at both side, server and client: it could be nice if Andrew Nicols could test the performance when using the code w/ and w/o the (new) partial content ETag . As per now, byte serving is broken regarding to caching so, strictly talking, no hit at all in performances. Matteo
          Hide
          Andrew Nicols added a comment -

          Cheers all,

          I agree strong ETags are correct here - and I guess that they may also be required for our use case - e.g. if another user tries to view the same content but doesn't have permission, we should use a different ETag to force cache invalidation to ensure that the content is not available. I may of course be wrong here. These have always been strong ETags (well for quite some time).

          The nginx XSendfile system strips out the ETags by default, but for all content and not just byte-ranges. I decided to pass the ETag header through regardless .

          I must have misread the specification I guess - my understanding of the RFC was that you must specify the same ETag for all parts of the content, and that the browser should then fill in the gaps to make a complete file based on those ETags. Section 13.5.4 of http://www.rfc-editor.org/rfc/rfc2616.txt suggests that the cache validators must match for different parts of the same file. If those don't match then:

          If either requirement is not met, the cache MUST use only the most
          recent partial response (based on the Date values transmitted with
          every response, and using the incoming response if these values are
          equal or missing), and MUST discard the other partial information.

          I should be able to check the performance side, but not for a fortnight. What is it you'd like to test exactly? With Petr's suggested method of adding the byte-range to the ETag, there should be almost no performance overhead over the existing system. As you say Matteo, it's currently broken and the workaround is to disable byte serving entirely for Moodle.

          Andrew

          Show
          Andrew Nicols added a comment - Cheers all, I agree strong ETags are correct here - and I guess that they may also be required for our use case - e.g. if another user tries to view the same content but doesn't have permission, we should use a different ETag to force cache invalidation to ensure that the content is not available. I may of course be wrong here. These have always been strong ETags (well for quite some time). The nginx XSendfile system strips out the ETags by default, but for all content and not just byte-ranges. I decided to pass the ETag header through regardless . I must have misread the specification I guess - my understanding of the RFC was that you must specify the same ETag for all parts of the content, and that the browser should then fill in the gaps to make a complete file based on those ETags. Section 13.5.4 of http://www.rfc-editor.org/rfc/rfc2616.txt suggests that the cache validators must match for different parts of the same file. If those don't match then: If either requirement is not met, the cache MUST use only the most recent partial response (based on the Date values transmitted with every response, and using the incoming response if these values are equal or missing), and MUST discard the other partial information. I should be able to check the performance side, but not for a fortnight. What is it you'd like to test exactly? With Petr's suggested method of adding the byte-range to the ETag, there should be almost no performance overhead over the existing system. As you say Matteo, it's currently broken and the workaround is to disable byte serving entirely for Moodle. Andrew
          Hide
          Petr Škoda added a comment -

          See MDL-38743 for the "file privacy" issue when proxies used - that is imo completely separate topic.

          Our ETag is content based, your browser must already have the whole file, I do not think we need to include the user id in ETag.

          Show
          Petr Škoda added a comment - See MDL-38743 for the "file privacy" issue when proxies used - that is imo completely separate topic. Our ETag is content based, your browser must already have the whole file, I do not think we need to include the user id in ETag.
          Hide
          Petr Škoda added a comment -

          MDL-39832 contains a sample patch for the strong ETags for partial content, could you please help me with review & testing? I would like to get this to master once it is open for new features. Thanks.

          Show
          Petr Škoda added a comment - MDL-39832 contains a sample patch for the strong ETags for partial content, could you please help me with review & testing? I would like to get this to master once it is open for new features. Thanks.
          Hide
          Matteo Scaramuccia added a comment - - edited

          Hi Andrew,
          the idea of my testing proposal - and I'm sure you know what I'm talking about, I'm the newbie here for sure - is looking at the HTTP Responses of different accesses to the same MP4 file, searching for not serving that file for subsequent (partial) requests while the user is still happily playing the MP4 several new times after the first.
          Since Petr has created a new separate issue for adding the strong ETag for partial responses too, my "request" is now related to that new issue (MDL-39832).

          Regarding with your comment, yes, it seems we're reading RFC2616 in two different and opposite ways: mine depends on the idea that caching single partial responses requires ETag to be related to that piece of content ("requested variant") and not to the whole of it: not sure right now - I'll try to find more refs looking at different proxy caching implementations - but I guess that partial responses are single unrelated entries in the cache i.e. with no relation with the parent file i.e. caching is not able, in case of a plain request, to reconstruct the parent file using the different chunks already served via byte serving - e.g. like rsync is able to do - while it is able to correctly manage serving the chunk itself. Shortly, the cache will always have n+1 entries for a single file, if n is the number of partial requests for that file (here, read file as URL).

          EDIT: my comments about the way caching works with ETag and Ranges are totally wrong. Details in https://tracker.moodle.org/browse/MDL-39832?focusedCommentId=225363&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-225363 and next comments. My comments being wrong does not luckily affect the quality of the solution provided here.

          Show
          Matteo Scaramuccia added a comment - - edited Hi Andrew, the idea of my testing proposal - and I'm sure you know what I'm talking about, I'm the newbie here for sure - is looking at the HTTP Responses of different accesses to the same MP4 file, searching for not serving that file for subsequent (partial) requests while the user is still happily playing the MP4 several new times after the first. Since Petr has created a new separate issue for adding the strong ETag for partial responses too, my "request" is now related to that new issue ( MDL-39832 ). Regarding with your comment, yes, it seems we're reading RFC2616 in two different and opposite ways: mine depends on the idea that caching single partial responses requires ETag to be related to that piece of content ("requested variant") and not to the whole of it: not sure right now - I'll try to find more refs looking at different proxy caching implementations - but I guess that partial responses are single unrelated entries in the cache i.e. with no relation with the parent file i.e. caching is not able, in case of a plain request, to reconstruct the parent file using the different chunks already served via byte serving - e.g. like rsync is able to do - while it is able to correctly manage serving the chunk itself. Shortly, the cache will always have n+1 entries for a single file, if n is the number of partial requests for that file (here, read file as URL ). EDIT: my comments about the way caching works with ETag and Ranges are totally wrong. Details in https://tracker.moodle.org/browse/MDL-39832?focusedCommentId=225363&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-225363 and next comments. My comments being wrong does not luckily affect the quality of the solution provided here.
          Hide
          Damyon Wiese added a comment -

          The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week.

          TIA and ciao

          Show
          Damyon Wiese added a comment - The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week. TIA and ciao
          Hide
          Dan Poltawski added a comment -

          Integrated to master, 25, 24 and 23 - thanks guys. Seems safest to remove it like you've concluded.

          Show
          Dan Poltawski added a comment - Integrated to master, 25, 24 and 23 - thanks guys. Seems safest to remove it like you've concluded.
          Hide
          Sam Hemelryk added a comment -

          Thanks guys tested and confirmed the ETag was correctly removed for partial [206] requests.

          Testing was done using mod_xsendfile for Apache 2.4.

          Many thanks
          Sam

          Show
          Sam Hemelryk added a comment - Thanks guys tested and confirmed the ETag was correctly removed for partial [206] requests. Testing was done using mod_xsendfile for Apache 2.4. Many thanks Sam
          Hide
          Damyon Wiese added a comment -

          Thanks for your hard work. This issue has now been sent upstream and will soon be downloaded by millions of Moodlers!

          Regards, Damyon

          Show
          Damyon Wiese added a comment - Thanks for your hard work. This issue has now been sent upstream and will soon be downloaded by millions of Moodlers! Regards, Damyon

            People

            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: