Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.4
    • Fix Version/s: 2.4
    • Component/s: Caching
    • Labels:
    • Rank:
      44886

      Description

      The current 'file' cache store has been optimised for situations where:

      • The maximum number of things stored is quite low (32-bit CRC used as the key), and where performance is of the essence (we cache things like lang strings).

      There are other situations where we cache things that are more expensive to compute, and we may be going to cache many more of them (e.g. question definitions).

      For these, we probably want a cache store which offers a different trade-off:

      • Stronger, slower hash function
      • Store keys in subfolders, to limit the number of files in any one folder.

      This issue is to work on such a thing.

      Here is relevant link that gives chance of key collision for various size keys: http://en.wikipedia.org/wiki/Birthday_attack

        Activity

        Hide
        Tim Hunt added a comment -

        Sam, is this going in the right direction? I might need more of you help to make sha1 for hash keys a possibility.

        Show
        Tim Hunt added a comment - Sam, is this going in the right direction? I might need more of you help to make sha1 for hash keys a possibility.
        Hide
        Dan Poltawski added a comment -

        Sending all 'waiting for peer review' issues to integration before freeze, as agreed in Integrators Meeting 19/10/12. We are doing this to ensure any 'integratable issues' will not got missed before freeze..

        Show
        Dan Poltawski added a comment - Sending all 'waiting for peer review' issues to integration before freeze, as agreed in Integrators Meeting 19/10/12. We are doing this to ensure any 'integratable issues' will not got missed before freeze..
        Hide
        Tim Hunt added a comment -

        I think this needs more work, and I am not going to have time to do it for at least the next two weeks, so re-assigning to Sam H. Also this needs to be merged with the work on alternative hash algorithms that Sam is doing.

        Show
        Tim Hunt added a comment - I think this needs more work, and I am not going to have time to do it for at least the next two weeks, so re-assigning to Sam H. Also this needs to be merged with the work on alternative hash algorithms that Sam is doing.
        Hide
        Dan Poltawski added a comment -

        Taking this out of integration based on Tims comments about needing more work.

        Show
        Dan Poltawski added a comment - Taking this out of integration based on Tims comments about needing more work.
        Hide
        Sam Hemelryk added a comment -

        Ok I've been working on this issue today and have come up with a solution.
        I decided against the new cache store plugin for a couple of reasons:

        1. The default file store needs to be able to deal with all application caches that may be used. The idea of separating into sub directories to avoid file system restrictions is a good idea that I think should be implemented on the current file cache.
        2. I don't believe cache stores should be responsible for hashing the key, the current design of the system is that the store receives the key and data ready to use.

        The solution I've come up with does the following:

        • The file store now uses sub directories like the file API. Two levels of subdirectories each of the 2 letters.
        • The file store has a new option singledirectory. When enabled we store everything in a single directory rather than the sub directories. This is off by default and as you can't edit the default file store instance we can still be sure that the default instance will serve all purposes.
        • There is a new option for definitions simplekeys. It should be set to true by the definition when we know that the keys are going to be 'simple' such that they consist of only a-zA-Z0-9_. If simple keys are enabled then we don't hash the key. This is a performance win.
        • The hash algorithm has been changed from crc32 to sha1.

        Putting this straight up for integration.

        Many thanks
        Sam

        Show
        Sam Hemelryk added a comment - Ok I've been working on this issue today and have come up with a solution. I decided against the new cache store plugin for a couple of reasons: The default file store needs to be able to deal with all application caches that may be used. The idea of separating into sub directories to avoid file system restrictions is a good idea that I think should be implemented on the current file cache. I don't believe cache stores should be responsible for hashing the key, the current design of the system is that the store receives the key and data ready to use. The solution I've come up with does the following: The file store now uses sub directories like the file API. Two levels of subdirectories each of the 2 letters. The file store has a new option singledirectory. When enabled we store everything in a single directory rather than the sub directories. This is off by default and as you can't edit the default file store instance we can still be sure that the default instance will serve all purposes. There is a new option for definitions simplekeys. It should be set to true by the definition when we know that the keys are going to be 'simple' such that they consist of only a-zA-Z0-9_. If simple keys are enabled then we don't hash the key. This is a performance win. The hash algorithm has been changed from crc32 to sha1. Putting this straight up for integration. Many thanks Sam
        Hide
        Tim Hunt added a comment -

        Here are my peer review comments (even though you did not ask for them):

        1. I think the heuristics for the total number of files we might ever want to store in the filepool is several orders of magnitude greater than for any given cache. Therefore, I think that what I did, with only one level of folder nesting, is a better solution for the caching API. It will give fractionally better performance.
        2. For non-sha1 keys, (e.g. keys starting 1, 2, 3, ...) then logic like $subdir1 = substr($key, 0, 2); $subdir2 = substr($key, 2, 2); is going to go horribly wrong. (Unit tests for the method to make the key used from the raw key, and the method to make the path from the raw key, should have made this very obvious) Also, I don't know what the generate_single_key_prefix method does, but it might also screw this up by causing all files to always end up in the same bucket.
        3. https://github.com/samhemelryk/moodle/compare/master...wip-MDL-36120-m24#L1R522 bad copy and paste in this comment.
        4. Bad whitespace: https://github.com/samhemelryk/moodle/compare/master...wip-MDL-36120-m24#L7R135 (there are 4 lines around here with double space after the =.)
        5. double / here must be wrong: https://github.com/samhemelryk/moodle/compare/master...wip-MDL-36120-m24#L8R555
        Show
        Tim Hunt added a comment - Here are my peer review comments (even though you did not ask for them): I think the heuristics for the total number of files we might ever want to store in the filepool is several orders of magnitude greater than for any given cache. Therefore, I think that what I did, with only one level of folder nesting, is a better solution for the caching API. It will give fractionally better performance. For non-sha1 keys, (e.g. keys starting 1, 2, 3, ...) then logic like $subdir1 = substr($key, 0, 2); $subdir2 = substr($key, 2, 2); is going to go horribly wrong. (Unit tests for the method to make the key used from the raw key, and the method to make the path from the raw key, should have made this very obvious) Also, I don't know what the generate_single_key_prefix method does, but it might also screw this up by causing all files to always end up in the same bucket. https://github.com/samhemelryk/moodle/compare/master...wip-MDL-36120-m24#L1R522 bad copy and paste in this comment. Bad whitespace: https://github.com/samhemelryk/moodle/compare/master...wip-MDL-36120-m24#L7R135 (there are 4 lines around here with double space after the =.) double / here must be wrong: https://github.com/samhemelryk/moodle/compare/master...wip-MDL-36120-m24#L8R555
        Hide
        Sam Hemelryk added a comment -

        Haha thanks Tim, was Friday afternoon and I had ruled out finding anyone to look at it.
        Good spotting on all points, I'll make those changes and have them up today (Sunday).

        Many thanks
        Sam

        Show
        Sam Hemelryk added a comment - Haha thanks Tim, was Friday afternoon and I had ruled out finding anyone to look at it. Good spotting on all points, I'll make those changes and have them up today (Sunday). Many thanks Sam
        Hide
        Sam Hemelryk added a comment -

        Alright all tidied up now, in response to each point just to be clear:

        1. I've reverted to a single level of nesting. I had upped it to two so that it functions similarly to the current File API storage assuming someone had done the thinking about the likelihood of hitting file system limitations there. No real reason other than that and no probs lessening the nesting so all done.
        2. Good thinking. The unit tests were passing because that generate_single_key_prefix appends hashed information about the definition onto the key. This happens as definitions can have identifiers to separate data set when the cahce is requested. However you are correct in noting this wasn't working in that they would always end up in the same basket. I've now addressed that so that we end up with the final key starting with the given simple key.
        3. Cleaned up.
        4. Cleaned up.
        5. Cleaned up.

        Many thanks
        Sam

        Show
        Sam Hemelryk added a comment - Alright all tidied up now, in response to each point just to be clear: I've reverted to a single level of nesting. I had upped it to two so that it functions similarly to the current File API storage assuming someone had done the thinking about the likelihood of hitting file system limitations there. No real reason other than that and no probs lessening the nesting so all done. Good thinking. The unit tests were passing because that generate_single_key_prefix appends hashed information about the definition onto the key. This happens as definitions can have identifiers to separate data set when the cahce is requested. However you are correct in noting this wasn't working in that they would always end up in the same basket. I've now addressed that so that we end up with the final key starting with the given simple key. Cleaned up. Cleaned up. Cleaned up. Many thanks Sam
        Hide
        Dan Poltawski added a comment -

        Hi Sam,

        This is conflicting with MDL-36362, any chance you could resolve that?

        Show
        Dan Poltawski added a comment - Hi Sam, This is conflicting with MDL-36362 , any chance you could resolve that?
        Hide
        Matteo Scaramuccia added a comment -

        MDL-36360 passed the testing stage => see also http://tracker.moodle.org/browse/MDL-36360?focusedCommentId=186365&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-186365

        Shortly, due to https://github.com/samhemelryk/moodle/compare/master...wip-MDL-36120-m24#L5R56 there is the need of 'rebasing' the current pull by adding the new singledirectory param:

        diff --git a/cache/stores/file/lib.php b/cache/stores/file/lib.php
        index 03d2b32..73757c9 100644
        --- a/cache/stores/file/lib.php
        +++ b/cache/stores/file/lib.php
        @@ -471,6 +471,9 @@ class cachestore_file implements cache_store, cache_is_key_aware {
                 if (isset($data->autocreate)) {
                     $config['autocreate'] = $data->autocreate;
                 }
        +        if (isset($data->singledirectory)) {
        +            $config['singledirectory'] = $data->singledirectory;
        +        }
                 if (isset($data->prescan)) {
                     $config['prescan'] = $data->prescan;
                 }
        
        Show
        Matteo Scaramuccia added a comment - MDL-36360 passed the testing stage => see also http://tracker.moodle.org/browse/MDL-36360?focusedCommentId=186365&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-186365 Shortly, due to https://github.com/samhemelryk/moodle/compare/master...wip-MDL-36120-m24#L5R56 there is the need of 'rebasing' the current pull by adding the new singledirectory param: diff --git a/cache/stores/file/lib.php b/cache/stores/file/lib.php index 03d2b32..73757c9 100644 --- a/cache/stores/file/lib.php +++ b/cache/stores/file/lib.php @@ -471,6 +471,9 @@ class cachestore_file implements cache_store, cache_is_key_aware { if (isset($data->autocreate)) { $config['autocreate'] = $data->autocreate; } + if (isset($data->singledirectory)) { + $config['singledirectory'] = $data->singledirectory; + } if (isset($data->prescan)) { $config['prescan'] = $data->prescan; }
        Hide
        Sam Hemelryk added a comment -

        Hi Dan & Matteo,

        I've pushed up a branch now that is based on integration master. It both fixes the conflicts and adds the singledirectory check Matteo mentioned.
        All ready to go once more

        Cheers
        Sam

        Show
        Sam Hemelryk added a comment - Hi Dan & Matteo, I've pushed up a branch now that is based on integration master. It both fixes the conflicts and adds the singledirectory check Matteo mentioned. All ready to go once more Cheers Sam
        Hide
        Dan Poltawski added a comment -

        Integrated, thanks Sam.

        Show
        Dan Poltawski added a comment - Integrated, thanks Sam.
        Hide
        Eloy Lafuente (stronk7) added a comment -

        Amazed. Inspired. Grateful. That’s how your generosity makes me feel.

        (not really)

        Closing, thanks!

        Show
        Eloy Lafuente (stronk7) added a comment - Amazed. Inspired. Grateful. That’s how your generosity makes me feel. (not really) Closing, thanks!

          People

          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: