Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-68456

Create cache localization helper



    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: Future Dev
    • Fix Version/s: None
    • Component/s: Caching
    • Labels:


      Following on from the telegram chat around localizable caches I think it would go a long way to not only document how to do them correctly but if there was a helper class that did most of the heavy lifting for you, and would greatly improve consistency across moodle.


      So here is a wish list for a best-in-class cache for high scaling / performance, even if we can't get them all it helps guide how such a help api should look:

      A) can be localized of course

      B) when localized is guaranteed to always be correct out of the box

      C) have cache write locking at generation time, not write time, to avoid cache stampedes

      D) simple api that is easy to understand, and hard to use wrong


      E) capability to always have a warm cache, or pre warming


      Because of C) we must have a single chunk of code responsible for generating the data for a cache. We already have that theoretically in the cache_data_source so all of this is either extensions to that interface, or a helper which works with that interface, and only with interface to force it to be used correctly. We probably want a localized_cache_data_source interface which builds on the other one.

      This is a rough brain dump and not 100% thought through, but something like:

      1) Code has a "global" key (eg a course id = 42), and nothing else, it asks the helper for the data. Global is just a 'normal' cache key, I'm just introducing the term to be distinct from a local key.

      2) The helper needs to know the localized key for the data (more on this below), so it delegates to the cache_data_source with a new method, localized_cache_data_source->get_local_key($key) to grab the latest local key for a key (ie course 42 it as version xyzabc1234)

      3) Then it calls the cache to see if data for the local key xyzabc1234 exists, if it does return and that's it were done.

      But if the data doesn't exist, then we need to generate it.

      5) First use the 'cache lock api' (which needs love) so that only 1 process is writing to the cache at any point

      6) Call the existing cache_data_source->load_for_cache($key) with the global key which generates the data

      7) Before we can save it to a localizable cache we need a new localized key, so we take the hash of the data (probably sha256) and now we have a key which is unique to the data being stored. We can use the whole hash or we can truncate it down to say 20 chars, we are not using this for strong crypto, just uniqueness.

      8) Now we can finally cache the data using the local key

      9) Now we also need to persist the local key itself so everyone is aware of the new version using localized_cache_data_source->set_local_key($key, $hash)

      10) There would be a base class you inherit from which you can out of the box get_local_key / set_local_key implementation that stores the keys in it's own, application scope, non localizable cache. So we are grabbing an id from the shared cache, and then the bulk of the IO is to the local cache.

      11) In many cases we can avoid the overhead of the tiny shared cache by storing the version hash in a place which we will already have access to. This could be get_config() for your plugin (which is also in shared MUC), but we probably have already called it, or it could be stored / persisted in any other way. plugin config is only a good place if you have a small number of global keys (eg the theme rev, js rev etc conceptually only ever have 1 global key). A good practice would be to override the setter so that it writes to both MUC and wherever else you want it to go, so that the default get always continues working.

      Then step 1) now looks like:  Our code has both a "global" key (eg a course id = 42), and a local key, and the helper can skip the lookup step 2.

      Last we can tick E of with pre warmed caches. This exposes a rebuild() method to force a load_for_cache() instead of merely of invalidating a cache. Because of the order of steps 8 & 9 above the new cache version is warm before the local key is updated so everyone will flip from a warm old version to a warm new version, completely side stepping cache stampede issues, and avoiding the bad glitches that happen when cache misses spike during load. I suspect the main reason this pattern has been avoided in the past is I think the lack of a single responsible code path, with often many bits of code purging an item, which should be moot with the new helper and interface. Also if you had debug = developer, and rebuild was called and it turned out that content + hash didn't change then it should emit a warning saying you must call this, "IF and ONLY if, the content has changed"

      There is also one more theoretical advantage to this new api, which is that if you have written a high performance cache using this API, but it is installed into a simple moodle which doesn't even have localized cache you can shortcut the local keys to always be the same as the original global key and effectively turn the whole system off which would save a bunch of cache space. This is just a pure space vs speed trade off, and it would apply to the specific cache store configured. The only hitch is that we currently only know if a cache store is localizable, not if it actually is local, which much be admin level config as there is no way to infer this. This adds more complexity to the cache admin gui so it's may not be worth it.


      See also






            Unassigned Unassigned
            brendanheywood Brendan Heywood
            Component watchers:
            Matteo Scaramuccia, Amaia Anabitarte, Carlos Escobedo, Ferran Recio, Ilya Tregubov, Sara Arjona (@sarjona)
            0 Vote for this issue
            4 Start watching this issue