Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-71531

Moodle file cache - excessive locking around cache reads

XMLWordPrintable

    • MOODLE_310_STABLE, MOODLE_311_STABLE, MOODLE_400_STABLE
    • MOODLE_311_STABLE, MOODLE_400_STABLE
    • MDL-71531-master
    • Hide

      I wish I could provide manual testing instructions here, but this really is a smoke test scenario and I think that we can trust Behat to this.

      As I mentioned, this has been in used by MoodleCloud for several years.

      Show
      I wish I could provide manual testing instructions here, but this really is a smoke test scenario and I think that we can trust Behat to this. As I mentioned, this has been in used by MoodleCloud for several years.

      TL;DR:

      In the source code  of moodle/cache/stores/file/lib.php lines 357-374:

              // Open ensuring the file for reading in binary format.
              if (!$handle = fopen($file, 'rb')) {
                  return false;
              }
              // Lock it up!
              // We don't care if this succeeds or not, on some systems it will, on some it won't, meah either way.
              flock($handle, LOCK_SH);
              $data = '';
              // Read the data in 1Mb chunks. Small caches will not loop more than once.  We don't use filesize as it may
              // be cached with a different value than what we need to read from the file.
              do {
                  $data .= fread($handle, 1048576);
              } while (!feof($handle));
       
              // Unlock it.
              flock($handle, LOCK_UN);
              // Return it unserialised.
              return $this->prep_data_after_read($data);
      

       `flock` operations (file locking/unlocking in lines 363 and 372) significantly effect performance of Moodle on NFS shares. I question if these locks around file read operations are needed by design and can be safely removed.

      Background story

      I was looking at Moodle performance on AWS EFS. From statistical analysis of NFS operations you can see that moodle in its normal behavior spend about 45% time on NFS Sequence operations of locking/unlocking files: Screenshot: NFS Sequences

      If we extract specific NFS operations from the NFS Sequences the picture is about the same: Screenshot: NFS Operations

      So, after digging into the source code and looking for the lock/unlock operations I found that the only place responsible for this behavior is coded in the file mentioned above (moodle/cache/stores/file/lib.php), which is part of get() method of file cache functionality.

      Files that are cached are usually pretty small (<4kB) in size. Read operation from the line 368 -  $data .= fread($handle, 1048576); reads 1MB chunk if file larger than 1MB or just the whole file content if it is smaller than 1MB. So, from NFS operations point these reads can be considered atomic. So, the only concern I have is in case of large files (>1MB). Although, I have not seen these to be cached in my vanilla Moodle testing setup.

      So, the question is: is it safe to disable locking around cache reads for the benefit of almost doubling the NFS I/O performance?

      Another argument I can make is that with NFS/EFS share of moodledata directory the moodle data as well as cached files are all shared in EFS/NFS. So, moodle downloads files from EFS and then cache them back to EFS, which sounds like not the most optimized setup for the performance. Global shared caching made sense with very expensive DB requests. But when EFS/NFS used for data storage as well as cache sharing - this data flow design IMHO is not optimal.

      PS: I had a conversation with Sam Hemelryk - author of this file caching implementation. He agreed with my arguments but also suggested to spread the circle to make sure that we don't miss some conner cases.
       

      Thank you,

      -Vadim Karpusenko

       

            dobedobedoh Andrew Lyons
            vadikus Vadikus
            Brendan Heywood Brendan Heywood
            Jun Pataleta Jun Pataleta
            CiBoT CiBoT
            Votes:
            4 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved:

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 15 minutes
                15m

                  Error rendering 'clockify-timesheets-time-tracking-reports:timer-sidebar'. Please contact your Jira administrators.