Affects Version/s: 3.9.4
Fix Version/s: None
So, I was encouraged to post my story here, to prevent it from happening to others.
And to help finding it in case someone else runs into this. In case this leads to a discussion or code changes, feel free of course to change Issue Type and Summary.
Please do not judge. They were young and didn't know what they were doing.
That's a follow up from my HTTP 206 partial content requests flooding Apache forum post.
So, what happened: January 5th. Mail comes in by a student. He doesn't see a image in a quiz question using Safari.
Over time, we notice: a lot of repeating requests for files, mainly images. Delivered then with HTTP 206 partial content statuses. But lots of requests for the database.
Things went bad and even more bad since semester started to end and exams began being conducted: lots of user in the line. Waiting times of half a minute to jump from one quiz page to the following etcetera. Average page loading times raising and raising.
Measures: Proofchecked the Database. Tweaking cache and parameters. Increased processors. Divided on several systems. (We're on a PHP7-FPM, PostgreSQL 12, Moodle 3.9.3 btw).
Finally, February 1st, the reason for this turns up. Quickfix setting $CFG->disablebyteserving = true; helps turning performance back instantly (Thanks to guy thomas of Titus Learning!). But Safari still not displaying all images.
The cause was a system engineer feeling having to do something about free diskspace shortage and going to compress all png files in moodledata on December 23rd.
The effect didn't kick in until the return from christmas break, when students returned in large numbers.
The checksum of the compressed files not corrisponding to their filenames must have caused the chain effect to finally overload the db by passing through some browser hammering request then served with 206 partial content letting them continue to fire request.
That's my own naïve way of putting this.
Surely someone more savvy could break that down in a more wise way.
I'm not saying we should do something about this behaviour, not knowing the reason of it. Just putting that down here.