-
Bug
-
Resolution: Fixed
-
Blocker
-
3.11.10, 4.0.4, 4.1
The bump_submission_for_stale_conversions adhoc task contains a guard clause intended to prevent it from running on sites where document conversion has never been enabled. However it always returns true regardless of whether the $earliestconversion query returns a value or not. $earliestconversion is never null, it's always an object with a min member - it is this min member that can be null. So the guard clause should be: if (isset($earliestconversion->min))
The value of $earliestconversion->min is used in a subsequent query on the files table - and it appears when this value is null, it causes some DBs with a large number of records in the files table to spend an absurd amount of time on this query.
Original report
evsoldatkin commented this on MDL-68943, and I decided to post this as an issue, because it seemed like an urgent matter to me:
Seems that SQL request has no chance to finish for medium DB on SSD with 4 million files. 3 days hanging and restart. f1 with left join on f2 is too big load to execute. I suggest spliting sql in several requests. That resulted into speed increase on the tested DB from never to 2 min 57 sec. Also adding raise_memory_limit(MEMORY_HUGE) can be used for safety, but was not needed for the tested DB.
public function execute() { |
global $DB; |
// Used to only get records after whenever document conversion was enabled for this site. |
$earliestconversion = $DB->get_record_sql("SELECT MIN(timecreated) AS min |
FROM {files}
|
WHERE filearea = 'documentconversion'"); |
if ($earliestconversion) { |
['sql' => $extensionsql, 'params' => $extensionparams] = array_reduce( |
['doc', 'docx', 'rtf', 'xls', 'xlsx', 'ppt', 'pptx', 'html', 'odt', 'ods', 'png', 'jpg', 'txt', 'gif'], |
function(array $c, string $ext) use ($DB): array { |
return [ |
'sql' => $c['sql'] . ($c['sql'] ? ' OR ' : '') . $DB->sql_like('f1.filename', ':' . $ext), |
'params' => $c['params'] + [$ext => '%.' . $ext] |
];
|
},
|
['sql' => '', 'params' => []] |
);
|
// A converted file has its filename set to the contenthash of the file it converted. |
// Find all files in the relevant file areas for which there is no corresponding |
// file with the contenthash as the file name. |
// |
// Also check if the file has a greater modified time than the submission, if it does |
// that means it is both stale (as per the above) and will never be reconverted. |
$sql = "SELECT * FROM mdl_files f1 WHERE f1.filearea = 'submission_files' AND ($extensionsql)"; |
$files1 = $DB->get_records_sql($sql, $extensionparams); |
$files2 = $DB->get_records('files', ['component' => 'core', 'filearea' => 'documentconversion']); |
$files2 = array_map( |
function($file2) { |
return $file2->filename; |
},
|
$files2 |
);
|
$files = []; |
foreach ($files1 as $file1) { |
if (!in_array($file1->contenthash, $files2)) { |
$files[] = $file1; |
}
|
}
|
$sql = "SELECT f3.id, f3.timemodified as fmodified, asu.id as submissionid |
FROM {assign_submission} asu
|
JOIN {assign_grades} asg ON asg.userid = asu.userid AND asg.assignment = asu.assignment
|
JOIN {files} f3 ON f3.itemid = asg.id
|
WHERE asu.id = :itemid
|
AND f3.timecreated >= :earliest
|
AND f3.component = 'assignfeedback_editpdf' |
AND f3.filearea = 'combined' |
AND f3.filename = 'combined.pdf' |
AND f3.timemodified >= asu.timemodified";
|
foreach ($files as $file) |
{
|
$submission = $DB->get_record_sql($sql, ['earliest' => $earliestconversion->min, 'itemid' => $file->itemid]); |
if ($submission) |
{
|
// Set the submission modified time to one second later than the |
// converted files modified time, this will cause assign to reconvert |
// everything and delete the old files when the assignment grader is |
// viewed. See get_page_images_for_attempt in document_services.php. |
$newmodified = $submission->fmodified + 1; |
$record = (object)[ |
'id' => $submission->submissionid, |
'timemodified' => $newmodified |
];
|
mtrace('Set submission ' . $submission->submissionid . ' timemodified to ' . $newmodified); |
$DB->update_record('assign_submission', $record); |
}
|
}
|
}
|
}
|
I don't have a large instance to test this on, but mikhailgolenkov commented that the original step already took 18 minutes for him, so it seems definitely possible that it takes exceedingly long on larger instances.
- has a non-specific relationship to
-
MDL-68943 Converted PDF file does not update when submission file is overwritten
- Closed