Moodle

grading creates thousands of entries in mdl_grade_grades for only 2 students!!!

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Critical Critical
  • Resolution: Fixed
  • Affects Version/s: 1.9.2
  • Fix Version/s: 1.9.4
  • Component/s: Gradebook
  • Labels:
    None
  • Database:
    MySQL
  • Affected Branches:
    MOODLE_19_STABLE
  • Fixed Branches:
    MOODLE_19_STABLE

Description

course with categories and calculations in the gradebook creates after grading an assignment (qucik grading) for one student (only two students in this course) thousands of entries in mdl_grades_grade for users that are not participants of the course - got 116000 entries for the course in the table for only two students - should have beean about 45 entries.
The grading is I think because of these entries very slow and not really useable...

I hope someone can help me!

thx Katarzyna Potocka

  1. bogus_grades_cleanup_3.patch
    25/Oct/08 6:00 AM
    5 kB
    Petr Škoda (skodak)
  2. grade_item.php.patch.txt
    21/Oct/08 2:41 PM
    1 kB
    Michael Spall
  3. gradebook.patch
    23/Oct/08 12:42 AM
    2 kB
    Ann Adamcik

Issue Links

Activity

Hide
Michael Spall added a comment -

I can confirm this issue. On a test site with ~10,000 users. I created a course and added 3 faculty and 2 students. I added 6 assignments, 3 categories and 3 graded items. I entered in three grades total. I entered a calculation in each of the three grade items. The mdl_grade_grades table grew from 1152 records to 2,450 and mdl_grade_grades_history grew from 10,114 records to 12,715. There was no other activity on the test site while I was doing this.

On our production site we have
mdl_grade_grades with 536,018 records and
mdl_grade_grades_history with 2,184,929 records

These large tables are causing our production server to have one type of query go over 5 seconds:
Count: 254 Time=27.43s (6967s) Lock=1.27s (323s) Rows=66.3 (16840),
SELECT DISTINCT go.userid
FROM mdl_grade_grades go
JOIN mdl_grade_items gi
ON gi.id = go.itemid
LEFT OUTER JOIN mdl_grade_grades g
ON (g.userid = go.userid AND g.itemid = N)
WHERE gi.id <> N AND g.id IS NULL

Show
Michael Spall added a comment - I can confirm this issue. On a test site with ~10,000 users. I created a course and added 3 faculty and 2 students. I added 6 assignments, 3 categories and 3 graded items. I entered in three grades total. I entered a calculation in each of the three grade items. The mdl_grade_grades table grew from 1152 records to 2,450 and mdl_grade_grades_history grew from 10,114 records to 12,715. There was no other activity on the test site while I was doing this. On our production site we have mdl_grade_grades with 536,018 records and mdl_grade_grades_history with 2,184,929 records These large tables are causing our production server to have one type of query go over 5 seconds: Count: 254 Time=27.43s (6967s) Lock=1.27s (323s) Rows=66.3 (16840), SELECT DISTINCT go.userid FROM mdl_grade_grades go JOIN mdl_grade_items gi ON gi.id = go.itemid LEFT OUTER JOIN mdl_grade_grades g ON (g.userid = go.userid AND g.itemid = N) WHERE gi.id <> N AND g.id IS NULL
Hide
Ann Adamcik added a comment -

This is quickly becoming a huge issue here. I've tracked it down to the regrading function. When a calculation is added to the gradebook, a regrade occurs, which calls the compute function in lib/grade/grade_items.php. That function does a query to see if there are missing records, and if so, it adds them. The problem is that the query returns a userid for EVERY USER that is found in the mdl_grade_grades table, not just users in the current course. It then adds an entry for the current grade item for each 'missing' user. Thus, we end up with tons of extra entries in the table.

Here's the code in question, about line 1615 in lib/grade/grade_items.php:

// precreate grades - we need them to exist
$sql = "SELECT DISTINCT go.userid
FROM {$CFG->prefix}grade_grades go
JOIN {$CFG->prefix}grade_items gi
ON gi.id = go.itemid
LEFT OUTER JOIN {$CFG->prefix}grade_grades g
ON (g.userid = go.userid AND g.itemid = $this->id)
WHERE gi.id <> $this->id AND g.id IS NULL";
if ($missing = get_records_sql($sql)) {
foreach ($missing as $m) { $grade = new grade_grade(array('itemid'=>$this->id, 'userid'=>$m->userid), false); $grade->grade_item =& $this; $grade->insert('system'); }
}

Show
Ann Adamcik added a comment - This is quickly becoming a huge issue here. I've tracked it down to the regrading function. When a calculation is added to the gradebook, a regrade occurs, which calls the compute function in lib/grade/grade_items.php. That function does a query to see if there are missing records, and if so, it adds them. The problem is that the query returns a userid for EVERY USER that is found in the mdl_grade_grades table, not just users in the current course. It then adds an entry for the current grade item for each 'missing' user. Thus, we end up with tons of extra entries in the table. Here's the code in question, about line 1615 in lib/grade/grade_items.php: // precreate grades - we need them to exist $sql = "SELECT DISTINCT go.userid FROM {$CFG->prefix}grade_grades go JOIN {$CFG->prefix}grade_items gi ON gi.id = go.itemid LEFT OUTER JOIN {$CFG->prefix}grade_grades g ON (g.userid = go.userid AND g.itemid = $this->id) WHERE gi.id <> $this->id AND g.id IS NULL"; if ($missing = get_records_sql($sql)) { foreach ($missing as $m) { $grade = new grade_grade(array('itemid'=>$this->id, 'userid'=>$m->userid), false); $grade->grade_item =& $this; $grade->insert('system'); } }
Hide
Eloy Lafuente (stronk7) added a comment -

Raising to critical and addressing to 1.9.4 (1.9.3+)

Agree with Ann, that query seems to return ALL the users having some grade in any other activity in the site and not having grade in target activity.

Apart from fixing the query to retrieve the correct users... also it seems to be a nice candidate to use recordset functions if the number of users can be big.

Ciao

Show
Eloy Lafuente (stronk7) added a comment - Raising to critical and addressing to 1.9.4 (1.9.3+) Agree with Ann, that query seems to return ALL the users having some grade in any other activity in the site and not having grade in target activity. Apart from fixing the query to retrieve the correct users... also it seems to be a nice candidate to use recordset functions if the number of users can be big. Ciao
Hide
Michael Spall added a comment -

This is a quick and dirty patch that limits the users returned to those that have mdl_grade_grades records in the course instead of sitewide. This will not fix courses that already have created "extra" mdl_grade_grades records for users outside the course. You will need to run a query like:
DELETE FROM mdl_grade_grades where rawgrade IS NULL AND rawscaleid IS NULL AND usermodified IS NULL AND feedback IS NULL AND information IS NULL AND timecreated IS NULL AND timemodified IS NULL AND (finalgrade=0 or finalgrade IS NULL);
Because that query might be too aggressive you should back up your DB first and/or put more thought into which rows to delete.

It seems to me that the real fix would use the "Graded roles / gradebookroles" that are actually enrolled in the course.

Show
Michael Spall added a comment - This is a quick and dirty patch that limits the users returned to those that have mdl_grade_grades records in the course instead of sitewide. This will not fix courses that already have created "extra" mdl_grade_grades records for users outside the course. You will need to run a query like: DELETE FROM mdl_grade_grades where rawgrade IS NULL AND rawscaleid IS NULL AND usermodified IS NULL AND feedback IS NULL AND information IS NULL AND timecreated IS NULL AND timemodified IS NULL AND (finalgrade=0 or finalgrade IS NULL); Because that query might be too aggressive you should back up your DB first and/or put more thought into which rows to delete. It seems to me that the real fix would use the "Graded roles / gradebookroles" that are actually enrolled in the course.
Hide
Ann Adamcik added a comment -

Michael's patch looks reasonable, as long as the grade_grades table doesn't already have a bunch of extra entries for users not in the course (or the table has been adequately cleaned up!). However, any 'extra' users that exist in the table for other grade items in the course will continue to be brought forward whenever new grade items are added.

I'm attaching another patch that makes sure the query results are limited to actual course users. This way, any new entries in the table will be correct, even if there are still old entries for users not in the course.

Show
Ann Adamcik added a comment - Michael's patch looks reasonable, as long as the grade_grades table doesn't already have a bunch of extra entries for users not in the course (or the table has been adequately cleaned up!). However, any 'extra' users that exist in the table for other grade items in the course will continue to be brought forward whenever new grade items are added. I'm attaching another patch that makes sure the query results are limited to actual course users. This way, any new entries in the table will be correct, even if there are still old entries for users not in the course.
Hide
Petr Škoda (skodak) added a comment -

unfortunately we can not use following, because it will not work for large numbers of users
+ if (!empty($users)) { + $sql .= " AND go.userid in (".$users.")"; + }

Thanks everybody for working on this, I agree this must be fixed ASAP.

Show
Petr Škoda (skodak) added a comment - unfortunately we can not use following, because it will not work for large numbers of users + if (!empty($users)) { + $sql .= " AND go.userid in (".$users.")"; + } Thanks everybody for working on this, I agree this must be fixed ASAP.
Hide
Petr Škoda (skodak) added a comment -

Attaching upgrade cleanup patch, it is very slow - please review it.
It is intentional that grades of unenrolled students are not removed, grades are deleted only when deleting user.

Petr

Show
Petr Škoda (skodak) added a comment - Attaching upgrade cleanup patch, it is very slow - please review it. It is intentional that grades of unenrolled students are not removed, grades are deleted only when deleting user. Petr
Hide
Petr Škoda (skodak) added a comment -

going to improve the patch a bit more tomorrow so that it keeps grades fetched from activities

Show
Petr Škoda (skodak) added a comment - going to improve the patch a bit more tomorrow so that it keeps grades fetched from activities
Hide
Petr Škoda (skodak) added a comment -

closing this issue and filing a separate issue for cleanup of bogus grades created by this bug
thanks for the report a lot

Show
Petr Škoda (skodak) added a comment - closing this issue and filing a separate issue for cleanup of bogus grades created by this bug thanks for the report a lot
Hide
Tim Hunt added a comment -

I assume that since the fix was checked in in October, and we have not had complaints, this must be fixed.

Therefore, closing, as the remaining work has been moved to a separate issue.

Show
Tim Hunt added a comment - I assume that since the fix was checked in in October, and we have not had complaints, this must be fixed. Therefore, closing, as the remaining work has been moved to a separate issue.
Hide
Fred Woolard added a comment -

I am assuming the final patch (third attached file, gradebook.patch) is the official fix for this problem and I think I understand the goal of that fragment, however, I'd like to suggest what I think to be a simpler single query, rather than using two separate queries to the db:

Assume that the $gradebookroles variable contains an appropriate SQL constraint for either a single or multiple key values for the roleid.

select distinct ra.userid
from mdl_role_assignments ra
inner join mdl_context ctx on ra.contextid = ctx.id
left join mdl_grade_grades gg on gg.itemid = $this->id and gg.userid = ra.userid
where ra.roleid $gradebookroles
and ctx.instanceid = $this->courseid
and ctx.contextlevel = CONTEXT_COURSE
and gg.id is null

The first possible difference from the patch's query is that this one does not consider any "parent" contexts as pulled in from the get_related_contexts_string. But, from what I can tell, a course's parentage can only consist of categories on up to the site context (/1). I'm fairly ignorant of moodle still, but I'm reasoning that I don't need to consider a course's parent contexts when determining which student's grades are not present for one particular grade_item.

The other difference is that the population of user id values is coming from a set that has more of an expectation of being complete--enrollments, whereas the patch's approach is to look for user id values in the mdl_grade_grades table corresponding to any possible grade_items other than the ONE in which we're interested. If there happen to be no grades for a user with a different grade_item, then that user's id won't be in the starting population against which we filter using the left join, and so won't be in the result.

Show
Fred Woolard added a comment - I am assuming the final patch (third attached file, gradebook.patch) is the official fix for this problem and I think I understand the goal of that fragment, however, I'd like to suggest what I think to be a simpler single query, rather than using two separate queries to the db: Assume that the $gradebookroles variable contains an appropriate SQL constraint for either a single or multiple key values for the roleid. select distinct ra.userid from mdl_role_assignments ra inner join mdl_context ctx on ra.contextid = ctx.id left join mdl_grade_grades gg on gg.itemid = $this->id and gg.userid = ra.userid where ra.roleid $gradebookroles and ctx.instanceid = $this->courseid and ctx.contextlevel = CONTEXT_COURSE and gg.id is null The first possible difference from the patch's query is that this one does not consider any "parent" contexts as pulled in from the get_related_contexts_string. But, from what I can tell, a course's parentage can only consist of categories on up to the site context (/1). I'm fairly ignorant of moodle still, but I'm reasoning that I don't need to consider a course's parent contexts when determining which student's grades are not present for one particular grade_item. The other difference is that the population of user id values is coming from a set that has more of an expectation of being complete--enrollments, whereas the patch's approach is to look for user id values in the mdl_grade_grades table corresponding to any possible grade_items other than the ONE in which we're interested. If there happen to be no grades for a user with a different grade_item, then that user's id won't be in the starting population against which we filter using the left join, and so won't be in the result.
Hide
Tim Hunt added a comment -

It is better not to trust the patches attached to an issue. Instead, click on the Version Control (or All) tab just above the comments, and you will be able to see exactly what changes were made in CVS relating to this issue, for example

MODIFY lib/grade/grade_item.php Rev. 1.130.2.28 (+2 -2 lines)

clicking on the (+2 -2 lines) link gets you the patch.

You do need to consider roles assigned in parent contexts - that is how the roles system works.

Show
Tim Hunt added a comment - It is better not to trust the patches attached to an issue. Instead, click on the Version Control (or All) tab just above the comments, and you will be able to see exactly what changes were made in CVS relating to this issue, for example MODIFY lib/grade/grade_item.php Rev. 1.130.2.28 (+2 -2 lines) clicking on the (+2 -2 lines) link gets you the patch. You do need to consider roles assigned in parent contexts - that is how the roles system works.
Hide
Fred Woolard added a comment -

Thanks Tim for the tips on discerning what's what with regard to the committed changes in CVS. Looking at those changes.

I still see the second described difference in the approaches as being pertinent. The starting population of user id values is coming from mdl_grade_grades rather than mdl_role_assignments. The change I would make to accommodate role assignments made to the parent course categories is:

select distinct ra.userid
from mdl_role_assignments ra
inner join mdl_context ctx on ra.contextid = ctx.id
left join mdl_grade_grades gg on gg.itemid = $this->id and gg.userid = ra.userid
where ra.roleid $gradebookroles
and ( (ctx.instanceid = $this->courseid and ctx.contextlevel = CONTEXT_COURSE)
or (ctx.id $relatedcontexts and ctx.contextlevel = CONTEXT_COURSECAT)
)
and gg.id is null

Where $relatedcontexts has the appropriate SQL constraint for the parent context(s), not including the course context id since that is handled in the first part of the compound and/or. The third file attachment uses a call to get_related_contexts_string to get the $relatedcontexts assignment which will have the course's context included, so if that function is used, the result would have to be massaged to eliminate the course's context id. I apologize that the SQL code is psuedo.

Show
Fred Woolard added a comment - Thanks Tim for the tips on discerning what's what with regard to the committed changes in CVS. Looking at those changes. I still see the second described difference in the approaches as being pertinent. The starting population of user id values is coming from mdl_grade_grades rather than mdl_role_assignments. The change I would make to accommodate role assignments made to the parent course categories is: select distinct ra.userid from mdl_role_assignments ra inner join mdl_context ctx on ra.contextid = ctx.id left join mdl_grade_grades gg on gg.itemid = $this->id and gg.userid = ra.userid where ra.roleid $gradebookroles and ( (ctx.instanceid = $this->courseid and ctx.contextlevel = CONTEXT_COURSE) or (ctx.id $relatedcontexts and ctx.contextlevel = CONTEXT_COURSECAT) ) and gg.id is null Where $relatedcontexts has the appropriate SQL constraint for the parent context(s), not including the course context id since that is handled in the first part of the compound and/or. The third file attachment uses a call to get_related_contexts_string to get the $relatedcontexts assignment which will have the course's context included, so if that function is used, the result would have to be massaged to eliminate the course's context id. I apologize that the SQL code is psuedo.
Hide
Fred Woolard added a comment -

Sorry, now back at dev machine where I can think clearly. In the previous comment's query, the constraint on the CONTEXT_COURSECAT is superfluous since the first part of the condition is on the id (unique key) column, so no further constraint is needed. Since the course context has to be fetched anyway as part of finding the related contexts, the first part of the compound constraint can be replaced with the same key constraint used for the course categories. The code fragment becomes:

if (strpos($CFG->gradebookroles, ',') === false) {
$gradebookroles = " = {$CFG->gradebookroles}";
} else {
$gradebookroles = " IN ({$CFG->gradebookroles})";
}

$relatedcontexts = get_related_contexts_string(get_context_instance(CONTEXT_COURSE, $this->courseid));

$sql = "SELECT DISTINCT ra.userid
FROM {$CFG->prefix}role_assignments ra
INNER JOIN {$CFG->prefix}context ctx
ON ra.contextid = ctx.id
LEFT JOIN {$CFG->prefix}grade_grades gg
ON gg.itemid = {$this->id}
AND gg.userid = ra.userid
WHERE ra.roleid {$gradebookroles}
AND ctx.id {$relatedcontexts}
AND gg.id is null";

if ($missing = get_records_sql($sql)) {

Show
Fred Woolard added a comment - Sorry, now back at dev machine where I can think clearly. In the previous comment's query, the constraint on the CONTEXT_COURSECAT is superfluous since the first part of the condition is on the id (unique key) column, so no further constraint is needed. Since the course context has to be fetched anyway as part of finding the related contexts, the first part of the compound constraint can be replaced with the same key constraint used for the course categories. The code fragment becomes: if (strpos($CFG->gradebookroles, ',') === false) { $gradebookroles = " = {$CFG->gradebookroles}"; } else { $gradebookroles = " IN ({$CFG->gradebookroles})"; } $relatedcontexts = get_related_contexts_string(get_context_instance(CONTEXT_COURSE, $this->courseid)); $sql = "SELECT DISTINCT ra.userid FROM {$CFG->prefix}role_assignments ra INNER JOIN {$CFG->prefix}context ctx ON ra.contextid = ctx.id LEFT JOIN {$CFG->prefix}grade_grades gg ON gg.itemid = {$this->id} AND gg.userid = ra.userid WHERE ra.roleid {$gradebookroles} AND ctx.id {$relatedcontexts} AND gg.id is null"; if ($missing = get_records_sql($sql)) {

Dates

  • Created:
    Updated:
    Resolved: