-
Bug
-
Resolution: Fixed
-
Blocker
-
1.9.3, 1.9.4, 1.9.5, 1.9.6, 1.9.7, 1.9.11, 2.0, 2.1
-
Any
-
MOODLE_19_STABLE, MOODLE_20_STABLE, MOODLE_21_STABLE
-
MOODLE_19_STABLE, MOODLE_20_STABLE
-
wip-
MDL-24355-master -
Moderate
-
The tag_compute_correlations() function in "tag/lib.php" is not computing tags correctly. This is the query it's currently using to compute the correlated tags for a specified tag:
$query = "SELECT tb.tagid ".
|
"FROM {$CFG->prefix}tag_instance ta INNER JOIN {$CFG->prefix}tag_instance tb ON ta.itemid = tb.itemid ".
|
"WHERE ta.tagid = {$tag->id} AND tb.tagid != {$tag->id} ".
|
"GROUP BY tb.tagid ".
|
"HAVING COUNT(*) > $min_correlation ".
|
"ORDER BY COUNT(*) DESC";
|
However, in the 'tag_instance' table, different items can have the same 'itemid' but different 'itemtype'. The query above does not account for that. To fix only this query, you could change it to this:
$query = "SELECT tb.tagid ".
|
"FROM {$CFG->prefix}tag_instance ta INNER JOIN {$CFG->prefix}tag_instance tb ON (ta.itemtype = tb.itemtype AND ta.itemid = tb.itemid) ".
|
"WHERE ta.tagid = {$tag->id} AND tb.tagid != {$tag->id} ".
|
"GROUP BY tb.tagid ".
|
"HAVING COUNT(*) > $min_correlation ".
|
"ORDER BY COUNT(*) DESC";
|
The only difference is the join condition (the part after "ON" in the 2nd line).
However, there's another problem with the tag_compute_correlations() function. It runs a minimum of 2*N+1 queries where N is the number of rows in the 'tag' table. 1 query to get a list of all the tags, 1 query per tag to get a list of correlated tags, and 1 query per tag to get the cached correlated tags from the 'tag_correlation' table.
I work for Remote-Learner and many of our clients have at least hundreds of rows in that table, and some have thousands. The particular client that caused me to discover this currently has 9649 rows in their 'tag' table. So every time that function gets called for their site, it runs a minimum of 19299 queries. I know that it's only supposed to run 20% of the times that 'admin/cron.php' runs, but still, that's a lot of queries! Especially when the equivalent can be done with 1 single query. I've attached a rewrite of the tag_compute_correlations() function that accomplishes this. I've thoroughly tested it on a few different sites. Let me know if you have any questions about it.
Sorry I didn't upload a patch file (if that's what you prefer). I thought it wouldn't really make sense as a patch since I just rewrote the whole function.
- has been marked as being related by
-
MDLSITE-1000 Tag functions tripping get_records warning about duplicate keys
- Closed
-
MDL-27861 Tag ID may still be in the correlatedtags column for other tags
- Closed
-
MDL-27859 add unit tests for tagcron()
- Closed
-
MDL-27864 tag_get_tags() is inaccurately named
- Closed
- is duplicated by
-
MDL-26586 Cron DB error on postgres : column "co.id" must appear in the GROUP BY clause or be used in an aggregate function
- Closed
-
MDL-26884 Buggy SELECT statement in function tag_compute_correlations()
- Closed
-
MDL-21819 MySQL error in tag/lib.php cron - Invalid use of group function
- Closed