Moodle
  1. Moodle
  2. MDL-24355

Computation of tag correlations is incorrect and very inefficient

    Details

    • Database:
      Any
    • Testing Instructions:
      Hide

      check that /admin/cron.php runs without error. Note that the tag correlation calculation doesn't run every time. To ensure it has been run you may wish to edit /lib/cronlib.php at line 197 to be:

      if (true) {
      

      1) Add 3 blog entries. List 2 tags with each entry. The same 2 for each entry.
      2) run the modified version of cron (see above).
      3) Go to the manage tags page. Site administration -> appearance -> manage tags. Check that your two tags are listed.
      4) Click on one of the tags you added. On the tags page your other tag should be listed in the "related tags" area.
      5) Click "Manage tags" to go back. This link can be hard to find. Its in the top left of the centre page area.
      6) Tick the box next to one of your tags and select "delete" from the "with selected tags" drop down. Click ok.
      7) Click your remaining tag and check that the related tags area is gone.
      8) Click through to one of your blog posts and check that the deleted tag is gone from the tags list for that blog post.

      This testing does need to be repeated for 1.9, 2 stable and 2.1 :|

      Show
      check that /admin/cron.php runs without error. Note that the tag correlation calculation doesn't run every time. To ensure it has been run you may wish to edit /lib/cronlib.php at line 197 to be: if ( true ) { 1) Add 3 blog entries. List 2 tags with each entry. The same 2 for each entry. 2) run the modified version of cron (see above). 3) Go to the manage tags page. Site administration -> appearance -> manage tags. Check that your two tags are listed. 4) Click on one of the tags you added. On the tags page your other tag should be listed in the "related tags" area. 5) Click "Manage tags" to go back. This link can be hard to find. Its in the top left of the centre page area. 6) Tick the box next to one of your tags and select "delete" from the "with selected tags" drop down. Click ok. 7) Click your remaining tag and check that the related tags area is gone. 8) Click through to one of your blog posts and check that the deleted tag is gone from the tags list for that blog post. This testing does need to be repeated for 1.9, 2 stable and 2.1 :|
    • Difficulty:
      Moderate
    • Affected Branches:
      MOODLE_19_STABLE, MOODLE_20_STABLE, MOODLE_21_STABLE
    • Fixed Branches:
      MOODLE_19_STABLE, MOODLE_20_STABLE
    • Pull Master Branch:
      wip-MDL-24355-master
    • Rank:
      6328

      Description

      The tag_compute_correlations() function in "tag/lib.php" is not computing tags correctly. This is the query it's currently using to compute the correlated tags for a specified tag:

      $query = "SELECT tb.tagid ".
          "FROM {$CFG->prefix}tag_instance ta INNER JOIN {$CFG->prefix}tag_instance tb ON ta.itemid = tb.itemid ".
          "WHERE ta.tagid = {$tag->id} AND tb.tagid != {$tag->id} ".
          "GROUP BY tb.tagid ".
          "HAVING COUNT(*) > $min_correlation ".
          "ORDER BY COUNT(*) DESC";

      However, in the 'tag_instance' table, different items can have the same 'itemid' but different 'itemtype'. The query above does not account for that. To fix only this query, you could change it to this:

      $query = "SELECT tb.tagid ".
          "FROM {$CFG->prefix}tag_instance ta INNER JOIN {$CFG->prefix}tag_instance tb ON (ta.itemtype = tb.itemtype AND ta.itemid = tb.itemid) ".
          "WHERE ta.tagid = {$tag->id} AND tb.tagid != {$tag->id} ".
          "GROUP BY tb.tagid ".
          "HAVING COUNT(*) > $min_correlation ".
          "ORDER BY COUNT(*) DESC";

      The only difference is the join condition (the part after "ON" in the 2nd line).

      However, there's another problem with the tag_compute_correlations() function. It runs a minimum of 2*N+1 queries where N is the number of rows in the 'tag' table. 1 query to get a list of all the tags, 1 query per tag to get a list of correlated tags, and 1 query per tag to get the cached correlated tags from the 'tag_correlation' table.

      I work for Remote-Learner and many of our clients have at least hundreds of rows in that table, and some have thousands. The particular client that caused me to discover this currently has 9649 rows in their 'tag' table. So every time that function gets called for their site, it runs a minimum of 19299 queries. I know that it's only supposed to run 20% of the times that 'admin/cron.php' runs, but still, that's a lot of queries! Especially when the equivalent can be done with 1 single query. I've attached a rewrite of the tag_compute_correlations() function that accomplishes this. I've thoroughly tested it on a few different sites. Let me know if you have any questions about it.

      Sorry I didn't upload a patch file (if that's what you prefer). I thought it wouldn't really make sense as a patch since I just rewrote the whole function.

        Issue Links

          Activity

          Hide
          Eloy Lafuente (stronk7) added a comment -

          Sending to stable backlog and assigning to Sam for evaluation of the patch and implementation. I think it can make a difference in sites with large tags collections (and we are certainly increasing "tag uses" over the whole Moodle specially in 2.0.x). Would be great to have this included in next performance-focussed sprint.

          Note this should be fixed both for 1.9.x and 2.0.x

          Thanks for the report and ciao

          Show
          Eloy Lafuente (stronk7) added a comment - Sending to stable backlog and assigning to Sam for evaluation of the patch and implementation. I think it can make a difference in sites with large tags collections (and we are certainly increasing "tag uses" over the whole Moodle specially in 2.0.x). Would be great to have this included in next performance-focussed sprint. Note this should be fixed both for 1.9.x and 2.0.x Thanks for the report and ciao
          Hide
          Sam Hemelryk added a comment -

          Hi guys,

          Jonathan thanks for providing your improved function. I'm presently going through testing and tweaking things and I believe I have got the 2.0 version in a state I am happy with.

          There is only one element of functionality that I am changing, and that is to remove the having clause from the SQL query and instead handle the checking of $mincorrelations in PHP. This fixes the issues with the having clause and allows us to also clean up any invalid correlations at the same time.
          I should add presently I have only tested this with a limited data set. I'll generate a huge dataset this afternoon to test on a mass scale.

          Andrew could you please peer-review it for me: https://github.com/samhemelryk/moodle/compare/master...wip-MDL-24355-M2.0
          I'll now work on getting a branch set up with the 1.9 changes.

          Cheers
          Sam

          Show
          Sam Hemelryk added a comment - Hi guys, Jonathan thanks for providing your improved function. I'm presently going through testing and tweaking things and I believe I have got the 2.0 version in a state I am happy with. There is only one element of functionality that I am changing, and that is to remove the having clause from the SQL query and instead handle the checking of $mincorrelations in PHP. This fixes the issues with the having clause and allows us to also clean up any invalid correlations at the same time. I should add presently I have only tested this with a limited data set. I'll generate a huge dataset this afternoon to test on a mass scale. Andrew could you please peer-review it for me: https://github.com/samhemelryk/moodle/compare/master...wip-MDL-24355-M2.0 I'll now work on getting a branch set up with the 1.9 changes. Cheers Sam
          Hide
          Sam Hemelryk added a comment -

          Alright, the 1.9 branch containing changes is now available: https://github.com/samhemelryk/moodle/compare/master...wip-MDL-24355-M1.9
          Andrew could you please peer-review that as well.

          I'm off to stress test things

          Cheers
          Sam

          Show
          Sam Hemelryk added a comment - Alright, the 1.9 branch containing changes is now available: https://github.com/samhemelryk/moodle/compare/master...wip-MDL-24355-M1.9 Andrew could you please peer-review that as well. I'm off to stress test things Cheers Sam
          Hide
          Andrew Davis added a comment -

          Being finicky but the phpdocs for tag_compute_correlations() should probably state that the function doesnt return anything.

          line 891, "relfected"?

          Random possibly bad idea. Rather that deleting the correlations one by one in tag_process_computed_correlation() would it be better to build up an array of correlations to delete them get rid of them all at once using $DB->delete_records_list() (or whatever)? tag_process_computed_correlation() could return the correlation id if it needs to be deleted with the calling code adding to an array then deleting them in one hit after its done looping through the correlations. Dont know if its worth it. Depends how many deletions typically occur.

          The comment starting on line 894 doesnt seem to make sense. Tag ID isnt being dealt with there.

          Otherwise, +1.

          Show
          Andrew Davis added a comment - Being finicky but the phpdocs for tag_compute_correlations() should probably state that the function doesnt return anything. line 891, "relfected"? Random possibly bad idea. Rather that deleting the correlations one by one in tag_process_computed_correlation() would it be better to build up an array of correlations to delete them get rid of them all at once using $DB->delete_records_list() (or whatever)? tag_process_computed_correlation() could return the correlation id if it needs to be deleted with the calling code adding to an array then deleting them in one hit after its done looping through the correlations. Dont know if its worth it. Depends how many deletions typically occur. The comment starting on line 894 doesnt seem to make sense. Tag ID isnt being dealt with there. Otherwise, +1.
          Hide
          Sam Hemelryk added a comment -

          Thanks Andrew, PULL-202 and PULL-203 have been created for this issue to see it integrated.

          Cheers
          Sam

          Show
          Sam Hemelryk added a comment - Thanks Andrew, PULL-202 and PULL-203 have been created for this issue to see it integrated. Cheers Sam
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Hi,

          I'm reopening this and sending it again to the STABLE backlog because I think both PULL-202 and PULL-203 were introducing one "new" way to consider "mincorrelation" that isn't correct IMO. Let's discuss it.

          Show
          Eloy Lafuente (stronk7) added a comment - Hi, I'm reopening this and sending it again to the STABLE backlog because I think both PULL-202 and PULL-203 were introducing one "new" way to consider "mincorrelation" that isn't correct IMO. Let's discuss it.
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Rationale about mincorrelation...imagine this tags (id, name pairs):

          1 - dog
          2 - cat
          3 - mouse
          4 - flea

          And imagine 3 users have set these pairs in the same itemtype, itemid (course tags, user tags... whatever):

          1, 2 (dog and cat, once)
          1, 3 (dog and mouse, once)
          1, 4 (dog and flea, once)

          If I didn't read your patch wrongly... you will end calling (for the tagid =1, dog) to tag_process_computed_correlation() with $tagcorrelation being something like:

          $tagcorrelation = stdClass {
              id    = null,                   // imagine that there are NO previous correlation records
              tagid = 1,                      // aka, dog
              correlatedtags = array(2, 3, 4) // aka, cat, mouse and flea
          }
          

          Then, within tag_process_computed_correlation() you simply check this condition:

          if (count($tagcorrelation->correlatedtags) > $mincorrelation)
          

          and if, it's true, you update/insert the correlations for dog. So at the end, you get dog correlated with cat, mouse and flea.

          And that's is 100% incorrect IMO, you are correlating dog with everything in the moment you find 2 or more (different) correlations.

          Instead has to be correlated with cat only if it is really correlated 2 or more times with it, i.e. if the (1,2) pair happens 2 or more times in the query (not the global count at all).

          IMO it's more or less simple to fix, surely all you need to do is to populate $tagcorrelation->correlatedtags with the keys being the correlated tags and the values being the count (incrementing each time you get a repeated correlation). Then tag_process_computed_correlation() will only create correlations for $tagcorrelation->correlatedtags elements having 2 or more in the value (i.e. have been correlated at least $mincorrelation times).

          How does it sound, do you get the difference? Perhaps I'm wrong but, after re-reading the patch a bunch of times I think your interpretation of $mincorrelation is incorrect as stated above.

          Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - Rationale about mincorrelation...imagine this tags (id, name pairs): 1 - dog 2 - cat 3 - mouse 4 - flea And imagine 3 users have set these pairs in the same itemtype, itemid (course tags, user tags... whatever): 1, 2 (dog and cat, once) 1, 3 (dog and mouse, once) 1, 4 (dog and flea, once) If I didn't read your patch wrongly... you will end calling (for the tagid =1, dog) to tag_process_computed_correlation() with $tagcorrelation being something like: $tagcorrelation = stdClass { id = null , // imagine that there are NO previous correlation records tagid = 1, // aka, dog correlatedtags = array(2, 3, 4) // aka, cat, mouse and flea } Then, within tag_process_computed_correlation() you simply check this condition: if (count($tagcorrelation->correlatedtags) > $mincorrelation) and if, it's true, you update/insert the correlations for dog. So at the end, you get dog correlated with cat, mouse and flea. And that's is 100% incorrect IMO, you are correlating dog with everything in the moment you find 2 or more (different) correlations. Instead has to be correlated with cat only if it is really correlated 2 or more times with it, i.e. if the (1,2) pair happens 2 or more times in the query (not the global count at all). IMO it's more or less simple to fix, surely all you need to do is to populate $tagcorrelation->correlatedtags with the keys being the correlated tags and the values being the count (incrementing each time you get a repeated correlation). Then tag_process_computed_correlation() will only create correlations for $tagcorrelation->correlatedtags elements having 2 or more in the value (i.e. have been correlated at least $mincorrelation times). How does it sound, do you get the difference? Perhaps I'm wrong but, after re-reading the patch a bunch of times I think your interpretation of $mincorrelation is incorrect as stated above. Ciao
          Hide
          Sam Hemelryk added a comment -

          Ahhh thanks for picking that out Eloy, indeed I have misunderstood the tag correlations, I'll go back to it now.

          Cheers
          Sam

          Show
          Sam Hemelryk added a comment - Ahhh thanks for picking that out Eloy, indeed I have misunderstood the tag correlations, I'll go back to it now. Cheers Sam
          Hide
          Sam Hemelryk added a comment -

          Alright, got it solved!
          My apologies Jonathan the query you had in your patch was in fact correct and it was me who broke it by not paying close enough attention to how min correlations was working.
          I'll create two new pull requests to see the 2.0 and 1.9 changes go in.

          Cheers
          Sam

          Show
          Sam Hemelryk added a comment - Alright, got it solved! My apologies Jonathan the query you had in your patch was in fact correct and it was me who broke it by not paying close enough attention to how min correlations was working. I'll create two new pull requests to see the 2.0 and 1.9 changes go in. Cheers Sam
          Hide
          Jonathan Robson added a comment -

          Glad to see this getting some attention. Thanks guys.

          Show
          Jonathan Robson added a comment - Glad to see this getting some attention. Thanks guys.
          Hide
          Eloy Lafuente (stronk7) added a comment -

          I'm reopening this because the last pull requests contained one non-cross-db query (only working under MySQL), sorry.

          Resolution pasted from pull-requests follows:

          1) it's mandatory to group by all the columns not being aggregated ones.
          2) there should be one COUNT in the select clause.
          3) you cannot group/select distinct by TEXT columns
          4) incidentally, there is a bunch of whitespace in the commits.

          Ciao

          Sending this to stable backlog.

          Show
          Eloy Lafuente (stronk7) added a comment - I'm reopening this because the last pull requests contained one non-cross-db query (only working under MySQL), sorry. Resolution pasted from pull-requests follows: 1) it's mandatory to group by all the columns not being aggregated ones. 2) there should be one COUNT in the select clause. 3) you cannot group/select distinct by TEXT columns 4) incidentally, there is a bunch of whitespace in the commits. Ciao Sending this to stable backlog.
          Hide
          Brent Graveland added a comment -

          Similar to what Eloy Lafuente reported, I'm seeing this in my PostgreSQL logs:

          ERROR: column "co.id" must appear in the GROUP BY clause or be used in an aggregate function at character 43

          Show
          Brent Graveland added a comment - Similar to what Eloy Lafuente reported, I'm seeing this in my PostgreSQL logs: ERROR: column "co.id" must appear in the GROUP BY clause or be used in an aggregate function at character 43
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Wow, not sure what/how I managed this, sorry!

          1) I switched the fix version back to STABLE backlog, so everybody missed this, my fault.
          2) I rejected the pull requests but the associated commits ended @ upstream, my re-fault!
          3) I'm idiot, my fault!

          Going to take a look to this! Stay tuned!

          Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - Wow, not sure what/how I managed this, sorry! 1) I switched the fix version back to STABLE backlog, so everybody missed this, my fault. 2) I rejected the pull requests but the associated commits ended @ upstream, my re-fault! 3) I'm idiot, my fault! Going to take a look to this! Stay tuned! Ciao
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Raising this to blocker, setting it for Sprint 9 (replacing dupe MDL-26884) and needs to be fixed for 19_STABLE, 20_STABLE and master.

          Going to look to it later today, in case I can help to fix my mistakes. Re-sorry!

          Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - Raising this to blocker, setting it for Sprint 9 (replacing dupe MDL-26884 ) and needs to be fixed for 19_STABLE, 20_STABLE and master. Going to look to it later today, in case I can help to fix my mistakes. Re-sorry! Ciao
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Update:

          1) I've created 2 emergency pull requests to get the SQL working in a cross-db:

          • PULL-658: for master
          • PULL-659: for 20_STABLE

          2) While looking @ the logic after the query... it seems that while correlation records are inserted and updated, they aren't ever deleted, nor when the correlation goes <= 2, nor when the tag instances are deleted.

          So, TODO (SamH?):

          0) Integrate the 2 emergency pull requests above (Petr is on them right now).
          1) Backport the query and code to MOODLE_19_STABLE
          2) Analyse what happens with correlation when < 2. Fix for all branches.
          3) Analyse what happens with correlation when tag instances are deleted. Fix for all branches.

          Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - Update: 1) I've created 2 emergency pull requests to get the SQL working in a cross-db: PULL-658: for master PULL-659: for 20_STABLE 2) While looking @ the logic after the query... it seems that while correlation records are inserted and updated, they aren't ever deleted, nor when the correlation goes <= 2, nor when the tag instances are deleted. So, TODO (SamH?): 0) Integrate the 2 emergency pull requests above (Petr is on them right now). 1) Backport the query and code to MOODLE_19_STABLE 2) Analyse what happens with correlation when < 2. Fix for all branches. 3) Analyse what happens with correlation when tag instances are deleted. Fix for all branches. Ciao
          Hide
          Sam Hemelryk added a comment -

          Thanks Eloy, I'll look to get a solution sorted for the next weekly reviews.

          Cheers
          Sam

          Show
          Sam Hemelryk added a comment - Thanks Eloy, I'll look to get a solution sorted for the next weekly reviews. Cheers Sam
          Hide
          Sam Hemelryk added a comment -

          Hi Eloy,

          Can you please review my purposed changes:

          Repo: git://github.com/samhemelryk/moodle.git

          Branch: wip-MDL-24355-master
          Diff: https://github.com/samhemelryk/moodle/compare/master...wip-MDL-24355-master

          Branch: wip-MDL-24355-MOODLE_20_STABLE
          Diff: https://github.com/samhemelryk/moodle/compare/MOODLE_20_STABLE...wip-MDL-24355-MOODLE_20_STABLE

          Branch: wip-MDL-24355-MOODLE_19_STABLE
          Diff: https://github.com/samhemelryk/moodle/compare/MOODLE_19_STABLE...wip-MDL-24355-MOODLE_19_STABLE

          Cheers
          Sam

          Show
          Sam Hemelryk added a comment - Hi Eloy, Can you please review my purposed changes: Repo: git://github.com/samhemelryk/moodle.git Branch: wip- MDL-24355 -master Diff: https://github.com/samhemelryk/moodle/compare/master...wip-MDL-24355-master Branch: wip- MDL-24355 -MOODLE_20_STABLE Diff: https://github.com/samhemelryk/moodle/compare/MOODLE_20_STABLE...wip-MDL-24355-MOODLE_20_STABLE Branch: wip- MDL-24355 -MOODLE_19_STABLE Diff: https://github.com/samhemelryk/moodle/compare/MOODLE_19_STABLE...wip-MDL-24355-MOODLE_19_STABLE Cheers Sam
          Hide
          Andrew Davis added a comment -

          Ive pulled these branches into my repository. Ive added 1 new commit to both master and 20 stable however its just a few modifications to some comments to make it a bit clearer what is going on. 1.9 stable branch is unchanged from Sam's version. I just pulled it in to mine for the sake of them all coming from the same repository.

          Show
          Andrew Davis added a comment - Ive pulled these branches into my repository. Ive added 1 new commit to both master and 20 stable however its just a few modifications to some comments to make it a bit clearer what is going on. 1.9 stable branch is unchanged from Sam's version. I just pulled it in to mine for the sake of them all coming from the same repository.
          Hide
          Andrew Davis added a comment -

          Just noticed this. Fixing it now. If there are no correlations the code that tries to delete correlations that no longer exists gives this error:

          Coding error detected, it must be fixed by a programmer: moodle_database::get_in_or_equal() does not accept empty arrays !!!
          !! Stack trace: * line 581 of /lib/dml/moodle_database.php: coding_exception thrown

          • line 887 of /tag/lib.php: call to moodle_database->get_in_or_equal()
          • line 924 of /tag/lib.php: call to tag_compute_correlations()
          • line 312 of /lib/cronlib.php: call to tag_cron()
          • line 79 of /admin/cron.php: call to cron_run()
          Show
          Andrew Davis added a comment - Just noticed this. Fixing it now. If there are no correlations the code that tries to delete correlations that no longer exists gives this error: Coding error detected, it must be fixed by a programmer: moodle_database::get_in_or_equal() does not accept empty arrays !!! !! Stack trace: * line 581 of /lib/dml/moodle_database.php: coding_exception thrown line 887 of /tag/lib.php: call to moodle_database->get_in_or_equal() line 924 of /tag/lib.php: call to tag_compute_correlations() line 312 of /lib/cronlib.php: call to tag_cron() line 79 of /admin/cron.php: call to cron_run()
          Hide
          Andrew Davis added a comment -

          Error fixed in all 3 branches. I just need to write some testing instructions and this can be submitted for integration (I think).

          Show
          Andrew Davis added a comment - Error fixed in all 3 branches. I just need to write some testing instructions and this can be submitted for integration (I think).
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Ho, some comments about this before integration could happen:

          1) +1 to create tag/simpletest/testtag.php and work some unit tests creating needed tables and running tag_cron() (and possibly other bits in that library. I think it is the only way to check everything continues working as expected. I think we cannot do this right now, but would suggest to create new issue at least for 2.0 and upwards.
          2) tag_delete(): I can see that the function deletes records from tag_instance, tag_correlation and tag tables and that's correct. Just guessing what happens with records in tag_correlation having the tag being deleted in the correlatedtags column. Is the rest of the tag stuff (display...) ready to find non-existing tags in that column? Note that looking for the tag there can be really expensive (it is a text field), so surely the way to go is to "auto-fix" on display or something like that. For now all we need is to verify that nothing breaks if orphan tags are present in the column. New issue for the "auto-fix" thingy is ok IMO.
          3) Also, conditions + delete_records() calls have no sense at all, the method always return boolean true since 2.0. I've created MDL-27841 for review and fix all the current uses.
          4) The deletion of not identified correlations is done by BIG get_in_or_equal() use. Not sure what will happen if there are, say, 10000 correlations to delete. There are limits in the size of the SQL / number of params. Uhm, perhaps we should put some limit known as supported by all DBs and let that work "incrementally". Thoughts?
          5) There is one query with comment: "// this is (and has to) return the same fields as the query in tag_get_tags" and it seems that we have changed it and now "ti.ordering" is out. Or the comment is wrong of the change is wrong.
          6) As far as 1) above does not exist.. I'd suggest to add some more detailed testing instructions (which tags to introduce), what to expect and so on. For your consideration. Note this will need testing in 1.9, 2.0 and 2.1dev.

          I'm keeping this under integration for now, til the points above are created / commented / fixed. Thanks and ciao

          Show
          Eloy Lafuente (stronk7) added a comment - Ho, some comments about this before integration could happen: 1) +1 to create tag/simpletest/testtag.php and work some unit tests creating needed tables and running tag_cron() (and possibly other bits in that library. I think it is the only way to check everything continues working as expected. I think we cannot do this right now, but would suggest to create new issue at least for 2.0 and upwards. 2) tag_delete(): I can see that the function deletes records from tag_instance, tag_correlation and tag tables and that's correct. Just guessing what happens with records in tag_correlation having the tag being deleted in the correlatedtags column. Is the rest of the tag stuff (display...) ready to find non-existing tags in that column? Note that looking for the tag there can be really expensive (it is a text field), so surely the way to go is to "auto-fix" on display or something like that. For now all we need is to verify that nothing breaks if orphan tags are present in the column. New issue for the "auto-fix" thingy is ok IMO. 3) Also, conditions + delete_records() calls have no sense at all, the method always return boolean true since 2.0. I've created MDL-27841 for review and fix all the current uses. 4) The deletion of not identified correlations is done by BIG get_in_or_equal() use. Not sure what will happen if there are, say, 10000 correlations to delete. There are limits in the size of the SQL / number of params. Uhm, perhaps we should put some limit known as supported by all DBs and let that work "incrementally". Thoughts? 5) There is one query with comment: "// this is (and has to) return the same fields as the query in tag_get_tags" and it seems that we have changed it and now "ti.ordering" is out. Or the comment is wrong of the change is wrong. 6) As far as 1) above does not exist.. I'd suggest to add some more detailed testing instructions (which tags to introduce), what to expect and so on. For your consideration. Note this will need testing in 1.9, 2.0 and 2.1dev. I'm keeping this under integration for now, til the points above are created / commented / fixed. Thanks and ciao
          Hide
          Andrew Davis added a comment - - edited

          1) I've raised MDL-27859 to do the unit tests.
          2) I cant guarantee that having a deleted tag id appear in the list of correlated tags for non-deleted tag won't cause problems. However, i havent been able to make a problem occur. I've raised MDL-27861 to look further into this.
          3) ok.
          4) Im having trouble finding out what these limits are. Some guy on the net says its 1000 for Oracle. No link supporting that. Apparently SQL Server's limit is based on memory use. I found some guy saying he had a query working against his SQL server with 50000 although much above that and he started getting an "out of memory" type exception. All of this is little better than anecdotal. Needs more investigating.

          Show
          Andrew Davis added a comment - - edited 1) I've raised MDL-27859 to do the unit tests. 2) I cant guarantee that having a deleted tag id appear in the list of correlated tags for non-deleted tag won't cause problems. However, i havent been able to make a problem occur. I've raised MDL-27861 to look further into this. 3) ok. 4) Im having trouble finding out what these limits are. Some guy on the net says its 1000 for Oracle. No link supporting that. Apparently SQL Server's limit is based on memory use. I found some guy saying he had a query working against his SQL server with 50000 although much above that and he started getting an "out of memory" type exception. All of this is little better than anecdotal. Needs more investigating.
          Hide
          Andrew Davis added a comment - - edited

          #5, Im not sure. tags and tag instances are not the same thing but looking at the code they are treated as being interchangeable.

          if ( !$result = $DB->get_records_sql("SELECT DISTINCT tg.id, tg.tagtype, tg.name, tg.rawname, tg.flag, ti.ordering
          FROM {tag} tg INNER JOIN {tag_instance} ti ON tg.id = ti.tagid
          WHERE tg.id IN ({$tag_correlation->correlatedtags})") ) {
          

          This is the original query. It looks wrong. You'll get multiple rows per tag by joining with the tag_instance table. The addition of "distinct" looks like a half baked attempt to avoid this. But its the tag_instance table that has the ordering column as a tag instance has an ordering. A tag does not.

          Update: Ive put the old query back in although it still looks wrong. I think there probably should be two functions, tag_get_tags() and tag_get_tag_instances(). The function tag_get_tags() that currently exists should actually be called tag_get_tag_instances(). and tag_get_correlated() has nothing to do with it. MDL-27864 to investigate more.

          Show
          Andrew Davis added a comment - - edited #5, Im not sure. tags and tag instances are not the same thing but looking at the code they are treated as being interchangeable. if ( !$result = $DB->get_records_sql("SELECT DISTINCT tg.id, tg.tagtype, tg.name, tg.rawname, tg.flag, ti.ordering FROM {tag} tg INNER JOIN {tag_instance} ti ON tg.id = ti.tagid WHERE tg.id IN ({$tag_correlation->correlatedtags})") ) { This is the original query. It looks wrong. You'll get multiple rows per tag by joining with the tag_instance table. The addition of "distinct" looks like a half baked attempt to avoid this. But its the tag_instance table that has the ordering column as a tag instance has an ordering. A tag does not. Update: Ive put the old query back in although it still looks wrong. I think there probably should be two functions, tag_get_tags() and tag_get_tag_instances(). The function tag_get_tags() that currently exists should actually be called tag_get_tag_instances(). and tag_get_correlated() has nothing to do with it. MDL-27864 to investigate more.
          Hide
          Andrew Davis added a comment -

          6) testing instructions expanded.

          Show
          Andrew Davis added a comment - 6) testing instructions expanded.
          Hide
          Andrew Davis added a comment -

          Linked all of the issues Ive raised to this one.

          Show
          Andrew Davis added a comment - Linked all of the issues Ive raised to this one.
          Hide
          Andrew Davis added a comment -

          I think this is all done except for the issue of how many tag correlations can we delete in one query.

          Show
          Andrew Davis added a comment - I think this is all done except for the issue of how many tag correlations can we delete in one query.
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Ho Andrew,

          agree with everything, specially:

          5) let's keep the query as it was and analyze the differences @ MDL-27864 to investigate properly.
          4) about limits, perhaps it's really hard to get them in real life. I can imagine correlations will be progressively created / deleted so each cron will process "some" but hardly thousands (only some sort of massive deletion of courses or so could produce that in real life). So +1 for keep it as is and only re-thought the deletion if the problem happens.

          Thanks!

          Show
          Eloy Lafuente (stronk7) added a comment - Ho Andrew, agree with everything, specially: 5) let's keep the query as it was and analyze the differences @ MDL-27864 to investigate properly. 4) about limits, perhaps it's really hard to get them in real life. I can imagine correlations will be progressively created / deleted so each cron will process "some" but hardly thousands (only some sort of massive deletion of courses or so could produce that in real life). So +1 for keep it as is and only re-thought the deletion if the problem happens. Thanks!
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Sad, I was going to integrate and suddenly detected that the merge from 20_STABLE had some problems, not modifying the tag_get_correlated() query as agreed by 5). Curiously the diff @ github shows the change but the merge doesn't apply it.

          So I'm failing this, plz fix your branches or whatever is necessary. I'll be happy to integrate this once that's solved.

          Thanks!

          Show
          Eloy Lafuente (stronk7) added a comment - Sad, I was going to integrate and suddenly detected that the merge from 20_STABLE had some problems, not modifying the tag_get_correlated() query as agreed by 5). Curiously the diff @ github shows the change but the merge doesn't apply it. So I'm failing this, plz fix your branches or whatever is necessary. I'll be happy to integrate this once that's solved. Thanks!
          Hide
          Andrew Davis added a comment -

          Ive created a new 2.0 stable branch and cherry picked the commits. If it doesn't work this time around I'm not sure what is going on.

          Show
          Andrew Davis added a comment - Ive created a new 2.0 stable branch and cherry picked the commits. If it doesn't work this time around I'm not sure what is going on.
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Aha, now branches seemed to apply the correct changes, thanks! (I think my problems were because of the wip- and non wip- differences or so when merging here).

          Show
          Eloy Lafuente (stronk7) added a comment - Aha, now branches seemed to apply the correct changes, thanks! (I think my problems were because of the wip- and non wip- differences or so when merging here).
          Hide
          Eloy Lafuente (stronk7) added a comment -

          tested under 19_STABLE, 20_STABLE and master, creating correlations, deleting tags, deleting correlations, running the cron as specified above. Everything seems to be ok.

          So passing, ciao

          Show
          Eloy Lafuente (stronk7) added a comment - tested under 19_STABLE, 20_STABLE and master, creating correlations, deleting tags, deleting correlations, running the cron as specified above. Everything seems to be ok. So passing, ciao
          Hide
          Eloy Lafuente (stronk7) added a comment -

          And now this is part of the best Moodle weeklies ever, thanks!

          Closing.

          Show
          Eloy Lafuente (stronk7) added a comment - And now this is part of the best Moodle weeklies ever, thanks! Closing.

            People

            • Votes:
              4 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: