|
[
Permalink
| « Hide
]
Mathieu Petit-Clair added a comment - 12/Feb/08 12:28 PM
This cron job is actually in the code, but there was a problem in the query, along with the fact that blog posts have an itemtype of "blog" while they are saved in table "post".
The code now remove old tags as they become "unused", which will prevent this problem from happening.
I'm leaving this ticket open though, because the current code doesn't clean up what was left before. I just committed a new tag_cleanup() function that cleans up the tag database. This should fix both this issue, MDL-1544 and probably
I didn't make it automatic now: this is potentially destructive. I tested this extensively, but I would like other people to review this patch and post comments, before calling this function from the cron. To test: Eloy, can you have a look at the code and tell me if there's a horrible error in there? Otherwise, it's going in (and in the cron) before 1.9.1, hopefully.
Hi Mathieu,
the tag_cleanup() looks nearly perfect IMO, some comments: 1) The first $tags = get_records('tag'): I will delete (can become really BIG) it and perform record_exists() while iterating. SELECT tg.id or, alternatively: SELECT tg.id Not tested! But any of them will avoid the iteration and executing one query for each tag. Hope this helps, ciao Muchas gracias Eloy! New diff: http://cvs.moodle.org/moodle/tag/lib.php?r1=1.43.2.34&r2=1.43.2.35
Looks perfect now! Great job. Ciao
What is the status, can we resolve this as fixed?
oh, the performance of tag_cleanup() still seems less than optimal, looking at it...
Sending untested patch, the idea is that if no cleanup done the number of queries should be constant (does not depend on number of tags or instances).
Also we should always try to use LEFT JOIN instead of "AND NOT EXISTS (SELECT..)" - the time difference can be in orders of magnitudes when dealing with large tables. I think that with this type of changes the tag cleanup could be done in each cron invocation. I did not test the patch at all because my tag* tables are empty and I am no tag expert Thanks Petr, I'll do tests and commit this.
Uhm... when was reviewing this bug... initially I thought about to change it in a similar way than the proposed by Petr's patch, but finally decided to keep it that way for better readability. But agree that, in the long term, could be better to have it working as suggested. So +1 for that.
IMO, to keep it working in the 20% section is ok, because with current (solid) API is difficult to have those "orphan" instances anymore (in fact the original bug was about to make the clean as part of the upgrade process, only once, and not in cron). And then this:
Don't agree at all! I think they are 99.9% the same under all DBs, in fact NOT EXISTS will work better under some circumstances (not indexes in join column, for example). And it's far more readable. Real differences happen with "AND NOT IN (SELECT)" because the IN clause is executed per-each record! BUT NOT EXISTS is, as commented, perfect, IMO. Ciao Thanks ELoy, I guess I hit the 0.1% accidentally with some weird subquery in stats and also mixed it up with "NOT IN" which was causing more trouble (or more likely I did some wrong measuring).
Do we have this "NOT IN" and "NOT EXISTS" in coding guide somewhere? Maybe we should add a new wiki page with tips for writing of efficient sql queries Moodle-round-the-clock did it again.
It would be amazingly great to have a guide to writing efficent sql in Moodle, especially with all the JOINs that are in the code. I guess you two should be the ones writing it.. Closing... I'm really empty of ideas about what to write there. Perhaps we should start annotating them as they happen, hehe.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||