Details
-
Type:
Sub-task
-
Status:
Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 1.9
-
Fix Version/s: 1.9
-
Component/s: Roles / Access
-
Labels:None
-
Affected Branches:MOODLE_19_STABLE
-
Fixed Branches:MOODLE_19_STABLE
Description
build_context_path(true) is called from the upgrade to populate the context table.
This sort of SQL breaks MySQL:
(mysql): UPDATE mdl_context SET depth=it.pdepth+1, path=CONCAT(it.ppath,'/',id) FROM (SELECT c.id AS instanceid, pctx.path AS ppath, pctx.depth as pdepth FROM mdl_course c JOIN mdl_context pctx ON (c.category=pctx.instanceid AND pctx.contextlevel=40) WHERE c.id != 1) it WHERE contextlevel=50 AND mdl_context.instanceid=it.instanceid 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'FROM (SELECT c.id AS instanceid, pctx.path AS ppath, ' at line 3
adodb_mysql._execute(UPDATE mdl_context
SET depth=it.pdepth+1, path=CONCAT(it.ppath,'/',id)
FROM (SELECT c.id AS in..., false) % line 891, file: adodb.inc.php
adodb_mysql.execute(UPDATE mdl_context
SET depth=it.pdepth+1, path=CONCAT(it.ppath,'/',id)
FROM (SELECT c.id AS in...) % line 89, file: dmllib.php
execute_sql(UPDATE mdl_context
SET depth=it.pdepth+1, path=CONCAT(it.ppath,'/',id)
FROM (SELECT c.id AS in..., true) % line 4479, file: accesslib.php
build_context_path(true) % line 8, file: test.php
Attachments
-
- bcp_mysql2.patch
- 19/Sep/07 11:53 PM
- 10 kB
- Petr Škoda (skodak)
-
- moodl_19_perf.diff
- 20/Sep/07 5:56 AM
- 0.4 kB
- Bryce Thornton
Issue Links
Activity
- All
- Comments
- History
- Activity
- Source
- Test Sessions
Does not look good, the MySQL manual clearly states that:
Currently, you cannot update a table and select from the same table in a subquery.
Which means we might have to do recordset looping for mysql family, which could be really slow...
Just to confirm (now that I've understood what the UPDATE does):
It's (should be) impossible to update records in one table based in other records of the same table. 100% sure. It's a matter of immutability and blah, blah,
So possible solutions are (from HQ):
1) PHP Recordset (as Petr commented)
2) Temp table usage to store calculated values: One insert with id, deph+1 and sql_concat(it.ppath,'/',id). And one simple update by id (PK).
2 is the recommended way for large sets (like this).
Committed temporary patch into cvs, it is ugly but works both with pq and mysql.
Assigning to Martín, please review and improve the use of temporary table or create a new permanent table or rewrite it completely ![]()
Thanks, Petr - good job.
Bringing down from critical because the urgency is off. Still needs more review though.
Eloy discovered that the current create_temp_table() code can not work for MSSQL because it prepends the # - which means we can not use the {$CFG->prefix}tablename - I think we should use normal table instead to play it safe for 1.9
also, I think that:
TRUNCATE tablename
isn't cross-db, but:
TRUNCATE TABLE tablename
is.
Looking to create_temp_table() to see if we can make it 100% cross-db... uhm...
Also I would change statements of this type (there are some of them):
UPDATE {$CFG->prefix}context
SET depth=2, path=".sql_concat("'$base/'", 'id')."
WHERE contextlevel=".CONTEXT_USER."
AND instanceid IN
(SELECT id FROM {$CFG->prefix}user)
to:
UPDATE {$CFG->prefix}context
SET depth=2, path=".sql_concat("'$base/'", 'id')."
WHERE contextlevel=".CONTEXT_USER."
AND EXISTS
(SELECT 'x' FROM {$CFG->prefix}user WHERE user.id = {$CFG->prefix}context.instanceid)
(not tested!)
But, generally, the EXISTS should perform better than IN
Ciao ![]()
And, one detail. When using that "JOIN" syntax... IMO the only conditions present in the "ON" clause must be those performing the join, while the rest must go to the WHERE part.
BTW why are we checking he emptiness of path? we could instead use the depth field == 0 - no ora hack + could be faster
also why not declare the path as nullable?
I found a small bug in build_context_path while updating from 1.8.2+. I'll attach a patch.
This is the patch for the small bug I found in build_context_path.
And +1 for NULL in path instead of empty.
NULL is cross-db (though will imply to detect - and modify - all the places in code where we are relying in those empties).
Awake and reviewing this now. Thanks Petr for the fix.
Temptable prefixes:
We need to fix the temptable thing across DB. The temptable name having a # is a bit of a problem with Oracle because the # will conflict with table prefixes in our coding style – we mix calls to functions that could do tablename magic like get_records() and straight SQL like SELECT ... FROM {$CFG->prefix}foo.
One way to "fix" it is to return the tablename in the create temp table call, and say that we don't support table prefixes in Oracle. If there are no prefixes, then get_records($temptable) and SELECT ... FROM {$CFG->prefix}$temptable both work perfectly well with $temptable = '#foobar'
EXISTS vs WHERE id IN():
Yes - EXISTS is faster, I was avoiding it because I thought it was less portable, and it's a bit more magic and limited than WHERE {CONDITION}. If Eloy confirms that EXISTS is more portable and predictable... great!
NULLS:
I don't know. Are we NULL-safe these days? If so, NULLs are ok.
Checking for depth instead of path:
Yes - completely agreed.
I've committed a fix that makes things much faster and simpler in Pg and MySQL. After a bit more sleuthing, it looks like the UPDATE SQL queries, to be fast and reasonably efficient, need to be split actross 2 families:
PG & MSSQL support using FROM
UPDATE tbl
SET col = ...
FROM tbl
JOIN othertbl ...
whereas MySQL and Oracle go down the track of listing the tables in the UPDATE
UPDATE carts c, prices p
SET c.format_code = p.format_code
WHERE c.cart_item_id IN
(SELECT t.cart_item_id from prices r, carts t
WHERE r.<price_table_key> = p.<price_table_key>
AND blah blah blah.....)
The latest change to accesslib
http://moodle.cvs.sourceforge.net/moodle/moodle/lib/accesslib.php?r1=1.394&r2=1.395
broke MySQL again (going back one revision let it upgrade correctly). Here is the error I had:
UPDATE mdl_context SET path = ct.path, depth = ct.depth FROM mdl_context_temp ct WHERE ct.id=mdl_context.id
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'FROM mdl_context_temp ct WHERE ct.id=mdl_context.id' at line 4
1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'FROM mdl_context_temp ct WHERE ct.id=mdl_context.id' at line 4
With Petr's changes (and mine on top), the upgrade of test.moodle.com works well for me. Good.
We do have a few pending bits
- sanity check of the UPDATE SQL as discussed above on MSSQL/Oracle
- temp table handling, specially wrt the '#' on Oracle
I've also drafted a "populate_contexts()" function, but I am thinking this through – from my tests, we don't actually need it.
- sanity check of the UPDATE SQL as discussed above on MSSQL/Oracle
- temp table handling, specially wrt the '#' on Oracle
Fixed MD's error with a switch from dbtype to dbfamily when picking what SQL to use.
Four minor note:
- I really think that latest changes on $updatesql = ... leave Oracle 100% out. Missing $dabfamily == ?
- Why the "mysql" one has such a a excess of prefixes?
- Also, IMO, no one of those updates will work under Oracle. It doesn't support FROMs without SELECTs (1st update) not multiple tables in the UPDATE (2nd update). But it supports multiple fields in an statement, something like (not tested!):
$updatesql = "UPDATE {$CFG->prefix}context ct
SET (ct.path, ct.depth) =
(SELECT temp.path, temp.depth
FROM {$CFG->prefix}$temptable
WHERE temp.id=ct.id";
- Perhaps it would be good to launch additional "TRUNCATE TABLE temptable" statements between new inserts? I will make the UPDATEs lighter, IMO.
- I really think that latest changes on $updatesql = ... leave Oracle 100% out. Missing $dabfamily == ?
- Why the "mysql" one has such a a excess of prefixes?
- Also, IMO, no one of those updates will work under Oracle. It doesn't support FROMs without SELECTs (1st update) not multiple tables in the UPDATE (2nd update). But it supports multiple fields in an statement, something like (not tested!):
- Perhaps it would be good to launch additional "TRUNCATE TABLE temptable" statements between new inserts? I will make the UPDATEs lighter, IMO.
I'm reopening this (knowing that, in teory, is working against all DBs but Oracle and that temp table support under MSSQL is pending... will check later...
I agree with Eloy - I don't think the current SQL works well in Oracle – in real life. However, my understanding is that
- we can get an UPDATE that works in Oracle and MySQL
- we can get an UPDATE that works in MSSQL and Pg
but I'll need Eloy's confirmation...
- we can get an UPDATE that works in Oracle and MySQL
- we can get an UPDATE that works in MSSQL and Pg
looking at syntax scheme for Oracle http://download-uk.oracle.com/docs/cd/B14117_01/server.101/b10759/statements_10007.htm#SQLRF01708
it looks like we could use:
- either mysql style (watcom?) with two tables after UPDATE
- or SET () = (SELECT) which is not compatible with PG
- either mysql style (watcom?) with two tables after UPDATE
- or SET () = (SELECT) which is not compatible with PG
Sorry guys. Am I forgetting anything?
Why is it so important to get one update that works both under Oracle and MySQL, ML ? Isn't only a matter of specify one more $CFG->dbfamily condition and nothing else?
Or where do you read in Oracle SQL Ref the two tables can be used in the UPDATE statement, Petr? I'm 95% sure that they cannot.
Sincerely, I would use the approach described above:
$updatesql = "UPDATE {$CFG->prefix}context ct
SET (ct.path, ct.depth) =
(SELECT temp.path, temp.depth
FROM {$CFG->prefix}$temptable temp
WHERE temp.id=ct.id)
WHERE EXISTS
(SELECT 'x'
FROM {$CFG->prefix}$temptable temp
WHERE temp.id = ct.id);
I guess that Oracle will need an extra EXISTS clause (present in the above sql) to update ONLY records with temp counterparts. The rest of DBs, with their own UPDATE syntaxes should be restricting ok to matching records only (I hope).
So, we'll end with one SQL for MySQL, another to both PG and MSSQL and another to Oracle. Yep, nop ?
you are right Eloy, I was not reading the schema properly - grrr - sorry
I was thinking about using four separate updates with subselects for "other databases" than pg and mysql.
Maybe a stupid question, why not merge those statements into several bigger semicolon separated queries? Would it work? Would it be faster?
BTW yesterday I was fighting with 2 subqueries in one update statement - it did not work for me in mysql+temptable - is this allowed in Oracle? is this a mysql bug?
Petr:
> why not merge those statements into several bigger semicolon separated queries?
- don't think it will help much – it'll save almost nothing, and it will make debugging harder. Also - not all DB backend drivers allow you to pass many statements.
Eloy:
> Why is it so important to get one update that works both under Oracle and MySQL, ML ?
Oh, ideally, I'd want to have just one SQL update for all of them. The less SQL variations we have to maintain, the easier... as long as it's fast...
In that sense, I believe that the original syntax with the subselect that MySQL didn't like is only invalid for MySQL – and much faster elsewhere. Oracle and MSSQL can definitely update from subselects on the same table. Pg is around 25 times faster – I have no idea what the difference is on MSSQL and Oracle.
That would mean one code flow using the temp table for MySQL, one codeflow for everyone else using a subselect. Which nicely sidesteps our temp table issues with Oracle...
- don't think it will help much – it'll save almost nothing, and it will make debugging harder. Also - not all DB backend drivers allow you to pass many statements.
Eloy - does the syntax you propose work everywhere? If it does, +1!
$updatesql = "UPDATE {$CFG->prefix}context ct
SET (ct.path, ct.depth) =
(SELECT temp.path, temp.depth
FROM {$CFG->prefix}$temptable temp
WHERE temp.id=ct.id)
WHERE EXISTS
(SELECT 'x'
FROM {$CFG->prefix}$temptable temp
WHERE temp.id = ct.id);
Sounds like a good plan to me:
- special version with temp table+UPDATE with two tables for MySQL
- original version from ML for all other db families
- special version with temp table+UPDATE with two tables for MySQL
- original version from ML for all other db families
Hi, the Oracle variation above is Oracle only! (and I think it's impossible to have the same update working against all DBs unless using two subqueries). So I would vote to keep the three different updates, after all, each one is the "best for each DB" and any other attempt will be less than optimal. So +1 for the 2 updates.
So, sumarizing, current updates in CVS (MySQL, PG and MSSQL) are working fine. Absolutely! I've tested that:
- MySQL works fine (and the syntax itself is clever enough to update only the required records).
- PostgreSQL and MSSQL work fine with the same statement (and the syntax itself is clever enough to update only the required records).
So I really think that we only need to:
1) Add the Oracle variation above (that works and has the clever harcoded
).
2) Fix the temp tables generation once and forever (I think that the problematic is the MSSQL one, with that horrendous # prefix). Tomorrow I'll try to add full support for temp tables properly from XMLDB in a cross-db fashion, ok? I've designed the basics today.
And I would propose also this, to your consideration:
3) Truncate the temp table after each insert & update iteration. I will make the update to be really lighter. Currently we are "repeating" the same records update a lot of times (at least for courses categories, because previous iteration records continue there).
4) Test it again under all DBs
So, if nobody is against 2) I get it (tomorrow).
- MySQL works fine (and the syntax itself is clever enough to update only the required records).
- PostgreSQL and MSSQL work fine with the same statement (and the syntax itself is clever enough to update only the required records).
Petr - thanks!
Eloy - Thumbs up for 1, 2 & 3 ( I had seen your proposal to truncate earlier, and forgot to reply - I think I removed the truncates yesterday, it was a mistake!)
I'm a bit bogged down today/tomorrow - but this is looking good!
Well, I updated moodle.org and guess what - the mysql user that site was using didn't have access for temporary tables (doh) and the upgrade failed.
This is probably not uncommon, so can I suggest we either:
1) Test for this ability before we use it (like in the environment check).
2) Create this same table as a permanent table (usually empty)
3) Create a generic permanent table that can be used by different bits of code as a temporary space, something like:
id
sessionid
data1
data2
data3 .... etc
I like (3) most , (2) a little less, and (1) the least.
Oki. What about:
1) Create a PERMANENT mdl_context_temp table (after all, this is going to be used continuously).
2) Concurrency will be avoided by changing slightly the INSERTS to avoid duplicates (duplicate UPDATES shouldn't be a problem)
3) Drop temp tables buggy support from ddllib? Or implement it for 2.0 (new bug)
4) Add the Oracle above.
5) Add the missing TRUNCATES
6) Test/test/test
Say YES and I'll do 1-6. Ciao ![]()
Done. Now the permanent table is installed and being used by build_context_paths()
Anyway I've found this:
SQL broken: Operand type clash: ntext is incompatible with int
SELECT name, value FROM mdl_config_plugins WHERE plugin='accesslib/dirtycontexts' AND value > (1190364087 - 2)
I guess same will happen with Oracle. Could we change that to one PHP resultser iteration instead?
After some research and talk in HQ chat:
It seems that all the uses of that 'accesslib/dirtycontexts' is pretty well centralized in three functions:
- get_dirty_contexts() to get all the currently dirty contexts.
- mark_context_dirty() to mark one context as diry.
- cleanup_dirty_contexts() to unmark dirty plugins
I would propose to move from the 'config_plugins' table to the 'contexts' one itself (it's the natural place, IMO)
Not sure if it can be performed with a simple (0/1) flag (it seems that some 'timeout' concept is in the whole thing). I guess that, instead, to preserve that behaviour, we must:
1) Add one 'timedirty' field to the 'context' table. Integer10 and indexed!
2) Modify the 3 functions above to work against the context table.
It should provide us, at least:
1) cross-db. Always interesting ![]()
2) Speed, by going against one numeric & indexed field, instead of a text-unindexed one.
To your consideration. Ciao ![]()
- get_dirty_contexts() to get all the currently dirty contexts.
- mark_context_dirty() to mark one context as diry.
- cleanup_dirty_contexts() to unmark dirty plugins
Hi all – I hadn't seen the activity on this bug earlier today, sorry! Ok - down to business
- I don't mind moving this to a dedicated temp table, however, let's not drop temptable support from ddllib. As we move to more batched SQL work, the need for temptables will only grow... including tricks like creating the temp table, populating it and adding indexes after all the inserts...
- Context-dirtyness: we fix the "time - 2" to php, and make it a string comparison as Eloy suggested. I tried to fix things to not need CASTs and I didn't fix it completely.
Moving "dirty contexts" from config_plugins to context table will have a huge cost, and specially since we want to read the dirty contexts on each pageload, this should be cheap cheap cheap.
The context table is huge, and the dirty contexts will always be a few, and changing lots. So the index for dirtty on the context table will be huge on disk, have high costs updating, and the statistics will be off most of the time. And actually, we don't need that info when reading the contexts table either. We need it for contexts we might not even read directly (example: if you are checking a cap in a module instance context, you probably never read the coursecat contexts above it, as accessdata has all you need... but if one of those coursecats is dirty, that should trigger an accessdata reload, so it makes sense to decouple the data).
A small table can deal with a lot of movement at a much lower cost because the indexes are really small. config_plugins is great for this – as it fits comfortable in the memory of the DB server, and its whole index can probable be read and written-to with a single disk-seek.
[ I gave this matter quite a bit of thought and testing before deciding on using config_plugins ]
In fact, there's something we can do to make it cheaper: use a funky bit of SQL (with a UNION ALL) to read both config and config_plugins->dirtycontexts in one go during lib/setup.php. That will save us 1DBq per pageload...
- I don't mind moving this to a dedicated temp table, however, let's not drop temptable support from ddllib. As we move to more batched SQL work, the need for temptables will only grow... including tricks like creating the temp table, populating it and adding indexes after all the inserts...
- Context-dirtyness: we fix the "time - 2" to php, and make it a string comparison as Eloy suggested. I tried to fix things to not need CASTs and I didn't fix it completely.
Thinking about it some more, I remembered that having dirty paths decoupled from the contexts table is also important because we can mark dirty paths that are no longer there, or that have moved. Have a look at context_moved() to see what I mean – and also delete_context() of course.
This is very important - as it allows us to force the reload of accessdata for users that are logged in when courses/coursecats are moved around.
Aha,
so, you need to be able to store dirty contexts somewhere out from the context table itself because of deleted contexts. Oki. And they are only a few. Perfect!
I really like the idea of loading all the "accesslib/dirtycontexts" from setup.php (agreeing it can be performed in one UNION ALL statement to save one query I would have it separated in one function to have it well encapsulated) , then handle them in PHP (the time condition), updating/deleting from DB as necessary.
+1 for that
Great! One minor thing I'm not sure we've understood eachother (spanish speakers trying to understand eachother in English!? WTF, we haveto teach these infidels some spanish)
When you say:
> then handle them in PHP (the time condition),
what I think will be fastest is to prepare in PHP the time condition, but to "enforce it" in the WHERE clause of the SQL query. In both cases the row has to be read into the DB memory (it'll only be a few rows at any given time!), but it's faster to do the > on the DB side. Is that what you are saying...?
(this is important mainly in the every-page query, in cron.php we could do it with an abbaccus...)
Hehe,
what I was thinking was this:
1) NEW function: load_dirtycontexts(), executed in setup.php, that will load ALL 'accesslib/dirtycontexts' records and store them for the life of the REQUEST.
2) Change get_dirty_contexts() to work 100% in memory. Could fallback to DB if necessary for cron although I'd add 1) to cron too.
3) Change mark_context_dirty() to perform the mark both in REQUEST and DB.
4) Change cleanup_dirty_contexts() to delete from REQUEST and from DB.
That way we won't have to execute those casted, inequalities... against DB at all, so cross-db will be easier and, hopefully, quicker.
That was my idea, summarized, with the phrase "then handle them in PHP (the time condition)" above, that's Spanish :-D :-P :-D
Hmmmm.
1 - We need to update the "$CFG = get_config();"line in lib/setup.php with something that populates $CFG and $DIRRTYPATHS in one SQL query, retrieving the least DB rows. So we can use a UNION ALL, and the dirty contexts subselect must say WHERE value > '$timecutoff'. This is the cheapest way.
Why push the work to the DB? Because the dirty paths will never be many (say, between 0 and 3 or 4) for each user if we use the cutoff, because as soon as the user "hits" a dirty path accessdata is reloaded. But those entries "live" in the table for 4hs - and on a busy site that could add up to a few hundred entries. The number is low enough to not need an index or a dedicated table, and to be fast to do in an in-memory-scan on the database side but it's quite a bit of data to move between DB and PHP just to do the exact same comparison on PHP.
So
- do the '<' work on the DB side – it's faster!
- avoid transferring data rows we don't need!
2 - get_dirty_contexts() must disappear! No longer needed. T
3 - Yes, it's a good idea to have mark_context_dirty() update $DIRTYCONTEXTS
4 - cleanup_dirty_contexts() doesn't need to do it from the request – there'll never be any overlap. It will be deleting stuff over 4hs old that will not be in the $DIRTYPATHS, ever. Way too old.
To summarise:
- preserve original strategy, it pushes the job to the DB and moves the least data between DB and PHP
- fix my error in cleanup_dirty_contexts()
- do #1 to save 1DBq
- do the '<' work on the DB side – it's faster!
- avoid transferring data rows we don't need!
- preserve original strategy, it pushes the job to the DB and moves the least data between DB and PHP
- fix my error in cleanup_dirty_contexts()
- do #1 to save 1DBq
Actually - #3 will bring a bit of trouble methinks.
Any HTTP request that calls mark_context_dirty() must either finish soon (redirect saying "yeah, the job is done") which is what we do now – OR do the reload of accessdata itself if it's long-lived like cron.php . Do we have anything in cron.php that calls mark_context_dirty()?
One of the nasty things of DIRTYPATHS is that it's got a granularity of 1s – actually of 3s when we look at the DB. Which means that during a single long-lived process (cron) we could end up calling reload_all_capabilities() a billion times before the 1s ticks over.
If we document that the semantics the way I've described it above ("finish soon, or do the homework"), it also means taht the caller is responsible for making sure that the reload happens once at the end of all the muddying of the paths, rather than hitting reload_all_capabilities (with its 3 DBq cost) a few hundred times.
Hmm, I don't understand why config_plugins should be used here at all.
Firstly, these are not config variables for any plugin. Using that table to pass these temporary flags between concurrent sessions is just not right semantically. Imagine if every developer who needed to do something similar used the table as a place for scratch data, it would rapidly become large and full of all kinds of stuff.
Secondly, it's a generic table ... integers are being stored in a text field.
I think this stuff should go in a new custom table (context_dirty, say) with one (fast and very db-cacheable) query.
yep, I must recognize that I was getting 100% lost with latest messages ... one specific table to store the dirty contexts can be the best. With pure speed and cross-db ints and so on.
Agree.
When I wrote the DIRTYPATHs thing I did think of a dedicated table, but in the end I didn't. A bit because I'm lazy (w00t!) and a bit because I think we need a general table where we can store values & timestamps.
The dirty paths are a case of "time-sensitive" flags. We also have some locks that are time sensitive in the code. And if we are doing more batch / bulk processing, we'll need this.
So I am thinking - could we add an optional 'timestamp' to config_plugins? Or have a similar table, called "config_flags_locks_and_other_things" with a structure of
-id,
- plugin(or "scope"/"realm"/"whatever"),
- name,
- timestamp
- value
It'd also be useful for the auth/ldap NTLM support patches I'm planning to merge (which currently also use plugin_config, and are time-sensitive)
- plugin(or "scope"/"realm"/"whatever"),
- name,
- timestamp
- value
Oh, really... I think we must end with a solution for this. My "personal requirement" about this is really simple:
"Being 100% cross-db"
just that. Nothing else. If finally we add one new BIGINT field to the config_plugins table or if we create one context_dirtypaths new table, great, go, go, go!
But this should be fixed ASAP (it order to be able to continue testing MSSQL and Oracle - currently broken).
Also, I must vote -1 here for timestamps (date / time fields in general), in case you are talking about them, instead of BIGINTs (like the rest 99.9% of Moodle).
Ciao ![]()
Complete agreement: 100% cross DB is the requirement.
In terms of timestamps vs bigints, I was of course thinking of a "moodle timestamp", a bigint
(... but I have to confess that since mysl v5 is out I have been wondering about what would we need to migrate from storing epochs to storing timestamps, and doing the time/date and TZ work in the DB. It's really hard
but perhaps, one day... )
So - to clarify - I'd be very happy with adding a bigint field to config_plugins (preferred) and also very happy if people want a different generic table for this kind of use (just give me a name for it!).
Also - Eloy, it'd be great to flesh this out... http://moodle.org/mod/forum/post.php?reply=357989
Hehe, while you was witting here, I was replying there! :-D
Nice to see that we were talking about BIGINTs. Now somebody (MD) has to decide about new field or generic new table.
Perhaps, IMO, thinking in other developments needing such type of data, I would give my +0.5 for a new generic "flags_locks_and_other_dirty_things" table, leaving config_plugins for its proper use.
Ciao ![]()
Two quick notes about the "cache_timestamps " proposal above:
1) Perhaps it should contain one int, one varchar and one text field to be able to store different data.
2) And, after a bit of thinking, it seems that we are going to use such table for two really different things:
- As "temp storage". For really temp (whose life is the connection life) information (like the used by auth/enrol plugins).
- As "persistent storage". For non-temp information (whose life isn't the connection life, but some sort of limited life). Like the used in the dirty contexts.
For temp storage, pehaps we need the help of some cron, to delete everything older than, say, 24 hours.
For persistent storage, each API using it must be responsible to keep those records deleted when necessary.
So, we'll need one more field to mark if the record is "temp storage" or "persistent storage" in order to allow cron to know what to delete.
Those are the cons of using one table for two different things (real temp usage and persistent storage) in the same table...
Just to think a bit about that....ciao ![]()
- As "temp storage". For really temp (whose life is the connection life) information (like the used by auth/enrol plugins).
- As "persistent storage". For non-temp information (whose life isn't the connection life, but some sort of limited life). Like the used in the dirty contexts.
+1 for Eloy's idea of including both timestamp and "value" varchar/text field.
And I think we can use it for at least
- "temp storage" the only real idea I have here is to use it for locks
- persistent – but time-limited – storage, like dirty paths and the "mini-session" entries from the new auth/ldap
- configuration entries just like config_plugins - the timestamp is useful because certain configuration changes may trigger costly operations. it us good to be able to say "recompute only X if confguration value Y changed since the last run".
In terms of the cronjobs, each type of data we store there will have its own rules. Some entries will be from plugins/modules that have a cronjob that will take care of clearing it up. Others we may need to add support for a cronjob – I think we are covered with modules, auth and enrol... (do blocks have a cron() method?)
The name cache_timestamps is alright, I'll try to see I if I can think of a clearer alternative - but it's hard, as we want to use it for various things :-/
- "temp storage" the only real idea I have here is to use it for locks
- persistent – but time-limited – storage, like dirty paths and the "mini-session" entries from the new auth/ldap
- configuration entries just like config_plugins - the timestamp is useful because certain configuration changes may trigger costly operations. it us good to be able to say "recompute only X if confguration value Y changed since the last run".
I'd really rather keep config variables separate in the config tables. Add a timestamp there if we must.
For this one, which is for limited-time values and flags, how about we call it:
cache_flags
with fields:
id
recordtype
name
value
timestamp (if reserved then timecreated)
Let's try and get this settled by tomorrow ... time's a-wasting.
Yes. In the beginning of a moot right now, but I'm happy with cache_flags (might try to talk MD into calling it "flag" or "cache_flag" - note the singular), and I'll implement it ASAP.
WRT build_context_path() in itself
- subselect-update not happy with mysq
- temp tables very hard to support
--> I''m tempted to revert to the slower-but-works-everywhere version, except that upgrades of large sites are going to hurt. I don't remember if the code uses a loop, if it does, it'll hurt a ton on large sites.
- subselect-update not happy with mysq
- temp tables very hard to support --> I''m tempted to revert to the slower-but-works-everywhere version, except that upgrades of large sites are going to hurt. I don't remember if the code uses a loop, if it does, it'll hurt a ton on large sites.
Martin, since this is so crucial, can you let me know an ETA for this? If you are busy on other things in the next 24 hours I can certainly do it.
Eloy - looking at the temptable you've implemented - there's nothing preventing concurrent usage...?
Quick status update:
- Looks like Eloy & Petr have (a) added a "permanent" temp table and fixed all the SQL to work with it. Are you guys happy with it in terms of cross-db-support? Or do you want me to review & do more work on it...?
- cache_flags table in place (on the git branch till cvs is up again – review time!), plus associated functions in moodlelib
- dirty contexts handling is using cache_flags table now
- better/more consistent sql error handling in dmllib (in git branch). Eloy - do you want to review these?
- Looks like Eloy & Petr have (a) added a "permanent" temp table and fixed all the SQL to work with it. Are you guys happy with it in terms of cross-db-support? Or do you want me to review & do more work on it...?
- cache_flags table in place (on the git branch till cvs is up again – review time!), plus associated functions in moodlelib
- dirty contexts handling is using cache_flags table now
- better/more consistent sql error handling in dmllib (in git branch). Eloy - do you want to review these?
Yep, I think that the build_context_path() should be working properly cross-db using that context_temp permanent table, at leat I tested all the problematic updates and were working under all DBs.
I think that I only modified the updates of depth and path (to be cross-db) and the inserts into context_temp table (to avoid duplicates in case of concurrency - that should never happing - but prevented them).
I would suggest you a quick review to it, anyway, to check that everything is in place.
About the rest, cool! If you want, paste here the link to dmllib changes under git to take a look to it (although sure it's perfect).
Ciao ![]()
Eloy - great, will review your code, and...
- link to general change series http://repo.or.cz/w/moodle-pu.git?a=log;h=mdl19-perf-4
- link to general change series http://repo.or.cz/w/moodle-pu.git?a=log;h=mdl19-perf-4
Re-reading my own patch, I think that we should change the semantics of delete_records*() and set_field*() to
- return false in case of error
- return $db->Rows_Affected() on success
This allows the callers to handle the 3 possible cases
- changed X rows successfully (report number of rows?)
- changed 0 rows successfully (nothing matched, but no error)
- b0rked SQL!
but "not right now". It's a 2.0 change ![]()
- return false in case of error
- return $db->Rows_Affected() on success
- changed X rows successfully (report number of rows?)
- changed 0 rows successfully (nothing matched, but no error)
- b0rked SQL!
Oki.
I've seen ML initial version and Petr's change some time later (due to the MySQL hardlmit of 1000 bytes in index size = 1000/3 = 333 chars for utf-8 char fields).
Current code splits the original UNIQUE index into 2 separate indexes (not critical but far from perfect).
So, if possible I would:
- Reduce such 255 fields to something smaller.
- Create the proper UNIQUE index.
Ciao ![]()
P.S.: Also, I'm going to introduce one index length checker in the XMLDB editor. MDL-11569
- Reduce such 255 fields to something smaller.
- Create the proper UNIQUE index.
ML messaged me when he woke up, proposed something similar 200+100 field length - should be working on that tomorrow...
Cool about to change fields length and reintroduce the unique index!
One minor (I hope) note:
I was fixing MDL-11571 and MDL-11487 when I've detected one call to the now non-existing cleanup_dirty_contexts() in admin/cron.php. And I've replaced it with one call to the new gc_cache_flags() instead.
I think it's ok, but please confirm that. TIA and ciao
Assigning to Petr who is studying this at the moment to find a different way to do the same thing.