Moodle

build_context_path has some funky SQL that breaks on MySQL

Details

  • Type: Sub-task Sub-task
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: 1.9
  • Fix Version/s: 1.9
  • Component/s: Roles / Access
  • Labels:
    None
  • Affected Branches:
    MOODLE_19_STABLE
  • Fixed Branches:
    MOODLE_19_STABLE

Description

build_context_path(true) is called from the upgrade to populate the context table.

This sort of SQL breaks MySQL:

(mysql): UPDATE mdl_context SET depth=it.pdepth+1, path=CONCAT(it.ppath,'/',id) FROM (SELECT c.id AS instanceid, pctx.path AS ppath, pctx.depth as pdepth FROM mdl_course c JOIN mdl_context pctx ON (c.category=pctx.instanceid AND pctx.contextlevel=40) WHERE c.id != 1) it WHERE contextlevel=50 AND mdl_context.instanceid=it.instanceid 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'FROM (SELECT c.id AS instanceid, pctx.path AS ppath, ' at line 3

adodb_mysql._execute(UPDATE mdl_context
SET depth=it.pdepth+1, path=CONCAT(it.ppath,'/',id)
FROM (SELECT c.id AS in..., false) % line 891, file: adodb.inc.php
adodb_mysql.execute(UPDATE mdl_context
SET depth=it.pdepth+1, path=CONCAT(it.ppath,'/',id)
FROM (SELECT c.id AS in...) % line 89, file: dmllib.php
execute_sql(UPDATE mdl_context
SET depth=it.pdepth+1, path=CONCAT(it.ppath,'/',id)
FROM (SELECT c.id AS in..., true) % line 4479, file: accesslib.php
build_context_path(true) % line 8, file: test.php

  1. bcp_mysql2.patch
    19/Sep/07 11:53 PM
    10 kB
    Petr Škoda (skodak)
  2. moodl_19_perf.diff
    20/Sep/07 5:56 AM
    0.4 kB
    Bryce Thornton

Issue Links

Activity

Hide
Martin Dougiamas added a comment -

Assigning to Petr who is studying this at the moment to find a different way to do the same thing.

Show
Martin Dougiamas added a comment - Assigning to Petr who is studying this at the moment to find a different way to do the same thing.
Hide
Petr Škoda (skodak) added a comment -

Does not look good, the MySQL manual clearly states that:
Currently, you cannot update a table and select from the same table in a subquery.

Which means we might have to do recordset looping for mysql family, which could be really slow...

Show
Petr Škoda (skodak) added a comment - Does not look good, the MySQL manual clearly states that: Currently, you cannot update a table and select from the same table in a subquery. Which means we might have to do recordset looping for mysql family, which could be really slow...
Hide
Eloy Lafuente (stronk7) added a comment -

Just to confirm (now that I've understood what the UPDATE does):

It's (should be) impossible to update records in one table based in other records of the same table. 100% sure. It's a matter of immutability and blah, blah,

So possible solutions are (from HQ):

1) PHP Recordset (as Petr commented)
2) Temp table usage to store calculated values: One insert with id, deph+1 and sql_concat(it.ppath,'/',id). And one simple update by id (PK).

2 is the recommended way for large sets (like this).

Show
Eloy Lafuente (stronk7) added a comment - Just to confirm (now that I've understood what the UPDATE does): It's (should be) impossible to update records in one table based in other records of the same table. 100% sure. It's a matter of immutability and blah, blah, So possible solutions are (from HQ): 1) PHP Recordset (as Petr commented) 2) Temp table usage to store calculated values: One insert with id, deph+1 and sql_concat(it.ppath,'/',id). And one simple update by id (PK). 2 is the recommended way for large sets (like this).
Hide
Petr Škoda (skodak) added a comment -

Committed temporary patch into cvs, it is ugly but works both with pq and mysql.

Assigning to Martín, please review and improve the use of temporary table or create a new permanent table or rewrite it completely

Show
Petr Škoda (skodak) added a comment - Committed temporary patch into cvs, it is ugly but works both with pq and mysql. Assigning to Martín, please review and improve the use of temporary table or create a new permanent table or rewrite it completely
Hide
Martin Dougiamas added a comment - - edited

Thanks, Petr - good job.

Bringing down from critical because the urgency is off. Still needs more review though.

Show
Martin Dougiamas added a comment - - edited Thanks, Petr - good job. Bringing down from critical because the urgency is off. Still needs more review though.
Hide
Petr Škoda (skodak) added a comment -

Eloy discovered that the current create_temp_table() code can not work for MSSQL because it prepends the # - which means we can not use the {$CFG->prefix}tablename - I think we should use normal table instead to play it safe for 1.9

Show
Petr Škoda (skodak) added a comment - Eloy discovered that the current create_temp_table() code can not work for MSSQL because it prepends the # - which means we can not use the {$CFG->prefix}tablename - I think we should use normal table instead to play it safe for 1.9
Hide
Eloy Lafuente (stronk7) added a comment -

also, I think that:

TRUNCATE tablename

isn't cross-db, but:

TRUNCATE TABLE tablename

is.

Looking to create_temp_table() to see if we can make it 100% cross-db... uhm...

Show
Eloy Lafuente (stronk7) added a comment - also, I think that: TRUNCATE tablename isn't cross-db, but: TRUNCATE TABLE tablename is. Looking to create_temp_table() to see if we can make it 100% cross-db... uhm...
Hide
Eloy Lafuente (stronk7) added a comment -

Also I would change statements of this type (there are some of them):

UPDATE {$CFG->prefix}context
SET depth=2, path=".sql_concat("'$base/'", 'id')."
WHERE contextlevel=".CONTEXT_USER."
AND instanceid IN
(SELECT id FROM {$CFG->prefix}user)

to:

UPDATE {$CFG->prefix}context
SET depth=2, path=".sql_concat("'$base/'", 'id')."
WHERE contextlevel=".CONTEXT_USER."
AND EXISTS
(SELECT 'x' FROM {$CFG->prefix}user WHERE user.id = {$CFG->prefix}context.instanceid)

(not tested!)

But, generally, the EXISTS should perform better than IN

Ciao

Show
Eloy Lafuente (stronk7) added a comment - Also I would change statements of this type (there are some of them): UPDATE {$CFG->prefix}context SET depth=2, path=".sql_concat("'$base/'", 'id')." WHERE contextlevel=".CONTEXT_USER." AND instanceid IN (SELECT id FROM {$CFG->prefix}user) to: UPDATE {$CFG->prefix}context SET depth=2, path=".sql_concat("'$base/'", 'id')." WHERE contextlevel=".CONTEXT_USER." AND EXISTS (SELECT 'x' FROM {$CFG->prefix}user WHERE user.id = {$CFG->prefix}context.instanceid) (not tested!) But, generally, the EXISTS should perform better than IN Ciao
Hide
Eloy Lafuente (stronk7) added a comment -

And, one detail. When using that "JOIN" syntax... IMO the only conditions present in the "ON" clause must be those performing the join, while the rest must go to the WHERE part.

Show
Eloy Lafuente (stronk7) added a comment - And, one detail. When using that "JOIN" syntax... IMO the only conditions present in the "ON" clause must be those performing the join, while the rest must go to the WHERE part.
Hide
Petr Škoda (skodak) added a comment - - edited

BTW why are we checking he emptiness of path? we could instead use the depth field == 0 - no ora hack + could be faster
also why not declare the path as nullable?

Show
Petr Škoda (skodak) added a comment - - edited BTW why are we checking he emptiness of path? we could instead use the depth field == 0 - no ora hack + could be faster also why not declare the path as nullable?
Hide
Bryce Thornton added a comment -

I found a small bug in build_context_path while updating from 1.8.2+. I'll attach a patch.

Show
Bryce Thornton added a comment - I found a small bug in build_context_path while updating from 1.8.2+. I'll attach a patch.
Hide
Bryce Thornton added a comment -

This is the patch for the small bug I found in build_context_path.

Show
Bryce Thornton added a comment - This is the patch for the small bug I found in build_context_path.
Hide
Eloy Lafuente (stronk7) added a comment -

Thanks Bryce. Patch applied! B-)

Show
Eloy Lafuente (stronk7) added a comment - Thanks Bryce. Patch applied! B-)
Hide
Eloy Lafuente (stronk7) added a comment -

And +1 for NULL in path instead of empty.

NULL is cross-db (though will imply to detect - and modify - all the places in code where we are relying in those empties).

Show
Eloy Lafuente (stronk7) added a comment - And +1 for NULL in path instead of empty. NULL is cross-db (though will imply to detect - and modify - all the places in code where we are relying in those empties).
Hide
Martín Langhoff added a comment -

Awake and reviewing this now. Thanks Petr for the fix.

Temptable prefixes:

We need to fix the temptable thing across DB. The temptable name having a # is a bit of a problem with Oracle because the # will conflict with table prefixes in our coding style – we mix calls to functions that could do tablename magic like get_records() and straight SQL like SELECT ... FROM {$CFG->prefix}foo.

One way to "fix" it is to return the tablename in the create temp table call, and say that we don't support table prefixes in Oracle. If there are no prefixes, then get_records($temptable) and SELECT ... FROM {$CFG->prefix}$temptable both work perfectly well with $temptable = '#foobar'

EXISTS vs WHERE id IN():

Yes - EXISTS is faster, I was avoiding it because I thought it was less portable, and it's a bit more magic and limited than WHERE {CONDITION}. If Eloy confirms that EXISTS is more portable and predictable... great!

NULLS:

I don't know. Are we NULL-safe these days? If so, NULLs are ok.

Checking for depth instead of path:

Yes - completely agreed.

Show
Martín Langhoff added a comment - Awake and reviewing this now. Thanks Petr for the fix. Temptable prefixes: We need to fix the temptable thing across DB. The temptable name having a # is a bit of a problem with Oracle because the # will conflict with table prefixes in our coding style – we mix calls to functions that could do tablename magic like get_records() and straight SQL like SELECT ... FROM {$CFG->prefix}foo. One way to "fix" it is to return the tablename in the create temp table call, and say that we don't support table prefixes in Oracle. If there are no prefixes, then get_records($temptable) and SELECT ... FROM {$CFG->prefix}$temptable both work perfectly well with $temptable = '#foobar' EXISTS vs WHERE id IN(): Yes - EXISTS is faster, I was avoiding it because I thought it was less portable, and it's a bit more magic and limited than WHERE {CONDITION}. If Eloy confirms that EXISTS is more portable and predictable... great! NULLS: I don't know. Are we NULL-safe these days? If so, NULLs are ok. Checking for depth instead of path: Yes - completely agreed.
Hide
Martín Langhoff added a comment -

I've committed a fix that makes things much faster and simpler in Pg and MySQL. After a bit more sleuthing, it looks like the UPDATE SQL queries, to be fast and reasonably efficient, need to be split actross 2 families:

PG & MSSQL support using FROM

UPDATE tbl
SET col = ...
FROM tbl
JOIN othertbl ...

whereas MySQL and Oracle go down the track of listing the tables in the UPDATE

UPDATE carts c, prices p
SET c.format_code = p.format_code
WHERE c.cart_item_id IN
(SELECT t.cart_item_id from prices r, carts t
WHERE r.<price_table_key> = p.<price_table_key>
AND blah blah blah.....)

Show
Martín Langhoff added a comment - I've committed a fix that makes things much faster and simpler in Pg and MySQL. After a bit more sleuthing, it looks like the UPDATE SQL queries, to be fast and reasonably efficient, need to be split actross 2 families: PG & MSSQL support using FROM UPDATE tbl SET col = ... FROM tbl JOIN othertbl ... whereas MySQL and Oracle go down the track of listing the tables in the UPDATE UPDATE carts c, prices p SET c.format_code = p.format_code WHERE c.cart_item_id IN (SELECT t.cart_item_id from prices r, carts t WHERE r.<price_table_key> = p.<price_table_key> AND blah blah blah.....)
Hide
Martin Dougiamas added a comment -

The latest change to accesslib

http://moodle.cvs.sourceforge.net/moodle/moodle/lib/accesslib.php?r1=1.394&r2=1.395

broke MySQL again (going back one revision let it upgrade correctly). Here is the error I had:

UPDATE mdl_context SET path = ct.path, depth = ct.depth FROM mdl_context_temp ct WHERE ct.id=mdl_context.id

You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'FROM mdl_context_temp ct WHERE ct.id=mdl_context.id' at line 4
1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'FROM mdl_context_temp ct WHERE ct.id=mdl_context.id' at line 4

Show
Martin Dougiamas added a comment - The latest change to accesslib http://moodle.cvs.sourceforge.net/moodle/moodle/lib/accesslib.php?r1=1.394&r2=1.395 broke MySQL again (going back one revision let it upgrade correctly). Here is the error I had: UPDATE mdl_context SET path = ct.path, depth = ct.depth FROM mdl_context_temp ct WHERE ct.id=mdl_context.id You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'FROM mdl_context_temp ct WHERE ct.id=mdl_context.id' at line 4 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'FROM mdl_context_temp ct WHERE ct.id=mdl_context.id' at line 4
Hide
Martín Langhoff added a comment -

With Petr's changes (and mine on top), the upgrade of test.moodle.com works well for me. Good.

We do have a few pending bits

  • sanity check of the UPDATE SQL as discussed above on MSSQL/Oracle
  • temp table handling, specially wrt the '#' on Oracle

I've also drafted a "populate_contexts()" function, but I am thinking this through – from my tests, we don't actually need it.

Show
Martín Langhoff added a comment - With Petr's changes (and mine on top), the upgrade of test.moodle.com works well for me. Good. We do have a few pending bits
  • sanity check of the UPDATE SQL as discussed above on MSSQL/Oracle
  • temp table handling, specially wrt the '#' on Oracle
I've also drafted a "populate_contexts()" function, but I am thinking this through – from my tests, we don't actually need it.
Hide
Martín Langhoff added a comment -

Fixed MD's error with a switch from dbtype to dbfamily when picking what SQL to use.

Show
Martín Langhoff added a comment - Fixed MD's error with a switch from dbtype to dbfamily when picking what SQL to use.
Hide
Martin Dougiamas added a comment -

Cool, I think we can call this resolved

Show
Martin Dougiamas added a comment - Cool, I think we can call this resolved
Hide
Martin Dougiamas added a comment -

Just fixing resolution

Show
Martin Dougiamas added a comment - Just fixing resolution
Hide
Eloy Lafuente (stronk7) added a comment -

Four minor note:

  • I really think that latest changes on $updatesql = ... leave Oracle 100% out. Missing $dabfamily == ?
  • Why the "mysql" one has such a a excess of prefixes?
  • Also, IMO, no one of those updates will work under Oracle. It doesn't support FROMs without SELECTs (1st update) not multiple tables in the UPDATE (2nd update). But it supports multiple fields in an statement, something like (not tested!):

$updatesql = "UPDATE {$CFG->prefix}context ct
SET (ct.path, ct.depth) =
(SELECT temp.path, temp.depth
FROM {$CFG->prefix}$temptable
WHERE temp.id=ct.id";

  • Perhaps it would be good to launch additional "TRUNCATE TABLE temptable" statements between new inserts? I will make the UPDATEs lighter, IMO.
Show
Eloy Lafuente (stronk7) added a comment - Four minor note:
  • I really think that latest changes on $updatesql = ... leave Oracle 100% out. Missing $dabfamily == ?
  • Why the "mysql" one has such a a excess of prefixes?
  • Also, IMO, no one of those updates will work under Oracle. It doesn't support FROMs without SELECTs (1st update) not multiple tables in the UPDATE (2nd update). But it supports multiple fields in an statement, something like (not tested!):
$updatesql = "UPDATE {$CFG->prefix}context ct SET (ct.path, ct.depth) = (SELECT temp.path, temp.depth FROM {$CFG->prefix}$temptable WHERE temp.id=ct.id";
  • Perhaps it would be good to launch additional "TRUNCATE TABLE temptable" statements between new inserts? I will make the UPDATEs lighter, IMO.
Hide
Eloy Lafuente (stronk7) added a comment -

I'm reopening this (knowing that, in teory, is working against all DBs but Oracle and that temp table support under MSSQL is pending... will check later...

Show
Eloy Lafuente (stronk7) added a comment - I'm reopening this (knowing that, in teory, is working against all DBs but Oracle and that temp table support under MSSQL is pending... will check later...
Hide
Martín Langhoff added a comment -

I agree with Eloy - I don't think the current SQL works well in Oracle – in real life. However, my understanding is that

  • we can get an UPDATE that works in Oracle and MySQL
  • we can get an UPDATE that works in MSSQL and Pg

but I'll need Eloy's confirmation...

Show
Martín Langhoff added a comment - I agree with Eloy - I don't think the current SQL works well in Oracle – in real life. However, my understanding is that
  • we can get an UPDATE that works in Oracle and MySQL
  • we can get an UPDATE that works in MSSQL and Pg
but I'll need Eloy's confirmation...
Hide
Petr Škoda (skodak) added a comment -

looking at syntax scheme for Oracle http://download-uk.oracle.com/docs/cd/B14117_01/server.101/b10759/statements_10007.htm#SQLRF01708
it looks like we could use:

  • either mysql style (watcom?) with two tables after UPDATE
  • or SET () = (SELECT) which is not compatible with PG
Show
Petr Škoda (skodak) added a comment - looking at syntax scheme for Oracle http://download-uk.oracle.com/docs/cd/B14117_01/server.101/b10759/statements_10007.htm#SQLRF01708 it looks like we could use:
  • either mysql style (watcom?) with two tables after UPDATE
  • or SET () = (SELECT) which is not compatible with PG
Hide
Eloy Lafuente (stronk7) added a comment -

Sorry guys. Am I forgetting anything?

Why is it so important to get one update that works both under Oracle and MySQL, ML ? Isn't only a matter of specify one more $CFG->dbfamily condition and nothing else?

Or where do you read in Oracle SQL Ref the two tables can be used in the UPDATE statement, Petr? I'm 95% sure that they cannot.

Sincerely, I would use the approach described above:

$updatesql = "UPDATE {$CFG->prefix}context ct
SET (ct.path, ct.depth) =
(SELECT temp.path, temp.depth
FROM {$CFG->prefix}$temptable temp
WHERE temp.id=ct.id)
WHERE EXISTS
(SELECT 'x'
FROM {$CFG->prefix}$temptable temp
WHERE temp.id = ct.id);

I guess that Oracle will need an extra EXISTS clause (present in the above sql) to update ONLY records with temp counterparts. The rest of DBs, with their own UPDATE syntaxes should be restricting ok to matching records only (I hope).

So, we'll end with one SQL for MySQL, another to both PG and MSSQL and another to Oracle. Yep, nop ?

Show
Eloy Lafuente (stronk7) added a comment - Sorry guys. Am I forgetting anything? Why is it so important to get one update that works both under Oracle and MySQL, ML ? Isn't only a matter of specify one more $CFG->dbfamily condition and nothing else? Or where do you read in Oracle SQL Ref the two tables can be used in the UPDATE statement, Petr? I'm 95% sure that they cannot. Sincerely, I would use the approach described above: $updatesql = "UPDATE {$CFG->prefix}context ct SET (ct.path, ct.depth) = (SELECT temp.path, temp.depth FROM {$CFG->prefix}$temptable temp WHERE temp.id=ct.id) WHERE EXISTS (SELECT 'x' FROM {$CFG->prefix}$temptable temp WHERE temp.id = ct.id); I guess that Oracle will need an extra EXISTS clause (present in the above sql) to update ONLY records with temp counterparts. The rest of DBs, with their own UPDATE syntaxes should be restricting ok to matching records only (I hope). So, we'll end with one SQL for MySQL, another to both PG and MSSQL and another to Oracle. Yep, nop ?
Hide
Petr Škoda (skodak) added a comment -

you are right Eloy, I was not reading the schema properly - grrr - sorry
I was thinking about using four separate updates with subselects for "other databases" than pg and mysql.

Maybe a stupid question, why not merge those statements into several bigger semicolon separated queries? Would it work? Would it be faster?

BTW yesterday I was fighting with 2 subqueries in one update statement - it did not work for me in mysql+temptable - is this allowed in Oracle? is this a mysql bug?

Show
Petr Škoda (skodak) added a comment - you are right Eloy, I was not reading the schema properly - grrr - sorry I was thinking about using four separate updates with subselects for "other databases" than pg and mysql. Maybe a stupid question, why not merge those statements into several bigger semicolon separated queries? Would it work? Would it be faster? BTW yesterday I was fighting with 2 subqueries in one update statement - it did not work for me in mysql+temptable - is this allowed in Oracle? is this a mysql bug?
Hide
Martín Langhoff added a comment -

Petr:
> why not merge those statements into several bigger semicolon separated queries?

  • don't think it will help much – it'll save almost nothing, and it will make debugging harder. Also - not all DB backend drivers allow you to pass many statements.

Eloy:
> Why is it so important to get one update that works both under Oracle and MySQL, ML ?

Oh, ideally, I'd want to have just one SQL update for all of them. The less SQL variations we have to maintain, the easier... as long as it's fast...

In that sense, I believe that the original syntax with the subselect that MySQL didn't like is only invalid for MySQL – and much faster elsewhere. Oracle and MSSQL can definitely update from subselects on the same table. Pg is around 25 times faster – I have no idea what the difference is on MSSQL and Oracle.

That would mean one code flow using the temp table for MySQL, one codeflow for everyone else using a subselect. Which nicely sidesteps our temp table issues with Oracle...

Show
Martín Langhoff added a comment - Petr: > why not merge those statements into several bigger semicolon separated queries?
  • don't think it will help much – it'll save almost nothing, and it will make debugging harder. Also - not all DB backend drivers allow you to pass many statements.
Eloy: > Why is it so important to get one update that works both under Oracle and MySQL, ML ? Oh, ideally, I'd want to have just one SQL update for all of them. The less SQL variations we have to maintain, the easier... as long as it's fast... In that sense, I believe that the original syntax with the subselect that MySQL didn't like is only invalid for MySQL – and much faster elsewhere. Oracle and MSSQL can definitely update from subselects on the same table. Pg is around 25 times faster – I have no idea what the difference is on MSSQL and Oracle. That would mean one code flow using the temp table for MySQL, one codeflow for everyone else using a subselect. Which nicely sidesteps our temp table issues with Oracle...
Hide
Martín Langhoff added a comment -

Eloy - does the syntax you propose work everywhere? If it does, +1!

$updatesql = "UPDATE {$CFG->prefix}context ct
SET (ct.path, ct.depth) =
(SELECT temp.path, temp.depth
FROM {$CFG->prefix}$temptable temp
WHERE temp.id=ct.id)
WHERE EXISTS
(SELECT 'x'
FROM {$CFG->prefix}$temptable temp
WHERE temp.id = ct.id);

Show
Martín Langhoff added a comment - Eloy - does the syntax you propose work everywhere? If it does, +1! $updatesql = "UPDATE {$CFG->prefix}context ct SET (ct.path, ct.depth) = (SELECT temp.path, temp.depth FROM {$CFG->prefix}$temptable temp WHERE temp.id=ct.id) WHERE EXISTS (SELECT 'x' FROM {$CFG->prefix}$temptable temp WHERE temp.id = ct.id);
Hide
Petr Škoda (skodak) added a comment -

Sounds like a good plan to me:

  • special version with temp table+UPDATE with two tables for MySQL
  • original version from ML for all other db families
Show
Petr Škoda (skodak) added a comment - Sounds like a good plan to me:
  • special version with temp table+UPDATE with two tables for MySQL
  • original version from ML for all other db families
Hide
Eloy Lafuente (stronk7) added a comment - - edited

Hi, the Oracle variation above is Oracle only! (and I think it's impossible to have the same update working against all DBs unless using two subqueries). So I would vote to keep the three different updates, after all, each one is the "best for each DB" and any other attempt will be less than optimal. So +1 for the 2 updates.

So, sumarizing, current updates in CVS (MySQL, PG and MSSQL) are working fine. Absolutely! I've tested that:

  • MySQL works fine (and the syntax itself is clever enough to update only the required records).
  • PostgreSQL and MSSQL work fine with the same statement (and the syntax itself is clever enough to update only the required records).

So I really think that we only need to:

1) Add the Oracle variation above (that works and has the clever harcoded ).
2) Fix the temp tables generation once and forever (I think that the problematic is the MSSQL one, with that horrendous # prefix). Tomorrow I'll try to add full support for temp tables properly from XMLDB in a cross-db fashion, ok? I've designed the basics today.

And I would propose also this, to your consideration:

3) Truncate the temp table after each insert & update iteration. I will make the update to be really lighter. Currently we are "repeating" the same records update a lot of times (at least for courses categories, because previous iteration records continue there).

4) Test it again under all DBs

So, if nobody is against 2) I get it (tomorrow).

Show
Eloy Lafuente (stronk7) added a comment - - edited Hi, the Oracle variation above is Oracle only! (and I think it's impossible to have the same update working against all DBs unless using two subqueries). So I would vote to keep the three different updates, after all, each one is the "best for each DB" and any other attempt will be less than optimal. So +1 for the 2 updates. So, sumarizing, current updates in CVS (MySQL, PG and MSSQL) are working fine. Absolutely! I've tested that:
  • MySQL works fine (and the syntax itself is clever enough to update only the required records).
  • PostgreSQL and MSSQL work fine with the same statement (and the syntax itself is clever enough to update only the required records).
So I really think that we only need to: 1) Add the Oracle variation above (that works and has the clever harcoded ). 2) Fix the temp tables generation once and forever (I think that the problematic is the MSSQL one, with that horrendous # prefix). Tomorrow I'll try to add full support for temp tables properly from XMLDB in a cross-db fashion, ok? I've designed the basics today. And I would propose also this, to your consideration: 3) Truncate the temp table after each insert & update iteration. I will make the update to be really lighter. Currently we are "repeating" the same records update a lot of times (at least for courses categories, because previous iteration records continue there). 4) Test it again under all DBs So, if nobody is against 2) I get it (tomorrow).
Hide
Martín Langhoff added a comment -

Petr - thanks!
Eloy - Thumbs up for 1, 2 & 3 ( I had seen your proposal to truncate earlier, and forgot to reply - I think I removed the truncates yesterday, it was a mistake!)

I'm a bit bogged down today/tomorrow - but this is looking good!

Show
Martín Langhoff added a comment - Petr - thanks! Eloy - Thumbs up for 1, 2 & 3 ( I had seen your proposal to truncate earlier, and forgot to reply - I think I removed the truncates yesterday, it was a mistake!) I'm a bit bogged down today/tomorrow - but this is looking good!
Hide
Martin Dougiamas added a comment -

Well, I updated moodle.org and guess what - the mysql user that site was using didn't have access for temporary tables (doh) and the upgrade failed.

This is probably not uncommon, so can I suggest we either:

1) Test for this ability before we use it (like in the environment check).
2) Create this same table as a permanent table (usually empty)
3) Create a generic permanent table that can be used by different bits of code as a temporary space, something like:

id
sessionid
data1
data2
data3 .... etc

I like (3) most , (2) a little less, and (1) the least.

Show
Martin Dougiamas added a comment - Well, I updated moodle.org and guess what - the mysql user that site was using didn't have access for temporary tables (doh) and the upgrade failed. This is probably not uncommon, so can I suggest we either: 1) Test for this ability before we use it (like in the environment check). 2) Create this same table as a permanent table (usually empty) 3) Create a generic permanent table that can be used by different bits of code as a temporary space, something like: id sessionid data1 data2 data3 .... etc I like (3) most , (2) a little less, and (1) the least.
Hide
Eloy Lafuente (stronk7) added a comment -

Oki. What about:

1) Create a PERMANENT mdl_context_temp table (after all, this is going to be used continuously).
2) Concurrency will be avoided by changing slightly the INSERTS to avoid duplicates (duplicate UPDATES shouldn't be a problem)
3) Drop temp tables buggy support from ddllib? Or implement it for 2.0 (new bug)
4) Add the Oracle above.
5) Add the missing TRUNCATES
6) Test/test/test

Say YES and I'll do 1-6. Ciao

Show
Eloy Lafuente (stronk7) added a comment - Oki. What about: 1) Create a PERMANENT mdl_context_temp table (after all, this is going to be used continuously). 2) Concurrency will be avoided by changing slightly the INSERTS to avoid duplicates (duplicate UPDATES shouldn't be a problem) 3) Drop temp tables buggy support from ddllib? Or implement it for 2.0 (new bug) 4) Add the Oracle above. 5) Add the missing TRUNCATES 6) Test/test/test Say YES and I'll do 1-6. Ciao
Hide
Martin Dougiamas added a comment -

YES!

Show
Martin Dougiamas added a comment - YES!
Hide
Petr Škoda (skodak) added a comment -

YESS

Show
Petr Škoda (skodak) added a comment - YESS
Hide
Eloy Lafuente (stronk7) added a comment -

Done. Now the permanent table is installed and being used by build_context_paths()

Anyway I've found this:

SQL broken: Operand type clash: ntext is incompatible with int

SELECT name, value FROM mdl_config_plugins WHERE plugin='accesslib/dirtycontexts' AND value > (1190364087 - 2)

I guess same will happen with Oracle. Could we change that to one PHP resultser iteration instead?

Show
Eloy Lafuente (stronk7) added a comment - Done. Now the permanent table is installed and being used by build_context_paths() Anyway I've found this: SQL broken: Operand type clash: ntext is incompatible with int SELECT name, value FROM mdl_config_plugins WHERE plugin='accesslib/dirtycontexts' AND value > (1190364087 - 2) I guess same will happen with Oracle. Could we change that to one PHP resultser iteration instead?
Hide
Eloy Lafuente (stronk7) added a comment -

After some research and talk in HQ chat:

It seems that all the uses of that 'accesslib/dirtycontexts' is pretty well centralized in three functions:

  • get_dirty_contexts() to get all the currently dirty contexts.
  • mark_context_dirty() to mark one context as diry.
  • cleanup_dirty_contexts() to unmark dirty plugins

I would propose to move from the 'config_plugins' table to the 'contexts' one itself (it's the natural place, IMO)

Not sure if it can be performed with a simple (0/1) flag (it seems that some 'timeout' concept is in the whole thing). I guess that, instead, to preserve that behaviour, we must:

1) Add one 'timedirty' field to the 'context' table. Integer10 and indexed!
2) Modify the 3 functions above to work against the context table.

It should provide us, at least:

1) cross-db. Always interesting
2) Speed, by going against one numeric & indexed field, instead of a text-unindexed one.

To your consideration. Ciao

Show
Eloy Lafuente (stronk7) added a comment - After some research and talk in HQ chat: It seems that all the uses of that 'accesslib/dirtycontexts' is pretty well centralized in three functions:
  • get_dirty_contexts() to get all the currently dirty contexts.
  • mark_context_dirty() to mark one context as diry.
  • cleanup_dirty_contexts() to unmark dirty plugins
I would propose to move from the 'config_plugins' table to the 'contexts' one itself (it's the natural place, IMO) Not sure if it can be performed with a simple (0/1) flag (it seems that some 'timeout' concept is in the whole thing). I guess that, instead, to preserve that behaviour, we must: 1) Add one 'timedirty' field to the 'context' table. Integer10 and indexed! 2) Modify the 3 functions above to work against the context table. It should provide us, at least: 1) cross-db. Always interesting 2) Speed, by going against one numeric & indexed field, instead of a text-unindexed one. To your consideration. Ciao
Hide
Martin Dougiamas added a comment -

Sounds 100% sensible to me.

Show
Martin Dougiamas added a comment - Sounds 100% sensible to me.
Hide
Martín Langhoff added a comment -

Hi all – I hadn't seen the activity on this bug earlier today, sorry! Ok - down to business

  • I don't mind moving this to a dedicated temp table, however, let's not drop temptable support from ddllib. As we move to more batched SQL work, the need for temptables will only grow... including tricks like creating the temp table, populating it and adding indexes after all the inserts...
  • Context-dirtyness: we fix the "time - 2" to php, and make it a string comparison as Eloy suggested. I tried to fix things to not need CASTs and I didn't fix it completely.

Moving "dirty contexts" from config_plugins to context table will have a huge cost, and specially since we want to read the dirty contexts on each pageload, this should be cheap cheap cheap.

The context table is huge, and the dirty contexts will always be a few, and changing lots. So the index for dirtty on the context table will be huge on disk, have high costs updating, and the statistics will be off most of the time. And actually, we don't need that info when reading the contexts table either. We need it for contexts we might not even read directly (example: if you are checking a cap in a module instance context, you probably never read the coursecat contexts above it, as accessdata has all you need... but if one of those coursecats is dirty, that should trigger an accessdata reload, so it makes sense to decouple the data).

A small table can deal with a lot of movement at a much lower cost because the indexes are really small. config_plugins is great for this – as it fits comfortable in the memory of the DB server, and its whole index can probable be read and written-to with a single disk-seek.

[ I gave this matter quite a bit of thought and testing before deciding on using config_plugins ]

In fact, there's something we can do to make it cheaper: use a funky bit of SQL (with a UNION ALL) to read both config and config_plugins->dirtycontexts in one go during lib/setup.php. That will save us 1DBq per pageload...

Show
Martín Langhoff added a comment - Hi all – I hadn't seen the activity on this bug earlier today, sorry! Ok - down to business
  • I don't mind moving this to a dedicated temp table, however, let's not drop temptable support from ddllib. As we move to more batched SQL work, the need for temptables will only grow... including tricks like creating the temp table, populating it and adding indexes after all the inserts...
  • Context-dirtyness: we fix the "time - 2" to php, and make it a string comparison as Eloy suggested. I tried to fix things to not need CASTs and I didn't fix it completely.
Moving "dirty contexts" from config_plugins to context table will have a huge cost, and specially since we want to read the dirty contexts on each pageload, this should be cheap cheap cheap. The context table is huge, and the dirty contexts will always be a few, and changing lots. So the index for dirtty on the context table will be huge on disk, have high costs updating, and the statistics will be off most of the time. And actually, we don't need that info when reading the contexts table either. We need it for contexts we might not even read directly (example: if you are checking a cap in a module instance context, you probably never read the coursecat contexts above it, as accessdata has all you need... but if one of those coursecats is dirty, that should trigger an accessdata reload, so it makes sense to decouple the data). A small table can deal with a lot of movement at a much lower cost because the indexes are really small. config_plugins is great for this – as it fits comfortable in the memory of the DB server, and its whole index can probable be read and written-to with a single disk-seek. [ I gave this matter quite a bit of thought and testing before deciding on using config_plugins ] In fact, there's something we can do to make it cheaper: use a funky bit of SQL (with a UNION ALL) to read both config and config_plugins->dirtycontexts in one go during lib/setup.php. That will save us 1DBq per pageload...
Hide
Martín Langhoff added a comment -

Thinking about it some more, I remembered that having dirty paths decoupled from the contexts table is also important because we can mark dirty paths that are no longer there, or that have moved. Have a look at context_moved() to see what I mean – and also delete_context() of course.

This is very important - as it allows us to force the reload of accessdata for users that are logged in when courses/coursecats are moved around.

Show
Martín Langhoff added a comment - Thinking about it some more, I remembered that having dirty paths decoupled from the contexts table is also important because we can mark dirty paths that are no longer there, or that have moved. Have a look at context_moved() to see what I mean – and also delete_context() of course. This is very important - as it allows us to force the reload of accessdata for users that are logged in when courses/coursecats are moved around.
Hide
Eloy Lafuente (stronk7) added a comment -

Aha,

so, you need to be able to store dirty contexts somewhere out from the context table itself because of deleted contexts. Oki. And they are only a few. Perfect!

I really like the idea of loading all the "accesslib/dirtycontexts" from setup.php (agreeing it can be performed in one UNION ALL statement to save one query I would have it separated in one function to have it well encapsulated) , then handle them in PHP (the time condition), updating/deleting from DB as necessary.

+1 for that

Show
Eloy Lafuente (stronk7) added a comment - Aha, so, you need to be able to store dirty contexts somewhere out from the context table itself because of deleted contexts. Oki. And they are only a few. Perfect! I really like the idea of loading all the "accesslib/dirtycontexts" from setup.php (agreeing it can be performed in one UNION ALL statement to save one query I would have it separated in one function to have it well encapsulated) , then handle them in PHP (the time condition), updating/deleting from DB as necessary. +1 for that
Hide
Martín Langhoff added a comment -

Great! One minor thing I'm not sure we've understood eachother (spanish speakers trying to understand eachother in English!? WTF, we haveto teach these infidels some spanish)

When you say:
> then handle them in PHP (the time condition),

what I think will be fastest is to prepare in PHP the time condition, but to "enforce it" in the WHERE clause of the SQL query. In both cases the row has to be read into the DB memory (it'll only be a few rows at any given time!), but it's faster to do the > on the DB side. Is that what you are saying...?

(this is important mainly in the every-page query, in cron.php we could do it with an abbaccus...)

Show
Martín Langhoff added a comment - Great! One minor thing I'm not sure we've understood eachother (spanish speakers trying to understand eachother in English!? WTF, we haveto teach these infidels some spanish) When you say: > then handle them in PHP (the time condition), what I think will be fastest is to prepare in PHP the time condition, but to "enforce it" in the WHERE clause of the SQL query. In both cases the row has to be read into the DB memory (it'll only be a few rows at any given time!), but it's faster to do the > on the DB side. Is that what you are saying...? (this is important mainly in the every-page query, in cron.php we could do it with an abbaccus...)
Hide
Eloy Lafuente (stronk7) added a comment -

Hehe,

what I was thinking was this:

1) NEW function: load_dirtycontexts(), executed in setup.php, that will load ALL 'accesslib/dirtycontexts' records and store them for the life of the REQUEST.

2) Change get_dirty_contexts() to work 100% in memory. Could fallback to DB if necessary for cron although I'd add 1) to cron too.

3) Change mark_context_dirty() to perform the mark both in REQUEST and DB.

4) Change cleanup_dirty_contexts() to delete from REQUEST and from DB.

That way we won't have to execute those casted, inequalities... against DB at all, so cross-db will be easier and, hopefully, quicker.

That was my idea, summarized, with the phrase "then handle them in PHP (the time condition)" above, that's Spanish :-D :-P :-D

Show
Eloy Lafuente (stronk7) added a comment - Hehe, what I was thinking was this: 1) NEW function: load_dirtycontexts(), executed in setup.php, that will load ALL 'accesslib/dirtycontexts' records and store them for the life of the REQUEST. 2) Change get_dirty_contexts() to work 100% in memory. Could fallback to DB if necessary for cron although I'd add 1) to cron too. 3) Change mark_context_dirty() to perform the mark both in REQUEST and DB. 4) Change cleanup_dirty_contexts() to delete from REQUEST and from DB. That way we won't have to execute those casted, inequalities... against DB at all, so cross-db will be easier and, hopefully, quicker. That was my idea, summarized, with the phrase "then handle them in PHP (the time condition)" above, that's Spanish :-D :-P :-D
Hide
Martín Langhoff added a comment -

Hmmmm.

1 - We need to update the "$CFG = get_config();"line in lib/setup.php with something that populates $CFG and $DIRRTYPATHS in one SQL query, retrieving the least DB rows. So we can use a UNION ALL, and the dirty contexts subselect must say WHERE value > '$timecutoff'. This is the cheapest way.

Why push the work to the DB? Because the dirty paths will never be many (say, between 0 and 3 or 4) for each user if we use the cutoff, because as soon as the user "hits" a dirty path accessdata is reloaded. But those entries "live" in the table for 4hs - and on a busy site that could add up to a few hundred entries. The number is low enough to not need an index or a dedicated table, and to be fast to do in an in-memory-scan on the database side but it's quite a bit of data to move between DB and PHP just to do the exact same comparison on PHP.

So

  • do the '<' work on the DB side – it's faster!
  • avoid transferring data rows we don't need!

2 - get_dirty_contexts() must disappear! No longer needed. T

3 - Yes, it's a good idea to have mark_context_dirty() update $DIRTYCONTEXTS

4 - cleanup_dirty_contexts() doesn't need to do it from the request – there'll never be any overlap. It will be deleting stuff over 4hs old that will not be in the $DIRTYPATHS, ever. Way too old.

To summarise:

  • preserve original strategy, it pushes the job to the DB and moves the least data between DB and PHP
  • fix my error in cleanup_dirty_contexts()
  • do #1 to save 1DBq
Show
Martín Langhoff added a comment - Hmmmm. 1 - We need to update the "$CFG = get_config();"line in lib/setup.php with something that populates $CFG and $DIRRTYPATHS in one SQL query, retrieving the least DB rows. So we can use a UNION ALL, and the dirty contexts subselect must say WHERE value > '$timecutoff'. This is the cheapest way. Why push the work to the DB? Because the dirty paths will never be many (say, between 0 and 3 or 4) for each user if we use the cutoff, because as soon as the user "hits" a dirty path accessdata is reloaded. But those entries "live" in the table for 4hs - and on a busy site that could add up to a few hundred entries. The number is low enough to not need an index or a dedicated table, and to be fast to do in an in-memory-scan on the database side but it's quite a bit of data to move between DB and PHP just to do the exact same comparison on PHP. So
  • do the '<' work on the DB side – it's faster!
  • avoid transferring data rows we don't need!
2 - get_dirty_contexts() must disappear! No longer needed. T 3 - Yes, it's a good idea to have mark_context_dirty() update $DIRTYCONTEXTS 4 - cleanup_dirty_contexts() doesn't need to do it from the request – there'll never be any overlap. It will be deleting stuff over 4hs old that will not be in the $DIRTYPATHS, ever. Way too old. To summarise:
  • preserve original strategy, it pushes the job to the DB and moves the least data between DB and PHP
  • fix my error in cleanup_dirty_contexts()
  • do #1 to save 1DBq
Hide
Martín Langhoff added a comment -

Actually - #3 will bring a bit of trouble methinks.

Any HTTP request that calls mark_context_dirty() must either finish soon (redirect saying "yeah, the job is done") which is what we do now – OR do the reload of accessdata itself if it's long-lived like cron.php . Do we have anything in cron.php that calls mark_context_dirty()?

One of the nasty things of DIRTYPATHS is that it's got a granularity of 1s – actually of 3s when we look at the DB. Which means that during a single long-lived process (cron) we could end up calling reload_all_capabilities() a billion times before the 1s ticks over.

If we document that the semantics the way I've described it above ("finish soon, or do the homework"), it also means taht the caller is responsible for making sure that the reload happens once at the end of all the muddying of the paths, rather than hitting reload_all_capabilities (with its 3 DBq cost) a few hundred times.

Show
Martín Langhoff added a comment - Actually - #3 will bring a bit of trouble methinks. Any HTTP request that calls mark_context_dirty() must either finish soon (redirect saying "yeah, the job is done") which is what we do now – OR do the reload of accessdata itself if it's long-lived like cron.php . Do we have anything in cron.php that calls mark_context_dirty()? One of the nasty things of DIRTYPATHS is that it's got a granularity of 1s – actually of 3s when we look at the DB. Which means that during a single long-lived process (cron) we could end up calling reload_all_capabilities() a billion times before the 1s ticks over. If we document that the semantics the way I've described it above ("finish soon, or do the homework"), it also means taht the caller is responsible for making sure that the reload happens once at the end of all the muddying of the paths, rather than hitting reload_all_capabilities (with its 3 DBq cost) a few hundred times.
Hide
Martin Dougiamas added a comment -

Hmm, I don't understand why config_plugins should be used here at all.

Firstly, these are not config variables for any plugin. Using that table to pass these temporary flags between concurrent sessions is just not right semantically. Imagine if every developer who needed to do something similar used the table as a place for scratch data, it would rapidly become large and full of all kinds of stuff.

Secondly, it's a generic table ... integers are being stored in a text field.

I think this stuff should go in a new custom table (context_dirty, say) with one (fast and very db-cacheable) query.

Show
Martin Dougiamas added a comment - Hmm, I don't understand why config_plugins should be used here at all. Firstly, these are not config variables for any plugin. Using that table to pass these temporary flags between concurrent sessions is just not right semantically. Imagine if every developer who needed to do something similar used the table as a place for scratch data, it would rapidly become large and full of all kinds of stuff. Secondly, it's a generic table ... integers are being stored in a text field. I think this stuff should go in a new custom table (context_dirty, say) with one (fast and very db-cacheable) query.
Hide
Petr Škoda (skodak) added a comment -

I was thinking the same as MD

Show
Petr Škoda (skodak) added a comment - I was thinking the same as MD
Hide
Eloy Lafuente (stronk7) added a comment -

yep, I must recognize that I was getting 100% lost with latest messages ... one specific table to store the dirty contexts can be the best. With pure speed and cross-db ints and so on.

Agree.

Show
Eloy Lafuente (stronk7) added a comment - yep, I must recognize that I was getting 100% lost with latest messages ... one specific table to store the dirty contexts can be the best. With pure speed and cross-db ints and so on. Agree.
Hide
Martín Langhoff added a comment -

When I wrote the DIRTYPATHs thing I did think of a dedicated table, but in the end I didn't. A bit because I'm lazy (w00t!) and a bit because I think we need a general table where we can store values & timestamps.

The dirty paths are a case of "time-sensitive" flags. We also have some locks that are time sensitive in the code. And if we are doing more batch / bulk processing, we'll need this.

So I am thinking - could we add an optional 'timestamp' to config_plugins? Or have a similar table, called "config_flags_locks_and_other_things" with a structure of

-id,

  • plugin(or "scope"/"realm"/"whatever"),
  • name,
  • timestamp
  • value

It'd also be useful for the auth/ldap NTLM support patches I'm planning to merge (which currently also use plugin_config, and are time-sensitive)

Show
Martín Langhoff added a comment - When I wrote the DIRTYPATHs thing I did think of a dedicated table, but in the end I didn't. A bit because I'm lazy (w00t!) and a bit because I think we need a general table where we can store values & timestamps. The dirty paths are a case of "time-sensitive" flags. We also have some locks that are time sensitive in the code. And if we are doing more batch / bulk processing, we'll need this. So I am thinking - could we add an optional 'timestamp' to config_plugins? Or have a similar table, called "config_flags_locks_and_other_things" with a structure of -id,
  • plugin(or "scope"/"realm"/"whatever"),
  • name,
  • timestamp
  • value
It'd also be useful for the auth/ldap NTLM support patches I'm planning to merge (which currently also use plugin_config, and are time-sensitive)
Hide
Eloy Lafuente (stronk7) added a comment -

Oh, really... I think we must end with a solution for this. My "personal requirement" about this is really simple:

"Being 100% cross-db"

just that. Nothing else. If finally we add one new BIGINT field to the config_plugins table or if we create one context_dirtypaths new table, great, go, go, go!

But this should be fixed ASAP (it order to be able to continue testing MSSQL and Oracle - currently broken).

Also, I must vote -1 here for timestamps (date / time fields in general), in case you are talking about them, instead of BIGINTs (like the rest 99.9% of Moodle).

Ciao

Show
Eloy Lafuente (stronk7) added a comment - Oh, really... I think we must end with a solution for this. My "personal requirement" about this is really simple: "Being 100% cross-db" just that. Nothing else. If finally we add one new BIGINT field to the config_plugins table or if we create one context_dirtypaths new table, great, go, go, go! But this should be fixed ASAP (it order to be able to continue testing MSSQL and Oracle - currently broken). Also, I must vote -1 here for timestamps (date / time fields in general), in case you are talking about them, instead of BIGINTs (like the rest 99.9% of Moodle). Ciao
Hide
Martín Langhoff added a comment -

Complete agreement: 100% cross DB is the requirement.

In terms of timestamps vs bigints, I was of course thinking of a "moodle timestamp", a bigint

(... but I have to confess that since mysl v5 is out I have been wondering about what would we need to migrate from storing epochs to storing timestamps, and doing the time/date and TZ work in the DB. It's really hard but perhaps, one day... )

Show
Martín Langhoff added a comment - Complete agreement: 100% cross DB is the requirement. In terms of timestamps vs bigints, I was of course thinking of a "moodle timestamp", a bigint (... but I have to confess that since mysl v5 is out I have been wondering about what would we need to migrate from storing epochs to storing timestamps, and doing the time/date and TZ work in the DB. It's really hard but perhaps, one day... )
Hide
Martín Langhoff added a comment -

So - to clarify - I'd be very happy with adding a bigint field to config_plugins (preferred) and also very happy if people want a different generic table for this kind of use (just give me a name for it!).

Also - Eloy, it'd be great to flesh this out... http://moodle.org/mod/forum/post.php?reply=357989

Show
Martín Langhoff added a comment - So - to clarify - I'd be very happy with adding a bigint field to config_plugins (preferred) and also very happy if people want a different generic table for this kind of use (just give me a name for it!). Also - Eloy, it'd be great to flesh this out... http://moodle.org/mod/forum/post.php?reply=357989
Hide
Eloy Lafuente (stronk7) added a comment -

Hehe, while you was witting here, I was replying there! :-D

Nice to see that we were talking about BIGINTs. Now somebody (MD) has to decide about new field or generic new table.

Perhaps, IMO, thinking in other developments needing such type of data, I would give my +0.5 for a new generic "flags_locks_and_other_dirty_things" table, leaving config_plugins for its proper use.

Ciao

Show
Eloy Lafuente (stronk7) added a comment - Hehe, while you was witting here, I was replying there! :-D Nice to see that we were talking about BIGINTs. Now somebody (MD) has to decide about new field or generic new table. Perhaps, IMO, thinking in other developments needing such type of data, I would give my +0.5 for a new generic "flags_locks_and_other_dirty_things" table, leaving config_plugins for its proper use. Ciao
Hide
Martin Dougiamas added a comment -

OK, how about

cache_timestamps

id
stamptype
name
timestamp

Show
Martin Dougiamas added a comment - OK, how about cache_timestamps id stamptype name timestamp
Hide
Eloy Lafuente (stronk7) added a comment -

Two quick notes about the "cache_timestamps " proposal above:

1) Perhaps it should contain one int, one varchar and one text field to be able to store different data.

2) And, after a bit of thinking, it seems that we are going to use such table for two really different things:

  • As "temp storage". For really temp (whose life is the connection life) information (like the used by auth/enrol plugins).
  • As "persistent storage". For non-temp information (whose life isn't the connection life, but some sort of limited life). Like the used in the dirty contexts.

For temp storage, pehaps we need the help of some cron, to delete everything older than, say, 24 hours.
For persistent storage, each API using it must be responsible to keep those records deleted when necessary.

So, we'll need one more field to mark if the record is "temp storage" or "persistent storage" in order to allow cron to know what to delete.

Those are the cons of using one table for two different things (real temp usage and persistent storage) in the same table...

Just to think a bit about that....ciao

Show
Eloy Lafuente (stronk7) added a comment - Two quick notes about the "cache_timestamps " proposal above: 1) Perhaps it should contain one int, one varchar and one text field to be able to store different data. 2) And, after a bit of thinking, it seems that we are going to use such table for two really different things:
  • As "temp storage". For really temp (whose life is the connection life) information (like the used by auth/enrol plugins).
  • As "persistent storage". For non-temp information (whose life isn't the connection life, but some sort of limited life). Like the used in the dirty contexts.
For temp storage, pehaps we need the help of some cron, to delete everything older than, say, 24 hours. For persistent storage, each API using it must be responsible to keep those records deleted when necessary. So, we'll need one more field to mark if the record is "temp storage" or "persistent storage" in order to allow cron to know what to delete. Those are the cons of using one table for two different things (real temp usage and persistent storage) in the same table... Just to think a bit about that....ciao
Hide
Martín Langhoff added a comment -

+1 for Eloy's idea of including both timestamp and "value" varchar/text field.

And I think we can use it for at least

  • "temp storage" the only real idea I have here is to use it for locks
  • persistent – but time-limited – storage, like dirty paths and the "mini-session" entries from the new auth/ldap
  • configuration entries just like config_plugins - the timestamp is useful because certain configuration changes may trigger costly operations. it us good to be able to say "recompute only X if confguration value Y changed since the last run".

In terms of the cronjobs, each type of data we store there will have its own rules. Some entries will be from plugins/modules that have a cronjob that will take care of clearing it up. Others we may need to add support for a cronjob – I think we are covered with modules, auth and enrol... (do blocks have a cron() method?)

The name cache_timestamps is alright, I'll try to see I if I can think of a clearer alternative - but it's hard, as we want to use it for various things :-/

Show
Martín Langhoff added a comment - +1 for Eloy's idea of including both timestamp and "value" varchar/text field. And I think we can use it for at least
  • "temp storage" the only real idea I have here is to use it for locks
  • persistent – but time-limited – storage, like dirty paths and the "mini-session" entries from the new auth/ldap
  • configuration entries just like config_plugins - the timestamp is useful because certain configuration changes may trigger costly operations. it us good to be able to say "recompute only X if confguration value Y changed since the last run".
In terms of the cronjobs, each type of data we store there will have its own rules. Some entries will be from plugins/modules that have a cronjob that will take care of clearing it up. Others we may need to add support for a cronjob – I think we are covered with modules, auth and enrol... (do blocks have a cron() method?) The name cache_timestamps is alright, I'll try to see I if I can think of a clearer alternative - but it's hard, as we want to use it for various things :-/
Hide
Martin Dougiamas added a comment -

I'd really rather keep config variables separate in the config tables. Add a timestamp there if we must.

For this one, which is for limited-time values and flags, how about we call it:

cache_flags

with fields:
id
recordtype
name
value
timestamp (if reserved then timecreated)

Let's try and get this settled by tomorrow ... time's a-wasting.

Show
Martin Dougiamas added a comment - I'd really rather keep config variables separate in the config tables. Add a timestamp there if we must. For this one, which is for limited-time values and flags, how about we call it: cache_flags with fields: id recordtype name value timestamp (if reserved then timecreated) Let's try and get this settled by tomorrow ... time's a-wasting.
Hide
Eloy Lafuente (stronk7) added a comment -

Just to confirm... ML ... you are on this, yep?

Show
Eloy Lafuente (stronk7) added a comment - Just to confirm... ML ... you are on this, yep?
Hide
Martín Langhoff added a comment -

Yes. In the beginning of a moot right now, but I'm happy with cache_flags (might try to talk MD into calling it "flag" or "cache_flag" - note the singular), and I'll implement it ASAP.

WRT build_context_path() in itself

  • subselect-update not happy with mysq
  • temp tables very hard to support
    --> I''m tempted to revert to the slower-but-works-everywhere version, except that upgrades of large sites are going to hurt. I don't remember if the code uses a loop, if it does, it'll hurt a ton on large sites.
Show
Martín Langhoff added a comment - Yes. In the beginning of a moot right now, but I'm happy with cache_flags (might try to talk MD into calling it "flag" or "cache_flag" - note the singular), and I'll implement it ASAP. WRT build_context_path() in itself
  • subselect-update not happy with mysq
  • temp tables very hard to support --> I''m tempted to revert to the slower-but-works-everywhere version, except that upgrades of large sites are going to hurt. I don't remember if the code uses a loop, if it does, it'll hurt a ton on large sites.
Hide
Martin Dougiamas added a comment -

Martin, since this is so crucial, can you let me know an ETA for this? If you are busy on other things in the next 24 hours I can certainly do it.

Show
Martin Dougiamas added a comment - Martin, since this is so crucial, can you let me know an ETA for this? If you are busy on other things in the next 24 hours I can certainly do it.
Hide
Martín Langhoff added a comment -

Eloy - looking at the temptable you've implemented - there's nothing preventing concurrent usage...?

Show
Martín Langhoff added a comment - Eloy - looking at the temptable you've implemented - there's nothing preventing concurrent usage...?
Hide
Martín Langhoff added a comment -

Quick status update:

  • Looks like Eloy & Petr have (a) added a "permanent" temp table and fixed all the SQL to work with it. Are you guys happy with it in terms of cross-db-support? Or do you want me to review & do more work on it...?
  • cache_flags table in place (on the git branch till cvs is up again – review time!), plus associated functions in moodlelib
  • dirty contexts handling is using cache_flags table now
  • better/more consistent sql error handling in dmllib (in git branch). Eloy - do you want to review these?
Show
Martín Langhoff added a comment - Quick status update:
  • Looks like Eloy & Petr have (a) added a "permanent" temp table and fixed all the SQL to work with it. Are you guys happy with it in terms of cross-db-support? Or do you want me to review & do more work on it...?
  • cache_flags table in place (on the git branch till cvs is up again – review time!), plus associated functions in moodlelib
  • dirty contexts handling is using cache_flags table now
  • better/more consistent sql error handling in dmllib (in git branch). Eloy - do you want to review these?
Hide
Eloy Lafuente (stronk7) added a comment -

Yep, I think that the build_context_path() should be working properly cross-db using that context_temp permanent table, at leat I tested all the problematic updates and were working under all DBs.

I think that I only modified the updates of depth and path (to be cross-db) and the inserts into context_temp table (to avoid duplicates in case of concurrency - that should never happing - but prevented them).

I would suggest you a quick review to it, anyway, to check that everything is in place.

About the rest, cool! If you want, paste here the link to dmllib changes under git to take a look to it (although sure it's perfect).

Ciao

Show
Eloy Lafuente (stronk7) added a comment - Yep, I think that the build_context_path() should be working properly cross-db using that context_temp permanent table, at leat I tested all the problematic updates and were working under all DBs. I think that I only modified the updates of depth and path (to be cross-db) and the inserts into context_temp table (to avoid duplicates in case of concurrency - that should never happing - but prevented them). I would suggest you a quick review to it, anyway, to check that everything is in place. About the rest, cool! If you want, paste here the link to dmllib changes under git to take a look to it (although sure it's perfect). Ciao
Hide
Martín Langhoff added a comment -
Show
Martín Langhoff added a comment - Eloy - great, will review your code, and...
Hide
Martín Langhoff added a comment -

Re-reading my own patch, I think that we should change the semantics of delete_records*() and set_field*() to

  • return false in case of error
  • return $db->Rows_Affected() on success

This allows the callers to handle the 3 possible cases

  • changed X rows successfully (report number of rows?)
  • changed 0 rows successfully (nothing matched, but no error)
  • b0rked SQL!

but "not right now". It's a 2.0 change

Show
Martín Langhoff added a comment - Re-reading my own patch, I think that we should change the semantics of delete_records*() and set_field*() to
  • return false in case of error
  • return $db->Rows_Affected() on success
This allows the callers to handle the 3 possible cases
  • changed X rows successfully (report number of rows?)
  • changed 0 rows successfully (nothing matched, but no error)
  • b0rked SQL!
but "not right now". It's a 2.0 change
Hide
Eloy Lafuente (stronk7) added a comment -

Oki.

I've seen ML initial version and Petr's change some time later (due to the MySQL hardlmit of 1000 bytes in index size = 1000/3 = 333 chars for utf-8 char fields).

Current code splits the original UNIQUE index into 2 separate indexes (not critical but far from perfect).

So, if possible I would:

  • Reduce such 255 fields to something smaller.
  • Create the proper UNIQUE index.

Ciao

P.S.: Also, I'm going to introduce one index length checker in the XMLDB editor. MDL-11569

Show
Eloy Lafuente (stronk7) added a comment - Oki. I've seen ML initial version and Petr's change some time later (due to the MySQL hardlmit of 1000 bytes in index size = 1000/3 = 333 chars for utf-8 char fields). Current code splits the original UNIQUE index into 2 separate indexes (not critical but far from perfect). So, if possible I would:
  • Reduce such 255 fields to something smaller.
  • Create the proper UNIQUE index.
Ciao P.S.: Also, I'm going to introduce one index length checker in the XMLDB editor. MDL-11569
Hide
Petr Škoda (skodak) added a comment -

ML messaged me when he woke up, proposed something similar 200+100 field length - should be working on that tomorrow...

Show
Petr Škoda (skodak) added a comment - ML messaged me when he woke up, proposed something similar 200+100 field length - should be working on that tomorrow...
Hide
Eloy Lafuente (stronk7) added a comment -

Cool about to change fields length and reintroduce the unique index!

One minor (I hope) note:

I was fixing MDL-11571 and MDL-11487 when I've detected one call to the now non-existing cleanup_dirty_contexts() in admin/cron.php. And I've replaced it with one call to the new gc_cache_flags() instead.

I think it's ok, but please confirm that. TIA and ciao

Show
Eloy Lafuente (stronk7) added a comment - Cool about to change fields length and reintroduce the unique index! One minor (I hope) note: I was fixing MDL-11571 and MDL-11487 when I've detected one call to the now non-existing cleanup_dirty_contexts() in admin/cron.php. And I've replaced it with one call to the new gc_cache_flags() instead. I think it's ok, but please confirm that. TIA and ciao
Hide
Martin Dougiamas added a comment -

Looks OK now finally. Please file new bugs if necessary.

Show
Martin Dougiamas added a comment - Looks OK now finally. Please file new bugs if necessary.

Dates

  • Created:
    Updated:
    Resolved: