Moodle

Towards better get_user_access_sitewide performance

Details

Description

Although some improvements have been performed to the get_user_access_sitewide() function (see MDL-15748) following discussion http://moodle.org/mod/forum/discuss.php?d=107755#p548095 seems to indicate that we could improve it a bit more by moving some stuff within the function.

This bug is about to discuss/test/decide if the change is doable, really boosts performance and provides 100% the same results.

Thanks to Paolo Oprandi for all his tests/comments.

  1. get_users_access_sitewide_test.csv
    12/Jun/09 6:16 PM
    0.9 kB
    Dan Poltawski
  2. get_users_access_sitewide_test.ods
    13/Jun/09 1:40 AM
    58 kB
    Eloy Lafuente (stronk7)
  3. test_get_user_access_sitewide.php
    12/Jun/09 2:55 AM
    9 kB
    Eloy Lafuente (stronk7)
  4. test_get_user_access_sitewide2.php
    12/Jun/09 5:31 PM
    9 kB
    Eloy Lafuente (stronk7)
  5. test_get_user_access_sitewide5.php
    12/Jun/09 9:22 PM
    9 kB
    Eloy Lafuente (stronk7)

Issue Links

Activity

Hide
Eloy Lafuente (stronk7) added a comment -

Adding some watchers to this bug. I'll include test script + new function here in some minutes (to be used against current 1.9.5+) in order to be able to run proper tests and report results.

Show
Eloy Lafuente (stronk7) added a comment - Adding some watchers to this bug. I'll include test script + new function here in some minutes (to be used against current 1.9.5+) in order to be able to run proper tests and report results.
Hide
Eloy Lafuente (stronk7) added a comment -

Attaching reworked test that:

  • Implements own get_user_access_sitewide_new() function with the changes commented in forum.
  • hashes every individual information array
  • hashes the whole test
  • reports:
    • tests (users) processed.
    • time spent by the get_user_access_sitewide() functions.
    • hash comparison results
    • improvement in %

Notes:

  • Put it in your root moodle folder.
  • Log in
  • Execute it
  • Report tests/timings/differences/improvement
  • Can be customised in the very first lines to decide the number (aprox) of tests to run

Warn:

  • To be executed only under 1.9.5+
  • Please verify the function, I think it's correct... but...

Ciao

Show
Eloy Lafuente (stronk7) added a comment - Attaching reworked test that:
  • Implements own get_user_access_sitewide_new() function with the changes commented in forum.
  • hashes every individual information array
  • hashes the whole test
  • reports:
    • tests (users) processed.
    • time spent by the get_user_access_sitewide() functions.
    • hash comparison results
    • improvement in %
Notes:
  • Put it in your root moodle folder.
  • Log in
  • Execute it
  • Report tests/timings/differences/improvement
  • Can be customised in the very first lines to decide the number (aprox) of tests to run
Warn:
  • To be executed only under 1.9.5+
  • Please verify the function, I think it's correct... but...
Ciao
Hide
Eloy Lafuente (stronk7) added a comment -

Results in home computer:

Users processed = 7
Final HASHES: match
Individual HASHES: match
Improvement (3 executions): 53.74 % , 2.33 % , 5.38 %

(note that 1st test uses to benefit 2nd test - the new one, because of caching and so on, so it's always recommended to execute the test 3 times)

Show
Eloy Lafuente (stronk7) added a comment - Results in home computer: Users processed = 7 Final HASHES: match Individual HASHES: match Improvement (3 executions): 53.74 % , 2.33 % , 5.38 % (note that 1st test uses to benefit 2nd test - the new one, because of caching and so on, so it's always recommended to execute the test 3 times)
Hide
Eloy Lafuente (stronk7) added a comment -

Results in moodle.org:

Users processed = 8263
Final HASHES: match
Individual HASHES: match
Improvement (3 executions): 29.64 % , 3.46 % , 38.05 %

Show
Eloy Lafuente (stronk7) added a comment - Results in moodle.org: Users processed = 8263 Final HASHES: match Individual HASHES: match Improvement (3 executions): 29.64 % , 3.46 % , 38.05 %
Hide
Eloy Lafuente (stronk7) added a comment - - edited

Result in one medium-size (10000 users, 100 courses) production server, with multiple roles / overrides in category and user contexts.

Users processed = 4883
Final HASHES: match
Individual HASHES: match
Improvement (3 executions): -6.60 % , -6.21 % , -6.96 %

(note that user context overrides are the ones causing big penalties in this site... so the question is... must this function really return all those user context role/override assignments?)

Show
Eloy Lafuente (stronk7) added a comment - - edited Result in one medium-size (10000 users, 100 courses) production server, with multiple roles / overrides in category and user contexts. Users processed = 4883 Final HASHES: match Individual HASHES: match Improvement (3 executions): -6.60 % , -6.21 % , -6.96 % (note that user context overrides are the ones causing big penalties in this site... so the question is... must this function really return all those user context role/override assignments?)
Hide
Tim Hunt added a comment -

Instinctively, I would say user context information is not needed here, and can be loaded only in needed, but I have not thought about this very deepy.

I guess a good start would be to grep for CONTEXT_USER. With luck it is only used on profil-y pages. That would confirm that we only need to load use context stuff on those pages, and so we could remove it from the get_user_access_sitewide_new() function.

Also, I suggest adding Dan P here, he is very good at running test scripts against real Moodle sites.

Show
Tim Hunt added a comment - Instinctively, I would say user context information is not needed here, and can be loaded only in needed, but I have not thought about this very deepy. I guess a good start would be to grep for CONTEXT_USER. With luck it is only used on profil-y pages. That would confirm that we only need to load use context stuff on those pages, and so we could remove it from the get_user_access_sitewide_new() function. Also, I suggest adding Dan P here, he is very good at running test scripts against real Moodle sites.
Hide
Eloy Lafuente (stronk7) added a comment -

Adding Dan , as suggested. Thanks Tim.

Show
Eloy Lafuente (stronk7) added a comment - Adding Dan , as suggested. Thanks Tim.
Hide
Eloy Lafuente (stronk7) added a comment -

Just quick-annotating one more idea before going to bed: What if we take out the ORDER BY clause in 2nd ad 3rd queries? AFAIK order isn't meaningful on them, and for sure, it's has some cost. To test tomorrow. Ciao

Show
Eloy Lafuente (stronk7) added a comment - Just quick-annotating one more idea before going to bed: What if we take out the ORDER BY clause in 2nd ad 3rd queries? AFAIK order isn't meaningful on them, and for sure, it's has some cost. To test tomorrow. Ciao
Hide
Eloy Lafuente (stronk7) added a comment -

Attaching test_get_user_access_sitewide2.php script that, simply, has the ORDER BY clauses in 2nd and 3rd queries omitted.

Posting results in minutes...

Show
Eloy Lafuente (stronk7) added a comment - Attaching test_get_user_access_sitewide2.php script that, simply, has the ORDER BY clauses in 2nd and 3rd queries omitted. Posting results in minutes...
Hide
Eloy Lafuente (stronk7) added a comment -

Results in home computer:
Users processed = 7
Final HASHES: match
Individual HASHES: match
Improvement (3 executions): 15.71 % , 16.69 % , 16.03 %

NOTE: test2 ~ 10% better than test1

Show
Eloy Lafuente (stronk7) added a comment - Results in home computer: Users processed = 7 Final HASHES: match Individual HASHES: match Improvement (3 executions): 15.71 % , 16.69 % , 16.03 % NOTE: test2 ~ 10% better than test1
Hide
Eloy Lafuente (stronk7) added a comment -

Results in moodle.org:
Users processed = 8383
Final HASHES: match
Individual HASHES: match
Improvement (3 executions): 36.34 % , 8.07 % , 36.03 %

NOTE: test2 ~ 3% better than test1, although continuous load changes in server cause this test to be not meaningful IMO

Show
Eloy Lafuente (stronk7) added a comment - Results in moodle.org: Users processed = 8383 Final HASHES: match Individual HASHES: match Improvement (3 executions): 36.34 % , 8.07 % , 36.03 % NOTE: test2 ~ 3% better than test1, although continuous load changes in server cause this test to be not meaningful IMO
Hide
Eloy Lafuente (stronk7) added a comment -

Result in one medium-size (10000 users, 100 courses) production server, with multiple roles / overrides in category and user contexts.
Users processed = 4883
Final HASHES: match
Individual HASHES: match
Improvement (3 executions): 0.60 % , 0.55 % , 1.02 %

(note that user context overrides are the ones causing big penalties in this site... so the question is... must this function really return all those user context role/override assignments?)

NOTE: test2 ~ 6% better than test1

Show
Eloy Lafuente (stronk7) added a comment - Result in one medium-size (10000 users, 100 courses) production server, with multiple roles / overrides in category and user contexts. Users processed = 4883 Final HASHES: match Individual HASHES: match Improvement (3 executions): 0.60 % , 0.55 % , 1.02 % (note that user context overrides are the ones causing big penalties in this site... so the question is... must this function really return all those user context role/override assignments?) NOTE: test2 ~ 6% better than test1
Hide
Dan Poltawski added a comment -

I've run the test on about 30 sites - but the results are fluctuating too much (probably cos the db server is busy doing other stuff..)

But the good news is that no differences are found

Show
Dan Poltawski added a comment - I've run the test on about 30 sites - but the results are fluctuating too much (probably cos the db server is busy doing other stuff..) But the good news is that no differences are found
Hide
Dan Poltawski added a comment -

(Also the majority of our sites are small)

Show
Dan Poltawski added a comment - (Also the majority of our sites are small)
Hide
Sam Marshall added a comment -

I tried get_user_access_sitewide_2.php on our test system which has real data from a few months ago. Note that we run 1.9.4 not 1.9.5, but we already have a version of the MDL-15748 change.

I had to add the following 2 lines at start of script after config.php:

raise_memory_limit('2048M'); // well more than default 128 anyhow
set_time_limit(3600); // it didn't really take this long, but more than default

Real users processed with orig function: 9992
Time spent in get_user_access_sitewide(): 76.352853775024
Real users processed with new function: 9992
Time spent in get_user_access_sitewide_new(): 67.175107955933

There were differences in about 15 user ra and about 2 user rdefs, so not very many, but some. I didn't investigate what situations these were - it's possible that there may be some invalid/old data in our tables.

Overall based on these numbers it doesn't seem much worth changing.

Running Pg 8.1.16, PHP 5.2.8.

Show
Sam Marshall added a comment - I tried get_user_access_sitewide_2.php on our test system which has real data from a few months ago. Note that we run 1.9.4 not 1.9.5, but we already have a version of the MDL-15748 change. I had to add the following 2 lines at start of script after config.php: raise_memory_limit('2048M'); // well more than default 128 anyhow set_time_limit(3600); // it didn't really take this long, but more than default Real users processed with orig function: 9992 Time spent in get_user_access_sitewide(): 76.352853775024 Real users processed with new function: 9992 Time spent in get_user_access_sitewide_new(): 67.175107955933 There were differences in about 15 user ra and about 2 user rdefs, so not very many, but some. I didn't investigate what situations these were - it's possible that there may be some invalid/old data in our tables. Overall based on these numbers it doesn't seem much worth changing. Running Pg 8.1.16, PHP 5.2.8.
Hide
Eloy Lafuente (stronk7) added a comment -

Attaching one new test (5th) that:

  • Deletes some unused variables / columns.
  • Change slow array_merge() by string operations.

I've get one extra 1.5% improvement (mean) when running the tests in the servers above.

Also, I've performed two tentative tests (3rd and 4th), trying to use UNION sqls here and there but results haven't been interesting (that's the cause for jumping from 2nd test to 5th one).

I think this is the final one. Would be interesting to know how it performs in other servers. In any case, note we are improving the function only some milliseconds per execution. Right now the function is processing 5000 users in only 8 seconds (medium site). It means it's able to process 1 user in 16 milliseconds, so I think original times aren't bad at all.

Would be great to know some site having bigger cost than those 16 ms... I haven't been able to reproduce it at all.

Ciao

Show
Eloy Lafuente (stronk7) added a comment - Attaching one new test (5th) that:
  • Deletes some unused variables / columns.
  • Change slow array_merge() by string operations.
I've get one extra 1.5% improvement (mean) when running the tests in the servers above. Also, I've performed two tentative tests (3rd and 4th), trying to use UNION sqls here and there but results haven't been interesting (that's the cause for jumping from 2nd test to 5th one). I think this is the final one. Would be interesting to know how it performs in other servers. In any case, note we are improving the function only some milliseconds per execution. Right now the function is processing 5000 users in only 8 seconds (medium site). It means it's able to process 1 user in 16 milliseconds, so I think original times aren't bad at all. Would be great to know some site having bigger cost than those 16 ms... I haven't been able to reproduce it at all. Ciao
Hide
Eloy Lafuente (stronk7) added a comment -

Thanks Dan and Sam.

Sam, note that perhaps diffs in results can be caused by the simple algorithm I used to sort results, in order to get consistent hashes.

Also, you're processing 9992 in 76 seconds: That's 130 ms per call, that doesn't seem to be really bad at all. Also, you're getting > 10% improvement with the new version (that's not bad either).

In any case, I think we cannot do too much beyond the test5 attached in my previous comment. Only taking out USER_CONTEXTs is pending and I think it won't affect to much (as far as normal sites aren't abusing with role/perms in that context).

Ciao

Show
Eloy Lafuente (stronk7) added a comment - Thanks Dan and Sam. Sam, note that perhaps diffs in results can be caused by the simple algorithm I used to sort results, in order to get consistent hashes. Also, you're processing 9992 in 76 seconds: That's 130 ms per call, that doesn't seem to be really bad at all. Also, you're getting > 10% improvement with the new version (that's not bad either). In any case, I think we cannot do too much beyond the test5 attached in my previous comment. Only taking out USER_CONTEXTs is pending and I think it won't affect to much (as far as normal sites aren't abusing with role/perms in that context). Ciao
Hide
Sam Marshall added a comment -

Eloy: Yes the time taken is already not bad! It was only bad before we made the change in MDL-15748. Now it's sort of ok. It would be nice if it completed in 40ms but. [Hm maybe it does on our live db server, that is quite fast...]

Here is a test using version 5:

77.626036882401 -> 68.158251523972

So still about 10% improvement.

Also more good news - the differences may be due to custom changes in our code. I think the 'active' field in ra is one of ours. So I just added it to your first query

WHERE ra.userid = $userid AND ra.active=1 AND ctx.contextlevel <=" . CONTEXT_COURSE;

(Assuming this is an OU custom change, we will need to patch it in our version again after you change it, obviously, so changing your new version as well gives us like-for-like comparison.)

This made it give identical results, yay! I tried three times:

Improvement = 8.19 %
Improvement = 11.58 %
Improvement = 11.08 %

I wonder if + + = :> ?

Show
Sam Marshall added a comment - Eloy: Yes the time taken is already not bad! It was only bad before we made the change in MDL-15748. Now it's sort of ok. It would be nice if it completed in 40ms but. [Hm maybe it does on our live db server, that is quite fast...] Here is a test using version 5: 77.626036882401 -> 68.158251523972 So still about 10% improvement. Also more good news - the differences may be due to custom changes in our code. I think the 'active' field in ra is one of ours. So I just added it to your first query WHERE ra.userid = $userid AND ra.active=1 AND ctx.contextlevel <=" . CONTEXT_COURSE; (Assuming this is an OU custom change, we will need to patch it in our version again after you change it, obviously, so changing your new version as well gives us like-for-like comparison.) This made it give identical results, yay! I tried three times: Improvement = 8.19 % Improvement = 11.58 % Improvement = 11.08 % I wonder if + + = :> ?
Hide
Eloy Lafuente (stronk7) added a comment -

Nice to see that you found the cause for differences in results, Sam! I love to see those differences reduced to 0 ! (btw... that "active" column.. is about to compute timestart and timeend? It sounds to me that you proposed that sometime ago...

So, summarising, with version 5:

  • All tests return matching hashes (so results are the same - unless there is one bug in the test itself).
  • Tests have included minimal sites, medium sites and big sites.
  • Tests have included sites with role assignments and overrides at 4 computed contexts (system, category, course and user).
  • In general, tests executions seems to point to an improvement around 10% (aprox mean of all the tests, see attached oo file).

And concluding...:

  • Should we apply this to 19_STABLE and HEAD ?

Ciao

Show
Eloy Lafuente (stronk7) added a comment - Nice to see that you found the cause for differences in results, Sam! I love to see those differences reduced to 0 ! (btw... that "active" column.. is about to compute timestart and timeend? It sounds to me that you proposed that sometime ago... So, summarising, with version 5:
  • All tests return matching hashes (so results are the same - unless there is one bug in the test itself).
  • Tests have included minimal sites, medium sites and big sites.
  • Tests have included sites with role assignments and overrides at 4 computed contexts (system, category, course and user).
  • In general, tests executions seems to point to an improvement around 10% (aprox mean of all the tests, see attached oo file).
And concluding...:
  • Should we apply this to 19_STABLE and HEAD ?
Ciao
Hide
Petr Škoda (skodak) added a comment -

+1 for HEAD

Show
Petr Škoda (skodak) added a comment - +1 for HEAD
Hide
Sam Marshall added a comment -

Eloy: Just for info - yes the 'active' column computes timestart (but not timeend - because Moodle removes role assignments altogether after timeend so that isn't needed). At present, timestart does not function at all in standard Moodle. We use it here more extensively than standard moodle so we need it to work (otherwise students are allowed into websites too early which is bad), so we applied the patch locally. This is MDL-13240.

Show
Sam Marshall added a comment - Eloy: Just for info - yes the 'active' column computes timestart (but not timeend - because Moodle removes role assignments altogether after timeend so that isn't needed). At present, timestart does not function at all in standard Moodle. We use it here more extensively than standard moodle so we need it to work (otherwise students are allowed into websites too early which is bad), so we applied the patch locally. This is MDL-13240.
Hide
Petr Škoda (skodak) added a comment -

my +10 to move the timestart and timeend stuff into enrolment implementation

Show
Petr Škoda (skodak) added a comment - my +10 to move the timestart and timeend stuff into enrolment implementation
Hide
Sam Marshall added a comment -

petr - argh now we have to change all our database integration too? I kind of agree though (for 2.0 of course).

Show
Sam Marshall added a comment - petr - argh now we have to change all our database integration too? I kind of agree though (for 2.0 of course).
Hide
Eloy Lafuente (stronk7) added a comment -

Oki, I'm going to apply this exclusively to HEAD...

Paolo... the change "only" improves 10%, so I don't get how it could improve your site in the order of magnitude commented in forums (http://moodle.org/mod/forum/discuss.php?d=107755#p549007) ... perhaps you weren't running 1.9.5 so you hadn't MDL-15748 applied? It was the BIG improvement we made some weeks ago.

Ciao

Show
Eloy Lafuente (stronk7) added a comment - Oki, I'm going to apply this exclusively to HEAD... Paolo... the change "only" improves 10%, so I don't get how it could improve your site in the order of magnitude commented in forums (http://moodle.org/mod/forum/discuss.php?d=107755#p549007) ... perhaps you weren't running 1.9.5 so you hadn't MDL-15748 applied? It was the BIG improvement we made some weeks ago. Ciao
Hide
Eloy Lafuente (stronk7) added a comment -

Resolving as fixed. This has been committed to HEAD (not to stable branches at all).

Ciao

Show
Eloy Lafuente (stronk7) added a comment - Resolving as fixed. This has been committed to HEAD (not to stable branches at all). Ciao

Dates

  • Created:
    Updated:
    Resolved: