I've been working on some improvements on the current implementation of the db authentication plugin.
The improvements I've been working on do not use temporary tables, but drastically reduce both the memory consumption and the number of database queries. As a result, the time for a complete run of the sync is also drastically reduced.
My test system is as follows:
- Postgres 9.0 with out-of-the-box configuration (no tuning)
- Moodle on Master
My test database has 15,000 records.
Using the current implementation of the db auth mechanism, I'm seeing something like the following. This is a straight compare where no data has changed.
www-data@moodle:/home/nicols/git/software/moodle$ time php auth/db/cli/sync_users.php > /dev/null
PERF: time: 554.768607s memory_total: 138010140B (131.6MB) memory_growth: 137127328B (130.8MB) memory_peak: 155578500B (148.4MB) includecount: 77 contextswithfilters: 0 filterscreated: 0 textsfiltered: 0 stringsfiltered: 0 langcountgetstring: 15005 langcountmemcache: 15002 langcountdiskcache: 0 includedyuimodules: 54 includedjsmodules: 2 db reads/writes: 30013/0 ticks: 55484 user: 17319 sys: 520 cuser: 0 csys: 0 serverload: 1.70
Notice in particular the memory growth/total, and the db reads/writes
I've heavily optimised it now such that:
www-data@moodle:/home/nicols/git/software/moodle$ time php auth/db/cli/sync_users.php
PERF: time: 8.187989s memory_total: 15399384B (14.7MB) memory_growth: 14517032B (13.8MB) memory_peak: 51300208B (48.9MB) includecount: 43 contextswithfilters: 0 filterscreated: 0 textsfiltered: 0 stringsfiltered: 0 langcountgetstring: 5 langcountmemcache: 3 langcountdiskcache: 0 includedyuimodules: 54 includedjsmodules: 2 db reads/writes: 16/0 ticks: 822 user: 156 sys: 7 cuser: 0 csys: 0 serverload: 0.95
The memory use for the same dataset, and the same operation has gone from 131.6MB to 14.7
reads have gone from 30013, to 16.
Looking at the code in further detail, the update code seems to do the following:
- Select all users from your external db
- Select all users from moodle who need updating
- Loop through each user needing updating
- Retrieve the user from the moodle database again
- Retrieve the user from the external database again
- Loop through each key that is set to onlogin
- Compare each value in turn
- If the value is different, call set_field on it
As a result, if you have 15,000 users, and you change three fields on each (e.g. set the City, State, and Country), you'll be doing:
- One select to initially retrieve all records from moodle
- One select to initially retrieve all records from the external DB
- 15,000 selects to select each individual record from moodle
- 15,000 selects to select each individual record from the external DB
- 3 set_field operations per 15,000 records.
You'd have a total of:
1 + 1 + ( 1 * 15,000 ) + ( 1 * 15,000 ) = 30,002 selects
3 * 15,000 = 45,000 updates
I've made the same optimisation with both ADODB, and with the moodle database stuff. Both give massive performance boosts over the current.
The standard moodle database engine uses marginally less memory than ADODB (0.5MB in my test)
My tests show similar performance gains for all operations (create, undelete, update, delete).
I'm happy to submit my changes to Moodle Core, but have on question first:
At present, the db auth plugin uses adodb. Moodle has since introduced it's own database system, but this doesn't have the same support as ADODB.
I've tested by changes with both of these and it's very little work to submit either – what is the preference upstream?