Moodle
  1. Moodle
  2. MDL-25372

Use temporary tables in sync_users of external database authentication

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Duplicate
    • Affects Version/s: 2.0
    • Fix Version/s: DEV backlog
    • Component/s: Authentication, Performance
    • Labels:
    • Environment:
      Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz, 4GB RAM,Linux, MySQL with tunned configurations, Apache 2,
    • Affected Branches:
      MOODLE_20_STABLE
    • Rank:
      6510

      Description

      Here at Universidade Federal de Santa Catarina we are starting the upgrade of our infra-structure to 2.0 and our first major goal is to sync users from the academic control system.
      In Moodle 1.9 we use a custom system, developed from beggining, and with a lot of features. One of these is to create users in Moodle from a table, and now we know that Moodle's External Database Authetication plugin can do the work, but it is not that "performatic".
      As the documentation says, it uses arrays to calculate the set of users to be created, instead of a temporary table and a left join like what is done in ldap plugin.
      This patch i am submitting changes this behaviour and introduces the temporary table in the proccess. A lot of duplicated code from ldap, that we may think about refactor.
      I haven't tested the original process, but the new one can calculate the users to create in almost zero time in a fresh install (just 2 or 3 records in user table) and it took about 5 minutes to calculate the set with 38K records in the external table that were already in moodle.
      Also, the update process can be a lot improved. The current code does an update for each user, that consists of one SELECT and N updates, where N is the number of update keys. I don't have numbers but should be better to construct only one UPDATE TABLE query and let the database calculate which records differ. This is valid for remove/suspend process too.

      1. mdl-25372.v1.diff
        18 kB
        Daniel Neis
      2. mdl25372-1.9.diff
        22 kB
        Daniel Neis

        Issue Links

          Activity

          Hide
          Daniel Neis added a comment -

          Hello, i've setup a git reposiroty in github, and now you can see the last version of patch at https://github.com/danielneis/moodle/compare/master...MDL-25372
          Add .patch or .diff if you want a plain text version

          Show
          Daniel Neis added a comment - Hello, i've setup a git reposiroty in github, and now you can see the last version of patch at https://github.com/danielneis/moodle/compare/master...MDL-25372 Add .patch or .diff if you want a plain text version
          Hide
          Daniel Neis added a comment -

          Hello,

          here is a patch for Moodle 1.9 that uses temporary table to sync users, allow password change in remote database and adds a "password table" option to update password in case a view is used to authenticate/create users.

          Hope you like.

          Daniel

          Show
          Daniel Neis added a comment - Hello, here is a patch for Moodle 1.9 that uses temporary table to sync users, allow password change in remote database and adds a "password table" option to update password in case a view is used to authenticate/create users. Hope you like. Daniel
          Hide
          Daniel Neis added a comment -

          Hello,

          i've updated the code for 2.0 to use the temporary table also in user addition.
          Also, i've added the "password table" option to 2.0 too.

          You can see the complete patch in https://github.com/danielneis/moodle/compare/master...MDL-25372

          Show
          Daniel Neis added a comment - Hello, i've updated the code for 2.0 to use the temporary table also in user addition. Also, i've added the "password table" option to 2.0 too. You can see the complete patch in https://github.com/danielneis/moodle/compare/master...MDL-25372
          Hide
          Daniel Neis added a comment -

          The patch commentd above (18/may) solves this and MDL-25373 also.

          Show
          Daniel Neis added a comment - The patch commentd above (18/may) solves this and MDL-25373 also.
          Hide
          Andrew Nicols added a comment -

          I've been working on some improvements on the current implementation of the db authentication plugin.

          The improvements I've been working on do not use temporary tables, but drastically reduce both the memory consumption and the number of database queries. As a result, the time for a complete run of the sync is also drastically reduced.

          My test system is as follows:

          • Postgres 9.0 with out-of-the-box configuration (no tuning)
          • Moodle on Master

          My test database has 15,000 records.

          Using the current implementation of the db auth mechanism, I'm seeing something like the following. This is a straight compare where no data has changed.

          www-data@moodle:/home/nicols/git/software/moodle$ time php auth/db/cli/sync_users.php > /dev/null
          PERF: time: 554.768607s memory_total: 138010140B (131.6MB) memory_growth: 137127328B (130.8MB) memory_peak: 155578500B (148.4MB) includecount: 77 contextswithfilters: 0 filterscreated: 0 textsfiltered: 0 stringsfiltered: 0 langcountgetstring: 15005 langcountmemcache: 15002 langcountdiskcache: 0 includedyuimodules: 54 includedjsmodules: 2 db reads/writes: 30013/0 ticks: 55484 user: 17319 sys: 520 cuser: 0 csys: 0 serverload: 1.70

          real 9m14.861s
          user 2m53.223s
          sys 0m5.212s

          Notice in particular the memory growth/total, and the db reads/writes

          I've heavily optimised it now such that:

          www-data@moodle:/home/nicols/git/software/moodle$ time php auth/db/cli/sync_users.php
          PERF: time: 8.187989s memory_total: 15399384B (14.7MB) memory_growth: 14517032B (13.8MB) memory_peak: 51300208B (48.9MB) includecount: 43 contextswithfilters: 0 filterscreated: 0 textsfiltered: 0 stringsfiltered: 0 langcountgetstring: 5 langcountmemcache: 3 langcountdiskcache: 0 includedyuimodules: 54 includedjsmodules: 2 db reads/writes: 16/0 ticks: 822 user: 156 sys: 7 cuser: 0 csys: 0 serverload: 0.95

          real 0m8.259s
          user 0m1.592s
          sys 0m0.080s

          The memory use for the same dataset, and the same operation has gone from 131.6MB to 14.7
          reads have gone from 30013, to 16.

          Looking at the code in further detail, the update code seems to do the following:

          • Select all users from your external db
          • Select all users from moodle who need updating
          • Loop through each user needing updating
          • Retrieve the user from the moodle database again
          • Retrieve the user from the external database again
          • Loop through each key that is set to onlogin
          • Compare each value in turn
          • If the value is different, call set_field on it

          As a result, if you have 15,000 users, and you change three fields on each (e.g. set the City, State, and Country), you'll be doing:

          • One select to initially retrieve all records from moodle
          • One select to initially retrieve all records from the external DB
          • 15,000 selects to select each individual record from moodle
          • 15,000 selects to select each individual record from the external DB
          • 3 set_field operations per 15,000 records.

          You'd have a total of:
          1 + 1 + ( 1 * 15,000 ) + ( 1 * 15,000 ) = 30,002 selects
          3 * 15,000 = 45,000 updates

          I've made the same optimisation with both ADODB, and with the moodle database stuff. Both give massive performance boosts over the current.
          The standard moodle database engine uses marginally less memory than ADODB (0.5MB in my test)

          My tests show similar performance gains for all operations (create, undelete, update, delete).

          I'm happy to submit my changes to Moodle Core, but have on question first:

          At present, the db auth plugin uses adodb. Moodle has since introduced it's own database system, but this doesn't have the same support as ADODB.
          I've tested by changes with both of these and it's very little work to submit either – what is the preference upstream?

          Show
          Andrew Nicols added a comment - I've been working on some improvements on the current implementation of the db authentication plugin. The improvements I've been working on do not use temporary tables, but drastically reduce both the memory consumption and the number of database queries. As a result, the time for a complete run of the sync is also drastically reduced. My test system is as follows: Postgres 9.0 with out-of-the-box configuration (no tuning) Moodle on Master My test database has 15,000 records. Using the current implementation of the db auth mechanism, I'm seeing something like the following. This is a straight compare where no data has changed. www-data@moodle:/home/nicols/git/software/moodle$ time php auth/db/cli/sync_users.php > /dev/null PERF: time: 554.768607s memory_total: 138010140B (131.6MB) memory_growth: 137127328B (130.8MB) memory_peak: 155578500B (148.4MB) includecount: 77 contextswithfilters: 0 filterscreated: 0 textsfiltered: 0 stringsfiltered: 0 langcountgetstring: 15005 langcountmemcache: 15002 langcountdiskcache: 0 includedyuimodules: 54 includedjsmodules: 2 db reads/writes: 30013/0 ticks: 55484 user: 17319 sys: 520 cuser: 0 csys: 0 serverload: 1.70 real 9m14.861s user 2m53.223s sys 0m5.212s Notice in particular the memory growth/total, and the db reads/writes I've heavily optimised it now such that: www-data@moodle:/home/nicols/git/software/moodle$ time php auth/db/cli/sync_users.php PERF: time: 8.187989s memory_total: 15399384B (14.7MB) memory_growth: 14517032B (13.8MB) memory_peak: 51300208B (48.9MB) includecount: 43 contextswithfilters: 0 filterscreated: 0 textsfiltered: 0 stringsfiltered: 0 langcountgetstring: 5 langcountmemcache: 3 langcountdiskcache: 0 includedyuimodules: 54 includedjsmodules: 2 db reads/writes: 16/0 ticks: 822 user: 156 sys: 7 cuser: 0 csys: 0 serverload: 0.95 real 0m8.259s user 0m1.592s sys 0m0.080s The memory use for the same dataset, and the same operation has gone from 131.6MB to 14.7 reads have gone from 30013, to 16. Looking at the code in further detail, the update code seems to do the following: Select all users from your external db Select all users from moodle who need updating Loop through each user needing updating Retrieve the user from the moodle database again Retrieve the user from the external database again Loop through each key that is set to onlogin Compare each value in turn If the value is different, call set_field on it As a result, if you have 15,000 users, and you change three fields on each (e.g. set the City, State, and Country), you'll be doing: One select to initially retrieve all records from moodle One select to initially retrieve all records from the external DB 15,000 selects to select each individual record from moodle 15,000 selects to select each individual record from the external DB 3 set_field operations per 15,000 records. You'd have a total of: 1 + 1 + ( 1 * 15,000 ) + ( 1 * 15,000 ) = 30,002 selects 3 * 15,000 = 45,000 updates I've made the same optimisation with both ADODB, and with the moodle database stuff. Both give massive performance boosts over the current. The standard moodle database engine uses marginally less memory than ADODB (0.5MB in my test) My tests show similar performance gains for all operations (create, undelete, update, delete). I'm happy to submit my changes to Moodle Core, but have on question first: At present, the db auth plugin uses adodb. Moodle has since introduced it's own database system, but this doesn't have the same support as ADODB. I've tested by changes with both of these and it's very little work to submit either – what is the preference upstream?
          Hide
          Anthony Borrow added a comment -

          I am adding the patch label since Daniel provided one which he is using in production. Tis seems to fit in with some of the recent emphasis on improving performance in various places. Peace - Anthony

          Show
          Anthony Borrow added a comment - I am adding the patch label since Daniel provided one which he is using in production. Tis seems to fit in with some of the recent emphasis on improving performance in various places. Peace - Anthony
          Hide
          Daniel Neis added a comment -

          this patch was expanded by MDL-25373

          Show
          Daniel Neis added a comment - this patch was expanded by MDL-25373
          Hide
          Michael de Raadt added a comment -

          Does that now mean this issue can be closed?

          Show
          Michael de Raadt added a comment - Does that now mean this issue can be closed?
          Hide
          Daniel Neis added a comment -

          Hello,

          yes, this issue can be closed =)

          Thanks,
          Daniel

          Show
          Daniel Neis added a comment - Hello, yes, this issue can be closed =) Thanks, Daniel

            People

            • Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: