Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-68183

Searching for user by email (case-insensitive + accent-sensitive) is too expensive on MySQL with many users




      This has been raised by eduardcercos on our https://moodle.org site. We are experiencing several slow SQL queries triggered e.g. on the /login/forgot_password.php, rsslib.php or the signup page. They significantly increase the DB load (queries take up to 30 secs to finish on our prod site) and may represent a weak point and a surface for potential DoS attacks.

      I was able to trace it down and confirm it. There is a query in core_login_process_password_reset() where we search for the user record matching the provided email. That search is intended to be case insensitive yet accent sensitive. As such it leads to SQL comparison

      WHERE LOWER(email) LIKE LOWER('...') COLLATE 'utf8mb4_bin' ESCAPE '|'

      Likely because the LOWER function is involved, the DB engine does not make use of the index for the email column and instead, it performs a sequential scan:

      mysql> EXPLAIN SELECT * FROM user WHERE LOWER(email) LIKE LOWER('david@moodle.org') COLLATE 'utf8mb4_bin' ESCAPE '|' AND mnethostid = 10 AND deleted = 0 AND suspended = 0 LIMIT 0,1\G
      *************************** 1. row ***************************
                 id: 1
        select_type: SIMPLE
              table: user
         partitions: NULL
               type: ref
      possible_keys: user_mneuse_uix,user_deleted
                key: user_mneuse_uix
            key_len: 8
                ref: const
               rows: 1100595
           filtered: 5.00
              Extra: Using where

      As you can see, in our case, that means that more than 1 million records have to be checked and the LOWER comparison performed for each of them.

      This can improve significantly if the search is both case insensitive and accent insensitive. In that case, the MySQL uses the comparison

      WHERE email LIKE '...' COLLATE 'utf8mb4_unicode_ci' ESCAPE '|'

      that can make use of the index over the email column:

      mysql> EXPLAIN SELECT * FROM user WHERE email LIKE 'david@moodle.org' COLLATE 'utf8mb4_unicode_ci' ESCAPE '|' AND mnethostid = 10 AND deleted = 0 AND suspended = 0 LIMIT 0,1\G
      *************************** 1. row ***************************
                 id: 1
        select_type: SIMPLE
              table: user
         partitions: NULL
               type: range
      possible_keys: user_mneuse_uix,user_deleted,user_ema_ix
                key: user_ema_ix
            key_len: 402
                ref: NULL
               rows: 1
           filtered: 5.00
              Extra: Using index condition; Using where

      This pattern repeats in all other reported places and happens everywhere where we search for users by email.

      My proposal is to change these functions so that

      • they perform the accent insensitive search first to get a short list of candidate users
      • and only then perform another accent sensitive search just among them using the additional id IN (...) condition.

      This is also a good opportunity to rewrite the existing LIKE statements into proper comparison statements via the sql_equal() method (had not existed yet when some of these queries were implemented).


        Issue Links



              mudrd8mz David Mudrák (@mudrd8mz)
              mudrd8mz David Mudrák (@mudrd8mz)
              Peter Dias Peter Dias
              Eloy Lafuente (stronk7) Eloy Lafuente (stronk7)
              CiBoT CiBoT
              1 Vote for this issue
              6 Start watching this issue



                Time Tracking

                  Original Estimate - 0 minutes
                  Remaining Estimate - 0 minutes
                  Time Spent - 1 day, 2 hours, 5 minutes
                  1d 2h 5m


                    Error rendering 'clockify-timesheets-time-tracking-reports:timer-sidebar'. Please contact your Jira administrators.