Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-68183

Searching for user by email (case-insensitive + accent-sensitive) is too expensive on MySQL with many users

    XMLWordPrintable

    Details

      Description

      This has been raised by Eduard Cercós on our https://moodle.org site. We are experiencing several slow SQL queries triggered e.g. on the /login/forgot_password.php, rsslib.php or the signup page. They significantly increase the DB load (queries take up to 30 secs to finish on our prod site) and may represent a weak point and a surface for potential DoS attacks.

      I was able to trace it down and confirm it. There is a query in core_login_process_password_reset() where we search for the user record matching the provided email. That search is intended to be case insensitive yet accent sensitive. As such it leads to SQL comparison

      WHERE LOWER(email) LIKE LOWER('...') COLLATE 'utf8mb4_bin' ESCAPE '|'
      

      Likely because the LOWER function is involved, the DB engine does not make use of the index for the email column and instead, it performs a sequential scan:

      mysql> EXPLAIN SELECT * FROM user WHERE LOWER(email) LIKE LOWER('david@moodle.org') COLLATE 'utf8mb4_bin' ESCAPE '|' AND mnethostid = 10 AND deleted = 0 AND suspended = 0 LIMIT 0,1\G
      *************************** 1. row ***************************
                 id: 1
        select_type: SIMPLE
              table: user
         partitions: NULL
               type: ref
      possible_keys: user_mneuse_uix,user_deleted
                key: user_mneuse_uix
            key_len: 8
                ref: const
               rows: 1100595
           filtered: 5.00
              Extra: Using where
      

      As you can see, in our case, that means that more than 1 million records have to be checked and the LOWER comparison performed for each of them.

      This can improve significantly if the search is both case insensitive and accent insensitive. In that case, the MySQL uses the comparison

      WHERE email LIKE '...' COLLATE 'utf8mb4_unicode_ci' ESCAPE '|'
      

      that can make use of the index over the email column:

      mysql> EXPLAIN SELECT * FROM user WHERE email LIKE 'david@moodle.org' COLLATE 'utf8mb4_unicode_ci' ESCAPE '|' AND mnethostid = 10 AND deleted = 0 AND suspended = 0 LIMIT 0,1\G
      *************************** 1. row ***************************
                 id: 1
        select_type: SIMPLE
              table: user
         partitions: NULL
               type: range
      possible_keys: user_mneuse_uix,user_deleted,user_ema_ix
                key: user_ema_ix
            key_len: 402
                ref: NULL
               rows: 1
           filtered: 5.00
              Extra: Using index condition; Using where
      

      This pattern repeats in all other reported places and happens everywhere where we search for users by email.

      My proposal is to change these functions so that

      • they perform the accent insensitive search first to get a short list of candidate users
      • and only then perform another accent sensitive search just among them using the additional id IN (...) condition.

      This is also a good opportunity to rewrite the existing LIKE statements into proper comparison statements via the sql_equal() method (had not existed yet when some of these queries were implemented).

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              mudrd8mz David Mudrák (@mudrd8mz)
              Reporter:
              mudrd8mz David Mudrák (@mudrd8mz)
              Peer reviewer:
              Peter Dias
              Integrator:
              Eloy Lafuente (stronk7)
              Tester:
              CiBoT
              Participants:
              Component watchers:
              Andrew Nicols, Jun Pataleta, Michael Hawkins, Shamim Rezaie, Simey Lameze, Matteo Scaramuccia, Andrew Nicols, Jun Pataleta, Michael Hawkins, Shamim Rezaie, Simey Lameze
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Fix Release Date:
                11/May/20

                  Time Tracking

                  Estimated:
                  Original Estimate - 0 minutes
                  0m
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 1 day, 2 hours, 5 minutes
                  1d 2h 5m