Moodle
  1. Moodle
  2. MDL-29315 Case of Emails should be properly handeled
  3. MDL-29035

Restoring a backup with user data from another site fails because of email address upper/lower case

    Details

    • Rank:
      19324

      Description

      When restoring a course from another system, this error is seen when the email is the same just parts are different case :-

      Trying to restore user 'tlock' from backup file will cause conflict

      tim.lock@xxxx.com.au in Moodle
      Tim.Lock@xxxx.com.au in Backup

        Issue Links

          Activity

          Hide
          Tim Lock added a comment -

          Pull Request for 2.0.4+ from https://github.com/tlock/moodle/commit/036127b5063483b95fb70332d0dc14665197d0d5

          Same patch should apply to 2.1.1 + 2.2

          Show
          Tim Lock added a comment - Pull Request for 2.0.4+ from https://github.com/tlock/moodle/commit/036127b5063483b95fb70332d0dc14665197d0d5 Same patch should apply to 2.1.1 + 2.2
          Hide
          Michael de Raadt added a comment -

          Thanks for reporting that and providing a solution.

          Show
          Michael de Raadt added a comment - Thanks for reporting that and providing a solution.
          Hide
          Ankit Agarwal added a comment -

          This is an expected behavior, not a bug.
          Moodle does email match using "Simple Database Equal operator(=)". This is supposed to be case-sensitive in most cases. However based on your collation it might be behaving as case-insensitive.
          For example Postgre sql is case-sensitive by default where as if we use MYSQL with utf8_general_ci collation, its case-insensitive.
          So basically Moodle does an "=" match for emails, and depending on the database and settings it can be case sensitive or insensitive.
          Same kind of match is also followed while creating a user manually or via bulk upload from admin panel.
          Work around is obviously use same case emails in both backup and DB.
          Thanks

          Show
          Ankit Agarwal added a comment - This is an expected behavior, not a bug. Moodle does email match using "Simple Database Equal operator(=)". This is supposed to be case-sensitive in most cases. However based on your collation it might be behaving as case-insensitive. For example Postgre sql is case-sensitive by default where as if we use MYSQL with utf8_general_ci collation, its case-insensitive. So basically Moodle does an "=" match for emails, and depending on the database and settings it can be case sensitive or insensitive. Same kind of match is also followed while creating a user manually or via bulk upload from admin panel. Work around is obviously use same case emails in both backup and DB. Thanks
          Hide
          Aparup Banerjee added a comment -

          hm, maybe we should standardize lowercase domain part of email addresses and add a config option (sitewide or just as a config to backup?) to enforce lowercase the mailbox part of the address? - to ease portability of backups.

          Show
          Aparup Banerjee added a comment - hm, maybe we should standardize lowercase domain part of email addresses and add a config option (sitewide or just as a config to backup?) to enforce lowercase the mailbox part of the address? - to ease portability of backups.
          Hide
          Ankit Agarwal added a comment - - edited

          Yes standardizing lower case domain part of the email is good idea.But to maintain consistency through out the site, IMO we should implement this site-wide and not just for backup..

          Show
          Ankit Agarwal added a comment - - edited Yes standardizing lower case domain part of the email is good idea.But to maintain consistency through out the site, IMO we should implement this site-wide and not just for backup..
          Hide
          moodle.com added a comment -

          Adding a test comment.

          Show
          moodle.com added a comment - Adding a test comment.
          Hide
          Michael de Raadt added a comment -

          Another test message.

          Show
          Michael de Raadt added a comment - Another test message.
          Hide
          Ankit Agarwal added a comment -

          In this patch I have made email match case-insensitive for the given case only, sitewide changes will be tracked by MDL-29315

          Show
          Ankit Agarwal added a comment - In this patch I have made email match case-insensitive for the given case only, sitewide changes will be tracked by MDL-29315
          Hide
          Aparup Banerjee added a comment -

          Thanks for that Ankit, the patch looks good.

          Show
          Aparup Banerjee added a comment - Thanks for that Ankit, the patch looks good.
          Hide
          Ankit Agarwal added a comment -

          Thanks Aparup for the review

          Show
          Ankit Agarwal added a comment - Thanks Aparup for the review
          Hide
          Sam Hemelryk added a comment -

          Hi Ankit,

          I'm failing this as there is a regression caused by the sql_like statement thats been added.
          Some database engines treat _'s as a wildcard and as email addresses can contain wildcards we need to properly escape it.
          Have a look at login/forgot_password.php ln: 137+138. Essentially you need to provide the escape char to sql_like and then use sql_like_escape on the email address using the same escape char.

          On a side note when putting forward a patch that has been provided by someone else it is nice to give credit to them as the author by leaving them, or making them the author. If you are recommitting there work you can do this as part of the commit message using `--author='Joe Blogs <joe@blogs.local>'`.
          Once you have made reasonable changes to someone else's proposed patch you can commit it as yourself again or you can always make a second commit so that the original author still gets some credit.

          Cheers
          Sam

          Show
          Sam Hemelryk added a comment - Hi Ankit, I'm failing this as there is a regression caused by the sql_like statement thats been added. Some database engines treat _'s as a wildcard and as email addresses can contain wildcards we need to properly escape it. Have a look at login/forgot_password.php ln: 137+138. Essentially you need to provide the escape char to sql_like and then use sql_like_escape on the email address using the same escape char. On a side note when putting forward a patch that has been provided by someone else it is nice to give credit to them as the author by leaving them, or making them the author. If you are recommitting there work you can do this as part of the commit message using `--author='Joe Blogs <joe@blogs.local>'` . Once you have made reasonable changes to someone else's proposed patch you can commit it as yourself again or you can always make a second commit so that the original author still gets some credit. Cheers Sam
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Sorry no way this will apply before deciding about this globally @ MDL-29315

          Show
          Eloy Lafuente (stronk7) added a comment - Sorry no way this will apply before deciding about this globally @ MDL-29315
          Hide
          Ankit Agarwal added a comment -

          Made changes to patch as per the discussions @MDL-29315

          Show
          Ankit Agarwal added a comment - Made changes to patch as per the discussions @ MDL-29315
          Hide
          Aparup Banerjee added a comment - - edited

          looks fine to me. UPPER() seems to be cross db
          note: typo in commit msgs for stable branches.

          Show
          Aparup Banerjee added a comment - - edited looks fine to me. UPPER() seems to be cross db note: typo in commit msgs for stable branches.
          Hide
          Ankit Agarwal added a comment -

          Thanks for review. up for integration now..

          Show
          Ankit Agarwal added a comment - Thanks for review. up for integration now..
          Hide
          Eloy Lafuente (stronk7) added a comment -

          The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week.

          TIA and ciao

          Show
          Eloy Lafuente (stronk7) added a comment - The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week. TIA and ciao
          Hide
          Eloy Lafuente (stronk7) added a comment - - edited

          Uhm... Ankit... one small detail... do email addressed support accents or other multibyte symbols? I say that because strtoupper() is not utf8 savy (relies on locale that cannot be guarantied to be using utf8 encoding).

          So you need to perform proper utf8 uppercase calls to both sides in the condition. And the easiest way to achieve this is to use UPPER() in both sides of the SQL and done - note that alternatively you could use texlib::strtoupper() in the PHP side too - , but I think it's clearer if the query already represents the complete logic).

          Also, if I'm not wrong that function requires the UPPER() comparison to be performed in more places (for deleted users, for example, the email address is stored into the username field).

          So I guess you need to review a bunch of points in that function (1B, 1C?, 2A, 2B1, 2B2, 2C?)

          So reopening, who said life was easy, lol, ciao

          Edited: Just thinking if that function is covered by unittests, surely if would be perfect to simulate one users table with some records covering normal and deleted users with lower/mixed case emails, and then test it against some $user objects, also normal and deleted and lower/mixed, both with $samesite = true (1X points) and false (2X points).

          Show
          Eloy Lafuente (stronk7) added a comment - - edited Uhm... Ankit... one small detail... do email addressed support accents or other multibyte symbols? I say that because strtoupper() is not utf8 savy (relies on locale that cannot be guarantied to be using utf8 encoding). So you need to perform proper utf8 uppercase calls to both sides in the condition. And the easiest way to achieve this is to use UPPER() in both sides of the SQL and done - note that alternatively you could use texlib::strtoupper() in the PHP side too - , but I think it's clearer if the query already represents the complete logic). Also, if I'm not wrong that function requires the UPPER() comparison to be performed in more places (for deleted users, for example, the email address is stored into the username field). So I guess you need to review a bunch of points in that function (1B, 1C?, 2A, 2B1, 2B2, 2C?) So reopening, who said life was easy, lol, ciao Edited: Just thinking if that function is covered by unittests, surely if would be perfect to simulate one users table with some records covering normal and deleted users with lower/mixed case emails, and then test it against some $user objects, also normal and deleted and lower/mixed, both with $samesite = true (1X points) and false (2X points).
          Hide
          Ankit Agarwal added a comment -

          Hi Eloy,
          1) -> I did more study on the "multi-byte" character support of email address, and looks like email address can contain multi-byte characters (as multi-byte characters are also allowed in domain names)
          http://en.wikipedia.org/wiki/Email_address#Syntax
          http://tools.ietf.org/html/rfc3696#page-5
          2) -> From RFC

          In the context of local parts,
          apostrophe ("'") and acute accent ("`") are ordinary characters, not
          quoting characters. (so they are allowed)
          

          3) -> about (1B, 1C?, 2A, 2B1, 2B2, 2C?) I was expecting those to be considered in detail @MDL-29316.

          Show
          Ankit Agarwal added a comment - Hi Eloy, 1) -> I did more study on the "multi-byte" character support of email address, and looks like email address can contain multi-byte characters (as multi-byte characters are also allowed in domain names) http://en.wikipedia.org/wiki/Email_address#Syntax http://tools.ietf.org/html/rfc3696#page-5 2) -> From RFC In the context of local parts, apostrophe ( "'" ) and acute accent ( "`" ) are ordinary characters, not quoting characters. (so they are allowed) 3) -> about (1B, 1C?, 2A, 2B1, 2B2, 2C?) I was expecting those to be considered in detail @ MDL-29316 .
          Hide
          Eloy Lafuente (stronk7) added a comment -

          for the records: after chat it has been agreed to continue here, review all he points with upper() in both sides and create followup for the unit tests.

          Show
          Eloy Lafuente (stronk7) added a comment - for the records: after chat it has been agreed to continue here, review all he points with upper() in both sides and create followup for the unit tests.
          Hide
          Michael de Raadt added a comment -

          Reverting the affected versions as they were changed in error.

          Show
          Michael de Raadt added a comment - Reverting the affected versions as they were changed in error.
          Hide
          Ankit Agarwal added a comment -

          Adding Eloy to Watchers

          Show
          Ankit Agarwal added a comment - Adding Eloy to Watchers
          Hide
          Ankit Agarwal added a comment -

          Cases:-

          • 1A - No changes needed
          • 1B - username (db) and email (backup) match should use UPPER()
          • 1C - email (db) and trimmed username (backup) match should use UPPER()
          • 1D - No changes needed
          • 2A - email (db) and email (backup) match should use UPPER()
          • 2B1 - username (db) and email (backup) match should use UPPER()
          • 2B2 - username (db) and email (backup) match should use UPPER()
          • 2C - email (db) and trimmed username (backup) match should use UPPER()
          • 2D - email (db) and email (backup) match should use UPPER()
          Show
          Ankit Agarwal added a comment - Cases:- 1A - No changes needed 1B - username (db) and email (backup) match should use UPPER() 1C - email (db) and trimmed username (backup) match should use UPPER() 1D - No changes needed 2A - email (db) and email (backup) match should use UPPER() 2B1 - username (db) and email (backup) match should use UPPER() 2B2 - username (db) and email (backup) match should use UPPER() 2C - email (db) and trimmed username (backup) match should use UPPER() 2D - email (db) and email (backup) match should use UPPER()
          Hide
          Ankit Agarwal added a comment -

          Eloy can you have a look at this when you got some time?
          Thanks

          Show
          Ankit Agarwal added a comment - Eloy can you have a look at this when you got some time? Thanks
          Hide
          Eloy Lafuente (stronk7) added a comment -

          I think this looks perfect Ankit, thanks!

          The only little remaining detail is about to try to cover that method with some unit-tests, so we can guarantee it continue working perfectly along the time, each case should have some tests, with different cases, mnhosts, first accesses...

          Feel free to create followup about that, TIA! Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - I think this looks perfect Ankit, thanks! The only little remaining detail is about to try to cover that method with some unit-tests, so we can guarantee it continue working perfectly along the time, each case should have some tests, with different cases, mnhosts, first accesses... Feel free to create followup about that, TIA! Ciao
          Hide
          Ankit Agarwal added a comment - - edited

          This needs to be tested in all supported DB

          Here is a copy paste of the logic of all cases

          *  If restoring users from same site backup:
              *      1A - Normal check: If match by id and username and mnethost  => ok, return target user
              *      1B - Handle users deleted in DB and "alive" in backup file:
              *           If match by id and mnethost and user is deleted in DB and
              *           (match by username LIKE 'backup_email.%' or by non empty email = md5(username)) => ok, return target user
              *      1C - Handle users deleted in backup file and "alive" in DB:
              *           If match by id and mnethost and user is deleted in backup file
              *           and match by email = email_without_time(backup_email) => ok, return target user
              *      1D - Conflict: If match by username and mnethost and doesn't match by id => conflict, return false
              *      1E - None of the above, return true => User needs to be created
              *
              *  if restoring from another site backup (cannot match by id here, replace it by email/firstaccess combination):
              *      2A - Normal check: If match by username and mnethost and (email or non-zero firstaccess) => ok, return target user
              *      2B - Handle users deleted in DB and "alive" in backup file:
              *           2B1 - If match by mnethost and user is deleted in DB and not empty email = md5(username) and
              *                 (username LIKE 'backup_email.%' or non-zero firstaccess) => ok, return target user
              *           2B2 - If match by mnethost and user is deleted in DB and
              *                 username LIKE 'backup_email.%' and non-zero firstaccess) => ok, return target user
              *                 (to cover situations were md5(username) wasn't implemented on delete we requiere both)
              *      2C - Handle users deleted in backup file and "alive" in DB:
              *           If match mnethost and user is deleted in backup file
              *           and by email = email_without_time(backup_email) and non-zero firstaccess=> ok, return target user
              *      2D - Conflict: If match by username and mnethost and not by (email or non-zero firstaccess) => conflict, return false
              *      2E - None of the above, return true => User needs to be created
              *
              * Note: for DB deleted users email is stored in username field, hence we
              *       are looking there for emails. See delete_user()
              * Note: for DB deleted users md5(username) is stored *sometimes* in the email field,
              *       hence we are looking there for usernames if not empty. See delete_user()
          

          Steps to test

          1. Create a course
          2. Create two user and enrol into the course as student (lets say the email of the student is xyz@localhost.com)
          3. Backup the course
          4. Change the case of email in the backup file (users.xml) to something like XyZ@LocalHost.CoM (for both users)
          5. Now we need to recreate all possible 9 cases. Below I have tried to do so:-

          Test 1A
          Just restore the backup into a new course
          Test 1B
          Delete Student1 from Admin panel
          Try to restore the course
          Test 1C
          For student 2 notedown the values in the user table
          Delete the student from admin panel
          Backup the course
          Goto database and revert to old values for student2 (this should restore the student profile)
          Restore the course

          Test 1D
          No testing needed

          Testing second set of cases

          • Follow steps 1-4
          • Change original_site_identifier_hash in moodle_backup.xml
          • Repeat all about tests
          • If I missed out any cases that you can think of please test those as well.
            If MDL-30080 is already fixed, run the associated unit tests
            Repeat all Tests for all supporting DBs

          If no conflict/error is generated we can assume all is good

          Show
          Ankit Agarwal added a comment - - edited This needs to be tested in all supported DB Here is a copy paste of the logic of all cases * If restoring users from same site backup: * 1A - Normal check: If match by id and username and mnethost => ok, return target user * 1B - Handle users deleted in DB and "alive" in backup file: * If match by id and mnethost and user is deleted in DB and * (match by username LIKE 'backup_email.%' or by non empty email = md5(username)) => ok, return target user * 1C - Handle users deleted in backup file and "alive" in DB: * If match by id and mnethost and user is deleted in backup file * and match by email = email_without_time(backup_email) => ok, return target user * 1D - Conflict: If match by username and mnethost and doesn't match by id => conflict, return false * 1E - None of the above, return true => User needs to be created * * if restoring from another site backup (cannot match by id here, replace it by email/firstaccess combination): * 2A - Normal check: If match by username and mnethost and (email or non-zero firstaccess) => ok, return target user * 2B - Handle users deleted in DB and "alive" in backup file: * 2B1 - If match by mnethost and user is deleted in DB and not empty email = md5(username) and * (username LIKE 'backup_email.%' or non-zero firstaccess) => ok, return target user * 2B2 - If match by mnethost and user is deleted in DB and * username LIKE 'backup_email.%' and non-zero firstaccess) => ok, return target user * (to cover situations were md5(username) wasn't implemented on delete we requiere both) * 2C - Handle users deleted in backup file and "alive" in DB: * If match mnethost and user is deleted in backup file * and by email = email_without_time(backup_email) and non-zero firstaccess=> ok, return target user * 2D - Conflict: If match by username and mnethost and not by (email or non-zero firstaccess) => conflict, return false * 2E - None of the above, return true => User needs to be created * * Note: for DB deleted users email is stored in username field, hence we * are looking there for emails. See delete_user() * Note: for DB deleted users md5(username) is stored *sometimes* in the email field, * hence we are looking there for usernames if not empty. See delete_user() Steps to test Create a course Create two user and enrol into the course as student (lets say the email of the student is xyz@localhost.com) Backup the course Change the case of email in the backup file (users.xml) to something like XyZ@LocalHost.CoM (for both users) Now we need to recreate all possible 9 cases. Below I have tried to do so:- Test 1A Just restore the backup into a new course Test 1B Delete Student1 from Admin panel Try to restore the course Test 1C For student 2 notedown the values in the user table Delete the student from admin panel Backup the course Goto database and revert to old values for student2 (this should restore the student profile) Restore the course Test 1D No testing needed Testing second set of cases Follow steps 1-4 Change original_site_identifier_hash in moodle_backup.xml Repeat all about tests If I missed out any cases that you can think of please test those as well. If MDL-30080 is already fixed, run the associated unit tests Repeat all Tests for all supporting DBs If no conflict/error is generated we can assume all is good
          Hide
          Aparup Banerjee added a comment -

          Peer-review: this looks good to me, noted that unit tests will be handled in separate issue.

          Show
          Aparup Banerjee added a comment - Peer-review: this looks good to me, noted that unit tests will be handled in separate issue.
          Hide
          Ankit Agarwal added a comment -

          Great! Thanks Aparup for the review! Integration time!
          Thanks

          Show
          Ankit Agarwal added a comment - Great! Thanks Aparup for the review! Integration time! Thanks
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Integrated, thanks!

          Show
          Eloy Lafuente (stronk7) added a comment - Integrated, thanks!
          Hide
          Eloy Lafuente (stronk7) added a comment -

          I'm going to test this with a backup of http://school.demo.moodle.net/course/view.php?id=115 (that is known to have deleted users) against all DBs. This will cover 50% of the cases, the solution applied to the other 50% is 100% the same, so should work.

          Show
          Eloy Lafuente (stronk7) added a comment - I'm going to test this with a backup of http://school.demo.moodle.net/course/view.php?id=115 (that is known to have deleted users) against all DBs. This will cover 50% of the cases, the solution applied to the other 50% is 100% the same, so should work.
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Tested under the 4DBs with backup of the course above containing 2 deleted users. I think it's enough (the "samesite" part was executed too but with one course not having deleted users).

          So I think it's 95% ok to pass this, thanks and ciao

          Show
          Eloy Lafuente (stronk7) added a comment - Tested under the 4DBs with backup of the course above containing 2 deleted users. I think it's enough (the "samesite" part was executed too but with one course not having deleted users). So I think it's 95% ok to pass this, thanks and ciao
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Yes, you got this finally upstream, just in time for Moodle 2.2beta. Congrats and thanks!

          Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - Yes, you got this finally upstream, just in time for Moodle 2.2beta. Congrats and thanks! Ciao

            People

            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: