Moodle
  1. Moodle
  2. MDL-31484

Repeated restores of anonymised courses does not work

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.1.4, 2.2.3, 2.3, 2.3.4, 2.4.1
    • Fix Version/s: 2.3.5, 2.4.2
    • Component/s: Backup
    • Labels:
    • Environment:
      Centos 5.x x86_64, php 5.3.x, apache 2.2, postgresql 8.4
    • Database:
      PostgreSQL
    • Testing Instructions:
      Hide

      On a course with at least one user, run a backup with 'users', 'anonymize users' and 'user data' (for all activities) selected
      Restore this course, creating a new course and including the user data
      Restore the course a second time, again creating a new course and including the user data

      The course should restore fine, without any complaints about user conflicts. Any activities with user data should still have that data, but the users should all be anonymised versions.

      Show
      On a course with at least one user, run a backup with 'users', 'anonymize users' and 'user data' (for all activities) selected Restore this course, creating a new course and including the user data Restore the course a second time, again creating a new course and including the user data The course should restore fine, without any complaints about user conflicts. Any activities with user data should still have that data, but the users should all be anonymised versions.
    • Workaround:
      Hide

      Manually deleting the anonymous users before restoring the course a second time works around the issue, although for any subsequent course restore attempts we are back at square one. Furthermore, deleting the users is problematic since it has to be done by the Administrator (aka: me) and in our institution we try to enable a fully self-serviceable workflow for our teachers. Moreover, we are a bit reluctant to delete conflicting anon users since we do not know how the activities and resources in already existing courses react if the corresponding user is removed.

      It is a pity that these problems exist, because the ability to restore anonymous user data into courses is one of the more frequently asked features in our institution.

      Show
      Manually deleting the anonymous users before restoring the course a second time works around the issue, although for any subsequent course restore attempts we are back at square one. Furthermore, deleting the users is problematic since it has to be done by the Administrator (aka: me) and in our institution we try to enable a fully self-serviceable workflow for our teachers. Moreover, we are a bit reluctant to delete conflicting anon users since we do not know how the activities and resources in already existing courses react if the corresponding user is removed. It is a pity that these problems exist, because the ability to restore anonymous user data into courses is one of the more frequently asked features in our institution.
    • Affected Branches:
      MOODLE_21_STABLE, MOODLE_22_STABLE, MOODLE_23_STABLE, MOODLE_24_STABLE
    • Fixed Branches:
      MOODLE_23_STABLE, MOODLE_24_STABLE
    • Pull Master Branch:
      MDL-31484_anonymous_user_clash
    • Rank:
      38014

      Description

      Probably related to the suggestions made for (MDL-22158).

      Replication steps:

      1. Go to a course
      2. backup -> Tick Anonymise User Information -> Create Course Backup
      3. restore backup to new course -> this works
      4. restore same backup file to another new course ->

      In step "6. Process" a bunch of error messages
      "Trying to restore user 'anon1' from backup file will cause conflict" appears.

      Due to this error messages, it seems that step 6 can not be finished, as there is only a Cancel button but no Next button. The second course does not appear to be created

      What does not work
      -----------------------------------------------------------
      Restoring an anonymised course backup file for the first time into a new course creates a bunch of pseudo-users. Trying to restore the same backup file the second time yields a list of errors
      "Trying to restore user 'anon1' from backup file will cause conflict" ... (and so on) at step 6 Processing (see attached screenshot for instance).

      Expected behaviour
      -----------------------------------------------------------
      Iterated restores from the same anonymised course backup are expected to work. Further improvements to the way user data are anonymised would of course still be nice to have

        Issue Links

          Activity

          Hide
          Michael de Raadt added a comment -

          Thanks for reporting that.

          This exemplifies the problem I suspect Eloy was anticipating in MDL-22158 and that might provide more impetus to resolve the issue.

          Feel free to help us work on this issue. If you are able to provide a patch, please add a patch label so we will spot it.

          Show
          Michael de Raadt added a comment - Thanks for reporting that. This exemplifies the problem I suspect Eloy was anticipating in MDL-22158 and that might provide more impetus to resolve the issue. Feel free to help us work on this issue. If you are able to provide a patch, please add a patch label so we will spot it.
          Hide
          Davo Smith added a comment -

          I've just hit the same issue on the site for a client, so I'll take a look and see if I can come up with a sensible solution to this.

          Show
          Davo Smith added a comment - I've just hit the same issue on the site for a client, so I'll take a look and see if I can come up with a sensible solution to this.
          Hide
          Davo Smith added a comment -

          I've added a bit of code into 'backup_anonymizer_helper' to identify anonymous users based on their username ('anon'+number).

          The restore process then matches anonymous users based on just their (anonymised) username (rather than matching their userid as well) - on the assumption that anonymous users should be unique within a course (or at least an individual activity), but do not need to be uniquely identified across the site (in fact I'd argue that they should NOT be uniquely identifiable across a site, as could make it easy to figure out who someone really was).

          Show
          Davo Smith added a comment - I've added a bit of code into 'backup_anonymizer_helper' to identify anonymous users based on their username ('anon'+number). The restore process then matches anonymous users based on just their (anonymised) username (rather than matching their userid as well) - on the assumption that anonymous users should be unique within a course (or at least an individual activity), but do not need to be uniquely identified across the site (in fact I'd argue that they should NOT be uniquely identifiable across a site, as could make it easy to figure out who someone really was).
          Hide
          Martin Schwinzerl added a comment -

          @Davo Smith
          Many thanks for looking into this issue, we were unfortunately not able to focus on this problem so far.

          Obviously, we can't speak for other users of this functionality, but depending on the activity and the purpose of the anonymous restore, we can identify at least two different strategies that make sense for us (under the premise that we want to keep the number of automatically generated local users minimal):

          • The content of an activity is of interest, keeping the contributors separate is not vital (Examples: Wiki, Glossary, FAQ Lists, etc.)
            => Restore all contributions from the activity using one (ideally: sytstem-wide defined) anonymous user account
          • The creation history shall be preserved (E.g. forum discussion, submitted assignments, etc.)
            => For N - participating users, restore the contributions using the first N system-wide assigned users

          Ideally (again, at least for our use cases), if several different strategies to anonmyise content are available, they should be configurable on the level of activities.

          If I understand your approach correctly, we would get a large number of (additional) anonymous users at our site (potentially 40k "real" users and already apx. 1500 courses anually). As mentioned earlier, we work around this increasing number of user accounts by deleting them from time to time (which is necessary anyway, courtesy of this bug), leaving the question open how the (for us: relevant) user contributions (forum entries, wiki edits, glossary entities) will behave when the corresponding user goes AWOL.

          Any input on this matter would be very much appreaciated!

          Best regards
          Martin

          Show
          Martin Schwinzerl added a comment - @Davo Smith Many thanks for looking into this issue, we were unfortunately not able to focus on this problem so far. Obviously, we can't speak for other users of this functionality, but depending on the activity and the purpose of the anonymous restore, we can identify at least two different strategies that make sense for us (under the premise that we want to keep the number of automatically generated local users minimal): The content of an activity is of interest, keeping the contributors separate is not vital (Examples: Wiki, Glossary, FAQ Lists, etc.) => Restore all contributions from the activity using one (ideally: sytstem-wide defined) anonymous user account The creation history shall be preserved (E.g. forum discussion, submitted assignments, etc.) => For N - participating users, restore the contributions using the first N system-wide assigned users Ideally (again, at least for our use cases), if several different strategies to anonmyise content are available, they should be configurable on the level of activities. If I understand your approach correctly, we would get a large number of (additional) anonymous users at our site (potentially 40k "real" users and already apx. 1500 courses anually). As mentioned earlier, we work around this increasing number of user accounts by deleting them from time to time (which is necessary anyway, courtesy of this bug), leaving the question open how the (for us: relevant) user contributions (forum entries, wiki edits, glossary entities) will behave when the corresponding user goes AWOL. Any input on this matter would be very much appreaciated! Best regards Martin
          Hide
          Davo Smith added a comment -

          @Martin - this version will create, the same number of anonymous users as the maximum number of backed-up users in a single course.

          To put it a bit more clearly, if course A has 5 users and course B has 3 users, then after an anonymous backup and restore of both courses, there will be 5 anonymous users in total. Further backup & restore operations will only increase this number if you backup a course with a larger number of users (e.g. a course with 10 users, will increase the anonymous total to exactly 10 users). Anonymous users are numbered from 1 to 'number of users in the course' and existing anonymous users are reused on restore.

          Show
          Davo Smith added a comment - @Martin - this version will create, the same number of anonymous users as the maximum number of backed-up users in a single course. To put it a bit more clearly, if course A has 5 users and course B has 3 users, then after an anonymous backup and restore of both courses, there will be 5 anonymous users in total. Further backup & restore operations will only increase this number if you backup a course with a larger number of users (e.g. a course with 10 users, will increase the anonymous total to exactly 10 users). Anonymous users are numbered from 1 to 'number of users in the course' and existing anonymous users are reused on restore.
          Hide
          Martin Schwinzerl added a comment -

          @Davo
          Thank you for the clarification, that sounds quite managable (I was a bit nervous about the prospect of having 40K anonymous users in the worst case scenario for our site - not good).

          Show
          Martin Schwinzerl added a comment - @Davo Thank you for the clarification, that sounds quite managable (I was a bit nervous about the prospect of having 40K anonymous users in the worst case scenario for our site - not good).
          Hide
          Jason Fowler added a comment -

          Looks good. The empty line at 53 of backup/util/helper/backup_anonymizer_helper.class.php is unnecessary though.

          Show
          Jason Fowler added a comment - Looks good. The empty line at 53 of backup/util/helper/backup_anonymizer_helper.class.php is unnecessary though.
          Hide
          Davo Smith added a comment -

          I've removed the blank line.

          Could this be put forward for integration?

          Show
          Davo Smith added a comment - I've removed the blank line. Could this be put forward for integration?
          Hide
          Jason Fowler added a comment -

          Sure thing, pushing for integration now... just add some testing instructions to go with it ASAP please

          Show
          Jason Fowler added a comment - Sure thing, pushing for integration now... just add some testing instructions to go with it ASAP please
          Hide
          Davo Smith added a comment -

          Added testing instructions.

          Show
          Davo Smith added a comment - Added testing instructions.
          Hide
          Dan Poltawski added a comment -

          The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week.

          TIA and ciao

          Show
          Dan Poltawski added a comment - The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week. TIA and ciao
          Hide
          Dan Poltawski added a comment -

          grr, misclick. sorry

          Show
          Dan Poltawski added a comment - grr, misclick. sorry
          Hide
          Dan Poltawski added a comment -

          Hi, i'm reopening this because I don't think that this check for an anonymous user is safe. At the least the regular expression should be bounded by start/end characters as that regexp will currently match (the very possible to occur) username: shanon1

          However I am not sure if matching simply on this username is wise either so I think that needs more discussion. I'm not too familar with this area, but looking at backup_anonymizer_helper how about we match on all 4 of username + firstname + lastname + email to be absolutely sure.

          Show
          Dan Poltawski added a comment - Hi, i'm reopening this because I don't think that this check for an anonymous user is safe. At the least the regular expression should be bounded by start/end characters as that regexp will currently match (the very possible to occur) username: shanon1 However I am not sure if matching simply on this username is wise either so I think that needs more discussion. I'm not too familar with this area, but looking at backup_anonymizer_helper how about we match on all 4 of username + firstname + lastname + email to be absolutely sure.
          Hide
          Davo Smith added a comment -

          Making sure the regex is bounded by start/end characters is definitely a good idea - I'll add that.

          In terms of checking other fields, in addition to the username, I had considered this, but the restore process will fail if there is a user with username 'anon1' who is not identified as an 'anonymous' user (as the restore process will try to create such a user and cause the clash in the original issue report). Maybe the anonymous processes could be extended so that if the user in the backup is identified as anonymous, but there is a non-anonymous user with that username already on the site, then a new anonymous user should be created instead (but that will require a careful look at the restore workflow, as, currently, all the anonymisation takes place during backup, not during restore).

          Show
          Davo Smith added a comment - Making sure the regex is bounded by start/end characters is definitely a good idea - I'll add that. In terms of checking other fields, in addition to the username, I had considered this, but the restore process will fail if there is a user with username 'anon1' who is not identified as an 'anonymous' user (as the restore process will try to create such a user and cause the clash in the original issue report). Maybe the anonymous processes could be extended so that if the user in the backup is identified as anonymous, but there is a non-anonymous user with that username already on the site, then a new anonymous user should be created instead (but that will require a careful look at the restore workflow, as, currently, all the anonymisation takes place during backup, not during restore).
          Hide
          Davo Smith added a comment -

          I've updated my branch to check more carefully that this really is an anonymous user.

          If the username matches an anonymous user, but one (or more) of the other fields don't match, then a warning is output (is there a better way of giving the warning, instead of the 'echo'?), as there is a very good chance that this will cause a restore problem at some point (e.g. if the user is called 'anon5', but the other fields indicate they are not an anonymous user, then this will attempt to create a user 'anon5', which will clash with any pre-existing anonymous user called 'anon5').

          Show
          Davo Smith added a comment - I've updated my branch to check more carefully that this really is an anonymous user. If the username matches an anonymous user, but one (or more) of the other fields don't match, then a warning is output (is there a better way of giving the warning, instead of the 'echo'?), as there is a very good chance that this will cause a restore problem at some point (e.g. if the user is called 'anon5', but the other fields indicate they are not an anonymous user, then this will attempt to create a user 'anon5', which will clash with any pre-existing anonymous user called 'anon5').
          Hide
          Jason Fowler added a comment -

          Code looks good to me, but I tend to agree about the output being echo'd out, is there a better way to do that? Maybe an error output function or something?

          Show
          Jason Fowler added a comment - Code looks good to me, but I tend to agree about the output being echo'd out, is there a better way to do that? Maybe an error output function or something?
          Hide
          Damyon Wiese added a comment - - edited

          I have just been looking at this patch and it does at least allow the course restore to proceed - and the anonymous user information is restored. But this is not quite useful because the anonymous users are not enrolled in the course (I don't have an answer for this - but it is still not a very usable feature).

          E.g.

          Create a choice activity in a course.
          Login as a student and make a choice.
          Login as a teacher and see there is 1 response for the activity.
          Backup the choice activity with "Anonymise user information"
          Restore the backup into the same (or different) course.
          View the newly restored activity - it will display 0 responses.

          This is because the choice activity (and all the others) only display data for users currently enrolled (one reason is so that suspended students do not clutter the display, another is so that when courses are re-used they do not display last years data)

          Enrolling the anonymous users in the course with a student role does allow you to see the anonymous contributions - but that is not a solution as they will show up everywhere (gradebook, etc). Some sort of psuedo enrolment would be nice.

          Fixing this anonymous user enrolment issue is probably beyond the scope of this restore issue (and it would be nice to at least see this patch integrated).

          Regards, Damyon

          Show
          Damyon Wiese added a comment - - edited I have just been looking at this patch and it does at least allow the course restore to proceed - and the anonymous user information is restored. But this is not quite useful because the anonymous users are not enrolled in the course (I don't have an answer for this - but it is still not a very usable feature). E.g. Create a choice activity in a course. Login as a student and make a choice. Login as a teacher and see there is 1 response for the activity. Backup the choice activity with "Anonymise user information" Restore the backup into the same (or different) course. View the newly restored activity - it will display 0 responses. This is because the choice activity (and all the others) only display data for users currently enrolled (one reason is so that suspended students do not clutter the display, another is so that when courses are re-used they do not display last years data) Enrolling the anonymous users in the course with a student role does allow you to see the anonymous contributions - but that is not a solution as they will show up everywhere (gradebook, etc). Some sort of psuedo enrolment would be nice. Fixing this anonymous user enrolment issue is probably beyond the scope of this restore issue (and it would be nice to at least see this patch integrated). Regards, Damyon
          Hide
          Eloy Lafuente (stronk7) added a comment -

          After some discussion @ developers chat, it seems that current patch behavior is correct (other alternatives would require creating some hashes to be able to match those anonymous users better or harder checks).

          So, once these become amended, I think the patch can find its way to core:

          • add the missing phpdocs @ is_anonymous_user().
          • document the new matching case in inline docs @ precheck_user() (with link to this issue).
          • take rid of the RESTORE_SILENTLY + output thing (not supported since 2.0).

          About the enrollments thingy, if it continues not enrolling those users I'd create a new issue for handling it separately.

          Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - After some discussion @ developers chat, it seems that current patch behavior is correct (other alternatives would require creating some hashes to be able to match those anonymous users better or harder checks). So, once these become amended, I think the patch can find its way to core: add the missing phpdocs @ is_anonymous_user(). document the new matching case in inline docs @ precheck_user() (with link to this issue). take rid of the RESTORE_SILENTLY + output thing (not supported since 2.0). About the enrollments thingy, if it continues not enrolling those users I'd create a new issue for handling it separately. Ciao
          Hide
          Davo Smith added a comment -

          I've updated my code with the PHPDocs changes requested and rebased it off the current Moodle master branch.

          Show
          Davo Smith added a comment - I've updated my code with the PHPDocs changes requested and rebased it off the current Moodle master branch.
          Hide
          Dan Poltawski added a comment -

          Holding this as its master only change.

          Show
          Dan Poltawski added a comment - Holding this as its master only change.
          Hide
          Davo Smith added a comment -

          Dan - why is this a 'master only' change?

          If this was just a nice extra feature to have, I'd agree with you, but this is a fix to a standard feature of Moodle (since 2.1) that breaks if you try to use it more than once. Surely that qualifies to be fixed on all supported branches?

          Show
          Davo Smith added a comment - Dan - why is this a 'master only' change? If this was just a nice extra feature to have, I'd agree with you, but this is a fix to a standard feature of Moodle (since 2.1) that breaks if you try to use it more than once. Surely that qualifies to be fixed on all supported branches?
          Hide
          Dan Poltawski added a comment -

          Hi Davo,

          (Firstly, feel free to ping about this on the dev chat to get attention).

          I did that without the detailed analysis of the issue. The fix was put on hold because there was only a branch for master, no mention of cherry-picking to the stable branches and no fix versions for stable branches.

          I take it that you'd like this cherry-picking to the stables?

          Show
          Dan Poltawski added a comment - Hi Davo, (Firstly, feel free to ping about this on the dev chat to get attention). I did that without the detailed analysis of the issue. The fix was put on hold because there was only a branch for master, no mention of cherry-picking to the stable branches and no fix versions for stable branches. I take it that you'd like this cherry-picking to the stables?
          Hide
          Dan Poltawski added a comment -

          Davo was probably busy yesterday so i'm going to integrate this to the stable branches since seems sensible.

          Show
          Dan Poltawski added a comment - Davo was probably busy yesterday so i'm going to integrate this to the stable branches since seems sensible.
          Hide
          Dan Poltawski added a comment -

          Integrated to master, 24 and 23, thanks Davo.

          Show
          Dan Poltawski added a comment - Integrated to master, 24 and 23, thanks Davo.
          Hide
          Ankit Agarwal added a comment -

          Tested on 2.5 pgsql
          2.4 mysql, 2.3 mysql
          Works as described.
          Passing thanks

          Show
          Ankit Agarwal added a comment - Tested on 2.5 pgsql 2.4 mysql, 2.3 mysql Works as described. Passing thanks
          Hide
          Davo Smith added a comment -

          Thanks Dan - yes, I didn't really have a spare moment to look at this yesterday.

          Show
          Davo Smith added a comment - Thanks Dan - yes, I didn't really have a spare moment to look at this yesterday.
          Hide
          Dan Poltawski added a comment -

          Hurray! We did it! Thanks to all the reporters, testers, user and watchers for a bumper week of Moodling!

          Show
          Dan Poltawski added a comment - Hurray! We did it! Thanks to all the reporters, testers, user and watchers for a bumper week of Moodling!

            People

            • Votes:
              3 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: