Moodle
  1. Moodle
  2. MDL-36380

Duplicate month with zero is displayed in the report stats

    Details

    • Testing Instructions:
      Hide

      Test pre-requisites

      • Use an instance with more than 12 months worth of stats (or generate some, or use the attached script)
      • Enable the statistics
      • Run the cron to generate the statistics (or use attached script)
      • Set timezone to Server time

      /!\ Please note that the attached script gen_stats.php will wipe your logs and statistics. Real data is much more valuable, also read through the code before executing.

      Test steps

      1. Go to Home ► Site pages ► Reports ► Statistics ► Reports ► Statistics
        • View the last 2 weeks
        • View the last 3 months
        • View the last 7 months
        • View the last 2 years
        • Make sure that there is no duplicate entry in either the graph or the table
      2. Repeat the test 3 times with a different timezone
      Show
      Test pre-requisites Use an instance with more than 12 months worth of stats (or generate some, or use the attached script) Enable the statistics Run the cron to generate the statistics (or use attached script) Set timezone to Server time /!\ Please note that the attached script gen_stats.php will wipe your logs and statistics. Real data is much more valuable, also read through the code before executing. Test steps Go to Home ► Site pages ► Reports ► Statistics ► Reports ► Statistics View the last 2 weeks View the last 3 months View the last 7 months View the last 2 years Make sure that there is no duplicate entry in either the graph or the table Repeat the test 3 times with a different timezone
    • Affected Branches:
      MOODLE_22_STABLE, MOODLE_23_STABLE, MOODLE_24_STABLE, MOODLE_25_STABLE
    • Fixed Branches:
      MOODLE_23_STABLE, MOODLE_24_STABLE
    • Pull from Repository:
    • Pull 2.4 Branch:
    • Pull Master Branch:
      MDL-36380-master
    • Rank:
      45191

      Description

      When viewing the report statistics for greater then 7 months (ie when only the first of each month is shown) almost all months are duplicated with statistics of zero.

      Images attached show the stats and the graph pre and post the patch being applied.

      I have traced the issue and it seems to be coming from the function: stats_fix_zeros()
      It occurs because the times in the parameter $stat are normalised however the newly generated times are not normalised. By normalising the new times the same way everything appears properly.

      My commit can be found here: https://github.com/mdikih/moodle/commit/d5fd7d8eb4770524e20f782f52b4ed146aa3b217

      This issue has been reported before in 2008 affecting 1.9. Since this fix is for 2.2 I have created a new ticket.
      http://tracker.moodle.org/browse/MDL-17813
      http://tracker.moodle.org/browse/MDL-19838

      1. generate_stats.php
        5 kB
        Frédéric Massart
      1. postPatch_graph.png
        17 kB
      2. postPatch_stats.png
        36 kB
      3. prePatch_graph.png
        18 kB
      4. prePatch_stats.png
        45 kB

        Issue Links

          Activity

          Hide
          Michael de Raadt added a comment -

          Thanks for reporting this and adding a patch. It would have been good if you could have added this to one of the pre-existing issues, but I will tidy that up now.

          Show
          Michael de Raadt added a comment - Thanks for reporting this and adding a patch. It would have been good if you could have added this to one of the pre-existing issues, but I will tidy that up now.
          Hide
          Frédéric Massart added a comment - - edited

          Thanks for providing a patch and reporting this issue. Unfortunately it appears that the problem is far more complicated than it looks at first glace.

          This issue would only happen when the setting timezone is set to anything but 'Server timezone'. The problem then occurs when guessing the beginning/start of the (next) day/week/months. In fact, we are converting the timestamp accordingly to the timezone used, and then passing it to functions like stats_get_next_month_start() which converts the timestamp again, causing it to be eventually wrong.

          This is not an easy fix and would require a lot of Unit Tests to be written.

          Show
          Frédéric Massart added a comment - - edited Thanks for providing a patch and reporting this issue. Unfortunately it appears that the problem is far more complicated than it looks at first glace. This issue would only happen when the setting timezone is set to anything but 'Server timezone'. The problem then occurs when guessing the beginning/start of the (next) day/week/months. In fact, we are converting the timestamp accordingly to the timezone used, and then passing it to functions like stats_get_next_month_start() which converts the timestamp again, causing it to be eventually wrong. This is not an easy fix and would require a lot of Unit Tests to be written.
          Hide
          Frédéric Massart added a comment -

          It took me some time to figure out what the problem was. In fact, we are converting times over and over again, which is probably not a good idea. It's still very obscure to me what stats_get_base_daily() does, but anyway... So as when generating the statistics we store an already normalised date (see lib/statslib.php:250), there is no need to convert it. Then, to get the next date/week/month, we can't use stats_get_next_*_start() as it converts (twice) to the server timezone. Simply using strtotime() seems enough.

          Show
          Frédéric Massart added a comment - It took me some time to figure out what the problem was. In fact, we are converting times over and over again, which is probably not a good idea. It's still very obscure to me what stats_get_base_daily() does, but anyway... So as when generating the statistics we store an already normalised date (see lib/statslib.php:250), there is no need to convert it. Then, to get the next date/week/month, we can't use stats_get_next_*_start() as it converts (twice) to the server timezone. Simply using strtotime() seems enough.
          Hide
          Rajesh Taneja added a comment -

          Thanks for fixing this Fred,

          There are few things you might want to consider:

          1. Re-assigning stats array is probably not required. https://github.com/FMCorz/moodle/compare/moodle:master...MDL-36380-master?w=1#L0R1472
          2. As timeend is not going though stats_get_base_*, so $CFG->timezone is not considered and hence dst won't be calculated.
          3. Replacing stats_get_next_day_start with strtotime doesn't seems right. Sorry it's hard to understand what function is doing, but by the name it seems it should return start of next day/week/month. But with strtotime it will add 1 day/week/month to time. Not sure, hence adding Petr.
          4. It will be nice to add unittest in lib/tests/statslib_test.php
          Show
          Rajesh Taneja added a comment - Thanks for fixing this Fred, There are few things you might want to consider: Re-assigning stats array is probably not required. https://github.com/FMCorz/moodle/compare/moodle:master...MDL-36380-master?w=1#L0R1472 As timeend is not going though stats_get_base_*, so $CFG->timezone is not considered and hence dst won't be calculated. Replacing stats_get_next_day_start with strtotime doesn't seems right. Sorry it's hard to understand what function is doing, but by the name it seems it should return start of next day/week/month. But with strtotime it will add 1 day/week/month to time. Not sure, hence adding Petr. It will be nice to add unittest in lib/tests/statslib_test.php
          Hide
          Frédéric Massart added a comment -

          Hi Raj, people, and timestamp enthousiasts,

          Let me describe the process of logging, generating stats, and displaying them using the timestamps. But first, a brief about timestamp. As known by everyone, a timestamp is the amount of seconds elapsed since the 1st of Jan 1970. The problem is that a timestamp does not have a timezone, hence timestamp of 0 at UTC+0 is different than 0 at UTC+8, but appears to be similar. And when it come to using timezones, we have a couple of different ones within Moodle. The server sits at a specific timezone, Moodle itself can be set up on another one and then the user can have a timezone. To solve the problem between those, we normalise those timestamps to a specific timezone (or UTC+1) by removing/adding the difference (in hours) between the current timestamp and the timezone we decide to work on. As a reminder, the function time() returns a timestamp based on the server timezone. Moodle can also handle the Daylight Saving Time, but only in some specific functions, and only if the timezone is based on a location, rather than on a time different (Europe/Brussels against UTC+1).

          Back to the statistics.

          • My server is at UTC+8 (server time)
          • My Moodle is set up to use UTC+10 (Moodle time)
          1. When calling add_to_log(), we store a timestamp (from time()), so the server time, in the log table;
          2. While preparing the stats, we copy the entries from mdl_logs into a simplified temporary table;
          3. We then extract those information to place them in mdl_stats*. During the extract, based on the time the action has happened (the time in the log table), we generate the timeend value, which represents the end of the day (or midnight), this value is used to generating the statistics per day. At this stage the timeend value will be normalised to Moodle time. The current implementation is not accurate when Moodle timezone is not the server one:
            • Considers the time from mdl_logs a UTC+0 timestamp
            • Normalise twice to the (stats_get_next_day_start())
            • Wrong calculation of the beginning of the day based on the above (stats_get_base_daily())
          4. Before displaying the statistics over a period, we check that each day/month/week in stats_fix_zeros() has an entry, if not, we create a dummy entry by generating a timestamp normalised.

          The problem when generating a dummy entry is that it's quite hard to generate an accurate timestamp based on the information we have at that time, especially using the current API which converts too many times, and is based on an already converted timestamp (step 3).

          My solution considers that the timeend we got are already normalised (which they are), and simply adds up a day/week/month to it to get the next one, at the exact same time in the day. But as soon as 2 timestamps differ a bit, a new entry will be added to the graphs which is the problem in this issue.

          You mentioned DST, and I personally don't think we should worry about it too much, nonetheless it is taken care of while normalising the timeend for the first time.

          That said, I just realised that my solution probably does not work when adding a month to the '31st of Jan' for example, as it'll return '2nd of March'. I will investigate further.

          Note for myself: Do not work on timestamps early morning.

          Show
          Frédéric Massart added a comment - Hi Raj, people, and timestamp enthousiasts, Let me describe the process of logging, generating stats, and displaying them using the timestamps. But first, a brief about timestamp. As known by everyone, a timestamp is the amount of seconds elapsed since the 1st of Jan 1970. The problem is that a timestamp does not have a timezone, hence timestamp of 0 at UTC+0 is different than 0 at UTC+8, but appears to be similar. And when it come to using timezones, we have a couple of different ones within Moodle. The server sits at a specific timezone, Moodle itself can be set up on another one and then the user can have a timezone. To solve the problem between those, we normalise those timestamps to a specific timezone (or UTC+1) by removing/adding the difference (in hours) between the current timestamp and the timezone we decide to work on. As a reminder, the function time() returns a timestamp based on the server timezone. Moodle can also handle the Daylight Saving Time, but only in some specific functions, and only if the timezone is based on a location, rather than on a time different (Europe/Brussels against UTC+1). Back to the statistics. My server is at UTC+8 (server time) My Moodle is set up to use UTC+10 (Moodle time) When calling add_to_log() , we store a timestamp (from time()), so the server time, in the log table; While preparing the stats, we copy the entries from mdl_logs into a simplified temporary table; We then extract those information to place them in mdl_stats*. During the extract, based on the time the action has happened (the time in the log table), we generate the timeend value, which represents the end of the day (or midnight), this value is used to generating the statistics per day. At this stage the timeend value will be normalised to Moodle time. The current implementation is not accurate when Moodle timezone is not the server one: Considers the time from mdl_logs a UTC+0 timestamp Normalise twice to the (stats_get_next_day_start()) Wrong calculation of the beginning of the day based on the above (stats_get_base_daily()) Before displaying the statistics over a period, we check that each day/month/week in stats_fix_zeros() has an entry, if not, we create a dummy entry by generating a timestamp normalised. The problem when generating a dummy entry is that it's quite hard to generate an accurate timestamp based on the information we have at that time, especially using the current API which converts too many times, and is based on an already converted timestamp (step 3). My solution considers that the timeend we got are already normalised (which they are), and simply adds up a day/week/month to it to get the next one, at the exact same time in the day. But as soon as 2 timestamps differ a bit, a new entry will be added to the graphs which is the problem in this issue. You mentioned DST, and I personally don't think we should worry about it too much, nonetheless it is taken care of while normalising the timeend for the first time. That said, I just realised that my solution probably does not work when adding a month to the '31st of Jan' for example, as it'll return '2nd of March'. I will investigate further. Note for myself: Do not work on timestamps early morning.
          Hide
          Frédéric Massart added a comment - - edited

          I have updated my patch, and it seem to be working fine. It's a dirty hack though, but I can't think of anything else to properly fix this issue so far. The problem was that depending on the server timezone and Moodle timezone timeend is set either at the end of the month or at the beginning. When at the beginning, it can sometimes be 2 or more, which we do not want. So I enforce to go back to the first day of the month except if close the last days. Sending back for peer review.

          Show
          Frédéric Massart added a comment - - edited I have updated my patch, and it seem to be working fine. It's a dirty hack though, but I can't think of anything else to properly fix this issue so far. The problem was that depending on the server timezone and Moodle timezone timeend is set either at the end of the month or at the beginning. When at the beginning, it can sometimes be 2 or more, which we do not want. So I enforce to go back to the first day of the month except if close the last days. Sending back for peer review.
          Hide
          Rajesh Taneja added a comment -

          Thanks for the patch Fred,

          This is tricky and it seems hack works

          There are few things you might want to consider:

          1. As per my understanding, day and week works fine (Don't know how, but I can't see any zero), so personally I will not replace hack with a hack.
          2. If timezone is +ve month view show 1st day of month, when it is set to -ve it shows last day of month (not sure if this should be consistent).
          3. For month, it seems you can use strtotime('first day of next month', $timeafter); which will give you first day of next month without so much code.
          Show
          Rajesh Taneja added a comment - Thanks for the patch Fred, This is tricky and it seems hack works There are few things you might want to consider: As per my understanding, day and week works fine (Don't know how, but I can't see any zero), so personally I will not replace hack with a hack. If timezone is +ve month view show 1st day of month, when it is set to -ve it shows last day of month (not sure if this should be consistent). For month, it seems you can use strtotime('first day of next month', $timeafter); which will give you first day of next month without so much code.
          Hide
          Frédéric Massart added a comment -

          Many thanks for the reviews Raj!

          • You are right, I shouldn't change a line that works, I reverted the changes on day and week.
          • I am not sure if it should be consistent either, but this means rewriting a lot of the stats processing, leaving it like that for now.
          • Yes, but I really need to keep the hour/minute/second as it is, and I'd rather do the calculation myself than relying on an obscure PHP function which could not even return the same values on different versions. (Not sure when "First day of next month" was introduced).

          I am pushing for integration.
          Cheers!
          Fred

          Show
          Frédéric Massart added a comment - Many thanks for the reviews Raj! You are right, I shouldn't change a line that works, I reverted the changes on day and week. I am not sure if it should be consistent either, but this means rewriting a lot of the stats processing, leaving it like that for now. Yes, but I really need to keep the hour/minute/second as it is, and I'd rather do the calculation myself than relying on an obscure PHP function which could not even return the same values on different versions. (Not sure when "First day of next month" was introduced). I am pushing for integration. Cheers! Fred
          Hide
          Dan Poltawski added a comment -

          Oki, dirty hack wins out for the day. I expect a superbly clean fix to another issue to make up for this

          Integrated to 24 23 and master.

          Show
          Dan Poltawski added a comment - Oki, dirty hack wins out for the day. I expect a superbly clean fix to another issue to make up for this Integrated to 24 23 and master.
          Hide
          Adam Olley added a comment -

          This still doesn't work. It doesn't take into account daylight savings time.
          Contents of actualtimes:
          2012-07-01 09:30:00
          2012-05-01 09:30:00
          2012-02-01 10:30:00
          2012-02-01 10:30:00

          Resulting timeend's of functions returned array (told to show a years worth):
          2012-12-01 10:30:00
          2012-11-01 10:30:00
          2012-10-01 10:30:00
          2012-09-01 10:30:00
          2012-08-01 10:30:00
          2012-07-01 10:30:00
          2012-07-01 09:30:00
          2012-06-01 10:30:00
          2012-05-01 10:30:00
          2012-05-01 09:30:00
          2012-04-01 10:30:00
          2012-03-01 10:30:00
          2012-02-01 10:30:00
          2012-02-01 10:30:00

          You can see that for May, the function returns 09:30 and 10:30. The difference here of course being DST.

          Show
          Adam Olley added a comment - This still doesn't work. It doesn't take into account daylight savings time. Contents of actualtimes: 2012-07-01 09:30:00 2012-05-01 09:30:00 2012-02-01 10:30:00 2012-02-01 10:30:00 Resulting timeend's of functions returned array (told to show a years worth): 2012-12-01 10:30:00 2012-11-01 10:30:00 2012-10-01 10:30:00 2012-09-01 10:30:00 2012-08-01 10:30:00 2012-07-01 10:30:00 2012-07-01 09:30:00 2012-06-01 10:30:00 2012-05-01 10:30:00 2012-05-01 09:30:00 2012-04-01 10:30:00 2012-03-01 10:30:00 2012-02-01 10:30:00 2012-02-01 10:30:00 You can see that for May, the function returns 09:30 and 10:30. The difference here of course being DST.
          Hide
          Frédéric Massart added a comment -

          Arg. Thanks for reporting this Adam. Although, I am not sure if we should handle this scenario in fix_zeros. It seems like the problem comes from the related function get_monthly_base/start or something, which are used when generating the statistics. Meaning that the stored value is wrong and should be fixed, which is a cumbersome task.

          The problem in fixing the existing data is that we cannot force every Moodle to regenerate their stats (very heavy processing, logs might have been trimmed, ...) and we cannot safely fix the timestamps to handle the DST. Also, only fixing the way the data is saved now will not fix the problem as previous data is still wrong. Perhaps we could refactor the whole thing by using date fields instead of timestamps, but that's not an easy task...

          Show
          Frédéric Massart added a comment - Arg. Thanks for reporting this Adam. Although, I am not sure if we should handle this scenario in fix_zeros. It seems like the problem comes from the related function get_monthly_base/start or something, which are used when generating the statistics. Meaning that the stored value is wrong and should be fixed, which is a cumbersome task. The problem in fixing the existing data is that we cannot force every Moodle to regenerate their stats (very heavy processing, logs might have been trimmed, ...) and we cannot safely fix the timestamps to handle the DST. Also, only fixing the way the data is saved now will not fix the problem as previous data is still wrong. Perhaps we could refactor the whole thing by using date fields instead of timestamps, but that's not an easy task...
          Hide
          Adam Olley added a comment -

          Here's my suggested fix that applies atop the current patches here:
          https://github.com/aolley/moodle/commit/046d41991212f3b7e301e78c5455888eaeb39cf0

          Show
          Adam Olley added a comment - Here's my suggested fix that applies atop the current patches here: https://github.com/aolley/moodle/commit/046d41991212f3b7e301e78c5455888eaeb39cf0
          Hide
          Adam Olley added a comment -

          "Although, I am not sure if we should handle this scenario in fix_zeros."
          Well its fix zeros that is looking at the list of stuff it has and going "nope its not there, lets add another". The mktime function is adding datetimes with the hour based off of the previous $timeafter, without accounting for DST boundaries. That's where the issue is introduced.

          Show
          Adam Olley added a comment - "Although, I am not sure if we should handle this scenario in fix_zeros." Well its fix zeros that is looking at the list of stuff it has and going "nope its not there, lets add another". The mktime function is adding datetimes with the hour based off of the previous $timeafter, without accounting for DST boundaries. That's where the issue is introduced.
          Hide
          Frédéric Massart added a comment -

          Thanks the proposed patch. Do you think we count have a data set where the non-DST time is 0:00 and the DST'd one 11pm the day before? If so, would the patch handle that case scenario?

          Show
          Frédéric Massart added a comment - Thanks the proposed patch. Do you think we count have a data set where the non-DST time is 0:00 and the DST'd one 11pm the day before? If so, would the patch handle that case scenario?
          Hide
          Adam Olley added a comment - - edited

          It's certainly possible. Perhaps, for monthly, it needs to be even more generic than the dates matching and compare the months instead. After all its listing rows of months.

          So in my commit, instead of actualdates, we'd have actualmonths, built with a list of date('Y-m', $time);. If we then compared the padding loop with:
          if (!in_array(date('Y-m', $time), $actualmonths)...

          We should eliminate all dupe month rows (those introduced by this function anyway). The above if would also need to do the month comparison ONLY for $timestr=='monthly' of course and otherwise use the old comparison.

          Show
          Adam Olley added a comment - - edited It's certainly possible. Perhaps, for monthly, it needs to be even more generic than the dates matching and compare the months instead. After all its listing rows of months. So in my commit, instead of actualdates, we'd have actualmonths, built with a list of date('Y-m', $time);. If we then compared the padding loop with: if (!in_array(date('Y-m', $time), $actualmonths)... We should eliminate all dupe month rows (those introduced by this function anyway). The above if would also need to do the month comparison ONLY for $timestr=='monthly' of course and otherwise use the old comparison.
          Hide
          Michael de Raadt added a comment -

          Test result: Success!

          Tested in Master, 2.4 and 2.3.

          Show
          Michael de Raadt added a comment - Test result: Success! Tested in Master, 2.4 and 2.3.
          Hide
          Adam Olley added a comment -

          MdR: I'm guessing you haven't read the past few comments...

          Show
          Adam Olley added a comment - MdR: I'm guessing you haven't read the past few comments...
          Hide
          Michael de Raadt added a comment -

          Nope. I didn't go that deep.

          Show
          Michael de Raadt added a comment - Nope. I didn't go that deep.
          Hide
          Dan Poltawski added a comment -

          Reset to failed. We need a resolution or we'll need to revert this.

          Show
          Dan Poltawski added a comment - Reset to failed. We need a resolution or we'll need to revert this.
          Hide
          Dan Poltawski added a comment -

          OK, spoken with fred and we've agreed to revert this. Reverted and reopening.

          Show
          Dan Poltawski added a comment - OK, spoken with fred and we've agreed to revert this. Reverted and reopening.
          Hide
          CiBoT added a comment -

          Moving this reopened issue out from current integration. Please, re-submit it for integration once ready.

          Show
          CiBoT added a comment - Moving this reopened issue out from current integration. Please, re-submit it for integration once ready.
          Hide
          Frédéric Massart added a comment -

          Hi Adam,

          I've pushed another commit to my patch, it'd be nice if you could test (and/or review) it. Instead of checking the month as you proposed to do, I forced the hour/minutes and seconds of the timestamps.

          Cheers!
          Fred

          Show
          Frédéric Massart added a comment - Hi Adam, I've pushed another commit to my patch, it'd be nice if you could test (and/or review) it. Instead of checking the month as you proposed to do, I forced the hour/minutes and seconds of the timestamps. Cheers! Fred
          Hide
          Adam Olley added a comment - - edited

          [Y] Syntax
          [Y] Output
          [Y] Whitespace
          [-] Language
          [-] Databases
          [Y] Testing
          [-] Security
          [Y] Documentation
          [Y] Git
          [Y] Sanity check

          This now passes on the datasets I was using to reproduce the issue.

          I don't have the ability to move this onto awaiting testing or awaiting integration review. So someone else will have to do that.

          Show
          Adam Olley added a comment - - edited [Y] Syntax [Y] Output [Y] Whitespace [-] Language [-] Databases [Y] Testing [-] Security [Y] Documentation [Y] Git [Y] Sanity check This now passes on the datasets I was using to reproduce the issue. I don't have the ability to move this onto awaiting testing or awaiting integration review. So someone else will have to do that.
          Hide
          Frédéric Massart added a comment -

          Awesome, thanks! I'm pushing for integration. Would you have any idea how to replicate the dataset having RTL issues? I am not sure how to nicely fake that.

          Show
          Frédéric Massart added a comment - Awesome, thanks! I'm pushing for integration. Would you have any idea how to replicate the dataset having RTL issues? I am not sure how to nicely fake that.
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Integrated (23, 24 & master), thanks!

          Show
          Eloy Lafuente (stronk7) added a comment - Integrated (23, 24 & master), thanks!
          Hide
          Michael de Raadt added a comment -

          Test result: Success (again )

          I tested as before. Based on Adam's results in relation to DST affected data, I'm happy to pass this.

          Show
          Michael de Raadt added a comment - Test result: Success (again ) I tested as before. Based on Adam's results in relation to DST affected data, I'm happy to pass this.
          Hide
          Eloy Lafuente (stronk7) added a comment -

          This is valid for unlimited entries to the, soon to be unveiled, Moodle Codebase Gardens. It includes free access to all facilities.

          Personal and non-transferable to all assignees, reviewers and testers in this issue. Valid until switching to Blackboard (100000€ penalization will be applied).

          Thanks, closing as fixed!

          Show
          Eloy Lafuente (stronk7) added a comment - This is valid for unlimited entries to the, soon to be unveiled, Moodle Codebase Gardens. It includes free access to all facilities. Personal and non-transferable to all assignees, reviewers and testers in this issue. Valid until switching to Blackboard (100000€ penalization will be applied). Thanks, closing as fixed!

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: