Issue Details (XML | Word | Printable)

Key: MDL-13192
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Blocker Blocker
Assignee: Petr Skoda
Reporter: Petr Skoda
Votes: 7
Watchers: 15
Operations

Add/Edit UI Mockup to this issue
If you were logged in you would be able to see more operations.
Moodle

stats cleanup - meta

Created: 28/Jan/08 12:07 AM   Updated: 26/Feb/08 06:57 AM
Return to search
Component/s: Administration
Affects Version/s: 1.9
Fix Version/s: 1.9

File Attachments: 1. Text File 0001-staslib-stats_daily_cron-small-SQL-improvements.patch (13 kB)
2. Text File stats15.patch (76 kB)
3. Text File stats29.patch (109 kB)
4. Text File stats33.patch (114 kB)

Issue Links:
Dependency
 
Duplicate
 

Participants: Dan Marsden, Dan Poltawski, Gary Anderson, Kenneth Newquist, Martin Dougiamas, Martín Langhoff, Petr Skoda and Samuli Karevaara
Security Level: None
Resolved date: 26/Feb/08
Affected Branches: MOODLE_19_STABLE
Fixed Branches: MOODLE_19_STABLE

Sub-Tasks  All   Open   
 Sub-Task Progress: 
No sub-tasks match this view.

 All   Comments   Change History   Version Control      Sort Order: Ascending order - Click to sort in descending order
Petr Skoda added a comment - 02/Feb/08 09:31 PM
sending patch for review/testing - please note we are running out of version numbers in main version.php, that is why I used floating point version number in this patch.

Petr Skoda added a comment - 02/Feb/08 09:33 PM
ccing more ppl

Petr Skoda added a comment - 03/Feb/08 12:20 AM
With these performance improvements we could add separate module statistics in 2.0, the performance should not be much worse if we added separate read/write queries for each module.

Martin Dougiamas added a comment - 03/Feb/08 10:16 AM
Starting to review it now.

I notice you've dropped some indexes. Have you checked these aren't affecting other parts of Moodle using logs (non-stats) ? They must have been there for a reason.


Petr Skoda added a comment - 03/Feb/08 06:21 PM
Those were probably used by recent activity which does not use log table anymore, OU came to similar conclusions and dropped some indexes on log table too.

Dan Poltawski added a comment - 03/Feb/08 09:18 PM
I have never actually seen stats working, so its difficult to compare to behaviour before.

But from a sysadmin point of view we had some (small!) sites which hogged cron overnight to process a single days stats. (I never really looked into what was going on and just disabled the processing im afraid).

I've just forced through reprocessing of one of the sites with the patch to process since last jan:

...completed 398 days of statistics.
Cron script completed correctly
Execution took 283.956878 seconds


Gary Anderson added a comment - 04/Feb/08 11:23 AM
This is kind of exciting. This is the first time that Statistics has worked for us also.

I wish I could give you a time report like Dan did, but the cron did it on it's own.

We have 550 students with an average of 5 Moodle classes each, many with daily assignments or other activities.


Martín Langhoff added a comment - 05/Feb/08 07:34 AM
Hi Petr!

Very good patch – and large!

Overall, I agree with the patch - it's excellent stuff. Parts of it are identical to mine but yours is further reaching. (Did you have a chance to review my take on the stats generation?) I do have some comments though.

  • The main "activity" stats handling (around line 470 in your patch, "how many view actions in each course+role") SQL has 2 things I would improve
    -> You are using an inner join against RA which means that perfectly valid user activity will be ignored – for example, this completely murders stats against sitecourse. Use an outer join, and a coalesce to have a role fallback.
    -> Rather than using 2 subselects, it is much faster to use a single subslelect with a CASE block (supported on all the DBs we care about)

Have a look at the SQL in this patch - it shows the 2 ideas mentioned above http://git.catalyst.net.nz/gitweb?p=moodle-r2.git;a=commitdiff;h=dbbe5093d967cc15d644eff7e15f019bfe68a599

  • In admin/cron you set a hard limit of 3 days for stats_cron_daily() – i don't think this will work. Some sites have a big backlog, and available grunt. With your patches, processing 3 days will be fast. This is a problem since the block is still controlled by $timetocheck, which will be once a day – perhaps you meant to remove the $timetocheck logic?

Petr Skoda added a comment - 05/Feb/08 08:18 AM - edited
1/ the RA join - oki
2/ never used CASE - have to study it, but I trust you
3/ the 3 day limit should be limit for one cron invocation - executed for example every 30 minutes after 1:00 AM, I though I tested it, but I am not sure now.

I did not look much at your code before, mostly because I had my patch sort of working before you submitted into the tracker. The SQL was the easier part and I hoped you will be able to merge my SQL with yours easily. The trouble for me was with the locking, timezones and tuning mysql indexes to make it faster.

It would be great if you could improve/polish the patch a bit more, post new patch here, then I could quickly verify it still works on mysql and compare the speed, then you could commit it into cvs. Is it ok this way? I would return back to forum stuff in the meantime


Martín Langhoff added a comment - 06/Feb/08 07:15 AM
Sorry - ran out of time yesterday, but I'll write up a patch on top of yours with my SQL later today.

Petr Skoda added a comment - 06/Feb/08 07:20 AM
Thanks!

Kenneth Newquist added a comment - 07/Feb/08 12:46 AM
Is there any chance this will be backported to Moodle 1.8.5?

Martín Langhoff added a comment - 07/Feb/08 02:09 PM
From the patch attached:
Subject: [PATCH] staslib: stats_daily_cron() - small SQL improvements in activity & enrol SQL

This commit touches 3 elements

  • enrolments query

We need a separate query to deal fairly with the sitecourse.

  • activity query -

This SQL query is a bit more expensive mainly because of how it reads
role assignments to grab the "primary role". The "primary role"
SQL could be improved further (in dmllib) to be faster and more correct.

  • stats_get_action_names() replaces stats_get_action_sql_in()

which simplifies things. It's only called once, so there's no point in
holding a cache, and it doesn't make any assumptions about what the
caller will do with it, so callers can use it with more freedom.


Martín Langhoff added a comment - 07/Feb/08 02:24 PM
Petr -

that's my patch, which applies on top of yours. I suspect it is a bit slower on sites with tons of courses, but it is more correct as a result. The primary roles logic is quite patchy still, I think we can do something much much better with a tad of cunning and some SQL substring operations on the context.path to do a simplified "cascading" of the roles.

The SQL operations are DB specific, but I think the 4 DBs support doing what we want "inline" (that is, without stored procedures).


Martín Langhoff added a comment - 07/Feb/08 02:44 PM
Also - if you are sticking to the "3 days per run" rule, you will definitely want to remove the "if (time() > $timetocheck) {" line from admin/cron.php, because it is only true once a day

Petr Skoda added a comment - 08/Feb/08 09:11 AM
hello,

I do not understand why we should use the primary role concept, we are not using it anywhere else, right? The graph shows hits per role, the trouble is that one user may have several roles. I guessed the primary role tries to work around it so that each user has only one role, why? Or is there another reason? I proposed to count user's hits in each role he/she belongs to and construct a new query for course total, because the sum of hits from all roles would not make sense anymore. I believe the second approach is more correct and as a bonus is at least 100x faster on mysql (or maybe it is caused by CASE, not sure yet). I am afraid that if we try to look into parent context roles + primary roles it will be again 100x times slower which would not be acceptable IMHO.

"if (time() > $timetocheck) {" true only once a day? I would expect it to be true all the rest of the day after $timetocheck, though I did not test the code yet


Petr Skoda added a comment - 08/Feb/08 09:15 AM
also I needed to fix two sql typos to make the daily stats execute, is the patch ok?

Petr Skoda added a comment - 09/Feb/08 04:41 AM
got some new ideas and started working on this again, going to improve guest user stats, parent contexts, speed etc.

Petr Skoda added a comment - 11/Feb/08 09:54 PM - edited
sending latest patch, it is not production quality yet, I have to review each sql query once more

the major changes are:

  • added not-logged-in access to gues role on frontpage
  • default frontpage role stats
  • enrolments in parent course categories should work
  • temporary course guest access for normal users
  • all users total displayed on role graphs
  • fixed graph colors and graphs use the same sorting of roles as table

speed is still <60s for moodle.org day


Petr Skoda added a comment - 11/Feb/08 10:53 PM
patch update

Petr Skoda added a comment - 12/Feb/08 12:11 AM
I hope this one is close to final, please test on other databases than mysql, thanks
adding unrelated scale_history patch, because we are running out of version numbers in main version.php

Petr Skoda added a comment - 12/Feb/08 01:23 AM
found logic bug in my enrollment related queries, working on fix now...

Petr Skoda added a comment - 12/Feb/08 02:20 AM
fixed course enrolment totals and xhtml strictness of reports

Petr Skoda added a comment - 12/Feb/08 06:50 AM
The last part should be only upgrade code which add enrolment and activity course totals (roleid == 0) - needed only for sites with working stats, I will work on that tomorrow..

Martín Langhoff added a comment - 13/Feb/08 03:34 PM
Hey - great to see further progress. I was working in Australia for a few days - really bad connection.

The patch25 is looking good, but role_assignment is getting inner-joined rather than having an outer-join with a COALESCE to the default site role. The issue with that is that activity gets seriously underreported - unless you move away from the "activity by role" model, but users find it really useful being able to see whether students are lukers/active and if teachers are lurkers or active in a particular course. It is important for users (as in "admin users").

(In that sense, I know the "primary role" thing wasn't 200% accurate, but it was close enough to give users very valuable stats 99% of the time. And some aspects of the primary role subselect can be improved significantly.)


Petr Skoda added a comment - 13/Feb/08 05:21 PM
Forgive my total ignorance of primary roles, but I do not understand why would the patch return underreported data - it does consider enrolments in system and course category contexts, I can not think about single row in log table which would be missed except the roles with do anything. Please tell me in which kind of course (frontpage, normal) and for what users do you think that the patch underreports.

Martín Langhoff added a comment - 13/Feb/08 05:39 PM
> it does consider enrolments in system and course category contexts

hmmm – perhaps I'm not understanding the patch now that it's gotten so large. All the joins I see are inner joins against role... ah, I tracked it down, you are talkign about stats_get_enrolled_sql(). Hmmm. If you do the join like that (instead of in a subselect that gets collapsed with a distinct or a group by) won't it join into multiple rows (for users with multiple RAs), thus resulting in over-counting? I definitely think it will.

It is getting quite hard to see this huge patch clearly in all its parts and with so many changes – would work much better as a patch series. Can I talk to you about a version control system I've developed in my spare time?


Petr Skoda added a comment - 13/Feb/08 05:56 PM
If you have several roles with course:view you are counted in each one separately, but in course total you are counted only once - this is the main difference. Primary roles use role sortorder which was not designed for this purpose, or was it?

Martín Langhoff added a comment - 13/Feb/08 06:27 PM
> If you have several roles with course:view you are counted in each one separately, but in course total you are counted only once

ah - that's perfect then. Makes sense.


Petr Skoda added a comment - 14/Feb/08 09:25 AM
fresh patch with speed improvement when having thousands of courses, fixed graphs time normalization, added upgrade code, other bug fixing

Petr Skoda added a comment - 16/Feb/08 07:52 AM
latest patch, tested on pgsql - should be nearly production quality

Petr Skoda added a comment - 17/Feb/08 03:41 AM
committed into 1.9dev, not sure if it should go into 1.8.x though

------------------------
in case of any problems you can force stats recalculation by:

<?php
require ('config.php');
require_once($CFG->dirroot.'/lib/statslib.php');
require_once($CFG->dirroot.'/lib/adminlib.php');
require_once($CFG->dirroot.'/course/lib.php');

set_cron_lock('statsrunning', null); // force unlocking
$time = time() - 60*60*24*20; // 20 days back
set_config('statslastdaily', $time);
set_config('statslastweekly', $time);
set_config('statslastmonthly', $time);

and then run cron

please test and report any problems here


Dan Marsden added a comment - 18/Feb/08 05:36 AM
worked fine on my test environment, but in a copy of our production environment the upgrade fails - times out after 4min..... - I'm guessing due to the large number of logs in our db.... any suggestions on how to combat this?

thanks!

Dan


Petr Skoda added a comment - 18/Feb/08 05:40 AM
easy:
1/ enable DEBUG_DEVELOPER
2/ find the query which times out (there are numbers printed in cron log)
3/ add echo "$sql" and try to execute the query in some sql console directly
4/ see what database complains about

please let me know which db you are using and what/how query failed


Dan Marsden added a comment - 18/Feb/08 05:44 AM
should have done most of that in the first place! - will do that and post an update in a couple of hours.


Dan Marsden added a comment - 18/Feb/08 05:46 AM
... we're using mysql 5.0.24

Dan Marsden added a comment - 18/Feb/08 07:04 AM
sorted. - it timesout where it starts updating the indexes.... but on investigating further, the issue seems to be related to fastcgi which we've just started to use. It timesout, even though the php script + sql query is still running. After playing with fastcgi a bit more, I might add something to the IIS docs in docs.moodle.org regarding timeout values....

thanks,

Dan


Gary Anderson added a comment - 18/Feb/08 09:49 AM
Petr:

Here is a field report from our medium-size installation. I just tried the latest CVS update with stats for our past 20 days. It took 81 minutes to run, or about 4 minutes per day. Once the initial stats are collected, that seems to be a reasonable amount.

We have a 570 student secondary school with most students having 3 to 5 classes on Moodle. Many classes have daily assignments or other activities (hence probably more activities than college courses). We average about 10,000 entries in mdl_log per day.

This was run on a 2.8 GHZ Linux development box with plenty of memory, duel processors and no other users (it is a backup of the production machine).

Stats never worked for us before work was started on this improvement.

--Gary


Martin Dougiamas added a comment - 19/Feb/08 02:07 PM
It's looking great on moodle.org. I've not checked / verified the numbers (but they seem right).

One small buglet - if the course has a forced $course->lang then it gets used for displaying the stats for that course (eg whole stats page for Norwegian course display in Norwegian. Without looking I'm guessing it's $COURSE being set incorrectly ($COURSE should be $SITE).

Also, are you using the role name aliases? (when a course overrides the name of a role for local use)?


Petr Skoda added a comment - 19/Feb/08 05:01 PM
filed subtask for these - going to fix it today, thanks

Samuli Karevaara added a comment - 19/Feb/08 09:44 PM
Sorry for a stupid question: what's with the
SELECT ...snip... COALESCE((SELECT COUNT('x') ...snip...), 0)
?

Doesn't COUNT() return 0 if there are no matching rows?


Petr Skoda added a comment - 19/Feb/08 10:56 PM - edited
hmm, you are probably right, postgres reported problems with NULLs, but maybe it was SUM only - going to review it today, thanks!

anyway it is great that somebody is reading the code :-D


Samuli Karevaara added a comment - 20/Feb/08 05:45 PM
Probably the SUM, at least for MySQL, from the manual:
"COUNT() returns 0 if there were no matching rows."
"SUM() returns NULL if there were no matching rows."

Trivial yes. Just wondered if this is because of something that I don't understand. Probably no harm done with the COALESCE there, just makes the query a bit more unreadable, I'd guess performance hit is negligible.


Dan Marsden added a comment - 21/Feb/08 05:00 AM
weird. - it looks like stats starts processing at around 8:30am our time instead of the time set in admin >server >settings

we have set in admin > Server > settings
Maximum runtime: Until Complete
Run at 1 : 5

in Admin> Location > Location Settings Timezones we have set:
Default Timezone: Pacific/Auckland
Users can choose their own timezone
Default Country "New Zealand"

the query that I notice is running at 8:30am looks a bit like this:
update mdl_stats_daily set stat2=(SELECT COUNT(DISTINCT ra.userid) FROM mdl_role_assignments ra, mdl_context ctx, mdl_course c, mdl_rol_capabilities .......

Dan


Petr Skoda added a comment - 21/Feb/08 07:10 AM
what PHP version?

Dan Marsden added a comment - 21/Feb/08 08:01 AM
PHP 5.2.1 under IIS

Petr Skoda added a comment - 24/Feb/08 07:57 AM
added new improvements - the most notable is fixed counting of admins and better parent context join
the only major issue is the timezone stuff, fortunately it should sort of work even if it executes the stats later/soon

Petr Skoda added a comment - 24/Feb/08 07:58 AM
oh, the timezone stuff is concentrated in several functions only in lib/statslib.php feel free to tinker with it

Petr Skoda added a comment - 24/Feb/08 10:33 PM
I found one incorrect use of date() in admin/cron.php which was printing info about next stats processing, but it does not explain the 8:30am execution of query with update mdl_stats_daily.

Dan Marsden added a comment - 25/Feb/08 03:42 AM
leave that one for me. - I'll put some time into trying to find the issue.. - maybe I should log a seperate bug report - it's not a blocker by itself. - maybe a minor bug? - I might file a seperate report.

Dan


Dan Marsden added a comment - 25/Feb/08 05:29 AM
it's happening because stats processing has timed out in a previous process (me killing the sql query at 8:50 is part of the issue!)

so cron_lock for statsrunning is set to 8:50am....

then $nextmidnight rolls around, but cron_lock is still set, untill 24 hours later when timeout value occurs...

$timeout = empty($CFG->statsmaxruntime) ? 60*60*24 : $CFG->statsmaxruntime;

if (!set_cron_lock('statsrunning', $now + $timeout)) {

so. - timeout occurs, 24hours after last cron_lock, (happens to be 8:50am) then checks $nextmidnight and finds that it's after that time, so it runs again at 8:50am

Dan


Petr Skoda added a comment - 25/Feb/08 05:50 AM
aah, that explains it - setting shorter timeout should solve it then, right?

Petr Skoda added a comment - 25/Feb/08 05:51 AM
this should prove that the cron locking is working properly :-D

Dan Marsden added a comment - 25/Feb/08 06:14 AM
yup. - that should sort it!

Dan Marsden added a comment - 25/Feb/08 06:17 AM
problems with insert into statements to stats tables under MS SQL 2005:
3 queries fail with "invalid Column name 'ntimeend';

example:

mssql_n internal transformation:
INSERT INTO mdl_stats_weekly (stattype, timeend, courseid, roleid, stat1, stat2) SELECT stattype, (timeend - 946080000) AS ntimeend, courseid, 0, SUM(stat1), SUM(stat2) FROM mdl_stats_daily WHERE (stattype = 'enrolments' OR stattype = 'activity') AND roleid <> 0 GROUP BY stattype, ntimeend, courseid
to
INSERT INTO mdl_stats_weekly (stattype, timeend, courseid, roleid, stat1, stat2) SELECT stattype, (timeend - 946080000) AS ntimeend, courseid, 0, SUM(stat1), SUM(stat2) FROM mdl_stats_daily WHERE (stattype = N'enrolments' OR stattype = N'activity') AND roleid <> 0 GROUP BY stattype, ntimeend, courseid
207: Invalid column name 'ntimeend'.

ADOConnection._Execute(INSERT INTO mdl_stats_weekly (stattype, timeend, courseid, roleid, stat1, stat2)

SELECT stattype, (timeend - ..., false) % line 891, file: adodb.inc.php
ADOConnection.Execute(INSERT INTO mdl_stats_weekly (stattype, timeend, courseid, roleid, stat1, stat2)

SELECT stattype, (timeend - ...) % line 89, file: dmllib.php
execute_sql(INSERT INTO mdl_stats_weekly (stattype, timeend, courseid, roleid, stat1, stat2)

SELECT stattype, (timeend - ...) % line 1456, file: statslib.php
stats_upgrade_totals() % line 2761, file: upgrade.php
xmldb_main_upgrade(2007101508) % line 358, file: index.php


Petr Skoda added a comment - 25/Feb/08 06:20 AM - edited
hmm, I though that every decent database supports aliases in GROUP by,
could you please try to replace the ntimeend with timeend in GROUP BY?

Dan Marsden added a comment - 25/Feb/08 06:33 AM

running this directly on the db:
SELECT stattype, (timeend - 20) AS ntimeend, courseid, 0, SUM(stat1), SUM(stat2)
FROM mdl_stats_daily
GROUP BY stattype, ntimeend, courseid

gives this:
Invalid column name 'ntimeend'.

running this:
SELECT stattype, (timeend - 20) AS ntimeend, courseid, 0, SUM(stat1), SUM(stat2)
FROM mdl_stats_daily
GROUP BY stattype, (timeend - 20), courseid

query executes successfullly.... (doesn't return any results though, but that's probably right!)


Petr Skoda added a comment - 25/Feb/08 07:17 AM
thanks! I will test it in pg and mysql tomorrow and verify it returns the same results before committing...

Petr Skoda added a comment - 26/Feb/08 06:57 AM
should be fixed now in 1.9, I do not think it would be easily backportable into 1.8.x - sorry

please reopen in case of major problems or file new reports,
thanks for testing!