Moodle

Statistics are extremely slow

Details

  • Type: Sub-task Sub-task
  • Status: Closed Closed
  • Priority: Critical Critical
  • Resolution: Duplicate
  • Affects Version/s: 1.8.4, 1.9
  • Fix Version/s: 1.9
  • Component/s: Administration
  • Labels:
    None
  • Database:
    MySQL
  • Affected Branches:
    MOODLE_18_STABLE, MOODLE_19_STABLE
  • Fixed Branches:
    MOODLE_19_STABLE

Description

I have a site where the statistics are taking 18 hours to run (and then you still can't see any results, see MDL-11953).

There are lots of queries like this:

SELECT userid, primaryrole FROM (
SELECT active_course_users.userid,
(SELECT roleid FROM mdl_role_assignments outer_r_a INNER JOIN mdl_role outer_r ON outer_r_a.roleid=outer_r.id
INNER JOIN mdl_context c ON outer_r_a.contextid = c.id
WHERE c.instanceid=4 AND c.contextlevel = 50 AND outer_r_a.userid=active_course_users.userid
AND NOT EXISTS (SELECT 1 FROM mdl_role_assignments inner_r_a
INNER JOIN mdl_role inner_r ON inner_r_a.roleid = inner_r.id
WHERE inner_r.sortorder < outer_r.sortorder
AND inner_r_a.userid = outer_r_a.userid
AND inner_r_a.contextid = outer_r_a.contextid
)
) AS primaryrole
FROM (SELECT DISTINCT userid FROM mdl_log l WHERE course=4 AND (l.time > 1193702400 AND l.time < 1193788800) )
active_course_users
) foo WHERE primaryrole IS NOT NULL

and

SELECT COUNT(DISTINCT l.id) FROM (SELECT ra.userid,
ra.roleid AS primary_roleid,
ra.contextid,
r.sortorder,
r.name,
r.description,
r.shortname,
c.instanceid AS courseid,
c.contextlevel
FROM mdl_role_assignments ra
INNER JOIN mdl_role r ON ra.roleid = r.id
INNER JOIN mdl_context c ON ra.contextid = c.id
WHERE NOT EXISTS (
SELECT 1
FROM mdl_role_assignments i_ra
INNER JOIN mdl_role i_r ON i_ra.roleid = i_r.id
WHERE ra.userid = i_ra.userid AND
ra.contextid = i_ra.contextid AND
i_r.sortorder < r.sortorder
) ) prs
INNER JOIN mdl_log l ON (prs.userid=l.userid AND l.course=prs.courseid)
WHERE prs.primary_roleid=3 AND prs.courseid=7165
AND prs.contextlevel = 50 AND (l.time > 1193702400 AND l.time < 1193788800) AND l.action IN ('add','delete','edit','add mod','delete mod','edit sectionenrol','loginas','new','unenrol','update','update mod','upload','choose','choose again','add discussion','add post','delete discussion','delete post','move discussion','prune post','update post','add entry','update entry','update feedback','attempt','editquestions','review','submit','agree','assess','comment','grade','newattachment','removeattachments','resubmit','talk','add category','add comment','approve entry','delete category','delete comment','delete entry','edit category','update comment','open','set up','end','start','update grade attempt','hack','record delete');

and

SELECT COUNT(DISTINCT prs.userid) FROM (SELECT ra.userid,
ra.roleid AS primary_roleid,
ra.contextid,
r.sortorder,
r.name,
r.description,
r.shortname,
c.instanceid AS courseid,
c.contextlevel
FROM mdl_role_assignments ra
INNER JOIN mdl_role r ON ra.roleid = r.id
INNER JOIN mdl_context c ON ra.contextid = c.id
WHERE NOT EXISTS (
SELECT 1
FROM mdl_role_assignments i_ra
INNER JOIN mdl_role i_r ON i_ra.roleid = i_r.id
WHERE ra.userid = i_ra.userid AND
ra.contextid = i_ra.contextid AND
i_r.sortorder < r.sortorder
) ) prs WHERE prs.primary_roleid=3 AND prs.courseid=7165 AND prs.contextlevel = 50;

That are taking > 4 seconds to run, creating a lot of disk-based temporary tables (even with a 256M memory buffer allocated for them)

Issue Links

Activity

Hide
Sam Marshall added a comment -

Thanks for alerting me to this issue. For info on the OU perspective:

On our site this takes approx 3 hours - slow, but not slow enough to extend into the next day, so it isn't causing any critical problems for us. (Incidentally we also have custom code that prevents cron running more than once at a time. So this would mean that forum posts aren't emailed out between midnight and 3am, but we don't have much usage at that time anyway.)

As far as I know our database server has enough memory to cache everything except mdl_log (we are intending to do something with old mdl_log soon to resolve that) so it ought to be quite fast in general...

Show
Sam Marshall added a comment - Thanks for alerting me to this issue. For info on the OU perspective: On our site this takes approx 3 hours - slow, but not slow enough to extend into the next day, so it isn't causing any critical problems for us. (Incidentally we also have custom code that prevents cron running more than once at a time. So this would mean that forum posts aren't emailed out between midnight and 3am, but we don't have much usage at that time anyway.) As far as I know our database server has enough memory to cache everything except mdl_log (we are intending to do something with old mdl_log soon to resolve that) so it ought to be quite fast in general...
Hide
Martín Langhoff added a comment -

With the DB that MD sent me I can see it clearly: the daily stats are murder. I suspect they are even more expensive on MySQL - Pg seems to handle them a bit better. In any case, they are hugely inefficient. The core of the problem is the "primary roles subselect". My current notes are as follows:

  • primaryroles subselect
  • Is technically wrong - it does not cascade roles (this is fixable on 1.9) - it only removes duplicate roles. As we inner-join against it, log entries from indirect enrolments are not being counted. To do: fix on 1.9 using/abusing the context path
  • Is extremely expensive (and will get more expensive once it is "correct". To do: materialse at the beginning into a temp table (not really temporary - I'll have to add at least a table).

Note - Handling of the default role is tricky. If we add it to the primaryroles subselect, we get a massive combinatorial mess. Instead, users of the subselect/ view/temptable should add it as needed by using LEFT OUTER JOINs against this temptable, and using COALESCE() to "fill in" with the default role where needed.

  • Switch to constant number of queries...

We can probably be smart and use agreggates for most of the processing. A few expensive queries should get the job done - right now we are running thousands of expensive queries.

This wil take a day or two to fix, and involve one, perhaps 2 new tables (to be used as "temporary tables" except that we don't have a portable way of handling temp tables at the moment).

Show
Martín Langhoff added a comment - With the DB that MD sent me I can see it clearly: the daily stats are murder. I suspect they are even more expensive on MySQL - Pg seems to handle them a bit better. In any case, they are hugely inefficient. The core of the problem is the "primary roles subselect". My current notes are as follows:
  • primaryroles subselect
  • Is technically wrong - it does not cascade roles (this is fixable on 1.9) - it only removes duplicate roles. As we inner-join against it, log entries from indirect enrolments are not being counted. To do: fix on 1.9 using/abusing the context path
  • Is extremely expensive (and will get more expensive once it is "correct". To do: materialse at the beginning into a temp table (not really temporary - I'll have to add at least a table).
Note - Handling of the default role is tricky. If we add it to the primaryroles subselect, we get a massive combinatorial mess. Instead, users of the subselect/ view/temptable should add it as needed by using LEFT OUTER JOINs against this temptable, and using COALESCE() to "fill in" with the default role where needed.
  • Switch to constant number of queries...
We can probably be smart and use agreggates for most of the processing. A few expensive queries should get the job done - right now we are running thousands of expensive queries. This wil take a day or two to fix, and involve one, perhaps 2 new tables (to be used as "temporary tables" except that we don't have a portable way of handling temp tables at the moment).
Hide
Martín Langhoff added a comment -

Alright. It turns out that the main problem was the sheer number of queries, most of which included the the primaryroles subselect. Got some patches for this that turn it from an exponential number of queries (close to 1 million dbq per day on the sample data!) to a fixed number per-day. This cuts down the cost of running the daily stats (which was the most expensive part of running the stats) significantly.

Performance-wise, on my dev laptop it is getting through ~1 day per minute. The client DB I am using to test has ~1600 days worth of logs, however, so this will take 28hs to do just the daily stats. The weekly/monthly aggregates should be fast after that, but I haven't tested it.

Show
Martín Langhoff added a comment - Alright. It turns out that the main problem was the sheer number of queries, most of which included the the primaryroles subselect. Got some patches for this that turn it from an exponential number of queries (close to 1 million dbq per day on the sample data!) to a fixed number per-day. This cuts down the cost of running the daily stats (which was the most expensive part of running the stats) significantly. Performance-wise, on my dev laptop it is getting through ~1 day per minute. The client DB I am using to test has ~1600 days worth of logs, however, so this will take 28hs to do just the daily stats. The weekly/monthly aggregates should be fast after that, but I haven't tested it.
Hide
Martín Langhoff added a comment -

Attaching rollup patch - It's more readable here, broken down in 4 patches with good commit msgs: http://git.catalyst.net.nz/gitweb?p=moodle-r2.git;a=shortlog;h=mdl18-MDL-13092

Show
Martín Langhoff added a comment - Attaching rollup patch - It's more readable here, broken down in 4 patches with good commit msgs: http://git.catalyst.net.nz/gitweb?p=moodle-r2.git;a=shortlog;h=mdl18-MDL-13092
Hide
Petr Škoda (skodak) added a comment - - edited

Hi, I am in the middle of complete stats cron code rewrite for 1.9, fixing several other troubles there at the same time.
I am at 20s/6 queries/no memory consumed for one day of stats from moodle.org on my untuned pc, weeklies and monthlies should be much faster. I was not able to get any results with the old code.

Expect patch later this week, I still need to improve/fix/verify the code...

Show
Petr Škoda (skodak) added a comment - - edited Hi, I am in the middle of complete stats cron code rewrite for 1.9, fixing several other troubles there at the same time. I am at 20s/6 queries/no memory consumed for one day of stats from moodle.org on my untuned pc, weeklies and monthlies should be much faster. I was not able to get any results with the old code. Expect patch later this week, I still need to improve/fix/verify the code...
Hide
Martin Dougiamas added a comment -

Please can we not do parallel work on this problem ...

Show
Martin Dougiamas added a comment - Please can we not do parallel work on this problem ...
Hide
Petr Škoda (skodak) added a comment -

ok, I will keep my dirty hands off that

Show
Petr Škoda (skodak) added a comment - ok, I will keep my dirty hands off that
Hide
Martin Dougiamas added a comment -

It's not that, I was just trying to keep your time free to work on other things. Since you've got some SQL, though, let's look at it. I've sent you the same test database to try it out.

Show
Martin Dougiamas added a comment - It's not that, I was just trying to keep your time free to work on other things. Since you've got some SQL, though, let's look at it. I've sent you the same test database to try it out.
Hide
Petr Škoda (skodak) added a comment -

6s per day, 1s per week - cleaning up the patch to be ready for review tomorrow

Show
Petr Škoda (skodak) added a comment - 6s per day, 1s per week - cleaning up the patch to be ready for review tomorrow
Hide
Petr Škoda (skodak) added a comment -

making parent bug blocker instead

Show
Petr Škoda (skodak) added a comment - making parent bug blocker instead

People

Vote (1)
Watch (11)

Dates

  • Created:
    Updated:
    Resolved: