Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-27845

News forum items with ratings report DB errors

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.1.1, 2.2
    • Fix Version/s: 2.1.2
    • Component/s: Forum, Ratings
    • Labels:
    • Database:
      PostgreSQL
    • Testing Instructions:
      Hide

      1. set up a moodle site using postgresql as the database if you dont already have one
      2. access a forum and submit a rating
      3. check that the aggregate updates appropriately (assuming your use has the capabilities required to see the aggregate)

      Show
      1. set up a moodle site using postgresql as the database if you dont already have one 2. access a forum and submit a rating 3. check that the aggregate updates appropriately (assuming your use has the capabilities required to see the aggregate)
    • Affected Branches:
      MOODLE_21_STABLE, MOODLE_22_STABLE
    • Fixed Branches:
      MOODLE_21_STABLE
    • Pull Master Branch:
      MDL-27845_rating_aggregation

      Description

      I got this error when access news forum: Debug info:

      ERROR: column "ur.rating" must appear in the GROUP BY clause or be used in an aggregate function
      LINE 3: ... ur.id, ur.userid, ur.scaleid, ur.rating ...
      ^
      SELECT r.itemid, r.component, r.ratingarea, r.contextid,
      AVG(r.rating) AS aggrrating, COUNT(r.rating) AS numratings,
      ur.id, ur.userid, ur.scaleid, ur.rating AS usersrating
      FROM mdl_rating r
      LEFT JOIN mdl_rating ur ON ur.contextid = r.contextid AND
      ur.itemid = r.itemid AND
      ur.component = r.component AND
      ur.ratingarea = r.ratingarea AND
      ur.userid = $1
      WHERE r.contextid = $2 AND
      r.itemid = $3 AND
      r.component = $4 AND
      r.ratingarea = $5
      GROUP BY r.itemid, r.component, r.ratingarea, r.contextid, ur.id, ur.userid, ur.scaleid
      ORDER BY r.itemid
      [array (
      0 => '2',
      1 => '28',
      2 => '31',
      3 => 'mod_forum',
      4 => 'post',
      )]

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              stronk7 Eloy Lafuente (stronk7) added a comment - - edited

              Pasted from HQ chat:

              I'd say the query above is 100% incorrect, WTF is the reason for selecting and grouping by ur.rating (the value) ? It will break aggregation for sure lol!

              (or the the function returns aggregated info or it returns non-aggregated info. No way it can return the 2 things on its current incarnation)

              So something is wrong there IMO, ciao

              Show
              stronk7 Eloy Lafuente (stronk7) added a comment - - edited Pasted from HQ chat: I'd say the query above is 100% incorrect, WTF is the reason for selecting and grouping by ur.rating (the value) ? It will break aggregation for sure lol! (or the the function returns aggregated info or it returns non-aggregated info. No way it can return the 2 things on its current incarnation) So something is wrong there IMO, ciao
              Hide
              andyjdavis Andrew Davis added a comment - - edited

              I'm not so sure anything is wrong. That query returns both a user's rating of a given item plus the aggregate of all ratings of that item. That's why the rating table appears twice in the one query.

              Grouping ur.rating shouldn't affect the aggregation. "

              {rating} r" is being aggregated. "{rating}

              ur" isnt.

              Note also that

              {rating}

              r isnt actually being aggregated as it will be grouping a single row. Sadly to make postgres happy its sometimes required to add group by clauses that don't do any grouping.

              Show
              andyjdavis Andrew Davis added a comment - - edited I'm not so sure anything is wrong. That query returns both a user's rating of a given item plus the aggregate of all ratings of that item. That's why the rating table appears twice in the one query. Grouping ur.rating shouldn't affect the aggregation. " {rating} r" is being aggregated. "{rating} ur" isnt. Note also that {rating} r isnt actually being aggregated as it will be grouping a single row. Sadly to make postgres happy its sometimes required to add group by clauses that don't do any grouping.
              Hide
              stronk7 Eloy Lafuente (stronk7) added a comment - - edited

              Ho Andrew,

              it's not "sometimes", it's always. All DBs (but bloody MySQL) require all non aggregated columns to be grouped by.

              Also, the affirmation "Grouping ur.rating shouldn't affect the aggregation. "

              {rating} r" is being aggregated. "{rating}

              ur" isnt." is absolutely faulty. Aggregation happens for each group. Always.

              The key here is that your query is really one "cross-join" where the cartesian product is returned, mainly because there are some conditions missing in the JOIN.

              Luckly, from a aggregation perspective (the count() and the avg/sum() columns, it works because all those aggregations use to ignore NULLs. But it's one blatant cross join. Just print the returned records for 1 item and 3 ratings and you will see it.

              Also, it has the bad effect of returning a lot of columns (the non aggregated ones) with the "latest" value returned. For example, if you call the function for a given forum->post having, say, 3 ratings, the rating->count and rating->aggregate will be ok but other fields like "userid", "id", "scaleid" can be filled or no, 100% random. I'm 99% sure that happens. It's one cross-join and that is one of their main "abilities", to return "random" stuff if used in iterations.

              So, I really think we must control what is returned on all those non-agregated $rating attributes, because a big part of them are prone to be 100% false! And the risk of assuming they are correct is really, really high! I'd return nulls for all them, unless we are 100% sure they are correct (when userid is specified or so).

              So, summarizing, and back to my 1st comment:

              "or the the function returns aggregated info or it returns non-aggregated info. No way it can return the 2 things on its current incarnation"

              Hope I've explained the problems a bit better. Really those cross-joins are unpredictable IMO. And no "potentially incorrect" data should be returned.

              Ciao

              Show
              stronk7 Eloy Lafuente (stronk7) added a comment - - edited Ho Andrew, it's not "sometimes", it's always. All DBs (but bloody MySQL) require all non aggregated columns to be grouped by. Also, the affirmation "Grouping ur.rating shouldn't affect the aggregation. " {rating} r" is being aggregated. "{rating} ur" isnt." is absolutely faulty. Aggregation happens for each group. Always. The key here is that your query is really one "cross-join" where the cartesian product is returned, mainly because there are some conditions missing in the JOIN. Luckly, from a aggregation perspective (the count() and the avg/sum() columns, it works because all those aggregations use to ignore NULLs. But it's one blatant cross join. Just print the returned records for 1 item and 3 ratings and you will see it. Also, it has the bad effect of returning a lot of columns (the non aggregated ones) with the "latest" value returned. For example, if you call the function for a given forum->post having, say, 3 ratings, the rating->count and rating->aggregate will be ok but other fields like "userid", "id", "scaleid" can be filled or no, 100% random. I'm 99% sure that happens. It's one cross-join and that is one of their main "abilities", to return "random" stuff if used in iterations. So, I really think we must control what is returned on all those non-agregated $rating attributes, because a big part of them are prone to be 100% false! And the risk of assuming they are correct is really, really high! I'd return nulls for all them, unless we are 100% sure they are correct (when userid is specified or so). So, summarizing, and back to my 1st comment: "or the the function returns aggregated info or it returns non-aggregated info. No way it can return the 2 things on its current incarnation" Hope I've explained the problems a bit better. Really those cross-joins are unpredictable IMO. And no "potentially incorrect" data should be returned. Ciao
              Hide
              stronk7 Eloy Lafuente (stronk7) added a comment -

              Ho, I just have sent one basic rating/simpletest/testrating.php here:

              https://github.com/stronk7/moodle/compare/master...aggregation_cross_join_demo

              It' really simple, just creates 3 forum posts and 6 ratings over them, and then start testing the get_ratings() function.

              STEP 1 is testing of counts and averages, seem ok.
              STEP 2 introduces $options->userid, not sure if counts and averages should change.
              STEP 3 is the tricky one (read comments) where the "random" returned values are demo-ed. Only one fail happens but they could be lots more, I just have tested only one thing, demoing different results in 2 different calls.

              Surely that unittest can be used for defining the expected results and also as base for further unitesting along the time, feel free to pick it, is GPL, lol.

              Ciao

              Show
              stronk7 Eloy Lafuente (stronk7) added a comment - Ho, I just have sent one basic rating/simpletest/testrating.php here: https://github.com/stronk7/moodle/compare/master...aggregation_cross_join_demo It' really simple, just creates 3 forum posts and 6 ratings over them, and then start testing the get_ratings() function. STEP 1 is testing of counts and averages, seem ok. STEP 2 introduces $options->userid, not sure if counts and averages should change. STEP 3 is the tricky one (read comments) where the "random" returned values are demo-ed. Only one fail happens but they could be lots more, I just have tested only one thing, demoing different results in 2 different calls. Surely that unittest can be used for defining the expected results and also as base for further unitesting along the time, feel free to pick it, is GPL, lol. Ciao
              Hide
              andyjdavis Andrew Davis added a comment -

              Eloy, I've cherry picked your commit and added one of my own. I've reformatted the unit tests a bit plus also corrected some stuff. It doesn't quite work how you think it works

              git://github.com/andyjdavis/moodle.git
              https://github.com/andyjdavis/moodle/compare/master...MDL-27845_rating_aggregation

              You supply get_ratings() with a set of items and also a user id. If you don't supply a user id then $USER is used. That's why one of your step 3 tests was failing $USER is used and if you're logged in as admin you probably have a user ID of 2 and you were accessing user 1's items. User 2 has rated user 1's item so the assertNull() was failing. The second test passed because $USER->id == 2 and you were getting user 2's own items so there was no rating and the assertNull() passed.

              When you call get_rating() supplying a set of items and a user ID you get back that user's rating of each item ($result[0]>rating>rating) as well as aggregate info about all ratings of that item ($result[0]>rating>count and $result[0]>rating>aggregate) and some other bits and pieces to make rendering ratings easier.

              Have a look at the diff and see if anything is still unclear

              Show
              andyjdavis Andrew Davis added a comment - Eloy, I've cherry picked your commit and added one of my own. I've reformatted the unit tests a bit plus also corrected some stuff. It doesn't quite work how you think it works git://github.com/andyjdavis/moodle.git https://github.com/andyjdavis/moodle/compare/master...MDL-27845_rating_aggregation You supply get_ratings() with a set of items and also a user id. If you don't supply a user id then $USER is used. That's why one of your step 3 tests was failing $USER is used and if you're logged in as admin you probably have a user ID of 2 and you were accessing user 1's items. User 2 has rated user 1's item so the assertNull() was failing. The second test passed because $USER->id == 2 and you were getting user 2's own items so there was no rating and the assertNull() passed. When you call get_rating() supplying a set of items and a user ID you get back that user's rating of each item ($result [0] >rating >rating) as well as aggregate info about all ratings of that item ($result [0] >rating >count and $result [0] >rating >aggregate) and some other bits and pieces to make rendering ratings easier. Have a look at the diff and see if anything is still unclear
              Hide
              andyjdavis Andrew Davis added a comment -

              I have cherry picked this from master to 2.1 stable. Cherry picking to 2.0 stable fails as the ratings code has been significantly refactored. I made some attempt to reimplement it in 2.0 however its surprisingly painful. Unless there is a compelling reason to backport this to 2.0.x I don't think its worth the risk.

              Show
              andyjdavis Andrew Davis added a comment - I have cherry picked this from master to 2.1 stable. Cherry picking to 2.0 stable fails as the ratings code has been significantly refactored. I made some attempt to reimplement it in 2.0 however its surprisingly painful. Unless there is a compelling reason to backport this to 2.0.x I don't think its worth the risk.
              Hide
              nebgor Aparup Banerjee added a comment - - edited

              the test needs to be updated - i tried replicating the issue to see if the patch solves it - i was stuck at the replication stage.

              aside from the code its slightly confusing readability wise to have both curly braces used within the query where the curly braces are resolved differently.

              {rating}

              vs {$itemidtest}

              also is ' r.itemid {$itemidtest} AND' bit of query missing any assignment? ah its form get_in_or_equal maybe not call it an id?

              Show
              nebgor Aparup Banerjee added a comment - - edited the test needs to be updated - i tried replicating the issue to see if the patch solves it - i was stuck at the replication stage. aside from the code its slightly confusing readability wise to have both curly braces used within the query where the curly braces are resolved differently. {rating} vs {$itemidtest} also is ' r.itemid {$itemidtest} AND' bit of query missing any assignment? ah its form get_in_or_equal maybe not call it an id?
              Hide
              andyjdavis Andrew Davis added a comment -

              Ive corrected the test.

              r.itemid {$itemidtest}

              The above is correct. I agree its a little odd looking at first.

              Show
              andyjdavis Andrew Davis added a comment - Ive corrected the test. r.itemid {$itemidtest} The above is correct. I agree its a little odd looking at first.
              Hide
              andyjdavis Andrew Davis added a comment -

              I have raised MDL-28465 to remind me to re-implement this same fix in 2.0 stable

              Show
              andyjdavis Andrew Davis added a comment - I have raised MDL-28465 to remind me to re-implement this same fix in 2.0 stable
              Hide
              samhemelryk Sam Hemelryk added a comment -

              Hi Andrew, just a heads up this will require the changes suggested by Petr on MDL-28456 plus a little more as in 20, 21, and master the code relies on itemid being the first column.
              Cheers
              Sam

              Show
              samhemelryk Sam Hemelryk added a comment - Hi Andrew, just a heads up this will require the changes suggested by Petr on MDL-28456 plus a little more as in 20, 21, and master the code relies on itemid being the first column. Cheers Sam
              Hide
              skodak Petr Skoda added a comment -

              You mean MDL-28465, right?

              Show
              skodak Petr Skoda added a comment - You mean MDL-28465 , right?
              Hide
              skodak Petr Skoda added a comment -

              Reopening for the same reasons as MDL-28465, the first column in get_records() has to be guaranteed to be unique.

              Show
              skodak Petr Skoda added a comment - Reopening for the same reasons as MDL-28465 , the first column in get_records() has to be guaranteed to be unique.
              Hide
              andyjdavis Andrew Davis added a comment -

              The first column is guaranteed to be unique by the ratings logic. I have however reworked it so that should this situation change the first column with still be unique.

              Show
              andyjdavis Andrew Davis added a comment - The first column is guaranteed to be unique by the ratings logic. I have however reworked it so that should this situation change the first column with still be unique.
              Hide
              nebgor Aparup Banerjee added a comment -

              the code looks good to me. sanity checks all passed

              Show
              nebgor Aparup Banerjee added a comment - the code looks good to me. sanity checks all passed
              Hide
              stronk7 Eloy Lafuente (stronk7) added a comment -

              Postposing this to next week.

              Note to integrator: This is the solution for 21_STABLE and master. MDL-28465 is the corresponding solution for 20_STABLE so should be integrated at the same time.

              Note to Andrew: Personally, I think the query is joining too much (unrelated records aka cross joining). But if it works, it works

              Show
              stronk7 Eloy Lafuente (stronk7) added a comment - Postposing this to next week. Note to integrator: This is the solution for 21_STABLE and master. MDL-28465 is the corresponding solution for 20_STABLE so should be integrated at the same time. Note to Andrew: Personally, I think the query is joining too much (unrelated records aka cross joining). But if it works, it works
              Hide
              skodak Petr Skoda added a comment -

              Integrated, please test very carefully in all 4 supported databases.

              Next time please do not use columns in GROUP BY that are not necessary, I hope this is going to be compatible with all supported databases:
              GROUP BY r.itemid, r.component, r.ratingarea, r.contextid

              Thanks.

              Show
              skodak Petr Skoda added a comment - Integrated, please test very carefully in all 4 supported databases. Next time please do not use columns in GROUP BY that are not necessary, I hope this is going to be compatible with all supported databases: GROUP BY r.itemid, r.component, r.ratingarea, r.contextid Thanks.
              Hide
              rajeshtaneja Rajesh Taneja added a comment -

              Works Great
              Thanks for fixing this Andrew and Eloy

              Show
              rajeshtaneja Rajesh Taneja added a comment - Works Great Thanks for fixing this Andrew and Eloy
              Hide
              stronk7 Eloy Lafuente (stronk7) added a comment -

              Such an effort has get its prize. Universal Award to the very-best Moodle developer in the family. Thanks!

              Closing. Ciao

              Show
              stronk7 Eloy Lafuente (stronk7) added a comment - Such an effort has get its prize. Universal Award to the very-best Moodle developer in the family. Thanks! Closing. Ciao

                People

                • Votes:
                  0 Vote for this issue
                  Watchers:
                  2 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:
                    Fix Release Date:
                    10/Oct/11