Moodle
  1. Moodle
  2. MDL-27845

News forum items with ratings report DB errors

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.1.1, 2.2
    • Fix Version/s: 2.1.2
    • Component/s: Forum, Ratings
    • Labels:
    • Database:
      PostgreSQL
    • Testing Instructions:
      Hide

      1. set up a moodle site using postgresql as the database if you dont already have one
      2. access a forum and submit a rating
      3. check that the aggregate updates appropriately (assuming your use has the capabilities required to see the aggregate)

      Show
      1. set up a moodle site using postgresql as the database if you dont already have one 2. access a forum and submit a rating 3. check that the aggregate updates appropriately (assuming your use has the capabilities required to see the aggregate)
    • Affected Branches:
      MOODLE_21_STABLE, MOODLE_22_STABLE
    • Fixed Branches:
      MOODLE_21_STABLE
    • Pull Master Branch:
      MDL-27845_rating_aggregation
    • Rank:
      17501

      Description

      I got this error when access news forum: Debug info:

      ERROR: column "ur.rating" must appear in the GROUP BY clause or be used in an aggregate function
      LINE 3: ... ur.id, ur.userid, ur.scaleid, ur.rating ...
      ^
      SELECT r.itemid, r.component, r.ratingarea, r.contextid,
      AVG(r.rating) AS aggrrating, COUNT(r.rating) AS numratings,
      ur.id, ur.userid, ur.scaleid, ur.rating AS usersrating
      FROM mdl_rating r
      LEFT JOIN mdl_rating ur ON ur.contextid = r.contextid AND
      ur.itemid = r.itemid AND
      ur.component = r.component AND
      ur.ratingarea = r.ratingarea AND
      ur.userid = $1
      WHERE r.contextid = $2 AND
      r.itemid = $3 AND
      r.component = $4 AND
      r.ratingarea = $5
      GROUP BY r.itemid, r.component, r.ratingarea, r.contextid, ur.id, ur.userid, ur.scaleid
      ORDER BY r.itemid
      [array (
      0 => '2',
      1 => '28',
      2 => '31',
      3 => 'mod_forum',
      4 => 'post',
      )]
      

        Issue Links

          Activity

          Hide
          Eloy Lafuente (stronk7) added a comment - - edited

          Pasted from HQ chat:

          I'd say the query above is 100% incorrect, WTF is the reason for selecting and grouping by ur.rating (the value) ? It will break aggregation for sure lol!

          (or the the function returns aggregated info or it returns non-aggregated info. No way it can return the 2 things on its current incarnation)

          So something is wrong there IMO, ciao

          Show
          Eloy Lafuente (stronk7) added a comment - - edited Pasted from HQ chat: I'd say the query above is 100% incorrect, WTF is the reason for selecting and grouping by ur.rating (the value) ? It will break aggregation for sure lol! (or the the function returns aggregated info or it returns non-aggregated info. No way it can return the 2 things on its current incarnation) So something is wrong there IMO, ciao
          Hide
          Andrew Davis added a comment - - edited

          I'm not so sure anything is wrong. That query returns both a user's rating of a given item plus the aggregate of all ratings of that item. That's why the rating table appears twice in the one query.

          Grouping ur.rating shouldn't affect the aggregation. "

          {rating} r" is being aggregated. "{rating}

          ur" isnt.

          Note also that

          {rating}

          r isnt actually being aggregated as it will be grouping a single row. Sadly to make postgres happy its sometimes required to add group by clauses that don't do any grouping.

          Show
          Andrew Davis added a comment - - edited I'm not so sure anything is wrong. That query returns both a user's rating of a given item plus the aggregate of all ratings of that item. That's why the rating table appears twice in the one query. Grouping ur.rating shouldn't affect the aggregation. " {rating} r" is being aggregated. "{rating} ur" isnt. Note also that {rating} r isnt actually being aggregated as it will be grouping a single row. Sadly to make postgres happy its sometimes required to add group by clauses that don't do any grouping.
          Hide
          Eloy Lafuente (stronk7) added a comment - - edited

          Ho Andrew,

          it's not "sometimes", it's always. All DBs (but bloody MySQL) require all non aggregated columns to be grouped by.

          Also, the affirmation "Grouping ur.rating shouldn't affect the aggregation. "

          {rating} r" is being aggregated. "{rating}

          ur" isnt." is absolutely faulty. Aggregation happens for each group. Always.

          The key here is that your query is really one "cross-join" where the cartesian product is returned, mainly because there are some conditions missing in the JOIN.

          Luckly, from a aggregation perspective (the count() and the avg/sum() columns, it works because all those aggregations use to ignore NULLs. But it's one blatant cross join. Just print the returned records for 1 item and 3 ratings and you will see it.

          Also, it has the bad effect of returning a lot of columns (the non aggregated ones) with the "latest" value returned. For example, if you call the function for a given forum->post having, say, 3 ratings, the rating->count and rating->aggregate will be ok but other fields like "userid", "id", "scaleid" can be filled or no, 100% random. I'm 99% sure that happens. It's one cross-join and that is one of their main "abilities", to return "random" stuff if used in iterations.

          So, I really think we must control what is returned on all those non-agregated $rating attributes, because a big part of them are prone to be 100% false! And the risk of assuming they are correct is really, really high! I'd return nulls for all them, unless we are 100% sure they are correct (when userid is specified or so).

          So, summarizing, and back to my 1st comment:

          "or the the function returns aggregated info or it returns non-aggregated info. No way it can return the 2 things on its current incarnation"

          Hope I've explained the problems a bit better. Really those cross-joins are unpredictable IMO. And no "potentially incorrect" data should be returned.

          Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - - edited Ho Andrew, it's not "sometimes", it's always. All DBs (but bloody MySQL) require all non aggregated columns to be grouped by. Also, the affirmation "Grouping ur.rating shouldn't affect the aggregation. " {rating} r" is being aggregated. "{rating} ur" isnt." is absolutely faulty. Aggregation happens for each group. Always. The key here is that your query is really one "cross-join" where the cartesian product is returned, mainly because there are some conditions missing in the JOIN. Luckly, from a aggregation perspective (the count() and the avg/sum() columns, it works because all those aggregations use to ignore NULLs. But it's one blatant cross join. Just print the returned records for 1 item and 3 ratings and you will see it. Also, it has the bad effect of returning a lot of columns (the non aggregated ones) with the "latest" value returned. For example, if you call the function for a given forum->post having, say, 3 ratings, the rating->count and rating->aggregate will be ok but other fields like "userid", "id", "scaleid" can be filled or no, 100% random. I'm 99% sure that happens. It's one cross-join and that is one of their main "abilities", to return "random" stuff if used in iterations. So, I really think we must control what is returned on all those non-agregated $rating attributes, because a big part of them are prone to be 100% false! And the risk of assuming they are correct is really, really high! I'd return nulls for all them, unless we are 100% sure they are correct (when userid is specified or so). So, summarizing, and back to my 1st comment: "or the the function returns aggregated info or it returns non-aggregated info. No way it can return the 2 things on its current incarnation" Hope I've explained the problems a bit better. Really those cross-joins are unpredictable IMO. And no "potentially incorrect" data should be returned. Ciao
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Ho, I just have sent one basic rating/simpletest/testrating.php here:

          https://github.com/stronk7/moodle/compare/master...aggregation_cross_join_demo

          It' really simple, just creates 3 forum posts and 6 ratings over them, and then start testing the get_ratings() function.

          STEP 1 is testing of counts and averages, seem ok.
          STEP 2 introduces $options->userid, not sure if counts and averages should change.
          STEP 3 is the tricky one (read comments) where the "random" returned values are demo-ed. Only one fail happens but they could be lots more, I just have tested only one thing, demoing different results in 2 different calls.

          Surely that unittest can be used for defining the expected results and also as base for further unitesting along the time, feel free to pick it, is GPL, lol.

          Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - Ho, I just have sent one basic rating/simpletest/testrating.php here: https://github.com/stronk7/moodle/compare/master...aggregation_cross_join_demo It' really simple, just creates 3 forum posts and 6 ratings over them, and then start testing the get_ratings() function. STEP 1 is testing of counts and averages, seem ok. STEP 2 introduces $options->userid, not sure if counts and averages should change. STEP 3 is the tricky one (read comments) where the "random" returned values are demo-ed. Only one fail happens but they could be lots more, I just have tested only one thing, demoing different results in 2 different calls. Surely that unittest can be used for defining the expected results and also as base for further unitesting along the time, feel free to pick it, is GPL, lol. Ciao
          Hide
          Andrew Davis added a comment -

          Eloy, I've cherry picked your commit and added one of my own. I've reformatted the unit tests a bit plus also corrected some stuff. It doesn't quite work how you think it works

          git://github.com/andyjdavis/moodle.git
          https://github.com/andyjdavis/moodle/compare/master...MDL-27845_rating_aggregation

          You supply get_ratings() with a set of items and also a user id. If you don't supply a user id then $USER is used. That's why one of your step 3 tests was failing $USER is used and if you're logged in as admin you probably have a user ID of 2 and you were accessing user 1's items. User 2 has rated user 1's item so the assertNull() was failing. The second test passed because $USER->id == 2 and you were getting user 2's own items so there was no rating and the assertNull() passed.

          When you call get_rating() supplying a set of items and a user ID you get back that user's rating of each item ($result[0]>rating>rating) as well as aggregate info about all ratings of that item ($result[0]>rating>count and $result[0]>rating>aggregate) and some other bits and pieces to make rendering ratings easier.

          Have a look at the diff and see if anything is still unclear

          Show
          Andrew Davis added a comment - Eloy, I've cherry picked your commit and added one of my own. I've reformatted the unit tests a bit plus also corrected some stuff. It doesn't quite work how you think it works git://github.com/andyjdavis/moodle.git https://github.com/andyjdavis/moodle/compare/master...MDL-27845_rating_aggregation You supply get_ratings() with a set of items and also a user id. If you don't supply a user id then $USER is used. That's why one of your step 3 tests was failing $USER is used and if you're logged in as admin you probably have a user ID of 2 and you were accessing user 1's items. User 2 has rated user 1's item so the assertNull() was failing. The second test passed because $USER->id == 2 and you were getting user 2's own items so there was no rating and the assertNull() passed. When you call get_rating() supplying a set of items and a user ID you get back that user's rating of each item ($result [0] >rating >rating) as well as aggregate info about all ratings of that item ($result [0] >rating >count and $result [0] >rating >aggregate) and some other bits and pieces to make rendering ratings easier. Have a look at the diff and see if anything is still unclear
          Hide
          Andrew Davis added a comment -

          I have cherry picked this from master to 2.1 stable. Cherry picking to 2.0 stable fails as the ratings code has been significantly refactored. I made some attempt to reimplement it in 2.0 however its surprisingly painful. Unless there is a compelling reason to backport this to 2.0.x I don't think its worth the risk.

          Show
          Andrew Davis added a comment - I have cherry picked this from master to 2.1 stable. Cherry picking to 2.0 stable fails as the ratings code has been significantly refactored. I made some attempt to reimplement it in 2.0 however its surprisingly painful. Unless there is a compelling reason to backport this to 2.0.x I don't think its worth the risk.
          Hide
          Aparup Banerjee added a comment - - edited

          the test needs to be updated - i tried replicating the issue to see if the patch solves it - i was stuck at the replication stage.

          aside from the code its slightly confusing readability wise to have both curly braces used within the query where the curly braces are resolved differently.

          {rating}

          vs {$itemidtest}

          also is ' r.itemid {$itemidtest} AND' bit of query missing any assignment? ah its form get_in_or_equal maybe not call it an id?

          Show
          Aparup Banerjee added a comment - - edited the test needs to be updated - i tried replicating the issue to see if the patch solves it - i was stuck at the replication stage. aside from the code its slightly confusing readability wise to have both curly braces used within the query where the curly braces are resolved differently. {rating} vs {$itemidtest} also is ' r.itemid {$itemidtest} AND' bit of query missing any assignment? ah its form get_in_or_equal maybe not call it an id?
          Hide
          Andrew Davis added a comment -

          Ive corrected the test.

          r.itemid {$itemidtest}
          

          The above is correct. I agree its a little odd looking at first.

          Show
          Andrew Davis added a comment - Ive corrected the test. r.itemid {$itemidtest} The above is correct. I agree its a little odd looking at first.
          Hide
          Andrew Davis added a comment -

          I have raised MDL-28465 to remind me to re-implement this same fix in 2.0 stable

          Show
          Andrew Davis added a comment - I have raised MDL-28465 to remind me to re-implement this same fix in 2.0 stable
          Hide
          Sam Hemelryk added a comment -

          Hi Andrew, just a heads up this will require the changes suggested by Petr on MDL-28456 plus a little more as in 20, 21, and master the code relies on itemid being the first column.
          Cheers
          Sam

          Show
          Sam Hemelryk added a comment - Hi Andrew, just a heads up this will require the changes suggested by Petr on MDL-28456 plus a little more as in 20, 21, and master the code relies on itemid being the first column. Cheers Sam
          Hide
          Petr Škoda added a comment -

          You mean MDL-28465, right?

          Show
          Petr Škoda added a comment - You mean MDL-28465 , right?
          Hide
          Petr Škoda added a comment -

          Reopening for the same reasons as MDL-28465, the first column in get_records() has to be guaranteed to be unique.

          Show
          Petr Škoda added a comment - Reopening for the same reasons as MDL-28465 , the first column in get_records() has to be guaranteed to be unique.
          Hide
          Andrew Davis added a comment -

          The first column is guaranteed to be unique by the ratings logic. I have however reworked it so that should this situation change the first column with still be unique.

          Show
          Andrew Davis added a comment - The first column is guaranteed to be unique by the ratings logic. I have however reworked it so that should this situation change the first column with still be unique.
          Hide
          Aparup Banerjee added a comment -

          the code looks good to me. sanity checks all passed

          Show
          Aparup Banerjee added a comment - the code looks good to me. sanity checks all passed
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Postposing this to next week.

          Note to integrator: This is the solution for 21_STABLE and master. MDL-28465 is the corresponding solution for 20_STABLE so should be integrated at the same time.

          Note to Andrew: Personally, I think the query is joining too much (unrelated records aka cross joining). But if it works, it works

          Show
          Eloy Lafuente (stronk7) added a comment - Postposing this to next week. Note to integrator: This is the solution for 21_STABLE and master. MDL-28465 is the corresponding solution for 20_STABLE so should be integrated at the same time. Note to Andrew: Personally, I think the query is joining too much (unrelated records aka cross joining). But if it works, it works
          Hide
          Petr Škoda added a comment -

          Integrated, please test very carefully in all 4 supported databases.

          Next time please do not use columns in GROUP BY that are not necessary, I hope this is going to be compatible with all supported databases:
          GROUP BY r.itemid, r.component, r.ratingarea, r.contextid

          Thanks.

          Show
          Petr Škoda added a comment - Integrated, please test very carefully in all 4 supported databases. Next time please do not use columns in GROUP BY that are not necessary, I hope this is going to be compatible with all supported databases: GROUP BY r.itemid, r.component, r.ratingarea, r.contextid Thanks.
          Hide
          Rajesh Taneja added a comment -

          Works Great
          Thanks for fixing this Andrew and Eloy

          Show
          Rajesh Taneja added a comment - Works Great Thanks for fixing this Andrew and Eloy
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Such an effort has get its prize. Universal Award to the very-best Moodle developer in the family. Thanks!

          Closing. Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - Such an effort has get its prize. Universal Award to the very-best Moodle developer in the family. Thanks! Closing. Ciao

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: