Moodle
  1. Moodle
  2. MDL-39725

PostgreSQL performance can be poor with large temp tables

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.3.7, 2.4
    • Fix Version/s: 2.4.8, 2.5.4, 2.6.1
    • Component/s: Performance
    • Labels:

      Description

      PostgreSQL doesn't automatically create statistics for temporary tables like all other supported databases. This is due to temporary tables being session local.

      When temporary tables get large, they need ANALYZE run to produce better performing database plans.

      This issue appears as part of the investigation into MDL-29439. It also needs a more general solution when we are working with temporary tables.

      Here are the links and summary of affected DB's for Analyzing temporary tables;

      MySQL: No statistics Engine until 5.6, so I'm assuming not affecting people.
      PostgreSQL: Doesn't automatically update statistics on Temporary Tables
      Oracle: Does update statistics
      SQL Server: Does update statistics unless you disable it.

      MySQL Link: http://dev.mysql.com/doc/refman/5.6/en/glossary.html#glos_persistent_statistics
      PostgreSQL Link: http://www.postgresql.org/message-id/AANLkTin5Z3ie1XBCNs=sjDL=nsbXXERVF1xVnxcE_108@mail.gmail.com
      Oracle Link: http://docs.oracle.com/cd/B19306_01/server.102/b14231/tables.htm#i1006473
      SQL Server Link: http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/cfa8f983-8107-4af0-91a7-b2ee915e07ea

      I am not clear whether MySQL may begin to present an issue when newer versions use on disk statistics with temporary tables.

      It is also worth considering that when performing function like restoring or large batch processing jobs, forcing statistics collection on other database may not be a bad thing. It can cause issues as ownership and privileges usually need to be considered.

        Gliffy Diagrams

          Issue Links

            Activity

            Hide
            Russell Smith added a comment -

            So a little digging into this and so far in master I've only found the following temp table references;

            'backup_ids_temp' and 'backup_files_temp'

            They appear to be the only two at the moment. My method for finding them was to search for create_temptable_from_real_table and for INSERT's into _temp tables, using both insert_record and INSERT sql search. After that, the above two from backup restore are the only two I've seen.

            Other bugs have proposed more use of temporary tables like MDL-25373 and MDL-30643. So even though the situation is isolated to backup/restore at this point in time we could be introducing temporary tables in other locations at any time.

            The architecture of temp table management seems a little strange to me at the moment with both a _template version in the database, but then actually creating the table from XML. I will not delve into that in this issue and will allow that to be discussed further somewhere else.

            The limited scale of this seems that the approach of updating the DB api with a new function and then calling it appropriately during backup/restore will be acceptable. A brief discussion on developer chat resulted in update_table_stats() or a variation being a better starting point than analyze(). The next task is to discern the best places to put the analyze.

            The impact of analyze is everytime it's called it uses up to 2 seconds on PostgreSQL for large tables. Other db's will not pay any price. So it can't be in a path that's called often. But when major data change happens it needs to be called to provide up to date statistics information. I will advice further if there are questionable locations for this or if it becomes confusing to know the best locations to put it.

            My initial tips are send_files_to_pool when you call from a context and add lots of itemid's and also after the initial bulk load of data is completed.

            Show
            Russell Smith added a comment - So a little digging into this and so far in master I've only found the following temp table references; 'backup_ids_temp' and 'backup_files_temp' They appear to be the only two at the moment. My method for finding them was to search for create_temptable_from_real_table and for INSERT's into _temp tables, using both insert_record and INSERT sql search. After that, the above two from backup restore are the only two I've seen. Other bugs have proposed more use of temporary tables like MDL-25373 and MDL-30643 . So even though the situation is isolated to backup/restore at this point in time we could be introducing temporary tables in other locations at any time. The architecture of temp table management seems a little strange to me at the moment with both a _template version in the database, but then actually creating the table from XML. I will not delve into that in this issue and will allow that to be discussed further somewhere else. The limited scale of this seems that the approach of updating the DB api with a new function and then calling it appropriately during backup/restore will be acceptable. A brief discussion on developer chat resulted in update_table_stats() or a variation being a better starting point than analyze(). The next task is to discern the best places to put the analyze. The impact of analyze is everytime it's called it uses up to 2 seconds on PostgreSQL for large tables. Other db's will not pay any price. So it can't be in a path that's called often. But when major data change happens it needs to be called to provide up to date statistics information. I will advice further if there are questionable locations for this or if it becomes confusing to know the best locations to put it. My initial tips are send_files_to_pool when you call from a context and add lots of itemid's and also after the initial bulk load of data is completed.
            Hide
            Petr Skoda added a comment -

            it seems you have missed temp tables in stats and ldap auth - grep create_temp_table()
            the recommended way is to $dbman->create_temp_table($table);, no need for xml at all

            Show
            Petr Skoda added a comment - it seems you have missed temp tables in stats and ldap auth - grep create_temp_table() the recommended way is to $dbman->create_temp_table($table);, no need for xml at all
            Hide
            Russell Smith added a comment -

            Here is the forum discussion I've started to gather further ideas and feedback about the implementation options for this issue. I will await some feedback there before proceeding with any implementation to ensure we have a bit more consensus.

            https://moodle.org/mod/forum/discuss.php?d=230792

            Show
            Russell Smith added a comment - Here is the forum discussion I've started to gather further ideas and feedback about the implementation options for this issue. I will await some feedback there before proceeding with any implementation to ensure we have a bit more consensus. https://moodle.org/mod/forum/discuss.php?d=230792
            Hide
            Petr Skoda added a comment -

            I already tried to invent some automatic analyse for pg temp tables and failed.

            My +1 to add $DB->analyze_temp_tables() and call it manually in affected areas.

            Show
            Petr Skoda added a comment - I already tried to invent some automatic analyse for pg temp tables and failed. My +1 to add $DB->analyze_temp_tables() and call it manually in affected areas.
            Hide
            Russell Smith added a comment -

            I have begun work on this. Unit tests and analyze code is working. I am now devising a plan about the best locations for the calls and also some performance metrics to confirm improvement.

            Show
            Russell Smith added a comment - I have begun work on this. Unit tests and analyze code is working. I am now devising a plan about the best locations for the calls and also some performance metrics to confirm improvement.
            Hide
            Russell Smith added a comment -

            I've now added the branch for the development of this. It's still in progress. Most of it appears to be looking good.

            However I cannot at all seems to find a logical place to put the update statistics function in backup/restore. The more I look at it, seemingly the more confused I get. The call graph and method of adding data into the temp tables backup_ids_temp and backup_files_temp is beyond my unassisted comprehension. Is somebody able to help/point out options there?

            I've run unit test for the statslib stuff. However it doesn't seem to alter the performance. I've not looked at how large the test stats runs are. However multiple runs don't make the runs slower.

            Any feedback at this point would be appreciated.

            Show
            Russell Smith added a comment - I've now added the branch for the development of this. It's still in progress. Most of it appears to be looking good. However I cannot at all seems to find a logical place to put the update statistics function in backup/restore. The more I look at it, seemingly the more confused I get. The call graph and method of adding data into the temp tables backup_ids_temp and backup_files_temp is beyond my unassisted comprehension. Is somebody able to help/point out options there? I've run unit test for the statslib stuff. However it doesn't seem to alter the performance. I've not looked at how large the test stats runs are. However multiple runs don't make the runs slower. Any feedback at this point would be appreciated.
            Hide
            Petr Skoda added a comment -

            the changes in db drivers look ok, +1 for that part

            Show
            Petr Skoda added a comment - the changes in db drivers look ok, +1 for that part
            Hide
            Eloy Lafuente (stronk7) added a comment -

            About the DB changes, they look perfect. The only tiny detail is that I'm not a fan of empty return statements and I would consider taking rid of them.

            About the best points to introduce those calls in restore... well, it's tricky, because the whole restore (but a few exceptions) is a 100% progressive process of parsing and constructing. So there are not exact points in general.

            I'd follow some strategy like this:

            1) For bulk inserts in the temp tables (from memory, users, questions and files), call analyze once they have been loaded.

            2) For general use (when inserting 1by1), keep some counter and conditionally call analyze if a given number of inserts has happened (say 10.000, completely arbitrary number, I know) or some % of the total number has changed (say 10%). That does require to keep some counter(s) but it's the only way I can imagine. Any attempt to put it at fixed points won't be accurate at all, IMO.

            Ciao

            Show
            Eloy Lafuente (stronk7) added a comment - About the DB changes, they look perfect. The only tiny detail is that I'm not a fan of empty return statements and I would consider taking rid of them. About the best points to introduce those calls in restore... well, it's tricky, because the whole restore (but a few exceptions) is a 100% progressive process of parsing and constructing. So there are not exact points in general. I'd follow some strategy like this: 1) For bulk inserts in the temp tables (from memory, users, questions and files), call analyze once they have been loaded. 2) For general use (when inserting 1by1), keep some counter and conditionally call analyze if a given number of inserts has happened (say 10.000, completely arbitrary number, I know) or some % of the total number has changed (say 10%). That does require to keep some counter(s) but it's the only way I can imagine. Any attempt to put it at fixed points won't be accurate at all, IMO. Ciao
            Hide
            Russell Smith added a comment -

            Okay I've decided with this one, I'm going to submit without alteration to the backup/restore mechanism. The reason for that is that I'm reworking the temp table interaction significantly in those tickets and I can add the appropriate tracking that is required in there then. I will be able to do the row adjustment counting as suggested by Eloy Lafuente (stronk7) in the earlier comments.

            I'll then push this back up for peer review.

            Show
            Russell Smith added a comment - Okay I've decided with this one, I'm going to submit without alteration to the backup/restore mechanism. The reason for that is that I'm reworking the temp table interaction significantly in those tickets and I can add the appropriate tracking that is required in there then. I will be able to do the row adjustment counting as suggested by Eloy Lafuente (stronk7) in the earlier comments. I'll then push this back up for peer review.
            Hide
            Petr Skoda added a comment -

            hi, I think the following code is not 100% correct because it does not tell you where you called it and it gets executed for pg only. So either remove it or add it to the parent class and execute always.

            error_log('Potential coding error - No temp tables exist to collect statistics from.');

            Thanks!

            Show
            Petr Skoda added a comment - hi, I think the following code is not 100% correct because it does not tell you where you called it and it gets executed for pg only. So either remove it or add it to the parent class and execute always. error_log('Potential coding error - No temp tables exist to collect statistics from.'); Thanks!
            Hide
            Russell Smith added a comment -

            I've removed that line. Resending for review.

            Show
            Russell Smith added a comment - I've removed that line. Resending for review.
            Hide
            Petr Skoda added a comment -

            thanks, sending to integration

            Show
            Petr Skoda added a comment - thanks, sending to integration
            Hide
            Eloy Lafuente (stronk7) added a comment -

            +1, thanks both!

            Show
            Eloy Lafuente (stronk7) added a comment - +1, thanks both!
            Hide
            Sam Hemelryk added a comment -

            Thanks Russell, this has been integrated now.

            Show
            Sam Hemelryk added a comment - Thanks Russell, this has been integrated now.
            Hide
            Dan Poltawski added a comment - - edited

            Pretty sure this is unrelated, but failing whilst I investigate, oracle is broken:

            3) mod_wiki_events_testcase::test_page_deleted
            Failed asserting that '2' matches expected '1'.
             
            /Users/danp/moodles/pm/moodle/mod/wiki/tests/events_test.php:320
            /Users/danp/moodles/pm/moodle/lib/phpunit/classes/advanced_testcase.php:80
             
            To re-run:
             vendor/bin/phpunit mod_wiki_events_testcase mod/wiki/tests/events_test.php
            

            Show
            Dan Poltawski added a comment - - edited Pretty sure this is unrelated, but failing whilst I investigate, oracle is broken: 3) mod_wiki_events_testcase::test_page_deleted Failed asserting that '2' matches expected '1'. /Users/danp/moodles/pm/moodle/mod/wiki/tests/events_test.php:320 /Users/danp/moodles/pm/moodle/lib/phpunit/classes/advanced_testcase.php:80 To re-run: vendor/bin/phpunit mod_wiki_events_testcase mod/wiki/tests/events_test.php
            Hide
            Dan Poltawski added a comment -

            I'm passing this issue, it works on all databases.

            There are two fails on oracle. Wiki events: MDL-43443 (which should've been picked up last week) and badges: MDL-42965

            Show
            Dan Poltawski added a comment - I'm passing this issue, it works on all databases. There are two fails on oracle. Wiki events: MDL-43443 (which should've been picked up last week) and badges: MDL-42965
            Hide
            Damyon Wiese added a comment -

            Twas the week before Christmas,
            And all though HQ
            Devs were scrambling to finish peer review.
            They sent all their issues,
            and rushed out the door -
            "To the beach!" someone heard them roar!

            This issue has been released upstream. Thanks!

            Show
            Damyon Wiese added a comment - Twas the week before Christmas, And all though HQ Devs were scrambling to finish peer review. They sent all their issues, and rushed out the door - "To the beach!" someone heard them roar! This issue has been released upstream. Thanks!

              People

              • Votes:
                2 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: