Moodle
  1. Moodle
  2. MDL-40585

grouped_parser_processor::grouped_parent_exists needs a cache, it can get 10,000,000 for 500 paths on restore

    Details

    • Rank:
      51415

      Description

      Restoring large numbers of questions in a restore results in the grouped_parser_processor::grouped_parent_exists getting called many millions of times. However due to the XML structure there are only a few parent paths to the top of the XML tree.

      Removing the calls to progressive_parser::dirname($parentpath) and being able to complete simple cache lookups will reduce restore times significantly.

        Issue Links

          Activity

          Hide
          Russell Smith added a comment -

          Testing instructions need writing. And performance results need posting.

          Given the interaction with other issues in the EPIC, the test order is more difficult.

          Show
          Russell Smith added a comment - Testing instructions need writing. And performance results need posting. Given the interaction with other issues in the EPIC, the test order is more difficult.
          Hide
          Russell Smith added a comment -

          The current implementation does not use MUC, as at 10M calls, the overhead of the external cache is huge.

          I have implemented both, but I've not collected the results together. And MDL-40538 needs to land for MUCH static cache to be useful.

          If required, I will implement an MUC version and compare.

          Show
          Russell Smith added a comment - The current implementation does not use MUC, as at 10M calls, the overhead of the external cache is huge. I have implemented both, but I've not collected the results together. And MDL-40538 needs to land for MUCH static cache to be useful. If required, I will implement an MUC version and compare.
          Hide
          Russell Smith added a comment -

          This is not a static class, so I've not bothered with the MUC version.

          Using the two attached runs, import them and then run

          moodle/admin/tool/profiling/index.php?run1=89104786ee7a93d605b1b82c84424893&run2=5cf6414e40e769a72c73e0754e7455b9

          to see your results.

                                                              Diff        Diff%
          Number of Function Calls   43,762,907   22,591,226  -21,171,681 -48.4%
          Incl. Wall Time (microsec) 820,491,422  740,474,595 -80,016,827 -9.8%
          Incl. CPU (microsecs)      384,832,051  302,914,930 -81,917,121 -21.3%
          Incl. MemUse (bytes)       89,776,112   90,093,728  317,616 	0.4%
          Incl. PeakMemUse (bytes)   119,346,240  119,294,080 -52,160     -0.0%
          

          300k RAM increase for 50% less function calls and 10% precheck load time. I'm sure a full restore will provide more savings.

          Show
          Russell Smith added a comment - This is not a static class, so I've not bothered with the MUC version. Using the two attached runs, import them and then run moodle/admin/tool/profiling/index.php?run1=89104786ee7a93d605b1b82c84424893&run2=5cf6414e40e769a72c73e0754e7455b9 to see your results. Diff Diff% Number of Function Calls 43,762,907 22,591,226 -21,171,681 -48.4% Incl. Wall Time (microsec) 820,491,422 740,474,595 -80,016,827 -9.8% Incl. CPU (microsecs) 384,832,051 302,914,930 -81,917,121 -21.3% Incl. MemUse (bytes) 89,776,112 90,093,728 317,616 0.4% Incl. PeakMemUse (bytes) 119,346,240 119,294,080 -52,160 -0.0% 300k RAM increase for 50% less function calls and 10% precheck load time. I'm sure a full restore will provide more savings.
          Hide
          Russell Smith added a comment -

          Until I can upload the profiling results, it will be difficult to get this issue progressed through peer review.

          Show
          Russell Smith added a comment - Until I can upload the profiling results, it will be difficult to get this issue progressed through peer review.
          Hide
          Russell Smith added a comment -

          This one is ready for peer review. I haven't yet attached the after profile due to the blocking SITES bug. When that's resolved, I'll upload the post profile for you to import and review.

          Show
          Russell Smith added a comment - This one is ready for peer review. I haven't yet attached the after profile due to the blocking SITES bug. When that's resolved, I'll upload the post profile for you to import and review.
          Hide
          Tim Hunt added a comment -

          1. You have to large chunks of code in grouped_parent_exists. I think that should be in a get_parent_path method.

          2. When you throw away the 200 'oldest' cache entries, they are not the 200 least recently used. Presumably the overhead it re-ordering the array is worse than the risk of throwing away a cache entry that might still be used. I just wonder if it is worth clarifying the comment that you are throwing away the oldest by creation time, not use time?

          If you like, I can peer-review this again after you have fixed 1., if you don't want to wait for Eloy.

          Show
          Tim Hunt added a comment - 1. You have to large chunks of code in grouped_parent_exists. I think that should be in a get_parent_path method. 2. When you throw away the 200 'oldest' cache entries, they are not the 200 least recently used. Presumably the overhead it re-ordering the array is worse than the risk of throwing away a cache entry that might still be used. I just wonder if it is worth clarifying the comment that you are throwing away the oldest by creation time, not use time? If you like, I can peer-review this again after you have fixed 1., if you don't want to wait for Eloy.
          Hide
          Russell Smith added a comment -

          Tim helpfully offered to Peer review, so I've moved the review to him.

          After implementing his suggestion for another function. The following is the comparison of the new code to the original baseline.

            Run #89104786ee7a93d605b1b82c84424893 Run #b639ff28592f7efeef615ade1f7475ad Diff Diff%
          Number of Function Calls 43,762,907 27,884,650 -15,878,257 -36.3%
          Incl. Wall Time (microsec) 820,491,422 730,431,733 -90,059,689 -11.0%
          Incl. CPU (microsecs) 384,832,051 313,831,614 -71,000,437 -18.4%
          Incl. MemUse (bytes) 89,776,112 90,056,952 280,840 0.3%
          Incl. PeakMemUse (bytes) 119,346,240 119,334,064 -12,176 -0.0%
          Show
          Russell Smith added a comment - Tim helpfully offered to Peer review, so I've moved the review to him. After implementing his suggestion for another function. The following is the comparison of the new code to the original baseline.   Run #89104786ee7a93d605b1b82c84424893 Run #b639ff28592f7efeef615ade1f7475ad Diff Diff% Number of Function Calls 43,762,907 27,884,650 -15,878,257 -36.3% Incl. Wall Time (microsec) 820,491,422 730,431,733 -90,059,689 -11.0% Incl. CPU (microsecs) 384,832,051 313,831,614 -71,000,437 -18.4% Incl. MemUse (bytes) 89,776,112 90,056,952 280,840 0.3% Incl. PeakMemUse (bytes) 119,346,240 119,334,064 -12,176 -0.0%
          Hide
          Tim Hunt added a comment -

          +1 from me for that latest patch. Please submit for integration once you have decided how many branches to back-port this to.

          Show
          Tim Hunt added a comment - +1 from me for that latest patch. Please submit for integration once you have decided how many branches to back-port this to.
          Hide
          Russell Smith added a comment -

          I've back patched to 2.3 as it didn't cause any conflicts. I know 2.3 is not supported and I do not expect it to be integrated. However I will needs to integrate it to some 2.3 installations.

          Tim, I don't have the power to submit something for integration. Can you please do that?

          Show
          Russell Smith added a comment - I've back patched to 2.3 as it didn't cause any conflicts. I know 2.3 is not supported and I do not expect it to be integrated. However I will needs to integrate it to some 2.3 installations. Tim, I don't have the power to submit something for integration. Can you please do that?
          Hide
          Russell Smith added a comment -

          Final Peer review/request for a bump to integration review.

          Show
          Russell Smith added a comment - Final Peer review/request for a bump to integration review.
          Hide
          Tim Hunt added a comment -

          Submitting for integration.

          Show
          Tim Hunt added a comment - Submitting for integration.
          Hide
          Dan Poltawski added a comment -

          Hi Russell,

          There is trailing whitespace in your patch, do you want to fix it, or shall I?

          Dan

          Show
          Dan Poltawski added a comment - Hi Russell, There is trailing whitespace in your patch, do you want to fix it, or shall I? Dan
          Hide
          Russell Smith added a comment -

          I'm unavailable till this evening. I will fix if you can wait. Should I do a second commit, or rebase?

          Show
          Russell Smith added a comment - I'm unavailable till this evening. I will fix if you can wait. Should I do a second commit, or rebase?
          Hide
          Dan Poltawski added a comment -

          Ok thanks, i've added a commit for it (better to have it right in the original commit if it comes up again).

          I've integrated this into master, 25 and 24 (23 is only in security support now)

          Show
          Dan Poltawski added a comment - Ok thanks, i've added a commit for it (better to have it right in the original commit if it comes up again). I've integrated this into master, 25 and 24 (23 is only in security support now)
          Hide
          Rossiani Wijaya added a comment -

          Hi Russell,

          I'm not sure how complex the file need to be to test this issue. Would it be possible to attach a sample backup file to test this?

          Show
          Rossiani Wijaya added a comment - Hi Russell, I'm not sure how complex the file need to be to test this issue. Would it be possible to attach a sample backup file to test this?
          Hide
          Russell Smith added a comment -

          It's my belief the backup attached to the linked EPIC is sufficient. However that is what I tested with. You are free to use that to verify.

          Show
          Russell Smith added a comment - It's my belief the backup attached to the linked EPIC is sufficient. However that is what I tested with. You are free to use that to verify.
          Hide
          Rossiani Wijaya added a comment -

          Hi Russel,

          I tested this on 2.3, 2.4, 2.5 and master. It restored without error, however it took quite sometimes to finish the restore (approx. 30 mins). I noticed that the file is restoring about 6334 questions from deleted categories. This might be caused of the delay.

          Other than that, this is working find.

          Just some side note, received notice error for format_grid. Installed the format_grid plugins, the error is gone.

          Test passed.

          Show
          Rossiani Wijaya added a comment - Hi Russel, I tested this on 2.3, 2.4, 2.5 and master. It restored without error, however it took quite sometimes to finish the restore (approx. 30 mins). I noticed that the file is restoring about 6334 questions from deleted categories. This might be caused of the delay. Other than that, this is working find. Just some side note, received notice error for format_grid. Installed the format_grid plugins, the error is gone. Test passed.
          Hide
          Russell Smith added a comment -

          The restore time is very slow. This work is all about improving restore performance. Restoring that backup without this fix has been taking me about 40 minutes. Sothat time is expected as is shown in the profile results above.

          I forgot that format_grid is installed in the environment that backup comes from. Thanks for highlighting that.

          Show
          Russell Smith added a comment - The restore time is very slow. This work is all about improving restore performance. Restoring that backup without this fix has been taking me about 40 minutes. Sothat time is expected as is shown in the profile results above. I forgot that format_grid is installed in the environment that backup comes from. Thanks for highlighting that.
          Hide
          Damyon Wiese added a comment -

          Thanks again for another week of fixes, improvements and testing. These changes have been released to the world.

          Cheers, Damyon

          Show
          Damyon Wiese added a comment - Thanks again for another week of fixes, improvements and testing. These changes have been released to the world. Cheers, Damyon
          Hide
          Sam Marshall added a comment -

          Russell: I just filed MDL-41674, you might be interested in peer reviewing that issue, as it seems to make this area of code significantly faster still - but it's entirely possible I may have misunderstood something... (Just spotted this along the way, it's not a particular focus of mine.)

          Show
          Sam Marshall added a comment - Russell: I just filed MDL-41674 , you might be interested in peer reviewing that issue, as it seems to make this area of code significantly faster still - but it's entirely possible I may have misunderstood something... (Just spotted this along the way, it's not a particular focus of mine.)

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: