Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-34925

Improve bulk user export performance

    XMLWordPrintable

Details

    • MOODLE_21_STABLE, MOODLE_22_STABLE, MOODLE_23_STABLE, MOODLE_31_STABLE
    • MOODLE_31_STABLE
    • MDL-51603-dataformat-MDL-34925-user-download
    • Hide

      User profile fields

      1. Go to [Settings ► Site administration ► Users ► Accounts ► User profile fields]
      2. Create a few user profile fields.
      3. Edit user profiles and add data into the newly created user profile fields.
      4. Go to Bulk user download section of these testing instructions and follow the steps there.
      5. Delete user profile fields.
      6. Check for any errors.

      Bulk user download

      1. Navigate to Home ► Site administration ► Users ► Accounts ► Bulk user actions
      2. Select a bunch of users, preferably a really massive data set
      3. Select 'Download' and Go
      4. Confirm that the formats shown in the next page match the format ordering and enabled state in the admin gui
      5. Download each format and make sure it works.
      6. Note in particular that when you start a CSV download there is no delay, it should start the initial download within a second or two and then progressively generate and download the file. If possibly do some memory profiling and confirm that regardless of data size php the peak memory is almost fixed, or only grows linearly very slowly with data size.

      A simple hack to test this is:

      diff --git a/lib/dataformatlib.php b/lib/dataformatlib.php
      index f7a792d..493d631 100644
      --- a/lib/dataformatlib.php
      +++ b/lib/dataformatlib.php
      @@ -40,6 +40,7 @@ use Box\Spout\Writer\WriterFactory;
        */
       function download_as_dataformat($filename, $dataformat, $columns, $iterator, $callback = null) {
       
      +error_log(display_size(memory_get_peak_usage()));
           if (!NO_OUTPUT_BUFFERING) {
               throw new coding_exception("NO_OUTPUT_BUFFERING must be set to true before calling download_as_dataformat");
           }
      @@ -77,6 +78,7 @@ function download_as_dataformat($filename, $dataformat, $columns, $iterator, $ca
               // If the iterator is a RecordSet then close it.
               $iterator->close();
           }
      +error_log(display_size(memory_get_peak_usage()));
           exit;
       
       }
      

      If you want to see the instant download in action try this hack:

      @@ -70,6 +81,8 @@ function download_as_dataformat($filename, $dataformat, $columns, $iterator, $ca
                   continue;
               }
               $format->write_record($row, $c++);
      +error_log("do $c");
      +sleep(1);
           }
           $format->write_footer($columns);
      

      Now click download, get the file save prompt, and then tail the logs and see it still ticking over in the background.

      Some indicative numbers from my dev box:

      CSV:

      70 users: 15.7MB at start, 16MB at finish, < 1 second, 10kB file
      1000 users: 15.9MB at start, 16.3MB ~ 7 seconds, 144kB file
      14400 users: 18.5MB at start, 20.4MB ~ 120 seconds, 2.1MB file

      1. ODS and Excel unfortunately still need a temp file, even with the new spout library, so they still are prone to the load balancer timeout issue while they generate, but they do still have great memory performance so this does address the php memory issues.

      Excel:
      100 users: 15.8MB at start, 16.1MB at finish, < 1 second, 15kB file
      1000 users, 15.9MB at start 16.4MB at finish, ~ 10 seconds, 121kB file
      14400 users, 18.5MB at start 20.4MB at finish, ~ 138 seconds, 1.7MB file

      For comparison, on master using the old code:

      Excel:
      100 users 15.7MB at start, 17.8MB at finish, < 3 seconds
      200 users 15.7MB at start, 19.9MB at finish, ~ 8 seconds
      300 users 15.7MB at start, 21.4MB at finish, ~ 13 seconds
      500 users 15.8MB at start, 24.7MB at finish, ~ 18 seconds

      Show
      User profile fields Go to [Settings ► Site administration ► Users ► Accounts ► User profile fields] Create a few user profile fields. Edit user profiles and add data into the newly created user profile fields. Go to Bulk user download section of these testing instructions and follow the steps there. Delete user profile fields. Check for any errors. Bulk user download Navigate to Home ► Site administration ► Users ► Accounts ► Bulk user actions Select a bunch of users, preferably a really massive data set Select 'Download' and Go Confirm that the formats shown in the next page match the format ordering and enabled state in the admin gui Download each format and make sure it works. Note in particular that when you start a CSV download there is no delay, it should start the initial download within a second or two and then progressively generate and download the file. If possibly do some memory profiling and confirm that regardless of data size php the peak memory is almost fixed, or only grows linearly very slowly with data size. A simple hack to test this is: diff --git a/lib/dataformatlib.php b/lib/dataformatlib.php index f7a792d..493d631 100644 --- a/lib/dataformatlib.php +++ b/lib/dataformatlib.php @@ -40,6 +40,7 @@ use Box\Spout\Writer\WriterFactory; */ function download_as_dataformat($filename, $dataformat, $columns, $iterator, $callback = null) { +error_log(display_size(memory_get_peak_usage())); if (!NO_OUTPUT_BUFFERING) { throw new coding_exception("NO_OUTPUT_BUFFERING must be set to true before calling download_as_dataformat"); } @@ -77,6 +78,7 @@ function download_as_dataformat($filename, $dataformat, $columns, $iterator, $ca // If the iterator is a RecordSet then close it. $iterator->close(); } +error_log(display_size(memory_get_peak_usage())); exit; } If you want to see the instant download in action try this hack: @@ -70,6 +81,8 @@ function download_as_dataformat($filename, $dataformat, $columns, $iterator, $ca continue; } $format->write_record($row, $c++); +error_log("do $c"); +sleep(1); } $format->write_footer($columns); Now click download, get the file save prompt, and then tail the logs and see it still ticking over in the background. Some indicative numbers from my dev box: CSV: 70 users: 15.7MB at start, 16MB at finish, < 1 second, 10kB file 1000 users: 15.9MB at start, 16.3MB ~ 7 seconds, 144kB file 14400 users: 18.5MB at start, 20.4MB ~ 120 seconds, 2.1MB file ODS and Excel unfortunately still need a temp file, even with the new spout library, so they still are prone to the load balancer timeout issue while they generate, but they do still have great memory performance so this does address the php memory issues. Excel: 100 users: 15.8MB at start, 16.1MB at finish, < 1 second, 15kB file 1000 users, 15.9MB at start 16.4MB at finish, ~ 10 seconds, 121kB file 14400 users, 18.5MB at start 20.4MB at finish, ~ 138 seconds, 1.7MB file For comparison, on master using the old code: Excel: 100 users 15.7MB at start, 17.8MB at finish, < 3 seconds 200 users 15.7MB at start, 19.9MB at finish, ~ 8 seconds 300 users 15.7MB at start, 21.4MB at finish, ~ 13 seconds 500 users 15.8MB at start, 24.7MB at finish, ~ 18 seconds

    Description

      Running a bulk export of users will fail for larger sites, due to memory or time limit.

      It turns out that about 4 db queries are made for each user so the number of roundtrips is a significant barrier to performance. Most of this problem comes from the custom profile fields, which several queries are made to determine what the fields exist on every call.

      The following changes can greatly help:

      • Cache list of custom profile fields
      • Add option to bulk-preload data for a group of users into memory

      This can ultimately benefit any area of core that deals with user profiles.

      All up my changes improve the performance of this function about 10x, allowing a CSV download for a site with over 100k user accounts to complete in seconds.

      Attachments

        Issue Links

          Activity

            People

              brendanheywood Brendan Heywood
              tlevi Tony Levi
              David Monllaó David Monllaó
              Andrew Lyons Andrew Lyons
              Jun Pataleta Jun Pataleta
              David Woloszyn, Huong Nguyen, Jake Dallimore, Meirza, Michael Hawkins, Raquel Ortega, Safat Shahin, Stevani Andolo, Matteo Scaramuccia, David Woloszyn, Huong Nguyen, Jake Dallimore, Meirza, Michael Hawkins, Raquel Ortega, Safat Shahin, Stevani Andolo
              Votes:
              6 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:
                23/May/16