Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-51603

Refactor downloading of csv, ods, excel etc files in reports and exports across moodle to stream progressively

XMLWordPrintable

    • MOODLE_28_STABLE, MOODLE_29_STABLE
    • MOODLE_31_STABLE
    • Downloads memory usage
    • MDL-51603-dataformat
    • Hide

      Keep on ramping up load balancer timeouts, and php memory indefinitely

      Show
      Keep on ramping up load balancer timeouts, and php memory indefinitely
    • Hide

      Test the new data format plugin admin page:

      1. Navigate to Home ► Site administration ► Plugins ► Data formats ► Manage data formats
      2. Test uninstalling, enabling, disabling and reenabling and re-ordering dataformat plugins

      To test the downloading:

      1. Go to Home ► Site administration ► Reports ► Logs
      2. Search for everything on a box with lots of logs and then test each download method.
      3. Check the download options reflects the order and enabled state of the plugins in the admin gui
      4. On my dev box with 60k log records I progressively downloaded this as a 13Mb csv in 35 seconds. (on master I let this churn for 10 minutes with no result and then killed it)

      Test other downloads:

      1. An assignment with offline feedback enabled -> Download submissions option in the grading page top dropdown menu
      2. A quiz with attempts -> All downloads you can find in results section and sections under it (like statistics)
      3. A feedback activity with results -> Analysis and export to excel
      Show
      Test the new data format plugin admin page: Navigate to Home ► Site administration ► Plugins ► Data formats ► Manage data formats Test uninstalling, enabling, disabling and reenabling and re-ordering dataformat plugins To test the downloading: Go to Home ► Site administration ► Reports ► Logs Search for everything on a box with lots of logs and then test each download method. Check the download options reflects the order and enabled state of the plugins in the admin gui On my dev box with 60k log records I progressively downloaded this as a 13Mb csv in 35 seconds. (on master I let this churn for 10 minutes with no result and then killed it) Test other downloads: An assignment with offline feedback enabled -> Download submissions option in the grading page top dropdown menu A quiz with attempts -> All downloads you can find in results section and sections under it (like statistics) A feedback activity with results -> Analysis and export to excel

      I made a patch for MDL-37132 which seems to be the tip of the iceberg in a bunch of potential improvements and refactoring around file downloading scalability and performance.

      csv, ods, excel file downloads in Moodle are almost universally bad at scale, the solution in MDL-37132 is an improvement to the csvlib to stream from a DB RecordSet or Iterator, through an optional transforming function, and then encoded as csv line by line and flushed continually over the network giving it a fixed memory footprint and almost instantaneous download start. But this is just one download option in one module.

      Almost every large client of ours has been burnt by this bug in different places depending on what parts of moodle they happen to be using massively.

      I'd like to take this solution further and also add streaming download support for the other download types, eg .ods and excel which will be a fairly major refactor of many places in moodle as it's a fundamental change in API. But if we're going to do this, then I think we could go an extra extra step and abstract away the various formats into a new plugin type. ie a module like mod_feedback, or the grade book just says 'I have a RecordSet or an Iterator, please present this as a bunch of alternative downloads to the user'. Then if I install a new 'format' like say RDF, or JSON, then every report or export in moodle gets this new format for free.

      So a new api something this which renders the dropdown with all the different export options:

      report.php:

      $OUTPUT->download_as_chooser(
      get_string('exportgrades', 'feedback'), // Name on button dropdown
      new moodle_url('export.php', array('id' => $id)), // base url
      'type', // optional name of query param which will have download format

      export.php:

      $format = required_param('type', PARAM_ALPHANUM);

      $rs = $DB->get_record_set(...);

      $download = new downloader();
      $download->set_filename('feedback');
      $download->set_key_names(array('name' => get_string('name') ...... ); // eg human readable csv header names
      $download->download_from_resultset($rs, $callback, $type);

      Under the hood we'd have a core set of 'formats' something like:

      /format/csv/
      /format/ods/
      /format/xls/

      These would either wrap existing API's or reimplement as needed. In particular I am thinking of pulling in Spout as a dependency which does streaming ods, csv, and excel:

      https://github.com/box/spout

      And then as bonus points I'd also do a json and rdf export but these would probably only go into the plugin dir and not into core.

      There are probably other considerations:

      • would a 'format' also potentially provide a parse to accept incoming uploads? Ideally yes, but this would be optional for each format. The core ones should implement this. Is there any grade export / import code to consider or align with?
      • each format would generally be content agnostic, but we'd still pass the type into the callback so the module has an opportunity to transform the data if needed for a particular format.
      • some downloads, eg the feedback excel, are special snowflakes with text formatting and worksheets. These would need a case by case decision to keep or refactor.

      Mentioning the big commiters around this: timhunt skodak nicolasconnault

            brendanheywood Brendan Heywood
            brendanheywood Brendan Heywood
            Adrian Greeve Adrian Greeve
            David Monllaó David Monllaó
            cameron1729 cameron1729
            Votes:
            4 Vote for this issue
            Watchers:
            17 Start watching this issue

              Created:
              Updated:
              Resolved:

                Error rendering 'clockify-timesheets-time-tracking-reports:timer-sidebar'. Please contact your Jira administrators.