Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-59694

Limit the amount of analysables that are processed during one train() and predict() execution

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 3.4
    • Fix Version/s: 3.4
    • Component/s: Analytics
    • Labels:
    • Testing Instructions:
      Hide

      This change is pretty much a backend change, it is covered by unit tests but still worth testing manually.

      1. You should have, at least 5 courses in your site, use tool_generator if you want to create them quickly
      2. Go to admin/tool/analytics/index.php and click "Actions > Edit" "Students at risk of dropping out" model
      3. Change the time splitting method to something different to what you had selected before and different to empty, tick "Enable" it if it is unticked and save changes (this will reset previous predictions, in MDL-60022 we are adding a new button to clear them)
      4. Go to "Site admin > Analytics > Analytics settings" and set "analytics | modeltimelimit" to "2 seconds" and disable "analytics | onlycli"
      5. Edit analytics/classes/local/analyser/base.php and add a "sleep(1);" after $files = $this->process_analysable($analysable, $includetarget); (it is part of get_analysable_data method)
      6. Check mdl_analytics_used_analysables db table, it should be empty because we added it as part of this issue patch; if it is not empty it is because you tested another analytics issue this cycle (congrats and sorry...) please clear it or just consider new records generated from now during the following test steps
      7. Execute "Actions > Get predictions" for "Students at risk of dropping out" model; once finished press "Continue"
      8. Check mdl_analytics_used_analysables db table, there should only be 2 courses or maybe just 1, but in any case 2 (we added a 1 second delay between analysables and the limit is 2 seconds).
      9. Execute again "Actions > Get predictions" for "Students at risk of dropping out" model and check mdl_analytics_used_analysables db table again, it should now contain 2, 3 or 4 records (I know this sounds funny but it is possible and it is ok, depends on whether the course can be analysed or not)
      10. Go to "Site admin > Analytics > Analytics settings" and set "analytics | modeltimelimit" to "20 mins"
      11. Depending on your course contents the process can take up to 20 mins to finish, but pretty sure it will finish in a few seconds; check mdl_analytics_used_analysables, you should have the same number of records than courses in mdl_course minus 1; this minus one is because the frontpage is skipped
      Show
      This change is pretty much a backend change, it is covered by unit tests but still worth testing manually. You should have, at least 5 courses in your site, use tool_generator if you want to create them quickly Go to admin/tool/analytics/index.php and click "Actions > Edit" "Students at risk of dropping out" model Change the time splitting method to something different to what you had selected before and different to empty, tick "Enable" it if it is unticked and save changes (this will reset previous predictions, in MDL-60022 we are adding a new button to clear them) Go to "Site admin > Analytics > Analytics settings" and set "analytics | modeltimelimit" to "2 seconds" and disable "analytics | onlycli" Edit analytics/classes/local/analyser/base.php and add a "sleep(1);" after $files = $this->process_analysable($analysable, $includetarget); (it is part of get_analysable_data method) Check mdl_analytics_used_analysables db table, it should be empty because we added it as part of this issue patch; if it is not empty it is because you tested another analytics issue this cycle (congrats and sorry...) please clear it or just consider new records generated from now during the following test steps Execute "Actions > Get predictions" for "Students at risk of dropping out" model; once finished press "Continue" Check mdl_analytics_used_analysables db table, there should only be 2 courses or maybe just 1, but in any case 2 (we added a 1 second delay between analysables and the limit is 2 seconds). Execute again "Actions > Get predictions" for "Students at risk of dropping out" model and check mdl_analytics_used_analysables db table again, it should now contain 2, 3 or 4 records (I know this sounds funny but it is possible and it is ok, depends on whether the course can be analysed or not) Go to "Site admin > Analytics > Analytics settings" and set "analytics | modeltimelimit" to "20 mins" Depending on your course contents the process can take up to 20 mins to finish, but pretty sure it will finish in a few seconds; check mdl_analytics_used_analysables, you should have the same number of records than courses in mdl_course minus 1; this minus one is because the frontpage is skipped
    • Affected Branches:
      MOODLE_34_STABLE
    • Fixed Branches:
      MOODLE_34_STABLE
    • Pull from Repository:
    • Pull Master Branch:
      MDL-59694_master

      Description

      At the moment train() and predict() process all site contents without any limit, would be good to limit the amount of analysable elements that can be processed in 1 train() or predict() execution.

      We can limit by the number of analysable elements or by time spent on each model. Limiting it by analysable may not always help because it is completely up to the model, the site itself is an analysable element.

      I would opt for a "Time limit per model" and I would allow admins to configure the time using a new site setting, something similar although more complex has recently been integrated for search.

      During training and prediction what we do is to process analysables one by one and build a dataset file for each of them; at the end we merge them all together and train / get predictions using the machine learning backend. Using the get_analysables method proposed in https://tracker.moodle.org/browse/MDL-59630?focusedCommentId=477070&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-477070 we could iterate through the available analysables in a private base class method and setup the timer there checking the time spent since the start of get_analysable_data after each process_analysable call.

      Some extra comments:

      1. It is an approximate limit not exact because we would wait until the analysable is fully processed and we would later train / get predictions, which also takes some time. This should be explained in the setting description
      2. There is still 1 case where this time limit will not be that effective, prediction models using the site as a single analysable element. I already commented in the official docs page (around https://docs.moodle.org/dev/Analytics_API#How_many_predictions_for_each_sample.3F) that models iterating through tons of samples at site level should be careful and pay attention to memory usage I think that is enough
      3. This time limits should not be applied for evaluation processes ($this->options['evaluation'] in the analyser), as we need the whole site dataset
      4. This issue and MDL-59630 are related because they share the new get_analysables public method need. This new API method would also need an abstract get_analysables in the base class, it would be implemented by sitewide and by by_course analysers; for sitewide analyser would be a 1 array item with the site and for by_course just rename get_courses to get_analysables so people extending those analysers would not need to implement it.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Fix Release Date:
                  13/Nov/17