Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-59039

Global search: Allow partial indexing (in scheduled task)

    XMLWordPrintable

    Details

    • Testing Instructions:
      Hide

      Before you start

      0a. You need a Moodle site with search configured and working via an Apache Solr instance.

      0b. If your test site has cron running, then temporarily turn off the scheduled task that runs forum indexing (in this test script we'll run it manually, so it will be confusing if it runs automatically on your site).

      0c. You will need a large amount of content to index, for example forum posts. If your test site doesn't have much content, you can add 100,000 forum posts as follows:

      • Apply the attached xs_big_forums.patch
      • Go to site administration / development / make test course
      • Select the 'XS' size and enter a suitable short name and full name

      With the patch applied, this will create a course with 100,000 forum posts. It takes a few minutes.

      After you have created the course, remember to remove the patched change e.g. by using 'git checkout admin/tool/generator'.

      0d. If you already had a large amount of content, you need to ensure it wasn't already all indexed - you can do this by going to the search areas page and clicking 'Delete index' next to the areas that hold a lot of content.

      Extra fun bonus

      I haven't put it in the test instructions but it can sometimes be interesting to look at the Solr admin page after an indexing run completes (to see if the 'number of documents' it reports is increasing roughly as expected).

      Testing the scheduled task

      1. Go to site administration / plugins / search / manage global search.

      2. Set the 'Indexing time limit' option to 10 seconds and save changes.

      3. Go to site administration / server / scheduled tasks.

      4. Click 'Run now' against the 'Global search indexing' task, and click to confirm.

      EXPECTED:

      • Search indexing output should display, listing all processed areas.
      • It should complete in approximately 10 seconds (probably takes a few extra seconds of overhead, e.g. 14 seconds).
      • The large area (usually 'Forum - posts') will show that it processed a certain number of documents, ending in the message '(not complete)'.
      • Immediately after this, the message 'Stopping indexing due to time limit' will appear.
      • There may be other search areas (that would normally be done after forum posts) which were not processed this time.

      5. Run the task again (you can just reload the page).

      EXPECTED:

      • Results should be similar to previous.
      • The large area or areas (usually 'Forum - posts') should now appear last after all other search areas.

      6. Go to site administration / plugins / search / search areas.

      7. Look at the large area (Forum - posts).

      EXPECTED:

      • It should show the usual information, ending in '(not yet fully indexed)'.

      8. Go to site administration / advanced features and turn off the 'Enable global search' option, then save changes.

      9. Re-run the scheduled task.

      EXPECTED:

      • The task should do nothing because search is disabled.

      10. Go to site administration / plugins / search / manage global search.

      11. Turn on the 'Index when disabled' option and save changes.

      12. Re-run the scheduled task.

      EXPECTED:

      • The task should now run indexing, with results similar to those from step 5.

      13. Based on the number of documents indexed each time the task runs, estimate how long it will need to run in order to index all the remaining content (from the original 100,000 forum posts, if you did it that way). For example, if it indexes 2,000 documents in each 10 second run, it will probably need in total 500 seconds to do all of them.

      14. On the global search page, set the time limit to about 1/3 of this time.

      15. Run the search task repeatedly.

      EXPECTED:

      • The task should run for (approximately) the new longer time.
      • After about 3 runs it will finish indexing the large search area, and there will no longer be a '(not complete)' message next to that line.
      • If you then run it again it will complete very quickly as there is nothing to index.

      16. You probably want to enable global search again.

      Testing the CLI indexer

      17. On the command line, in the Moodle root directory, run the following:
      php search/cli/indexer.php --timelimit=60

      EXPECTED:

      • Result should be headed 'Running index of site (max 60 seconds)'
      • As all the content is already indexed, it should show 'No new documents to index' in each area.

      18. In the web interface, go to site administration / plugins / search / search areas.

      19. Click 'Delete index' next to the search area with large content.

      20. Back on the command line, run the command:

      php search/cli/indexer.php --timelimit=10

      EXPECTED:

      • The search area should be partially indexed, with a message like this:

      Processing area: Forum - posts
        Processed 3238 records containing 3238 documents, in 9.734 seconds (not complete).
      Stopping indexing due to time limit.

      21. Run the command again; the same thing should happen.

      22. Now run without the parameter:

      php search/cli/indexer.php

      EXPECTED:

      • The heading should be 'Running full index of site'.
      • It should take much longer because it will now do the full index.
      • The message for the search area should not have the '(not complete)' text.
      Show
      Before you start 0a. You need a Moodle site with search configured and working via an Apache Solr instance. 0b. If your test site has cron running, then temporarily turn off the scheduled task that runs forum indexing (in this test script we'll run it manually, so it will be confusing if it runs automatically on your site). 0c. You will need a large amount of content to index, for example forum posts. If your test site doesn't have much content, you can add 100,000 forum posts as follows: Apply the attached xs_big_forums.patch Go to site administration / development / make test course Select the 'XS' size and enter a suitable short name and full name With the patch applied, this will create a course with 100,000 forum posts. It takes a few minutes. After you have created the course, remember to remove the patched change e.g. by using 'git checkout admin/tool/generator'. 0d. If you already had a large amount of content, you need to ensure it wasn't already all indexed - you can do this by going to the search areas page and clicking 'Delete index' next to the areas that hold a lot of content. Extra fun bonus I haven't put it in the test instructions but it can sometimes be interesting to look at the Solr admin page after an indexing run completes (to see if the 'number of documents' it reports is increasing roughly as expected). Testing the scheduled task 1. Go to site administration / plugins / search / manage global search. 2. Set the 'Indexing time limit' option to 10 seconds and save changes. 3. Go to site administration / server / scheduled tasks. 4. Click 'Run now' against the 'Global search indexing' task, and click to confirm. EXPECTED: Search indexing output should display, listing all processed areas. It should complete in approximately 10 seconds (probably takes a few extra seconds of overhead, e.g. 14 seconds). The large area (usually 'Forum - posts') will show that it processed a certain number of documents, ending in the message '(not complete)'. Immediately after this, the message 'Stopping indexing due to time limit' will appear. There may be other search areas (that would normally be done after forum posts) which were not processed this time. 5. Run the task again (you can just reload the page). EXPECTED: Results should be similar to previous. The large area or areas (usually 'Forum - posts') should now appear last after all other search areas. 6. Go to site administration / plugins / search / search areas. 7. Look at the large area (Forum - posts). EXPECTED: It should show the usual information, ending in '(not yet fully indexed)'. 8. Go to site administration / advanced features and turn off the 'Enable global search' option, then save changes. 9. Re-run the scheduled task. EXPECTED: The task should do nothing because search is disabled. 10. Go to site administration / plugins / search / manage global search. 11. Turn on the 'Index when disabled' option and save changes. 12. Re-run the scheduled task. EXPECTED: The task should now run indexing, with results similar to those from step 5. 13. Based on the number of documents indexed each time the task runs, estimate how long it will need to run in order to index all the remaining content (from the original 100,000 forum posts, if you did it that way). For example, if it indexes 2,000 documents in each 10 second run, it will probably need in total 500 seconds to do all of them. 14. On the global search page, set the time limit to about 1/3 of this time. 15. Run the search task repeatedly. EXPECTED: The task should run for (approximately) the new longer time. After about 3 runs it will finish indexing the large search area, and there will no longer be a '(not complete)' message next to that line. If you then run it again it will complete very quickly as there is nothing to index. 16. You probably want to enable global search again. Testing the CLI indexer 17. On the command line, in the Moodle root directory, run the following: php search/cli/indexer.php --timelimit=60 EXPECTED: Result should be headed 'Running index of site (max 60 seconds)' As all the content is already indexed, it should show 'No new documents to index' in each area. 18. In the web interface, go to site administration / plugins / search / search areas. 19. Click 'Delete index' next to the search area with large content. 20. Back on the command line, run the command: php search/cli/indexer.php --timelimit=10 EXPECTED: The search area should be partially indexed, with a message like this: Processing area: Forum - posts   Processed 3238 records containing 3238 documents, in 9.734 seconds (not complete). Stopping indexing due to time limit. 21. Run the command again; the same thing should happen. 22. Now run without the parameter: php search/cli/indexer.php EXPECTED: The heading should be 'Running full index of site'. It should take much longer because it will now do the full index. The message for the search area should not have the '(not complete)' text.
    • Affected Branches:
      MOODLE_34_STABLE
    • Fixed Branches:
      MOODLE_34_STABLE
    • Pull Master Branch:
      MDL-59039-master

      Description

      At the moment, when the scheduled task to update the index runs, it updates all of the index and there is no way to stop it.

      For example, supposing you have a plugin which is not searchable, and you have 10 million items of content in it. You upgrade to the next version of the plugin which is searchable, and turn on the new search area - suddenly the scheduled task is going to try to do 10 million search documents on its next run. Let's guess it can index 100 per second (I don't actually know what speed is expected), this will still take over 25 hours.

      This might cause problems:

      a) The infrastructure (server running scheduled task) might not stay up long enough to complete the run.

      b) Existing search areas (that are already current) will be neglected while it is updating the one large area. So for example, if somebody is trying to search for content in a forum, that index will be up to a day out of date. This will be more noticeable to students than the one new area (which wasn't searchable before) still not being completely searchable.

      You can work around this problem by using the CLI indexing tool but it would be nice if the scheduled task worked better.

      This change is to put a time limit on the scheduled task, for example maybe it will run for a maximum of 10 minutes and then stop, and then do another 10 minutes next time it runs. With this it would then be nice if it gives priority to keeping up with the latest new content on the existing search areas (that are already current) rather than areas which are significantly behind.

      For the CLI indexing tool, I will also add an optional parameter to limit indexing time, but I don't propose to change existing default behaviour (so if you just run it, it will do the whole lot).

      In addition to other benefits, If we get this right, and also add the ability for the task to index when search is disabled, we can then stop recommending/requiring people to use the CLI script to initially index when they install search. This is helpful for setups like the OU where Moodle administrators do not have the ability to run CLI tools.

        Attachments

        1. search_areas.png
          search_areas.png
          42 kB
        2. manage_global_search.png
          manage_global_search.png
          39 kB
        3. scheduled_task.png
          scheduled_task.png
          43 kB
        4. xs_big_forums.patch
          1.0 kB

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Fix Release Date:
                  13/Nov/17