Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-66326

Global search: Delete from search index when courses are deleted

    XMLWordPrintable

    Details

    • Testing Instructions:
      Hide

      To test this fully we will need to perform the same testing steps using both supported search engines (simpledb and Solr). You will need to have a Solr server ready for use.

      Shared testing steps

      Note: these steps assume you don't have any existing content on your server with the word 'frogrocket'. If that's a common word in your content, you might need to use a different word in the test.

      1. Set up the server to use the search engine for this run (Solr or simpledb) and ensure global search is enabled.
      2. Create a new course for testing. Set the course full name to Frogrocket launching.
      3. Within the course, add an HTML block with the title Frogrocket dangers and arbitrary content.
      4. Add a Page with the title History of the frogrocket and arbitrary content.
      5. Add a second Page with the title Care and feeding of the frogrocket and arbitrary content.
      6. Run search indexing (e.g. run the Global search indexing scheduled task) so that it indexes the newly-created content.
      7. Using the method appropriate to the individual search engine (see below), check directly in the search engine's index to see how many entries contain the word frogrocket. There should be four results (in any order) with the titles from above.
      8. Delete the first Page from the course.
      9. Ensure that cron runs e.g. run it in browser admin/cron.php or via CLI, to carry out the actual deletion which may be done by an ad-hoc task.
      10. Repeat the search above. You should get three results, without the 'History of the frogrocket' one.
      11. Delete the HTML block from the course.
      12. Repeat the search above. You should get two results, without the 'Frogrocket dangers' one.
      13. Delete the entire course (easiest way is to go to course/delete.php?id=courseid and follow prompts).
      14. Repeat the search above. You should get no results.

      Checking search index (simpledb)

      • To check the index, connect to your database and run this SQL command:

      SELECT title FROM mdl_search_simpledb_index WHERE title LIKE '%frogrocket%'
      

      Checking search index (Solr)

      • To check the index, go to the Solr admin page in your browser (Usually under http://localhost:8983/solr.
      • The exact steps may differ depending on version. In my version, you need to:
        • 1 Select the relevant collection from the Collections dropdown.
        • 2 Choose "Query" from the list that appears below this.
        • 3 In the query form, change the 'q' field to title:frogrocket and submit. Results will be shown on the right.
        • You can simply re-submit the same form when you need to do it again after the index is updated.

      Show
      To test this fully we will need to perform the same testing steps using both supported search engines (simpledb and Solr). You will need to have a Solr server ready for use. Shared testing steps Note: these steps assume you don't have any existing content on your server with the word 'frogrocket'. If that's a common word in your content, you might need to use a different word in the test. Set up the server to use the search engine for this run (Solr or simpledb) and ensure global search is enabled. Create a new course for testing. Set the course full name to Frogrocket launching . Within the course, add an HTML block with the title Frogrocket dangers and arbitrary content. Add a Page with the title History of the frogrocket and arbitrary content. Add a second Page with the title Care and feeding of the frogrocket and arbitrary content. Run search indexing (e.g. run the Global search indexing scheduled task) so that it indexes the newly-created content. Using the method appropriate to the individual search engine (see below), check directly in the search engine's index to see how many entries contain the word frogrocket. There should be four results (in any order) with the titles from above. Delete the first Page from the course. Ensure that cron runs e.g. run it in browser admin/cron.php or via CLI, to carry out the actual deletion which may be done by an ad-hoc task. Repeat the search above. You should get three results, without the 'History of the frogrocket' one. Delete the HTML block from the course. Repeat the search above. You should get two results, without the 'Frogrocket dangers' one. Delete the entire course (easiest way is to go to course/delete.php?id=courseid and follow prompts). Repeat the search above. You should get no results. Checking search index (simpledb) To check the index, connect to your database and run this SQL command: SELECT title FROM mdl_search_simpledb_index WHERE title LIKE '%frogrocket%' Checking search index (Solr) To check the index, go to the Solr admin page in your browser (Usually under http://localhost:8983/solr . The exact steps may differ depending on version. In my version, you need to: 1 Select the relevant collection from the Collections dropdown. 2 Choose "Query" from the list that appears below this. 3 In the query form, change the 'q' field to title:frogrocket and submit. Results will be shown on the right. You can simply re-submit the same form when you need to do it again after the index is updated.
    • Affected Branches:
      MOODLE_38_STABLE
    • Fixed Branches:
      MOODLE_38_STABLE
    • Pull Master Branch:
      MDL-66326-master

      Description

      Unless you manually reindex, the global search system never deletes indexed data, other than in the case where you search for something and then get a result which has since been deleted.

      For example, supposing you make a forum post containing the phrase 'bleep off'. The post gets indexed. A moderator deletes the post. Another user searches for 'bleep'. The search engine returns the post, but Moodle can't find it: the post will now be deleted from the search engine so that it isn't returned again.

      If nobody had searched for words in that post (causing it to appear high enough in search results to potentially be displayed), then it would not be deleted from the index.

      This is not a major problem for single forum posts, because who cares. But consider a large course with tens of thousands of forum posts. The course is now deleted. Now, not only are we wasting lots of space in the search index, it is not even possible for it to be cleared up, because searches are restricted to courses that the user can access, and no user can access the now-deleted course, so these would never show up in results.

      I propose we should add new optional API to search engine, which allows for items to be deleted based on the 'courseid' field whenever a course is deleted from Moodle. (If the search engine does not support the API then it does nothing, as now.)

      We can also implement this based on 'contextid' field when an activity/block is deleted,

      To avoid flooding the search engine with requests, if a whole course is deleted at once, we will do the 'course' delete and suppress all 500 (or whatever) context deletes,

      I did the deletes manually using the Solr interface for 11 large courses that we deleted from our live system, and caused about 300,000 search documents to be deleted from the index. That's only about 1% of our index, but every little helps! I think it's important that we have this in future so that search indexes don't grow infinitely.

        Attachments

        1. simplesearch.png
          simplesearch.png
          230 kB
        2. solr_admin_ui.png
          solr_admin_ui.png
          184 kB
        3. solrsearch.png
          solrsearch.png
          429 kB

          Activity

            People

            Assignee:
            quen Sam Marshall
            Reporter:
            quen Sam Marshall
            Peer reviewer:
            Mark Johnson
            Integrator:
            Eloy Lafuente (stronk7)
            Tester:
            Janelle Barcega
            Participants:
            Component watchers:
            Amaia Anabitarte, Carlos Escobedo, Ferran Recio, Sara Arjona (@sarjona)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:
              Fix Release Date:
              18/Nov/19

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 2 hours, 18 minutes
                2h 18m