Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-53758

Global Search Filling and Performance Issues

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • 3.1
    • 3.1
    • Global search
    • MOODLE_31_STABLE
    • MOODLE_31_STABLE
    • MDL-53758-master
    • Hide

      Setup

      To fully test this, we need a lot of records, so I am attaching a generator file.
      To use it:

      1. Create 2 students Student 1 and Student 2, with the ID Numbers MDL-53642-u1 and MDL-53642-u2 respectively
      2. Create a course with the ID Number MDL-53642-c
      3. Enrol the two students
      4. In that course, create two groups, Group 1 and Group 2, with ID Numbers MDL-53642-g1 and MDL-53642-g2
      5. Assign Student 1 to Group 1 and Student 2 to Group 2
      6. Create a forum. Under Common module settings, set it to separate groups, and set the ID Number to MDL-53642-f
      7. Download MDL-53642-gen.php and place it in you Moodle directory
      8. Run php MDL-53642-u1-gen.php
      9. Go into your forum and make sure you have content there.
      10. Index site content:

        php admin/tool/task/cli/schedule_task.php --execute="\\core\\task\\search_index_task" 

      Once the data is generated:

      1. Login as student 1.
      2. Search for Search1 Search2 Search3 Search4 Search5
      3. On the first page, confirm you see 10 results for Student 1 posts, and 10 pages available.
      4. Go to the second page.
      5. Confirm you see 10 results for Student 1 posts, and 9 pages available.
      6. Go to page 7.
      7. Confirm you see 5 results for Student 1 posts, and 7 pages available.
      8. Go back to page 1.
      9. Go to page 10.
      10. Confirm you see 5 results for Student 1 posts, and 7 pages available, and that page 7 is the selected page.
      11. Login as Student 2.
      12. Search for Search1 Search2 Search3 Search4 Search5
      13. On the first page, confirm you see 10 results for Student 2 posts, and 10 pages available.
      14. Go to the second page.
      15. Confirm you see 10 results for Student 2 posts, and 10 pages available.
      16. Go to page 10
      17. Confirm you really end up on page 8, with 10 results, and 8 pages available.
      18. Page through pages 1 to 8. Confirm that the DB read count gets higher as you move to higher pages. Probably starts in the 60s for page 1, and in the 600s by page 8. The progression is not necessarily linear.
      Show
      Setup Make sure you have solr setup and working, with Forum posts search area enabled ( https://docs.moodle.org/dev/Global_search ) Enable performance footer (displays db read/write count in footer) To fully test this, we need a lot of records, so I am attaching a generator file. To use it: Create 2 students Student 1 and Student 2, with the ID Numbers MDL-53642 -u1 and MDL-53642 -u2 respectively Create a course with the ID Number MDL-53642 -c Enrol the two students In that course, create two groups, Group 1 and Group 2, with ID Numbers MDL-53642 -g1 and MDL-53642 -g2 Assign Student 1 to Group 1 and Student 2 to Group 2 Create a forum. Under Common module settings, set it to separate groups, and set the ID Number to MDL-53642 -f Download MDL-53642 -gen.php and place it in you Moodle directory Run php MDL-53642 -u1-gen.php Go into your forum and make sure you have content there. Index site content: php admin/tool/task/cli/schedule_task.php --execute="\\core\\task\\search_index_task" Once the data is generated: Login as student 1. Search for Search1 Search2 Search3 Search4 Search5 On the first page, confirm you see 10 results for Student 1 posts, and 10 pages available. Go to the second page. Confirm you see 10 results for Student 1 posts, and 9 pages available. Go to page 7. Confirm you see 5 results for Student 1 posts, and 7 pages available. Go back to page 1. Go to page 10. Confirm you see 5 results for Student 1 posts, and 7 pages available, and that page 7 is the selected page. Login as Student 2. Search for Search1 Search2 Search3 Search4 Search5 On the first page, confirm you see 10 results for Student 2 posts, and 10 pages available. Go to the second page. Confirm you see 10 results for Student 2 posts, and 10 pages available. Go to page 10 Confirm you really end up on page 8, with 10 results, and 8 pages available. Page through pages 1 to 8. Confirm that the DB read count gets higher as you move to higher pages. Probably starts in the 60s for page 1, and in the 600s by page 8. The progression is not necessarily linear.

      There are two tightly interconnected issues with the current global search implementations.

      Low hit-to-miss ratios

      The first problem is that the way the Solr engine is implemented, it relies on a high ratio of valid results to get the user results. While the schema does a good job of this with contexts, there are places where there simply isn't enough granularity possible.

      Take this example: A student is in a large course that uses Separate Groups forums for discussions. The student is in Group A. In Groups B, C, and D lots of students talk about topic YYY in the separated forums. Now the student, in Group A, searches for YYY. Solr gets the first 100 results for the query, and because the student has access to the contexts of those forums, the results are dominated by the messages he doesn't have access to. The search system correctly filters out results that the user doesn't have access to, but it can mean that the user only has a handful of (or no) valid results, even if there really are ones he has access too. They were just crowded out in the first 100 records.

      While of course there could be ways to change the schema to work with this particular problem, the generic problem will always be there - some search areas depend on stuff to determine access that we can't readily index/search, so we need to be better at getting a full set of results, even if they aren't in the first 100 documents from Solr. This isn't hard to implement, the problem becomes performance, which takes us to the second problem...

      Processing results we don't need

      The second issue is that the current search scheme just asks for 100 results for a query, it then takes it and just displays the records for the current page, discarding all the rest. This has a few downsides:

      • We are processing, and doing access/permission checks for all 100 results, even if we are only displaying the first 10. Given that a very large percent of searches will probably never move past the first page, that is a lot of wasted effort. Potentially hundreds of extra DB queries.
      • Second, if we fix the above problem, we have to do additional engine queries, db queries, and checks to replace any missed documents, even if they will never be used.

            emerrill Eric Merrill
            emerrill Eric Merrill
            David Monllaó David Monllaó
            Eloy Lafuente (stronk7) Eloy Lafuente (stronk7)
            Adrian Greeve Adrian Greeve
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:

                Error rendering 'clockify-timesheets-time-tracking-reports:timer-sidebar'. Please contact your Jira administrators.