-
Bug
-
Resolution: Fixed
-
Major
-
3.1
-
MOODLE_31_STABLE
-
MOODLE_31_STABLE
-
MDL-53758-master -
There are two tightly interconnected issues with the current global search implementations.
Low hit-to-miss ratios
The first problem is that the way the Solr engine is implemented, it relies on a high ratio of valid results to get the user results. While the schema does a good job of this with contexts, there are places where there simply isn't enough granularity possible.
Take this example: A student is in a large course that uses Separate Groups forums for discussions. The student is in Group A. In Groups B, C, and D lots of students talk about topic YYY in the separated forums. Now the student, in Group A, searches for YYY. Solr gets the first 100 results for the query, and because the student has access to the contexts of those forums, the results are dominated by the messages he doesn't have access to. The search system correctly filters out results that the user doesn't have access to, but it can mean that the user only has a handful of (or no) valid results, even if there really are ones he has access too. They were just crowded out in the first 100 records.
While of course there could be ways to change the schema to work with this particular problem, the generic problem will always be there - some search areas depend on stuff to determine access that we can't readily index/search, so we need to be better at getting a full set of results, even if they aren't in the first 100 documents from Solr. This isn't hard to implement, the problem becomes performance, which takes us to the second problem...
Processing results we don't need
The second issue is that the current search scheme just asks for 100 results for a query, it then takes it and just displays the records for the current page, discarding all the rest. This has a few downsides:
- We are processing, and doing access/permission checks for all 100 results, even if we are only displaying the first 10. Given that a very large percent of searches will probably never move past the first page, that is a lot of wasted effort. Potentially hundreds of extra DB queries.
- Second, if we fix the above problem, we have to do additional engine queries, db queries, and checks to replace any missed documents, even if they will never be used.