Affects Version/s: 3.1
Fix Version/s: None
Component/s: Global search
This should probably be converted into an Epic and broken apart into pieces, some backportable, some not. But only HQ/Component leads are supposed to make Epics.
The problem we have found is that there are a few problems with the way indexing is currently implemented that results in indexing for large sites to be somewhat unstable - particularly the initial indexing.
Solr may sometimes partially crash, resulting in exceptions being thrown for all new documents we are trying to add - but it still connects, so we don't fail out completely. Moodle catches and ignores errors from Solr when adding a doc because sometimes it throws exceptions on individual documents (particularly files) because Solr can't process that file, not because of an overall problem.
This leads to a few problems:
- If Solr crashes part way through a particular search area, there is no choice but start the area all over again.
- Usually when a Solr crash occurs, the manager doesn't realize it, and marks the area as being correctly indexed. The only way to recover is to delete the area and reindex it.
- Moodle doesn't detect that the engine is broken until the very end of the area, if at all, this means we may waste a lot of time processing records that aren't getting recorded.
- It may be possible that it would be impossible to index particularly large search areas, if you can never tune it to succeed in one pass.
So this brings up to possible improvements, in no particular order:
- Solr needs to properly report back commit failure in area_index_complete().
- Try to differentiate which Solr exceptions are routine and which are indicate that we should stop indexing.
- One possibility here would be to count how many successive document failures we have had, and do an explicit commit to detect if Solr is up or down.
- Properly report back to the manager when a doc has failed to be added.
- Have the manager use the last successfully indexed document's modified time as the pickup point, instead of the the last index start time. The phpdocs imply this was the intent, but that is not what is actually coded.
- Have the manager commit and save in chunks - like call area_index_complete(); area_index_starting(); every 50,000 records or something. Update the time pointers each time area_index_complete() succeeds.
Most of these things (but I'm not sure about 100% of them) actually fix within the existing API definitions and descriptions, it's largely a matter of tweaking internal behaviors.