-
Improvement
-
Resolution: Fixed
-
Minor
-
3.10
-
MOODLE_310_STABLE
-
MOODLE_310_STABLE
-
MDL-68690-master -
Search reindexing with Solr is slow when there are a large number of documents. The time taken can be in the order of weeks, which is annoying if you want to (for example) upgrade to a new Solr version.
The current engine code adds documents one at a time. It is possible to add multiple documents in one request, which would at least save on network round trips.
In my testing, this change improves indexing performance:
- By 80% when using a remote (cloud hosted) server running Solr 6.6.2, indexing small text entries.
- By 30% when using a local server running Solr 8.5.1, for the same condition.
(See attached performance.png for full test results.)
It would be expected that the performance increase is better for a remote rather than locally hosted Solr instance, and is better when indexing mainly small text entries such as forum posts. (This change doesn't affect how files are indexed, and if you have a large number of files then those may be the most significant part of indexing.)
This is a significant improvement. I don't have any real-life test results but it's possible that the real-life improvement on a cloud hosted server with a mixture of small text entries and files could be 50%, which is significant when the time to reindex an entire site (e.g. for search engine update) can sometimes be measured in weeks.
One concern could be the potential size of batch updates. I have implemented a limit of 100 documents per batch, and each document can be only up to 1MB of text in its content, otherwise it will be sent individually. There is a unit test to make sure it works with the worst allowed case (100 x 1MB).