Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-68690

Search: Allow Solr to add documents in batches

XMLWordPrintable

    • MOODLE_310_STABLE
    • MOODLE_310_STABLE
    • MDL-68690-master
    • Hide
      • In order to test this change, you will need a Moodle site that is configured using the Solr search engine.
      1. Go to a forum and create 3 new forum posts. In each post, type whatever text you like, but include the special word MARIEZWOOP.
      2. Run the search indexing task (php admin/cli/scheduled_task.php --execute='\core\task\search_index_task') and check the result of the execution @ Server > Tasks > Scheduled tasks using the "view logs" option. Specifically look in logs where it indexes the 'Forum - posts' area.
        • EXPECTED: You should see something like the text below, containing the note '(1 batch)'.
      3. Using the global search icon in the Moodle header, search for 'MARIEZWOOP'.
        • EXPECTED: You should get all 3 results, proving they were indexed correctly.

      Processing area: Forum - posts
        Processed 3 records containing 3 documents (1 batch), in 0.1 seconds.
      

      Show
      In order to test this change, you will need a Moodle site that is configured using the Solr search engine. Go to a forum and create 3 new forum posts. In each post, type whatever text you like, but include the special word MARIEZWOOP. Run the search indexing task ( php admin/cli/scheduled_task.php --execute='\core\task\search_index_task' ) and check the result of the execution @ Server > Tasks > Scheduled tasks using the "view logs" option. Specifically look in logs where it indexes the 'Forum - posts' area. EXPECTED: You should see something like the text below, containing the note '(1 batch)'. Using the global search icon in the Moodle header, search for 'MARIEZWOOP'. EXPECTED: You should get all 3 results, proving they were indexed correctly. Processing area: Forum - posts Processed 3 records containing 3 documents (1 batch), in 0.1 seconds.

      Search reindexing with Solr is slow when there are a large number of documents. The time taken can be in the order of weeks, which is annoying if you want to (for example) upgrade to a new Solr version.

      The current engine code adds documents one at a time. It is possible to add multiple documents in one request, which would at least save on network round trips.

      In my testing, this change improves indexing performance:

      • By 80% when using a remote (cloud hosted) server running Solr 6.6.2, indexing small text entries.
      • By 30% when using a local server running Solr 8.5.1, for the same condition.
        (See attached performance.png for full test results.)

      It would be expected that the performance increase is better for a remote rather than locally hosted Solr instance, and is better when indexing mainly small text entries such as forum posts. (This change doesn't affect how files are indexed, and if you have a large number of files then those may be the most significant part of indexing.)

      This is a significant improvement. I don't have any real-life test results but it's possible that the real-life improvement on a cloud hosted server with a mixture of small text entries and files could be 50%, which is significant when the time to reindex an entire site (e.g. for search engine update) can sometimes be measured in weeks.

      One concern could be the potential size of batch updates. I have implemented a limit of 100 documents per batch, and each document can be only up to 1MB of text in its content, otherwise it will be sent individually. There is a unit test to make sure it works with the worst allowed case (100 x 1MB).

            quen Sam Marshall
            quen Sam Marshall
            Mark Johnson Mark Johnson
            Eloy Lafuente (stronk7) Eloy Lafuente (stronk7)
            Gladys Basiana Gladys Basiana
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved:

                Estimated:
                Original Estimate - 0 minutes
                0m
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 7 hours, 10 minutes
                7h 10m

                  Error rendering 'clockify-timesheets-time-tracking-reports:timer-sidebar'. Please contact your Jira administrators.