Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-58654

Allow Global Search to be much faster by supporting batch processing

    XMLWordPrintable

    Details

    • Testing Instructions:
      Hide

      Regular Processing (To make sure nothing broke)

      1. Pull patch branch
      2. Setup Moodle (DB shouldn't matter)
      3. Create test content in site
      4. Setup search engine (for example Solr)
      5. Configure Global search in Moodle as per the Moodle docs
      6. Index site
      7. Conduct a search
      8. Delete all indexed contents (via web ui)
      9. Now, via CLI run 'php search/cli/indexer.php' and confirm that you see the duration for each search area listed.

      All search functionality should behave the same as original global search functionality. There should be no log errors etc. The only difference will be extra time information when the site is indexed from the command line.

      Show
      Regular Processing (To make sure nothing broke) Pull patch branch Setup Moodle (DB shouldn't matter) Create test content in site Setup search engine (for example Solr) Configure Global search in Moodle as per the Moodle docs Index site Conduct a search Delete all indexed contents (via web ui) Now, via CLI run 'php search/cli/indexer.php' and confirm that you see the duration for each search area listed. All search functionality should behave the same as original global search functionality. There should be no log errors etc. The only difference will be extra time information when the site is indexed from the command line.
    • Affected Branches:
      MOODLE_31_STABLE, MOODLE_32_STABLE, MOODLE_33_STABLE
    • Fixed Branches:
      MOODLE_34_STABLE
    • Pull Master Branch:
      MDL-58654_global_search_batch

      Description

      Currently for document indexing in Moodle, an iterator is made for each search area. The loop for this iterator is in the manager class of Global Search. This iterator loop then calls the add document method for the search engine plugin that a Moodle instances uses. This means that the search engine plugin backend is fed documents one at a time.

      This improvement moves the iterator loop from the management class into the base engine class. This allows for search engine backend plugins to override it and provide their own implementation. While not breaking the interface contract with current search engine implementations.

      The advantage of giving the search engine backend plugins the iterator directly instead of one document at a time, allows the plugin to implement bulk document indexing for search engines that support it. The ability to bulk index documents cuts down the network overhead, for example instead of making one cURL call per document to be indexed, you can bundle up several documents and pass them in one cURL call.

      I've been experimenting this with the Elasticsearch plugin. The way it stands now that maximum document throughput on my test rig is 25,000 documents indexed per hour. When batch processing is enabled this increases to 65,000 documents indexed per hour.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                3 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Fix Release Date:
                  13/Nov/17