Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-68726

Search: Stop Solr 'optimize' behaviour

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.10
    • Fix Version/s: 3.10
    • Component/s: Global search
    • Labels:
    • Testing Instructions:
      Hide

      You will need a Moodle setup with Apache Solr for global search.

      1. Go to the scheduled task screen.
      2. Run the 'Global search index optimization' scheduled task, either by using the 'Run now' link if available or scheduling it to run.
      3. Check the task output (use task logs if necessary).
        • EXPECTED: It should include the sentence 'The Solr search engine does not require automatic optimization.'
      Show
      You will need a Moodle setup with Apache Solr for global search. Go to the scheduled task screen. Run the 'Global search index optimization' scheduled task, either by using the 'Run now' link if available or scheduling it to run. Check the task output (use task logs if necessary). EXPECTED: It should include the sentence 'The Solr search engine does not require automatic optimization.'
    • Affected Branches:
      MOODLE_310_STABLE
    • Fixed Branches:
      MOODLE_310_STABLE
    • Pull Master Branch:
      MDL-68726-master

      Description

      Moodle automatically runs the Solr 'optimize' task every night via a scheduled task. This is a bad thing. At least in most versions of Solr, what optimize does is to rewrite the entire index (which may consist of multiple 'segment' files) into a single file. This has the following impacts:

      • In theory it might save disk space (the single file will not have any 'deleted' spaces in it), but in practice, you need 2x disk space each time it runs optimize, so I'm not sure what was the point of saving the space given that you need to keep it free anyway...
      • In theory it might make it faster, but Solr is perfectly fast with multiple segments.
      • With a sufficiently large index, it stops working anyway because it hits a Moodle time limit (120 seconds probably).
      • If you ever stop running optimise, you end up with one massive segment (e.g. 35GB) and Solr will create and use other segments, but will never free up that one unless 32.5GB of the 35GB gets deleted.

      It seems to be a generally-held belief that optimise is not usually necessary, and is only maybe a benefit if you have a static index (e.g. your website is only updated every 3 months, you update it and hit optimise then). From experience at the OU (struggling to get the 35GB segment deleted) I concur with this belief.

      Anyway, I would suggest that deciding to optimise, which is a quite drastic step, is something you should only do with full understanding and due consideration, which would indicate to me that the application shouldn't do it automatically!

      The scheduled task is used by other search engine plugins so rather than literally remove the task, I propose simply deleting the implementation within Solr.

      Along with this change I'm also going to make it add an mtrace when optimise does nothing (the default implementation) so that admins can tell if the search engine has not implemented it, basically just if they're wondering 'does this task do anything'.

        Attachments

          Activity

            People

            Assignee:
            quen Sam Marshall
            Reporter:
            quen Sam Marshall
            Peer reviewer:
            Mark Johnson
            Integrator:
            Eloy Lafuente (stronk7)
            Tester:
            Anna Carissa Sadia
            Participants:
            Component watchers:
            Amaia Anabitarte, Carlos Escobedo, Ferran Recio, Sara Arjona (@sarjona)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:
              Fix Release Date:
              9/Nov/20

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 1 hour, 45 minutes
                1h 45m