Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-83211

Search: Indexing can miss items due to race condition

XMLWordPrintable

    • MOODLE_403_STABLE, MOODLE_404_STABLE
    • MOODLE_404_STABLE, MOODLE_405_STABLE
    • MDL-83211-m404
    • MDL-83211-m405
    • MDL-83211-main
    • Hide

      Prerequisites:

      • You must have Moodle configured to enable global search. It does not matter which search engine you use (Solr or built in simpledb), the problem impacts both in the same way.
      • You must have access to the source code and be able to run CLI commands.

      Before starting, download the attached racetest2.php and save it within your Moodle code as search/cli/racetest2.php.

      1. Create a new course for testing. You can use default settings and call the course anything you like. Make a note of the id= number.
      2. Open two terminal windows ready to run CLI commands. You have to run these commands fairly quickly so it might be a good idea to set it up first.
      3. In the first terminal window, run php search/cli/racetest2.php --add --course=123 (where 123 is the course id you noted earlier).
        • If you have run this test script before, the script will first delete the existing Page activities that were created last time.
        • Wait for the script to output a SPECIAL KEY line.
        • If you get a message that it's too close to the next minute, wait a few seconds then run again.
      4. In the second terminal window, run php search/cli/racetest2.php --index --course=123 (where 123 is the course id you noted earlier).
        • This should also output the same SPECIAL KEY line.
        • WARNING: The test warns you not to press Ctrl-C because it has changed your search area settings. If you get the wrong special key or anything else goes wrong and you want to retry, let it run to completion instead of exiting, that way it will put back the search areas.
      5. Both scripts should wait for the next minute to start, then do something every 10 seconds. The first script is creating a Page (with a 4 second delay on committing the transaction) and the second one is indexing.
      6. When both scripts finish, check the output:
        • The 'add' script should not show errors, and should finish with 'All done' after creating Page 0006.
        • The 'index' script should show that all the Pages are indexed, i.e. it should say '1-6 Done'.

      Before this fix, the index script would end like this:

      Getting results
      Search includes:
      Done
      

      I.e. there are no successful search results for any of the pages (which would appear in between 'Search includes' and 'Done').

      Show
      Prerequisites: You must have Moodle configured to enable global search. It does not matter which search engine you use (Solr or built in simpledb), the problem impacts both in the same way. You must have access to the source code and be able to run CLI commands. Before starting, download the attached racetest2.php and save it within your Moodle code as search/cli/racetest2.php. Create a new course for testing. You can use default settings and call the course anything you like. Make a note of the id= number. Open two terminal windows ready to run CLI commands. You have to run these commands fairly quickly so it might be a good idea to set it up first. In the first terminal window, run php search/cli/racetest2.php --add --course=123 (where 123 is the course id you noted earlier). If you have run this test script before, the script will first delete the existing Page activities that were created last time. Wait for the script to output a SPECIAL KEY line. If you get a message that it's too close to the next minute, wait a few seconds then run again. In the second terminal window, run php search/cli/racetest2.php --index --course=123 (where 123 is the course id you noted earlier). This should also output the same SPECIAL KEY line. WARNING: The test warns you not to press Ctrl-C because it has changed your search area settings. If you get the wrong special key or anything else goes wrong and you want to retry, let it run to completion instead of exiting, that way it will put back the search areas. Both scripts should wait for the next minute to start, then do something every 10 seconds. The first script is creating a Page (with a 4 second delay on committing the transaction) and the second one is indexing. When both scripts finish, check the output: The 'add' script should not show errors, and should finish with 'All done' after creating Page 0006. The 'index' script should show that all the Pages are indexed, i.e. it should say '1-6 Done'. Before this fix, the index script would end like this: Getting results Search includes: Done I.e. there are no successful search results for any of the pages (which would appear in between 'Search includes' and 'Done').

      The search indexing works as follows:

      1. Index all documents up to current time
      2. Next time indexing runs, start at the previous time indexing started

      The times are inclusive, which means there should be an overlap of 1 second so in theory, all documents will be indexed, i.e. indexing will include documents from every possible second. For example, if indexing runs at 10:00:00.00 and a document is written at 10:00:00.50 (half a second later), the document will still be indexed, because the next indexing run will include documents from 10:00:00 to whenever that indexing run starts. (Some documents during that second might be indexed twice. Other than wasting time for indexing one document, it does not hurt to index the same document twice.)

      However it can miss documents in the following case:

      • A new document is written very near to the second (e.g. 09:07:09.991).
      • The timecreated for this document is that second (e.g. 09:07:09) but it will not be completely saved to the database until some time into the following second (e.g. 09:07:10.027).
      • Indexing runs earlier in the following second (e.g. 09:07:10.001).

      In this scenario, the document is never indexed because it only got saved to the database after indexing, but the time created is in the previous second so it won't be included next time.

      I wrote a complicated script to demonstrate this (attached). Here is the real output from my script corresponding to this case:

      Creating the page:

      Page 0003 09:07:09.991
                09:07:10.027
                timemodified 09:07:09
      

      Indexing (starts before page is written, but does not include page even though it has the earlier second):

      Now 09:07:10.001
      Processing area: Page
        Processed 1 records containing 1 documents, in 0 seconds.
      Now 09:07:10.196
      

      (The 'Processed' record here is not the page that was just added, but one of the previous ones.)

      This scenario is quite unlikely as the indexing for that search area has to start at the exact time of the document being created, but:

      • On a large site with indexing set to happen frequently (we run once per minute on ours) it could happen.
      • With more complicated documents than a Page activity, it will be more likely that committing it to database takes some time, i.e. the timecreated value could end up in the previous second even if you didn't start writing it 0.99 seconds in.

      To resolve this problem without any performance cost I propose changing it so that indexing does not include data from the previous N seconds (e.g. 5), and this data is instead covered in the next index run. That way if there is anything written to the database with an 'old' timecreated (up to 5 seconds old) it will still work. Realistically nothing should be later than that, or if it is, then it would be a bug in the activity creating the data. (E.g. if you calculate timecreated at the start of a transaction, then spend ages adding stuff to other tables, then write it to database. In that scenario you should work out timecreated at the end.)

      This diagram shows the proposed change, with 'now' being the time that 'Indexing 1' runs:

      In addition to the correctness improvement (for items that are added to database slightly late), this also has a slight performance advantage because we do not need to index the current second woth of documents twice, as indexing only works on documents 5 seconds in the past so there will be no new documents added with that date. Obviously this is a very small improvement; even if you run search every minute it will only save about 1/60 of the document indexing.

      Test scripts (Behat and PHPunit) will rely on indexing completing immediately, i.e. the pattern will be to create an activity then immediately update the index. So as to avoid breaking both core and third-party automated tests, the delay will not be applied in test runs, which will continue to use the 'before' strategy. I put in a hacky $CFG variable to enable the new approach during testing, which we can use for a unit test of the new behaviour.

        1. (1) 6 Passed -- (Main)MDL-83211.png
          (1) 6 Passed -- (Main)MDL-83211.png
          166 kB
        2. indexing-diagram.png
          indexing-diagram.png
          24 kB
        3. racetest.php
          9 kB
        4. racetest2.php
          9 kB
        5. search-indexing-diagram.svg
          9 kB

            quen Sam Marshall
            quen Sam Marshall
            Neill Magill Neill Magill
            Jun Pataleta Jun Pataleta
            Kim Jared Lucas Kim Jared Lucas
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved:

                Estimated:
                Original Estimate - 0 minutes
                0m
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 1 hour, 12 minutes
                1h 12m

                  Error rendering 'clockify-timesheets-time-tracking-reports:timer-sidebar'. Please contact your Jira administrators.