Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-60720

core_search: Indexing halts on failed get_document

XMLWordPrintable

    • MOODLE_32_STABLE, MOODLE_33_STABLE
    • MOODLE_32_STABLE, MOODLE_33_STABLE
    • MDL-60720-master
    • Hide

      You will need a Moodle with working global search (Apache Solr) setup, and direct access to the database.

      1. Go to the Site administration / Server / Scheduled tasks page.
      2. Click Run now against the Global search indexing task. Confirm the prompt.
      3. Check the output log says No new documents to index against all areas; if it doesn't, repeat until it does.
      4. On a test course, add a new Page called Broken page and enter text in Page content that includes the word morphleclunk.
      5. Make a note of the cmid of the new page (the number after view.php?id=).
      6. Sabotage the database entry for this Page using the following SQL (replace mdl_ with your Moodle database prefix, and ? with the ID of the first, soon-to-be-broken page):
        • UPDATE mdl_course_modules SET instance=0 WHERE id=?
      7. To confirm you have successfully broken the first page, in the web browser, edit settings for the course (you don't need to actually make any changes), then save changes.
        • You should see that the 'Broken page' is no longer visible.
      8. Add a second new Page called Working page with page content that also includes the word morphleclunk.
      9. Repeat the first few steps to re-run search indexing.
        • EXPECTED: Under Processing area: Page, you should see one debugging message starting with Error retrieving mod_page-activity
          • BEFORE FIX: This debugging message appeared twice if the other problem had been fixed, indicating that get_document was called twice.
        • EXPECTED: Under Processing area: Page, you should see that it processed 1 records containing 1 documents.
          • BEFORE FIX: It previously used to show No new documents to index.
      10. Using the global search icon in the header, search for morphleclunk.
        • EXPECTED: There should be one result titled Working page.
          • BEFORE FIX: Previously there were no results because the new working page wasn't indexed.
      Show
      You will need a Moodle with working global search (Apache Solr) setup, and direct access to the database. Go to the Site administration / Server / Scheduled tasks page. Click Run now against the Global search indexing task. Confirm the prompt. Check the output log says No new documents to index against all areas; if it doesn't, repeat until it does. On a test course, add a new Page called Broken page and enter text in Page content that includes the word morphleclunk . Make a note of the cmid of the new page (the number after view.php?id=). Sabotage the database entry for this Page using the following SQL (replace mdl_ with your Moodle database prefix, and ? with the ID of the first, soon-to-be-broken page): UPDATE mdl_course_modules SET instance=0 WHERE id=? To confirm you have successfully broken the first page, in the web browser, edit settings for the course (you don't need to actually make any changes), then save changes. You should see that the 'Broken page' is no longer visible. Add a second new Page called Working page with page content that also includes the word morphleclunk . Repeat the first few steps to re-run search indexing. EXPECTED: Under Processing area: Page , you should see one debugging message starting with Error retrieving mod_page-activity BEFORE FIX: This debugging message appeared twice if the other problem had been fixed, indicating that get_document was called twice. EXPECTED: Under Processing area: Page , you should see that it processed 1 records containing 1 documents . BEFORE FIX: It previously used to show No new documents to index . Using the global search icon in the header, search for morphleclunk . EXPECTED: There should be one result titled Working page . BEFORE FIX: Previously there were no results because the new working page wasn't indexed.

      Due to a bug in skip_future_documents_iterator, if any search area returns false to get_document, indexing halts at that point.

      This is a serious bug as it prevents all indexing from that point on. For example, in our test data, we have a Page from 2011 that is failing (returning false to get_document); in our system, no pages from 2011 onward are being indexed.

      Worse, because the indexing is not recorded as 'partial' (i.e. the system thinks it has completed the index, even though it stopped in 2011), the system may then try to continue from the current time (rather than where it got up to), i.e. it will index new pages created now in 2017, but there will be a massive gap from 2011 to 2017...

      In addition to this serious bug there is a less serious problem with the iterator - because it repeatedly calls the parent iterator's current() function, this results in two calls to the search system get_document() which in some cases can be a slow function involving multiple database queries.

            quen Sam Marshall
            quen Sam Marshall
            Damyon Wiese Damyon Wiese
            David Monllaó David Monllaó
            David Monllaó David Monllaó
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:

                Error rendering 'clockify-timesheets-time-tracking-reports:timer-sidebar'. Please contact your Jira administrators.