Moodle

Major improvements on Global search. See summary in tracker

Details

  • Type: Improvement Improvement
  • Status: Open Open
  • Priority: Major Major
  • Resolution: Unresolved
  • Affects Version/s: 1.8.5, 1.9
  • Fix Version/s: None
  • Component/s: Global search
  • Labels:
    None
  • Database:
    MySQL
  • Affected Branches:
    MOODLE_18_STABLE, MOODLE_19_STABLE

Description

Achieved and being tested a large set of improvements :

  • User records indexation :
    indexes three new documents
    • User description, indexes all users with description (could be pursued)
    • User blog posts, indexes the posts using subject, abstract and content
    • User blog attachments, depending on physical file indexability
  • Assignement indexation
    indexes assignement descriptions
    tries to index assignment submission, but architectural issues on multiple uploads (in progress)
  • Search API pluggability improved
    • allows detecting searchable third-party plugins, and delegates to plugin the search related implementation
    • Techproject spitted out from core search strategy, as being third -party. Used for testing above
  • Extensible physical handling
    • allows adding configuration parameters to launch converters without having to modify config_global.htm
      Note : it is still necessary to code and add a physical_XXX.php handler in /search/documents
  • Enhanced indexer configuration
    Allows to enable or disable by configuration modules to be indexed. This adds a great deal of flexibility in indexer, and allows disabling locally struggled components. (Asked by Matt Gibson in MDL-12271)
  • UTF8 fixes and straithening
    Forces to construct UTF8 compatible Lucene instances
    Checks UTF8 back links
    Fixes an UTF8 issue in querylib.php avoiding searches with special utf8 chars to match

In progress :
Tests on 1.9

Question : how to proceed for commitments ? I suggest commiting in HEAD before code review, and wait feedback for stability status.

Issue Links

Activity

Hide
Valery Fremaux added a comment -

Oups, how to remove that "blocker" link that has no sense ?

Show
Valery Fremaux added a comment - Oups, how to remove that "blocker" link that has no sense ?
Hide
Valery Fremaux added a comment -

Other improvements I forgot :

all physical handlers have been revisited so they could be reused to index any attachement in any module, and not only resources.

Physical handling extensino was tried with Adobe Search SDK. Although non GPL, there would be a provision for non standard SWF indexing, with sufficiant advertisment for user. Tim William might distribute this "not so free" pack with autoview.

Show
Valery Fremaux added a comment - Other improvements I forgot : all physical handlers have been revisited so they could be reused to index any attachement in any module, and not only resources. Physical handling extensino was tried with Adobe Search SDK. Although non GPL, there would be a provision for non standard SWF indexing, with sufficiant advertisment for user. Tim William might distribute this "not so free" pack with autoview.
Hide
Martin Dougiamas added a comment - - edited

Great! Yes, please put these in HEAD so people can test (GPL-code only, other stuff might have to be separate). If it's safe, we might port back to 1.9.1.

Show
Martin Dougiamas added a comment - - edited Great! Yes, please put these in HEAD so people can test (GPL-code only, other stuff might have to be separate). If it's safe, we might port back to 1.9.1.
Hide
Valery Fremaux added a comment -

All files commited in HEAD.

Note a particular proceeding for setup that ought to be commented in doc :

When changing the list of allowed extensions (adding some extra extensions) will be created additional config keys to setup system command line, and an optional environment variable.

As I did not use Ajax nor Javascript for updating interatively the form, there is a need to first save the altered extensions list, and then go back to the setup form to have the additional parameters available.

This should be the case (tested on my dev 1.8.4) for SWF handling, where the lib should be added to <%%moodleroot%%>/lib as "swfconverters" subdirectory, and subsequently binded in the search setup screen using a command line such as "lib/swfconverters/windows/swf2html.exe" (Windows example - No env variable needed).

Note 2 : as Adobe Search libs should not be distributed along, all references to this lib pack is given where relevant as http://www.adobe.com/licensing/developer/ for ones who want to test. works fine.

Cheers.

Show
Valery Fremaux added a comment - All files commited in HEAD. Note a particular proceeding for setup that ought to be commented in doc : When changing the list of allowed extensions (adding some extra extensions) will be created additional config keys to setup system command line, and an optional environment variable. As I did not use Ajax nor Javascript for updating interatively the form, there is a need to first save the altered extensions list, and then go back to the setup form to have the additional parameters available. This should be the case (tested on my dev 1.8.4) for SWF handling, where the lib should be added to <%%moodleroot%%>/lib as "swfconverters" subdirectory, and subsequently binded in the search setup screen using a command line such as "lib/swfconverters/windows/swf2html.exe" (Windows example - No env variable needed). Note 2 : as Adobe Search libs should not be distributed along, all references to this lib pack is given where relevant as http://www.adobe.com/licensing/developer/ for ones who want to test. works fine. Cheers.
Hide
Valery Fremaux added a comment -

Incomplete implementation.

Need finishing query side aspects of getting third-party modules outside of core search engine.

I am actually getting some stuff simpler, shooting out some useless constants.

HEAD will be patched with a new review soon.

I will integrate a contribution code that adds document type icon and a course reference within the result line.

Result set needs to be reworked when searching with a non connected status.

Show
Valery Fremaux added a comment - Incomplete implementation. Need finishing query side aspects of getting third-party modules outside of core search engine. I am actually getting some stuff simpler, shooting out some useless constants. HEAD will be patched with a new review soon. I will integrate a contribution code that adds document type icon and a course reference within the result line. Result set needs to be reworked when searching with a non connected status.
Hide
Valery Fremaux added a comment -

Many fixes where achieved, including testing many missing or mismatched local indexing strategies.

A tricky problem remains affecting search query performances :

The ideal would be that we only check access on a result page. But enabling or disabling access changes the result set length itself, and thus affects page size and boundaries in the list of initial results.

I'am searching now a suitable algorithm to optimize the result page construction, avoiding as far as we can testing access on unneeded material.

An implementation of caching search results for browsing from page to page was kicked off by Michael Campanis, but not fully implemented so it is not operative. The actual version does not cache results so has to test back all the primary result list for each query. This is obviously time and power consuming.

Caching results seems being a necessity.

Another approach I'am seeking for is to calculate and transmit to browser real offsets of page boundaries, so that a page is the result of searching the next page_size valid results ahead in premary results, wherever they are. This will still not resolve the issue of calculating the effective result set size, defining how many pages we have.

Cheers and some headakes foreseen.

Show
Valery Fremaux added a comment - Many fixes where achieved, including testing many missing or mismatched local indexing strategies. A tricky problem remains affecting search query performances : The ideal would be that we only check access on a result page. But enabling or disabling access changes the result set length itself, and thus affects page size and boundaries in the list of initial results. I'am searching now a suitable algorithm to optimize the result page construction, avoiding as far as we can testing access on unneeded material. An implementation of caching search results for browsing from page to page was kicked off by Michael Campanis, but not fully implemented so it is not operative. The actual version does not cache results so has to test back all the primary result list for each query. This is obviously time and power consuming. Caching results seems being a necessity. Another approach I'am seeking for is to calculate and transmit to browser real offsets of page boundaries, so that a page is the result of searching the next page_size valid results ahead in premary results, wherever they are. This will still not resolve the issue of calculating the effective result set size, defining how many pages we have. Cheers and some headakes foreseen.
Hide
Martin Dougiamas added a comment -

Hi, it's hard for me to understand what yo are doing and what is planned.

Can you please post diff patches here for all proposed fixes in 1.9?

Show
Martin Dougiamas added a comment - Hi, it's hard for me to understand what yo are doing and what is planned. Can you please post diff patches here for all proposed fixes in 1.9?
Hide
Valery Fremaux added a comment -

Hi Martin,

little time available, but here is an up to date full diff for /search

Next to come is the full diff for /blocks/search

The real nice thing would be to check what where Eloy's changes in HEAD (sure be few) and have both code synced unless those little changes (1.9 is MO the best code available among the branches).

Show
Valery Fremaux added a comment - Hi Martin, little time available, but here is an up to date full diff for /search Next to come is the full diff for /blocks/search The real nice thing would be to check what where Eloy's changes in HEAD (sure be few) and have both code synced unless those little changes (1.9 is MO the best code available among the branches).
Hide
Valery Fremaux added a comment -

The other diff as required.

All announced features are in, unless still extensively untested by now.

Other coming features, such as MNET search where not put in, as being on a very early stage of development (quite complicated, in fact, because revamping many xml_rpc code...)

Cheers.

Show
Valery Fremaux added a comment - The other diff as required. All announced features are in, unless still extensively untested by now. Other coming features, such as MNET search where not put in, as being on a very early stage of development (quite complicated, in fact, because revamping many xml_rpc code...) Cheers.
Hide
Baruch Dov Sienna added a comment -

Although 'Books' is classified as an 'activity' , functionally, it is a resource (as is the Lesson module, I might add).
As we can envision a site with heavy use of 'books' being able to search the text would be most useful.
Can you put that as a high priority on the wish list!!
Thanks.

Show
Baruch Dov Sienna added a comment - Although 'Books' is classified as an 'activity' , functionally, it is a resource (as is the Lesson module, I might add). As we can envision a site with heavy use of 'books' being able to search the text would be most useful. Can you put that as a high priority on the wish list!! Thanks.

People

Vote (2)
Watch (2)

Dates

  • Created:
    Updated: