Issue Details (XML | Word | Printable)

Key: MDL-14646
Type: Improvement Improvement
Status: Open Open
Priority: Major Major
Assignee: Martin Dougiamas
Reporter: Valery Fremaux
Votes: 1
Watchers: 1
Operations

Add/Edit UI Mockup to this issue
If you were logged in you would be able to see more operations.
Moodle

Major improvements on Global search. See summary in tracker

Created: 02/May/08 05:38 AM   Updated: 25/Nov/08 12:56 AM
Component/s: Global search
Affects Version/s: 1.8.5, 1.9
Fix Version/s: None

File Attachments: 1. Text File fulldiff_block_search_19_040908.txt (11 kB)
2. Text File fulldiff_search_19_040908.txt (245 kB)

Issue Links:
Blockers
 
Dependency
 

Database: MySQL
Participants: Baruch Dov Sienna, Martin Dougiamas and Valery Fremaux
Security Level: None
Affected Branches: MOODLE_18_STABLE, MOODLE_19_STABLE


 Description  « Hide
Achieved and being tested a large set of improvements :

- User records indexation :
indexes three new documents
-- User description, indexes all users with description (could be pursued)
-- User blog posts, indexes the posts using subject, abstract and content
-- User blog attachments, depending on physical file indexability

- Assignement indexation
indexes assignement descriptions
tries to index assignment submission, but architectural issues on multiple uploads (in progress)

- Search API pluggability improved
-- allows detecting searchable third-party plugins, and delegates to plugin the search related implementation
-- Techproject spitted out from core search strategy, as being third -party. Used for testing above

- Extensible physical handling
-- allows adding configuration parameters to launch converters without having to modify config_global.htm
Note : it is still necessary to code and add a physical_XXX.php handler in /search/documents

- Enhanced indexer configuration
Allows to enable or disable by configuration modules to be indexed. This adds a great deal of flexibility in indexer, and allows disabling locally struggled components. (Asked by Matt Gibson in MDL-12271)

- UTF8 fixes and straithening
Forces to construct UTF8 compatible Lucene instances
Checks UTF8 back links
Fixes an UTF8 issue in querylib.php avoiding searches with special utf8 chars to match

In progress :
Tests on 1.9

Question : how to proceed for commitments ? I suggest commiting in HEAD before code review, and wait feedback for stability status.

 All   Comments   Change History   Version Control      Sort Order: Ascending order - Click to sort in descending order
Valery Fremaux added a comment - 02/May/08 05:43 AM
Oups, how to remove that "blocker" link that has no sense ?

Valery Fremaux added a comment - 02/May/08 06:39 AM
Other improvements I forgot :

all physical handlers have been revisited so they could be reused to index any attachement in any module, and not only resources.

Physical handling extensino was tried with Adobe Search SDK. Although non GPL, there would be a provision for non standard SWF indexing, with sufficiant advertisment for user. Tim William might distribute this "not so free" pack with autoview.


Martin Dougiamas added a comment - 02/May/08 02:25 PM - edited
Great! Yes, please put these in HEAD so people can test (GPL-code only, other stuff might have to be separate). If it's safe, we might port back to 1.9.1.

Valery Fremaux added a comment - 02/May/08 08:13 PM
All files commited in HEAD.

Note a particular proceeding for setup that ought to be commented in doc :

When changing the list of allowed extensions (adding some extra extensions) will be created additional config keys to setup system command line, and an optional environment variable.

As I did not use Ajax nor Javascript for updating interatively the form, there is a need to first save the altered extensions list, and then go back to the setup form to have the additional parameters available.

This should be the case (tested on my dev 1.8.4) for SWF handling, where the lib should be added to <%%moodleroot%%>/lib as "swfconverters" subdirectory, and subsequently binded in the search setup screen using a command line such as "lib/swfconverters/windows/swf2html.exe" (Windows example - No env variable needed).

Note 2 : as Adobe Search libs should not be distributed along, all references to this lib pack is given where relevant as http://www.adobe.com/licensing/developer/ for ones who want to test. works fine.

Cheers.


Valery Fremaux added a comment - 24/May/08 05:16 AM
Incomplete implementation.

Need finishing query side aspects of getting third-party modules outside of core search engine.

I am actually getting some stuff simpler, shooting out some useless constants.

HEAD will be patched with a new review soon.

I will integrate a contribution code that adds document type icon and a course reference within the result line.

Result set needs to be reworked when searching with a non connected status.


Valery Fremaux added a comment - 24/May/08 08:02 AM
Many fixes where achieved, including testing many missing or mismatched local indexing strategies.

A tricky problem remains affecting search query performances :

The ideal would be that we only check access on a result page. But enabling or disabling access changes the result set length itself, and thus affects page size and boundaries in the list of initial results.

I'am searching now a suitable algorithm to optimize the result page construction, avoiding as far as we can testing access on unneeded material.

An implementation of caching search results for browsing from page to page was kicked off by Michael Campanis, but not fully implemented so it is not operative. The actual version does not cache results so has to test back all the primary result list for each query. This is obviously time and power consuming.

Caching results seems being a necessity.

Another approach I'am seeking for is to calculate and transmit to browser real offsets of page boundaries, so that a page is the result of searching the next page_size valid results ahead in premary results, wherever they are. This will still not resolve the issue of calculating the effective result set size, defining how many pages we have.

Cheers and some headakes foreseen.


Martin Dougiamas added a comment - 04/Aug/08 10:51 AM
Hi, it's hard for me to understand what yo are doing and what is planned.

Can you please post diff patches here for all proposed fixes in 1.9?


Valery Fremaux added a comment - 05/Sep/08 06:23 AM
Hi Martin,

little time available, but here is an up to date full diff for /search

Next to come is the full diff for /blocks/search

The real nice thing would be to check what where Eloy's changes in HEAD (sure be few) and have both code synced unless those little changes (1.9 is MO the best code available among the branches).


Valery Fremaux added a comment - 05/Sep/08 06:26 AM
The other diff as required.

All announced features are in, unless still extensively untested by now.

Other coming features, such as MNET search where not put in, as being on a very early stage of development (quite complicated, in fact, because revamping many xml_rpc code...)

Cheers.


Baruch Dov Sienna added a comment - 25/Nov/08 12:56 AM
Although 'Books' is classified as an 'activity' , functionally, it is a resource (as is the Lesson module, I might add).
As we can envision a site with heavy use of 'books' being able to search the text would be most useful.
Can you put that as a high priority on the wish list!!
Thanks.