Issue Details (XML | Word | Printable)

Key: MDL-14646
Type: Improvement Improvement
Status: Open Open
Priority: Major Major
Assignee: Martin Dougiamas
Reporter: Valery Fremaux
Votes: 1
Watchers: 1
Operations

Add/Edit UI Mockup to this issue
If you were logged in you would be able to see more operations.
Moodle

Major improvements on Global search. See summary in tracker

Created: 02/May/08 05:38 AM   Updated: 25/Nov/08 12:56 AM
Return to search
Component/s: Global search
Affects Version/s: 1.8.5, 1.9
Fix Version/s: None

File Attachments: 1. Text File fulldiff_block_search_19_040908.txt (11 kB)
2. Text File fulldiff_search_19_040908.txt (245 kB)

Issue Links:
Blockers
 
Dependency
 

Database: MySQL
Participants: Baruch Dov Sienna, Martin Dougiamas and Valery Fremaux
Security Level: None
Affected Branches: MOODLE_18_STABLE, MOODLE_19_STABLE


 Description  « Hide
Achieved and being tested a large set of improvements :

- User records indexation :
indexes three new documents
-- User description, indexes all users with description (could be pursued)
-- User blog posts, indexes the posts using subject, abstract and content
-- User blog attachments, depending on physical file indexability

- Assignement indexation
indexes assignement descriptions
tries to index assignment submission, but architectural issues on multiple uploads (in progress)

- Search API pluggability improved
-- allows detecting searchable third-party plugins, and delegates to plugin the search related implementation
-- Techproject spitted out from core search strategy, as being third -party. Used for testing above

- Extensible physical handling
-- allows adding configuration parameters to launch converters without having to modify config_global.htm
Note : it is still necessary to code and add a physical_XXX.php handler in /search/documents

- Enhanced indexer configuration
Allows to enable or disable by configuration modules to be indexed. This adds a great deal of flexibility in indexer, and allows disabling locally struggled components. (Asked by Matt Gibson in MDL-12271)

- UTF8 fixes and straithening
Forces to construct UTF8 compatible Lucene instances
Checks UTF8 back links
Fixes an UTF8 issue in querylib.php avoiding searches with special utf8 chars to match

In progress :
Tests on 1.9

Question : how to proceed for commitments ? I suggest commiting in HEAD before code review, and wait feedback for stability status.

 All   Comments   Change History   Version Control      Sort Order: Ascending order - Click to sort in descending order
Valery Fremaux made changes - 02/May/08 05:39 AM
Field Original Value New Value
Link This issue will help resolve MDL-12271 [ MDL-12271 ]
Valery Fremaux made changes - 02/May/08 05:40 AM
Link This issue will help resolve MDL-12306 [ MDL-12306 ]
Valery Fremaux made changes - 02/May/08 05:40 AM
Link This issue blocks MDL-12324 [ MDL-12324 ]
Valery Fremaux made changes - 02/May/08 05:42 AM
Link This issue will help resolve MDL-12324 [ MDL-12324 ]
Valery Fremaux added a comment - 02/May/08 05:43 AM
Oups, how to remove that "blocker" link that has no sense ?

Valery Fremaux added a comment - 02/May/08 06:39 AM
Other improvements I forgot :

all physical handlers have been revisited so they could be reused to index any attachement in any module, and not only resources.

Physical handling extensino was tried with Adobe Search SDK. Although non GPL, there would be a provision for non standard SWF indexing, with sufficiant advertisment for user. Tim William might distribute this "not so free" pack with autoview.


Martin Dougiamas added a comment - 02/May/08 02:25 PM - edited
Great! Yes, please put these in HEAD so people can test (GPL-code only, other stuff might have to be separate). If it's safe, we might port back to 1.9.1.

diml committed 122 files to 'Moodle CVS' - 02/May/08 07:58 PM
Commiting all changes reported in MDL-14646
MODIFY search/documents/glossary_document.php   Rev. 1.12    (+1 -1 lines)
MODIFY search/Zend/Search/Lucene/Analysis/Analyzer/Common/Text.php   Rev. 1.4    (+3 -1 lines)
MODIFY search/Zend/Search/Lucene/Index/SegmentInfoPriorityQueue.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Analysis/Token.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/LISEZMOI.txt   Rev. 1.4    (+40 -40 lines)
MODIFY search/Zend/Search/Lucene/Analysis/TokenFilter/ShortWords.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Storage/Directory/Filesystem.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Analysis/Analyzer/Common/Utf8Num/CaseInsensitive.php   Rev. 1.2    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Analysis/Analyzer/Common/Utf8.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Analysis/Analyzer/Common/Utf8.php   Rev. 1.3    (+1 -1 lines)
ADD search/searchtypes.php   Rev. 1.1    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Index/DictionaryLoader.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Search/QueryLexer.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Index/SegmentWriter/StreamWriter.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Search/Query/Wildcard.php   Rev. 1.2    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Index/Term.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Analysis/Analyzer/Common.php   Rev. 1.4    (+0 -0 lines)
ADD search/documents/user_document.php   Rev. 1.1    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Analysis/Analyzer.php   Rev. 1.6    (+0 -0 lines)
MODIFY blocks/search/config_global.html   Rev. 1.6    (+169 -13 lines)
MODIFY search/Zend/Search/Lucene/Analysis/TokenFilter/LowerCase.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Search/Weight.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Search/QueryToken.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene.php   Rev. 1.5    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Search/Weight/Empty.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Analysis/TokenFilter.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Document.php   Rev. 1.5    (+0 -0 lines)
MODIFY search/Zend/Exception.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Index/TermInfo.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/README.txt   Rev. 1.11    (+85 -153 lines)
MODIFY search/delete.php   Rev. 1.12    (+7 -0 lines)
MODIFY search/Zend/Search/Lucene/Search/Weight/Boolean.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Search/Query/Empty.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/FSM.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Analysis/Analyzer/Common/Text/CaseInsensitive.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/tests/index.php   Rev. 1.11    (+123 -75 lines)
MODIFY search/documents/physical_html.php   Rev. 1.3    (+2 -2 lines)
MODIFY search/Zend/Search/Lucene/Search/QueryEntry/Subquery.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Analysis/Analyzer/Common/Utf8Num.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Analysis/TokenFilter/LowerCaseUtf8.php   Rev. 1.2    (+3 -2 lines)
MODIFY search/Zend/Search/Lucene/Search/Query/Boolean.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Exception.php   Rev. 1.5    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Search/QueryEntry/Phrase.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Exception.php   Rev. 1.4    (+1 -1 lines)
MODIFY search/Zend/Search/Lucene/Search/Query/Term.php   Rev. 1.5    (+0 -0 lines)
MODIFY blocks/search/block_search.php   Rev. 1.15    (+1 -1 lines)
MODIFY search/Zend/Search/Lucene/Search/QueryParserContext.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Exception.php   Rev. 1.5    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/LockManager.php   Rev. 1.2    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Index/Writer.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Search/QueryEntry/Term.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Search/Query/MultiTerm.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/query.php   Rev. 1.24    (+107 -83 lines)
MODIFY search/indexersplash.php   Rev. 1.16    (+13 -1 lines)
MODIFY search/Zend/Search/Lucene/Search/BooleanExpressionRecognizer.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Analysis/Analyzer/Common/Text.php   Rev. 1.5    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Search/QueryParser.php   Rev. 1.4    (+15 -15 lines)
MODIFY search/Zend/Search/Lucene/Field.php   Rev. 1.5    (+0 -0 lines)
MODIFY search/querylib.php   Rev. 1.10    (+1 -1 lines)
MODIFY search/indexer.php   Rev. 1.21    (+45 -23 lines)
MODIFY search/Zend/Search/Lucene/Search/Weight/Phrase.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/PriorityQueue.php   Rev. 1.3    (+0 -0 lines)
ADD search/README_ARCHIVE.txt   Rev. 1.1    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Index/SegmentWriter.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Interface.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/stats.php   Rev. 1.17    (+13 -3 lines)
MODIFY search/documents/resource_document.php   Rev. 1.16    (+2 -1 lines)
MODIFY search/documents/chat_document.php   Rev. 1.8    (+1 -1 lines)
MODIFY search/Zend/Search/Lucene/Index/FieldInfo.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Index/SegmentInfo.php   Rev. 1.5    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Index/SegmentWriter/DocumentWriter.php   Rev. 1.3    (+0 -0 lines)
ADD search/documents/assignment_document.php   Rev. 1.1    (+0 -0 lines)
MODIFY search/Zend/LICENSE.txt   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Search/Query/Fuzzy.php   Rev. 1.2    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Search/Query/Range.php   Rev. 1.2    (+0 -0 lines)
MODIFY search/cron.php   Rev. 1.14    (+5 -1 lines)
MODIFY search/Zend/Search/Lucene/Search/Similarity.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Storage/File/Memory.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/documents/forum_document.php   Rev. 1.14    (+2 -1 lines)
ADD search/documents/physical_swf.php   Rev. 1.1    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Search/Query.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/Zend/IMPORTANT.txt   Rev. 1.5    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene.php   Rev. 1.4    (+17 -18 lines)
MODIFY search/Zend/Search/Lucene/Search/Weight/Term.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Analysis/Analyzer.php   Rev. 1.5    (+10 -10 lines)
MODIFY search/Zend/Search/Lucene/FSMAction.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Search/Query/Term.php   Rev. 1.4    (+2 -2 lines)
MODIFY search/Zend/Search/Lucene/Document/Html.php   Rev. 1.3    (+0 -0 lines)
DEL search/documents/Attic/techproject_document.php   Rev. 1.8    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Document.php   Rev. 1.4    (+1 -1 lines)
MODIFY search/Zend/Search/Exception.php   Rev. 1.4    (+1 -1 lines)
MODIFY search/Zend/Search/Lucene/Analysis/Analyzer/Common/Utf8/CaseInsensitive.php   Rev. 1.2    (+0 -0 lines)
MODIFY search/lib.php   Rev. 1.17    (+92 -18 lines)
MODIFY search/Zend/Search/Lucene/Proxy.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Search/QueryHit.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/documents/physical_doc.php   Rev. 1.6    (+10 -9 lines)
MODIFY search/Zend/Search/Lucene/Storage/File/Filesystem.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/documents/physical_pdf.php   Rev. 1.8    (+10 -9 lines)
MODIFY search/documents/physical_txt.php   Rev. 1.3    (+7 -2 lines)
MODIFY search/Zend/Search/Lucene/Search/Weight/MultiTerm.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Analysis/TokenFilter/StopWords.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Search/Query/Insignificant.php   Rev. 1.2    (+0 -0 lines)
MODIFY search/documents/lesson_document.php   Rev. 1.7    (+1 -1 lines)
MODIFY search/Zend/Search/Lucene/Search/Similarity/Default.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Analysis/TokenFilter/LowerCaseUtf8.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/documents/physical_ppt.php   Rev. 1.4    (+6 -2 lines)
DEL search/Attic/READMETOO.txt   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Search/QueryEntry.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Search/QueryParserException.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Analysis/Analyzer/Common/TextNum.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/documents/physical_xml.php   Rev. 1.3    (+6 -2 lines)
MODIFY search/documents/data_document.php   Rev. 1.8    (+1 -1 lines)
MODIFY search/Zend/Search/Lucene/Storage/File.php   Rev. 1.5    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Index/SegmentMerger.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/TODO.txt   Rev. 1.3    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Search/Query/Phrase.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Storage/Directory.php   Rev. 1.4    (+0 -0 lines)
MODIFY search/Zend/Search/Lucene/Search/QueryParser.php   Rev. 1.5    (+0 -0 lines)
MODIFY search/documents/physical_htm.php   Rev. 1.7    (+7 -3 lines)
MODIFY search/Zend/Search/Lucene/Analysis/Analyzer/Common/TextNum/CaseInsensitive.php   Rev. 1.3    (+0 -0 lines)
MODIFY search/update.php   Rev. 1.11    (+10 -1 lines)
MODIFY search/add.php   Rev. 1.11    (+9 -2 lines)
diml committed 3 files to 'Moodle CVS' - 02/May/08 08:03 PM
Commiting all changes reported in MDL-14646
MODIFY lang/en_utf8/block_search.php   Rev. 1.3    (+10 -0 lines)
MODIFY blocks/search/db/install.xml   Rev. 1.7    (+2 -2 lines)
MODIFY blocks/search/db/upgrade.php   Rev. 1.7    (+2 -57 lines)
Valery Fremaux added a comment - 02/May/08 08:13 PM
All files commited in HEAD.

Note a particular proceeding for setup that ought to be commented in doc :

When changing the list of allowed extensions (adding some extra extensions) will be created additional config keys to setup system command line, and an optional environment variable.

As I did not use Ajax nor Javascript for updating interatively the form, there is a need to first save the altered extensions list, and then go back to the setup form to have the additional parameters available.

This should be the case (tested on my dev 1.8.4) for SWF handling, where the lib should be added to <%%moodleroot%%>/lib as "swfconverters" subdirectory, and subsequently binded in the search setup screen using a command line such as "lib/swfconverters/windows/swf2html.exe" (Windows example - No env variable needed).

Note 2 : as Adobe Search libs should not be distributed along, all references to this lib pack is given where relevant as http://www.adobe.com/licensing/developer/ for ones who want to test. works fine.

Cheers.


diml committed 1 file to 'Moodle CVS' - 02/May/08 08:22 PM
Commiting all changes reported in MDL-14646
MODIFY search/documents/user_document.php   Rev. 1.2    (+1 -1 lines)
diml committed 2 files to 'Moodle CVS' - 02/May/08 11:23 PM
Commiting all changes reported in MDL-14646 - fixes stat report that could not show any third-party related result.
MODIFY search/indexlib.php   Rev. 1.11    (+2 -1 lines)
MODIFY search/lib.php   Rev. 1.18    (+11 -7 lines)
Mitsuhiro Yoshida committed 2 files to 'Lang CVS' - 03/May/08 02:19 PM
Translated new strings for block search MDL-14646.
MODIFY ja_utf8/README   Rev. 1.699    (+1 -1 lines)
MODIFY ja_utf8/block_search.php   Rev. 1.3    (+11 -1 lines)
martignoni committed 1 file to 'Lang CVS' - 04/May/08 11:22 PM
New strings for MDL-14646
MODIFY fr_utf8/block_search.php   Rev. 1.5    (+11 -1 lines)
Valery Fremaux added a comment - 24/May/08 05:16 AM
Incomplete implementation.

Need finishing query side aspects of getting third-party modules outside of core search engine.

I am actually getting some stuff simpler, shooting out some useless constants.

HEAD will be patched with a new review soon.

I will integrate a contribution code that adds document type icon and a course reference within the result line.

Result set needs to be reworked when searching with a non connected status.


Valery Fremaux added a comment - 24/May/08 08:02 AM
Many fixes where achieved, including testing many missing or mismatched local indexing strategies.

A tricky problem remains affecting search query performances :

The ideal would be that we only check access on a result page. But enabling or disabling access changes the result set length itself, and thus affects page size and boundaries in the list of initial results.

I'am searching now a suitable algorithm to optimize the result page construction, avoiding as far as we can testing access on unneeded material.

An implementation of caching search results for browsing from page to page was kicked off by Michael Campanis, but not fully implemented so it is not operative. The actual version does not cache results so has to test back all the primary result list for each query. This is obviously time and power consuming.

Caching results seems being a necessity.

Another approach I'am seeking for is to calculate and transmit to browser real offsets of page boundaries, so that a page is the result of searching the next page_size valid results ahead in premary results, wherever they are. This will still not resolve the issue of calculating the effective result set size, defining how many pages we have.

Cheers and some headakes foreseen.


Martin Dougiamas added a comment - 04/Aug/08 10:51 AM
Hi, it's hard for me to understand what yo are doing and what is planned.

Can you please post diff patches here for all proposed fixes in 1.9?


Valery Fremaux added a comment - 05/Sep/08 06:23 AM
Hi Martin,

little time available, but here is an up to date full diff for /search

Next to come is the full diff for /blocks/search

The real nice thing would be to check what where Eloy's changes in HEAD (sure be few) and have both code synced unless those little changes (1.9 is MO the best code available among the branches).


Valery Fremaux made changes - 05/Sep/08 06:23 AM
Attachment fulldiff_search_19_040908.txt [ 15064 ]
Valery Fremaux added a comment - 05/Sep/08 06:26 AM
The other diff as required.

All announced features are in, unless still extensively untested by now.

Other coming features, such as MNET search where not put in, as being on a very early stage of development (quite complicated, in fact, because revamping many xml_rpc code...)

Cheers.


Valery Fremaux made changes - 05/Sep/08 06:26 AM
Baruch Dov Sienna added a comment - 25/Nov/08 12:56 AM
Although 'Books' is classified as an 'activity' , functionally, it is a resource (as is the Lesson module, I might add).
As we can envision a site with heavy use of 'books' being able to search the text would be most useful.
Can you put that as a high priority on the wish list!!
Thanks.