Issue Details (XML | Word | Printable)

Key: MDL-8830
Type: New Feature New Feature
Status: Open Open
Priority: Critical Critical
Assignee: Eloy Lafuente (stronk7)
Reporter: Eloy Lafuente (stronk7)
Votes: 1
Watchers: 2
Operations

Add/Edit UI Mockup to this issue
If you were logged in you would be able to see more operations.
Moodle

Add support for TEXT indexes within XMLDB

Created: 10/Mar/07 01:11 AM   Updated: 05/Apr/08 02:48 PM
Return to search
Component/s: Database SQL/XMLDB
Affects Version/s: 2.0
Fix Version/s: 2.0

Database: Any
Participants: Eloy Lafuente (stronk7), Martin Dougiamas and Valery Fremaux
Security Level: None
Affected Branches: MOODLE_20_STABLE
Fixed Branches: MOODLE_20_STABLE


 Description  « Hide
There are some places where relational searches aren't good at all, specially where looking against large text sets (forums, glossaries...).

To fix this we must use documental (fulltext) indexes. They allow to perform "instant" searches against text.

So, For 1.9 something like this must be done:

1- Analyze and document each RDBMS documental index support.
2- Implement creation and drop of documental indexes from XMLDB generators (XMLDB schema, XML files and, finally, generators)
3- Extend XMLDB tests to check this functionality (creation, searching and drop).
4- Make text_index_exists() to work when looking for this type of indexes.
5- Add support for doc. indexes within the XMLDB Editor (parser, editor...)
6- Modify searchlib (search_generate_text_SQL() function) to be able to generate the correct SQL statements under each RDBMS.
7- Create a bunch of settings ($CFG->forum_usetextsearches....) to tell Moodle where to perform documental searches, code should be able to determinate when the underlying index exists.
8- Modify cron by calling one database_cron() function to be able to perform indexes maintenance (sync, optimisation...).

That's all, for now.

For Moodle 1.8 a basic documental indexes functionality has been implemented to work exclusively under MySQL and forum searches. It requires manual creation of the index before enabling it. Look to forum_search_posts() for some more basic info.

 All   Comments   Change History   Version Control      Sort Order: Ascending order - Click to sort in descending order
Martin Dougiamas added a comment - 11/Mar/07 10:12 PM
Well ... remember we have the global search subproject which uses it's own index.

The Lucene engine is pretty powerful, allows consistent searching across all parts of Moodle, and most of the work is done (in moodle/search), however it's going to be tricky bringing roles into the picture (but possible I think).

Shouldn't we focus on that instead? Or should we change direction and use the native database for searching?


Eloy Lafuente (stronk7) added a comment - 11/Mar/07 11:27 PM
+1 for lucene if possible to adapt if for roles (I hope it'll be cross-db compatible, at least it's in other languages where I've played a bit with).

I thought that (don't ask why) that such project was semi-dead and we could start supporting our own implementation, specially if we want to index uploaded files (perhaps this is the KEY question) and if lucene works well with non-spaced languages (like japanese).

Those (upload files + non-spaced languages) are the two major drawbacks of implementing our own indexes. Does Lucene uses its own index format (file based) or does it use DB specific indexes... uhm.

Too much questions! Let's halt this until 1.8 release and then examine a bit more about how lucene is working.... oki?


Martin Dougiamas added a comment - 12/Mar/07 10:50 AM
I've turned this on for moodle.org and WOW! That is heaps faster now! :-D

Times for some searches I tried have dropped from 11 seconds to about 1 second !

+1000 for better handling of this in 1.9.


Martin Dougiamas added a comment - 12/Mar/07 11:45 AM
More info about Lucene vs MySQL for the global search: http://jayant7k.blogspot.com/2006/05/mysql-fulltext-search-versus-lucene.html

Eloy Lafuente (stronk7) added a comment - 12/Mar/07 04:57 PM
Please,

note that current experimental implementation is far from perfect because, to mimic current behaviour and semantics of relational searching we are executing all the searches by appending the "*" wildchar to all the terms. It reduces real benefit a lot (compared with the execution without wildchars).

That's because I proposed an alternative search form + help file to be showed describing the new semantics used when the documental search is enabled. While the search with wildchars can be a good idea in sites with fewer results (more records are found), sites like moodle.org (+250.000 posts) will return enough information just searching by "exact" words, that is, in fact, the default search mode worldwide!

Anyway, I'll start documenting about the lucene-php-mysql thing after release... that seems to be the preferred way (if we don't want to apply such changes in semantics for now).



Eloy Lafuente (stronk7) added a comment - 14/May/07 07:47 AM
Quick note:

last week I've been playing with 300.000 posts and Zend Lucene. Some numbers:

  • got 1000+ indexed documents/minute.
  • most searches resolved in less than 1 sec, with 1-3 more secs needed to fetch and render one page of results
  • absolutely easy to customize for our needs (UTF-8, keeping numbers, versions... in the index) all this working with < 100 lines code.
  • easy to keep synchronised with one intemediate "lucene_documents" relational table (where all the modules will write the actions to be performed).
  • relatively easy to support groups/users/allowed forums in the documental searches.

but I've found some problems with current Zend Ludene implementation:

  • relies on old lucene 1.4 code.
  • index format has been dramatically improved for Lucene 2.0-2.1
  • tons of memory needed (> 120MB) to fully optimize the index (and this should be executed, at least, weekly, to avoid fragmentation and reduce number of segments).
  • after looking bugs and activity... not sure if it's

Uhm... just to point to another alternative... could we "reinvent the wheel" ? Drupal seems to have one interesting "hand-made" documental index for all its contents... perhaps it would be enough for our "token based searches"....

Ciao


Eloy Lafuente (stronk7) added a comment - 19/Sep/07 06:33 AM - edited
Moving this to 2.0.

Will Lucene be the final solution...uhm... ciao


Eloy Lafuente (stronk7) added a comment - 05/Apr/08 10:21 AM
Just updating this to comment that there are new downloads of the Zend Framework. And it also contains a very-much improved version of the Lucene engine (using the long awaited 2.1 index formats + big memory improvements).

Link: http://framework.zend.com/download

Ciao


Valery Fremaux added a comment - 05/Apr/08 02:48 PM
Thanks Eloy, I will try it soon in my dev instance.