|
+1 for lucene if possible to adapt if for roles (I hope it'll be cross-db compatible, at least it's in other languages where I've played a bit with).
I thought that (don't ask why) that such project was semi-dead and we could start supporting our own implementation, specially if we want to index uploaded files (perhaps this is the KEY question) and if lucene works well with non-spaced languages (like japanese). Those (upload files + non-spaced languages) are the two major drawbacks of implementing our own indexes. Does Lucene uses its own index format (file based) or does it use DB specific indexes... uhm. Too much questions! Let's halt this until 1.8 release and then examine a bit more about how lucene is working.... oki? I've turned this on for moodle.org and WOW! That is heaps faster now! :-D
Times for some searches I tried have dropped from 11 seconds to about 1 second ! +1000 for better handling of this in 1.9. More info about Lucene vs MySQL for the global search: http://jayant7k.blogspot.com/2006/05/mysql-fulltext-search-versus-lucene.html
Please,
note that current experimental implementation is far from perfect because, to mimic current behaviour and semantics of relational searching we are executing all the searches by appending the "*" wildchar to all the terms. It reduces real benefit a lot (compared with the execution without wildchars). That's because I proposed an alternative search form + help file to be showed describing the new semantics used when the documental search is enabled. While the search with wildchars can be a good idea in sites with fewer results (more records are found), sites like moodle.org (+250.000 posts) will return enough information just searching by "exact" words, that is, in fact, the default search mode worldwide! Anyway, I'll start documenting about the lucene-php-mysql thing after release... that seems to be the preferred way (if we don't want to apply such changes in semantics for now). Quick note:
last week I've been playing with 300.000 posts and Zend Lucene. Some numbers:
but I've found some problems with current Zend Ludene implementation:
Uhm... just to point to another alternative... could we "reinvent the wheel" ? Drupal seems to have one interesting "hand-made" documental index for all its contents... perhaps it would be enough for our "token based searches".... Ciao Moving this to 2.0.
Will Lucene be the final solution...uhm... ciao Just updating this to comment that there are new downloads of the Zend Framework. And it also contains a very-much improved version of the Lucene engine (using the long awaited 2.1 index formats + big memory improvements).
Link: http://framework.zend.com/download Ciao Thanks Eloy, I will try it soon in my dev instance.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
The Lucene engine is pretty powerful, allows consistent searching across all parts of Moodle, and most of the work is done (in moodle/search), however it's going to be tricky bringing roles into the picture (but possible I think).
Shouldn't we focus on that instead? Or should we change direction and use the native database for searching?