|
I can confirm the identical results on Moodle 1.8 Beta (Feb 14)
Linux Yep, it's not really finished yet, unfortunately.
The main issue, apart from a few bugs, is to make it work with Roles. I'm targeting this for 2.0 because of the PHP5 requirement. I made it run quite well (the datamodel in search/db does not install automatically), just insert the table prefix_search_documents and run indexor once. there are still a few bugs as Martin says such as using too deep library calls (should use get_record rather than get_recordset) among with some light XHTML constraints errors.
a temporary fix is downloadable as attached resource of the thread http://moodle.org/mod/forum/discuss.php?d=74975 May I, if Martin allows me, terminate some primary Roles integration to have the full text available sooner than 2.0. Martin, please could you send me a notice for Role minimal handling so I can spend effort on it ? More info : I informed Martin of a quasi availability of PDF and other bynary encoded file indexing in a few weeks, using xpdf and other content extractors. This is actually in pretty work. Just tried this, but no luck. After creating the mdl_search_documents table via PHPMySQL with 1 dummy field, the indexer (indexsplash.php, linked from the statistics page) gives this error:
Server Time: Sat, 30 Jun 2007 14:26:33 +0100 Warning: Indexing was not successfully completed last time, restarting. Using C:\MoodleWindowsInstaller-latest-16\moodle\moodledata/search as data directory. Using Moodle 1.8+. Any idea what I've missed? Hey Valery.
Well, the issue about permissions is that each piece of data indexed needs to have enough context information stored with it so that when the search results are displayed, each item can be checked to make sure that item should be displayed to the current user (the one doing the search). Some example scenarios that need to be coped with:
Here's a bit of a plan that I think will work: Each bit of information must have these things attached to it:
And additionally if we know it:
So the global search should return a big long list of all items that match items first in all parts of Moodle and return for further processing before display. Admins of course need no checks and can see everything so that should be fast. The first check will be on the $courseid, all items that do not match get_my_courses can be eliminated immediately. The second check for other users would be a has_capability($capability, $context) ... that will be fast because the current user has all this cached in memory (no database access required) so we'll eliminate a lot of stuff at once. If false then we can ignore this text item and leave it out from the list. If true then we keep checking. The third check would be to use a function from $path/lib.php (if one is defined) to do a further check on that text, passing all the info we have. eg xxxxx_check_text_access() This function would return true or false and would make the final definitive call. Most modules (like resources) wouldn't need such a function, it's only useful for complex access situations like forum. It would be terrific if you could implement this, Valery!!!!! Let me know if I can help further (eg CVS access etc) For Matt Gibson :
The message you get tells something was wrong with the table. The checkDB() method wich is called (in indexlib.php) tries to drop the table and to set it back as new. (Could you noticed the table disapeared ? it should). Note that this procedure is completely hidden to output, so yoi don't notice it's done. I noticed the SQL had some little errors on DEFAULTs. Should be : in search/db/mysql.sql : CREATE TABLE IF NOT EXISTS `prefix_search_documents` ( This could explain why this table was not created properly. I don't know if these defaults hang on all MySQL versions. I anticipate raising size of title and url fields that appear being to shorts in some experimentations. For Martin :
I just wrote a long technical discussion that I've just lost for login timeout in tracker !!. Sigh !! Try to make it shorter. I checked the possibility to add any field we need in indexing records of Lucene. Seems fine.
Note that I just patched the way of indexing block dependant information either, for those blocks which manage some amount of local information. One key question is about the necessity to record capabilityand context information in indexing records. The alternative scenario is : 1. We get indexing information out of Lucene (generic querying loop) This responsibility could be implemented in two locations :
This way of doing should allow the indexor engine to ignore all about capabilities and context, (which is simpler) and reports all this knowledge to the module (which seems being more cohesive). A rationale to this orientation is that if we agree in the view capability sapce should be kept as flat as possible, there is no way to prevent some developpers to use complex formula or combination of capabilities to access a piece of information. In which case pushing back into the module the task of checking for accesses "from anywhere" could let the architecture more versatile. Could you enlight me about that point ? If I understand you correctly, you are saying that the index should only store the $item information (eg location in the table) and this should be passed to the module to work out the $context?
I don't think we should do this because this task can be database-intensive, and would have to be repeated for EVERY search result. I think it would be better to shift this load to the indexing phase (which is infrequent and allowed to be slow) so that searches (which happen often) are as fast and light as possible. I expected something like this.
I'm sensibilized with load issues as I run my Moodle volume on an old 450MHtz Pentium III with 324Mo memory amount with an extra little machine as database (the only machines I can afford, as I have no political support in the institution I work, and being internal non-official competitor to the inner e-elarning home made project), and 1.8+ seems being much more time consuming. So I'll try to make the 'cleverest' 'contextual access information' caching I can find out after resolving the item identification and callback chaîn design. Once more I lost my (huge) post (session fallback while writing). Arghhhh !! I wont write it all.
Just to give you good news of search engine :
Indexing works great !!
Callback xxxx_check_text_access() is located in the document type description in /search/documents so that old code of modules and block should not be revised. Filtering chaîn works well. Now we should implement local policies for all standard activities, and test, test, test, test, document, document, document, document.... it all. Telegraphic style, but a bit tired today. Regards. Last news :
Global Search now indexes :
update.php delete.php add.php entirely reviewed, so were all xxxxx_document for complying to the information subclassing by 'itemtype', allowing complex modules to index more than one document type. Test in progress : updating/add/delete integrity using 'itemtype' subclassing. Internationalization integration completed. Will check what is to do for XMLDB compliance. Just curious (and not in the right spot...): Will global search/Lucene also index the course files (even if they are not linked but shown in for example folders)? Will it index the contents of the files? If yes, what filetypes will work?
Looking forward to it coming out of experimental! Thank you Valery for taking this on. Great! Kind regards, Hans (who has long ago learned to use Opera when typing long posts as it keep the form contents and fills them in when you press the back button after the timeout! Actually, file types handled by the version I spotted in the CVS are : MSWord, Powerpoint (PPT, maybe not all versions, a small empiric extractor I made quickly), PDF all versions, HTML, XML, full text.
Only the files that are published as "file" resources are indexed, leaving all stuff within file repositories unindexed (we could not really offer to users the way to see them cause the direct URL should not be used for these files). I should work a bit further to handle files in folders when these folders are published (and so opened) within a course content. I should also make some add-ons to index attachements in key modules such as WiKi or forums, or even data records where the record type is an uploaded file. Maybe could I get from Martin a nice method to put these requirements for voting. An experimental CVS version is in the HEAD repository, ready to download and test. Many french fellows have already agreed for testing regularily all the stuff. I am on 1.8.2+ and get this error... /moodle/search/tests/index.php
Fatal error: Call to a member function MoveNext() on a non-object in /home/tbshsweb/public_html/moodle/search/documents/resource_document.php on line 55 Sorry Antony,
this seems being older code, you may checkout back a renewed version of the search engine from the CVS, which is not published yet in distributions, even in 1.8.2. Note that this is still an experimental version that should be extensively tested by a community of french Moodlers in final August. any calls to MoveNext() (old direct call within an ADOdb recordset) should have disappeared, prefering the use of standard get_records(),insert_record(),delete_records() primitives. Hi Valery,
I'm on 1.8.2+ from CVS (updated yesterday) but I can't get any movement on this. Running the indexer from http://mymoodlesite.org/search/indexer.php?areyousure=yes leaves me with the same error about the database. I have tried pasting your SQL above into PHPMyAdmin (altering the prefix) but it still wipes the table every time as before: Database error. Please check settings/files. I'm on mysql 5.0.18 win2k3 I've reproduced the exact same error on my dev PC. I'lll check out this night and fix it.
Fixed in CVS
Fix report : bad default value format in search/db/mysql.sql for docdate - > must be 0 and not '0' presetting tables from search block is not necessary any more. Still no joy. I deleted both files from moodledata/search, made the table in mysql using the above (modified) SQL and updated the site from CVS (1.8.2+)
PHP error log gives: [12-Sep-2007 11:15:12] PHP Notice: Undefined property: stdClass::$search_index_size in C:\MoodleWindowsInstaller-latest-16\moodle\moodle\search\query.php on line 236 Many things :
SQL file seems not being up to date : should show : ... you should have an extra field : fix it in your local search/db/mysql.sql file in the meanwhile. Some pathes for include seem strange: [12-Sep-2007 11:15:20] PHP Warning: require_once(Zend/Search/Lucene/Exception.php) [<a href='function.require-once'>function.require-once</a>]: failed to open stream: No such file or directory in C:\MoodleWindowsInstaller-latest-16\moodle\moodle\search\Zend\Search\Lucene.php on line 23 line 23 should (and all relevant close to it) show : require_once $CFG->dirroot.'/search/Zend/Search/Lucene/Exception.php'; so that absolute and reliable path is used. If it is not the case, the Zend framework for global search has not be correctly updated. WindowsInstaller-latest-16 ?? What is your real Moodle version ? Search engine review has been tested only for 1.8 version and upper (I can test it on 1.7 eventually if needed). Could you check this ? The path is odd because I used the windows installer to start with and have found no easy way to upgrade its components (apache etc), so I have simply changed the moodle folder to a CVS checkout. I updated and it seemed to be fine, is this new code in HEAD, or in 19_BETA or in 18_STABLE? I will update again tomorrow and see what happens.
My CVS client setting seem to be pointing HEAD branch. Martin told he was updating 19_BETA branch with commitments. I'm newbee with CVS branches and I use WinCVS, wich is not obvious in working with branches.
I made a last check of consistence in code. Every seems being OK now and will run from a clean install of the "block_search" module. There is not any more "drop and create back" procedure for the indexing table. I put it away as we agreed with Martin and Eloy. Regenerating index table only clears it out. We can ensure nicer upgrades from now on. cron has been protected for php 4.3.0 environements so it will not be perturbated by Lucene exceptions catching in the cron hook. The worst that can happen is the search index update procedure aborting and letting other cron tasks be properly finished. Hello Valery,
Will this functionality be backported to 1.8 you think? Kind regards (and thank you for all your efforts...!), Hans de Zwart Don't know. It could be any way, as I test it on a 1.8+ version. HEAD code for /search and /blocks/search in CVS is fuilly compatible 1.8. Very useful stuff is in contrib/patches/global_search_libraries : it gives you several doctotext opensource converters for indexing physical files. Read README.txt for deployement.
I'm available anyway for help, use likely moodle.org messaging. I've got the contrib directories for search and blocks/search installed, and I have copied the antiword and xpdf libraries over from contrib/patches/global_search_libraries straight into /lib. Things seemed to go fine when I visited the notifications screen (although at that point I had not put the libraries in place). I've now a blank search in the search block but I get a fatal error:
[17-Sep-2007 09:26:13] PHP Fatal error: Call to undefined function build_navigation() in C:\MoodleWindowsInstaller-latest-16\moodle\moodle\search\query.php on line 140 Have I miised something? No it was mismatched in code, you are right (so sorry, in fact). Version of query.lib should be 1.16. I fixed it up in HEAD.
I will make some file comparing to check if other mismatches still remain. Hi,
I am trying to understand the current status of this tracker issue. I would like to help with testing, but I am experiencing much of what Matt Gibson describes. It appears that much of the functionality exists in the 1.9 nightly builds, although the ability to add global search as a block has been removed. I have make a test wiki page that does not have much more than the word wiki inside a wiki page. When I run test/index.php I get the following: Notice: /opt/lampp/htdocs/library0/search/documents/assignment_document.php does not exist, this module will not be indexed. Finished checking activity modules. Since the wiki is the only thing that has content, and very little at that it makes sense that that is the only module that reports it is ready for indexing. Performing the actual index gets me this: Server Time: Fri, 05 Oct 2007 06:16:13 -0400 Using /opt/lampp/htdocs/moodledata/search as data directory. 18 modules found. Processing module function data_get_content_for_index ... Processing module function forum_get_content_for_index ... Processing module function glossary_get_content_for_index ... Processing module function resource_get_content_for_index ... Processing module function wiki_get_content_for_index ... Finished activity modules The one wiki document does not appear to be indexed. Can I use search in its current state in 1.9, or is it still necessary to pull the some of the fixes Valery mentions in http://tracker.moodle.org/browse/MDL-8074#action_31877 I would be excited to help test the feature further. Thanks. Iterator errors seem to be normal. These module have an under implementation of iterators. I will check if we can get a cleaner behaviour.
We have a very satisfactory result of global search with Hans de Zwart, encouraging us to pursue Global Search finalizing. I personally don't know much about 1.9 distribution strategy. The search engine reengineering was made and is still tested over 1.8.2. I will check that point. Anyway, it is mandatory the accessory search block being installed, as its global configuration parameters are needed to enable indexation. Once installed, checking global configuration for that block will allow switching the indexing enabler, and configure pathes for the <format>totext converters. Could you make me some tigher reporting on the search_block problem ? Thanks. Reassigning to Yu, following Martin's comment about Yu fixing the global search block.
Just noticed that the global search block is not working on moodle.org (1.9 beta 2) - a search returns a blank page. The block has now been removed from the moodle.org homepage.
I got this working yesterday on 1.9 beta 2, after manually running the indexer from the statistics page. Two problems occurred: no pdfs were indexed, and it stopped at the wiki indexer (probably because I have nwiki 2.0) but apart from that, its fine.
just noticed that contrib/patches/global_search_libraries are not present in /lib in 1.9 beta 2. But after copying them over, I still have no pdfs indexed.
I'm also getting a whole bunch of errors like this one in the logs at indexing time:
[21-Nov-2007 10:28:49] SQL Column 'docid' cannot be null in C:\MoodleWindowsInstaller-latest-16\moodle\moodle\search\indexlib.php on line 232. STATEMENT: INSERT INTO mdl_block_search_documents ( DOCID, DOCTYPE, ITEMTYPE, TITLE, URL, DOCDATE, COURSEID, GROUPID ) VALUES ( null, 'forum', 'post', '', 'http://mymoodle.com/mod/forum/discuss.php?d=#', null, 66, null ) C:\MoodleWindowsInstaller-latest-16, 16 ??
I'll check the document_forum.php for making adequate records. Seems somethion wrong there.
Yu is no longer working on Moodle right? Should we maybe assign this to someone else please? Thanks!
assigning this to Valery...
About to close this issue (very old track). Everyone's OK ?
I agree that it works, but it is not (was not?) 100% clear how to get it to do so. Are the search libraries included with the standard distribution now? If they need to be downloaded from contrib, I'd say that it doesn't 'work' out of the box.
If a new user enables via the experimental screen and then adds the block, is that sufficient with no more actions to get it working? I've just tested this with 1.9.3 and 1.8.7, and it seems to work correctly although I couldn't get my PDF files indexed.
So it works "out-of-the-box", no need to import contrib code. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Moodle 1.7, Apache 2, PHP 5.1.2, MySQL 5.0.2