Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 2.2
    • Fix Version/s: 2.2
    • Component/s: Gradebook
    • Labels:
    • Environment:
      qa.moodle.net

      Description

      The following unit tests are failing on qa.moodle.net. The same failures did not appear on my local test server.

      Fail: grade/grading/simpletest/testlib.php / ► grading_manager_test / ► test_tokenize
      at [/html/grade/grading/simpletest/testlib.php line 137]
      Fail: grade/grading/simpletest/testlib.php / ► grading_manager_test / ► test_tokenize
      at [/html/grade/grading/simpletest/testlib.php line 138] 

      There was no further information about these failures.

        Gliffy Diagrams

          Issue Links

            Activity

            Hide
            Michael de Raadt added a comment -

            It would be good to get some more information about this and if it can be replicated elsewhere.

            Show
            Michael de Raadt added a comment - It would be good to get some more information about this and if it can be replicated elsewhere.
            Hide
            Andrew Davis added a comment -

            Ive run the unit tests and unfortunately cannot reproduce this which will make fixing it interesting.

            The unit tests that are failing on qa.moodle.net are the 2nd and 3rd in this group (the two in_array() tests)

            $needle = "    šašek, \n\n   \r    a král;  \t";
                    $tokens = testable_grading_manager::tokenize($needle);
                    $this->assertEqual(2, count($tokens));
                    $this->assertTrue(in_array('šašek', $tokens));
                    $this->assertTrue(in_array('král', $tokens));
            

            Show
            Andrew Davis added a comment - Ive run the unit tests and unfortunately cannot reproduce this which will make fixing it interesting. The unit tests that are failing on qa.moodle.net are the 2nd and 3rd in this group (the two in_array() tests) $needle = " šašek, \n\n \r a král; \t"; $tokens = testable_grading_manager::tokenize($needle); $this->assertEqual(2, count($tokens)); $this->assertTrue(in_array('šašek', $tokens)); $this->assertTrue(in_array('král', $tokens));
            Hide
            Sam Hemelryk added a comment -

            Hi Andrew,
            I can reproduce the error on line 137.
            A quick debug of the needle and the tokens variables shows the following:

            Array
            (
                [needle] =>     šašek, 
             
               
                a král;  	
                [tokens] => Array
                    (
                        [0] => ek
                        [1] => král
                    )
            )
            

            Let me know what other information you would like.

            Show
            Sam Hemelryk added a comment - Hi Andrew, I can reproduce the error on line 137. A quick debug of the needle and the tokens variables shows the following: Array ( [needle] => šašek,   a král; [tokens] => Array ( [0] => ek [1] => král ) ) Let me know what other information you would like.
            Hide
            Andrew Davis added a comment - - edited

            Hi Sam. Can you go to /admin/phpinfo.php and copy paste the pcre and intl sections.

            My computer (UNIT TESTS PASS)

            PCRE
            PCRE (Perl Compatible Regular Expressions) Support 	enabled
            PCRE Library Version 	8.12 2011-01-15
             
            Directive	Local Value	Master Value
            pcre.backtrack_limit	100000	100000
            pcre.recursion_limit	100000	100000
             
            INTL
            not present
            

            update: I installed the intl extension and it made no difference (tests still pass)

            Internationalization support	enabled
            version 	1.1.0
            ICU version 	4.4.2 
            

            qa.moodle.net UNIT TESTS FAIL

            PCRE
            PCRE (Perl Compatible Regular Expressions) Support 	enabled
            PCRE Library Version 	6.6 06-Feb-2006
             
            Directive	Local Value	Master Value
            pcre.backtrack_limit	20971520	1000000
            pcre.recursion_limit	100000	100000
             
            intl
            Internationalization support	enabled
            version 	1.1.0
            ICU version 	3.6 
            

            Show
            Andrew Davis added a comment - - edited Hi Sam. Can you go to /admin/phpinfo.php and copy paste the pcre and intl sections. My computer (UNIT TESTS PASS) PCRE PCRE (Perl Compatible Regular Expressions) Support enabled PCRE Library Version 8.12 2011-01-15   Directive Local Value Master Value pcre.backtrack_limit 100000 100000 pcre.recursion_limit 100000 100000   INTL not present update: I installed the intl extension and it made no difference (tests still pass) Internationalization support enabled version 1.1.0 ICU version 4.4.2 qa.moodle.net UNIT TESTS FAIL PCRE PCRE (Perl Compatible Regular Expressions) Support enabled PCRE Library Version 6.6 06-Feb-2006   Directive Local Value Master Value pcre.backtrack_limit 20971520 1000000 pcre.recursion_limit 100000 100000   intl Internationalization support enabled version 1.1.0 ICU version 3.6
            Hide
            Sam Hemelryk added a comment -

            Hi Andrew,

            I've got:

            PCRE (Perl Compatible Regular Expressions) Support enabled
            PCRE Library Version 8.02 2010-03-19

            Directive Local Value Master Value
            pcre.backtrack_limit 20971520 100000
            pcre.recursion_limit 100000 100000

            Just as a heads up I'd also check the version of the Intl plugin, (especially ICU version).

            Show
            Sam Hemelryk added a comment - Hi Andrew, I've got: PCRE (Perl Compatible Regular Expressions) Support enabled PCRE Library Version 8.02 2010-03-19 Directive Local Value Master Value pcre.backtrack_limit 20971520 100000 pcre.recursion_limit 100000 100000 Just as a heads up I'd also check the version of the Intl plugin, (especially ICU version).
            Hide
            Sam Hemelryk added a comment -

            LOL sorry just saw you were asking for Intl as well!

            I don't have it installed myself, so I suppose that rules it out (unless its some mac native thing)

            Show
            Sam Hemelryk added a comment - LOL sorry just saw you were asking for Intl as well! I don't have it installed myself, so I suppose that rules it out (unless its some mac native thing)
            Hide
            Andrew Davis added a comment -

            Aparup (UNIT TESTS FAILING)

            pcre : 7.8 2008-09-05

            intl : 1.0.3
            icu : 4.2.1

            Show
            Andrew Davis added a comment - Aparup (UNIT TESTS FAILING) pcre : 7.8 2008-09-05 intl : 1.0.3 icu : 4.2.1
            Hide
            Sam Hemelryk added a comment -

            Although perhaps that is the answer, perhaps its the lack of native support combined with an old Intl plugin?

            Show
            Sam Hemelryk added a comment - Although perhaps that is the answer, perhaps its the lack of native support combined with an old Intl plugin?
            Hide
            Andrew Davis added a comment -

            the needle and tokens on my computer (unit tests pass)

            string(34) " šašek, a král; "
             
            array(2) { [0]=> string(7) "šašek" [1]=> string(5) "král" } 
            

            The same variables from Sam's machine (unit tests fail). he posted these above. repeating for side by side comparison.

            Array
            (
                [needle] =>     šašek, 
             
               
                a král;  	
                [tokens] => Array
                    (
                        [0] => ek
                        [1] => král
                    )
            )
            

            sasek is being incorrectly cut in half.

            Show
            Andrew Davis added a comment - the needle and tokens on my computer (unit tests pass) string(34) " šašek, a král; "   array(2) { [0]=> string(7) "šašek" [1]=> string(5) "král" } The same variables from Sam's machine (unit tests fail). he posted these above. repeating for side by side comparison. Array ( [needle] => šašek,   a král; [tokens] => Array ( [0] => ek [1] => král ) ) sasek is being incorrectly cut in half.
            Hide
            Andrew Davis added a comment -

            Aparup is now trying to get his pcre and intl php extensions to upgrade as his versions are a bit behind mine. Once that's done he'll rerun the tests and we can see if there's any change. If he can get his versions to make (or exceed) mine and he is still getting a failure then we can eliminate those extensions as the source of failure.

            Show
            Andrew Davis added a comment - Aparup is now trying to get his pcre and intl php extensions to upgrade as his versions are a bit behind mine. Once that's done he'll rerun the tests and we can see if there's any change. If he can get his versions to make (or exceed) mine and he is still getting a failure then we can eliminate those extensions as the source of failure.
            Hide
            Andrew Davis added a comment -

            Also, Im on Ubuntu 11.04 if thats relevant.

            Show
            Andrew Davis added a comment - Also, Im on Ubuntu 11.04 if thats relevant.
            Hide
            Aparup Banerjee added a comment -

            i've added ppa:ondrej/php5 to my repos list (as reccommended by Jordan)

            i've then updated to Intl version 1.1.0 and ICU version 4.2.1

            and i'm still getting failures. likely since ICU wasn't updated? not yet figured out how to update that tho.

            Show
            Aparup Banerjee added a comment - i've added ppa:ondrej/php5 to my repos list (as reccommended by Jordan) i've then updated to Intl version 1.1.0 and ICU version 4.2.1 and i'm still getting failures. likely since ICU wasn't updated? not yet figured out how to update that tho.
            Hide
            Eloy Lafuente (stronk7) added a comment - - edited

            Uhm, I've seen the /u (unicode) modifier failing before sometimes. That leaded me to implement on backup this trick that has been working ok so far:

            https://github.com/stronk7/moodle/blob/master/backup/moodle2/backup_xml_transformer.class.php#L41

            So, one quick way to test if your regexp machinery has been compiled with support to unicode is to execute, simply, this:

            php -r "var_dump(@preg_match('/\pL/u', 'a'));"

            I get one precious int(1) here, false means not supported afaik.

            Ciao

            Show
            Eloy Lafuente (stronk7) added a comment - - edited Uhm, I've seen the /u (unicode) modifier failing before sometimes. That leaded me to implement on backup this trick that has been working ok so far: https://github.com/stronk7/moodle/blob/master/backup/moodle2/backup_xml_transformer.class.php#L41 So, one quick way to test if your regexp machinery has been compiled with support to unicode is to execute, simply, this: php -r "var_dump(@preg_match('/\pL/u', 'a'));" I get one precious int(1) here, false means not supported afaik. Ciao
            Hide
            Eloy Lafuente (stronk7) added a comment -

            And, if that's the case (the servers failing the tests have not regexp cimpiled with unicode support), then there are 2 solutions:

            • Compile them with unicode support (requires recompiling php.. with some extra flags, sounds to me, not 100$ sure).
            • Find one (almost equivalent) regular expression doing the split in an utf-8 savy way. Perhaps specifying common whitespace and punctuation chars is enough in this case. But always avoid any uppercase character class (\S, \W, \D...) they are the ones leading to problems, 100% guarantied (because they match "too much" under borked unicode mode).

            Hope it helps, ciao

            Show
            Eloy Lafuente (stronk7) added a comment - And, if that's the case (the servers failing the tests have not regexp cimpiled with unicode support), then there are 2 solutions: Compile them with unicode support (requires recompiling php.. with some extra flags, sounds to me, not 100$ sure). Find one (almost equivalent) regular expression doing the split in an utf-8 savy way. Perhaps specifying common whitespace and punctuation chars is enough in this case. But always avoid any uppercase character class (\S, \W, \D...) they are the ones leading to problems, 100% guarantied (because they match "too much" under borked unicode mode). Hope it helps, ciao
            Hide
            Andrew Davis added a comment - - edited

            Unfortunately I dont think thats it. The unit tests pass for me but fail for Sam and Aparup however all three of us get int(1) when we run that unicode regex test.

            As an experiment I tried flipping the logic of the regex from looking for non-word characters

            $tokens = preg_split("/\W/u", $needle);
            

            to looking for anything thats not a word character avoiding the use of \W

            $tokens = preg_split("/[^\w]/u", $needle);
            

            Either way works for me but they both fail for Aparup

            Show
            Andrew Davis added a comment - - edited Unfortunately I dont think thats it. The unit tests pass for me but fail for Sam and Aparup however all three of us get int(1) when we run that unicode regex test. As an experiment I tried flipping the logic of the regex from looking for non-word characters $tokens = preg_split("/\W/u", $needle); to looking for anything thats not a word character avoiding the use of \W $tokens = preg_split("/[^\w]/u", $needle); Either way works for me but they both fail for Aparup
            Hide
            Andrew Davis added a comment -

            Im adding a master branch. There are two commits. One fills out the tokenize() unit tests a little. The other provides a failure string so that anyone seeing the failure at least has a clue as to what is going on.

            Show
            Andrew Davis added a comment - Im adding a master branch. There are two commits. One fills out the tokenize() unit tests a little. The other provides a failure string so that anyone seeing the failure at least has a clue as to what is going on.
            Hide
            Aparup Banerjee added a comment - - edited

            This (patch) looks fine to me.

            As we haven't really found the root cause here and this is really an interim solution, perhaps we should create a follow up issue (clone?) to have this dealt with in future. A TODO with the MDL follow up issue seems in order here.

            edit: i've looked around (eg:http://www.regular-expressions.info/unicode.html) and it seems that \W doesn't have a proper unicode equivalent - we need to break that group down a bit and define them more explicitly and in a unicode friendly way to '\p'

            Show
            Aparup Banerjee added a comment - - edited This (patch) looks fine to me. As we haven't really found the root cause here and this is really an interim solution, perhaps we should create a follow up issue (clone?) to have this dealt with in future. A TODO with the MDL follow up issue seems in order here. edit: i've looked around (eg: http://www.regular-expressions.info/unicode.html ) and it seems that \W doesn't have a proper unicode equivalent - we need to break that group down a bit and define them more explicitly and in a unicode friendly way to '\p'
            Hide
            Andrew Davis added a comment -

            Ive raised MDL-30519 to come back and take another look at this.

            Show
            Andrew Davis added a comment - Ive raised MDL-30519 to come back and take another look at this.
            Hide
            Andrew Davis added a comment -

            The commits dont apply cleanly to 2.1. Rather than figuring out why on release eve Ive raised MDL-30520 to backport the commits.

            Show
            Andrew Davis added a comment - The commits dont apply cleanly to 2.1. Rather than figuring out why on release eve Ive raised MDL-30520 to backport the commits.
            Hide
            Aparup Banerjee added a comment -

            Thanks, integrated. hopefully we can have this really resolved in the follow up issues.

            (faster than lightening due to MdR's request for haste)

            Show
            Aparup Banerjee added a comment - Thanks, integrated. hopefully we can have this really resolved in the follow up issues. (faster than lightening due to MdR's request for haste)
            Hide
            Aparup Banerjee added a comment -

            passed, its failing better now!

            Show
            Aparup Banerjee added a comment - passed, its failing better now!
            Hide
            Eloy Lafuente (stronk7) added a comment -

            Re Aparup: \W uses to be changed to [\p

            {Space}

            \p

            {Punctuation}

            ].

            Also there is a more detailed implementation @ http://stackoverflow.com/questions/4304928/unicode-equivalents-for-w-and-b-in-java-regular-expressions (it is required by Java, but applicable to any regexp machinery supporting unicode).

            Perhaps that's is the problem that, while it seems that your regexp allows unicode, it has a "restrictive" interpretation of \W (detecting too many chars as non-word). In that case the replacement to [\p

            {Z}

            \p

            {P}

            ] should work.

            In general the rule should be to avoid any uppercase char-class, because they are negative classes, leading to excess of matching when working with unicode. Lowercase ones are safer and, of course, own-unicode classes too (if the test passes).

            Ciao

            Show
            Eloy Lafuente (stronk7) added a comment - Re Aparup: \W uses to be changed to [\p {Space} \p {Punctuation} ]. Also there is a more detailed implementation @ http://stackoverflow.com/questions/4304928/unicode-equivalents-for-w-and-b-in-java-regular-expressions (it is required by Java, but applicable to any regexp machinery supporting unicode). Perhaps that's is the problem that, while it seems that your regexp allows unicode, it has a "restrictive" interpretation of \W (detecting too many chars as non-word). In that case the replacement to [\p {Z} \p {P} ] should work. In general the rule should be to avoid any uppercase char-class, because they are negative classes, leading to excess of matching when working with unicode. Lowercase ones are safer and, of course, own-unicode classes too (if the test passes). Ciao
            Hide
            Aparup Banerjee added a comment -

            ok i've tested more , especially with $tokens = preg_split("/[\p

            {Z}

            \p

            {P}

            ]/u", $needle);
            i get
            Arrayn(n [0] => xc5xa1axc5xa1ekn [1] => nnn [2] => krxc3xa1ln)
            hence line 138 fails when trying to testing for a count of 2.

            This means the unicode bit is working fine, but i've no idea about why we're dropping 1 character tokens here. if we drop 3 that would fix the unit test but very surely break other things.

            Show
            Aparup Banerjee added a comment - ok i've tested more , especially with $tokens = preg_split("/[\p {Z} \p {P} ]/u", $needle); i get Arrayn(n [0] => xc5xa1axc5xa1ekn [1] => nnn [2] => krxc3xa1ln) hence line 138 fails when trying to testing for a count of 2. This means the unicode bit is working fine, but i've no idea about why we're dropping 1 character tokens here. if we drop 3 that would fix the unit test but very surely break other things.
            Hide
            Eloy Lafuente (stronk7) added a comment - - edited

            preg_split('#[\pZ\pP\pC]#u', $needle, 0, PREG_SPLIT_NO_EMPTY)

            Show
            Eloy Lafuente (stronk7) added a comment - - edited preg_split('# [\pZ\pP\pC] #u', $needle, 0, PREG_SPLIT_NO_EMPTY)
            Hide
            Aparup Banerjee added a comment -

            the above ^^ fails for me : 1/1 test cases complete: 14 passes, 11 fails and 26 exceptions.

            (added David as watcher)

            using $tokens = preg_split("#[\pZ\pP\pC]#u", $needle);
            fails at
            /home/aparup/mcode/integration/grade/grading/simpletest/testlib.php line 159 & 160 (integration.git) :
            $this->assertTrue(in_array('span', $tokens)); // Extracted the tag name
            $this->assertTrue(in_array('Aha', $tokens));

            So really its not clear what we want the unit tests here to do. definitively needs some clarification there.

            anyway, this issue is closed now. I don't see this unit test failure as a blocker for QA/2.2 Release as its the units test itself that is the problem.
            Hopefully the follow up issue will help resolve these tests clearly and soon.

            Show
            Aparup Banerjee added a comment - the above ^^ fails for me : 1/1 test cases complete: 14 passes, 11 fails and 26 exceptions. (added David as watcher) using $tokens = preg_split("# [\pZ\pP\pC] #u", $needle); fails at /home/aparup/mcode/integration/grade/grading/simpletest/testlib.php line 159 & 160 (integration.git) : $this->assertTrue(in_array('span', $tokens)); // Extracted the tag name $this->assertTrue(in_array('Aha', $tokens)); So really its not clear what we want the unit tests here to do. definitively needs some clarification there. anyway, this issue is closed now. I don't see this unit test failure as a blocker for QA/2.2 Release as its the units test itself that is the problem. Hopefully the follow up issue will help resolve these tests clearly and soon.
            Hide
            Michael de Raadt added a comment -

            I've removed this as a blocker for the linked QA issue.

            It looks like progress is still being made towards solving this mystery. Good stuff.

            Show
            Michael de Raadt added a comment - I've removed this as a blocker for the linked QA issue. It looks like progress is still being made towards solving this mystery. Good stuff.
            Hide
            Eloy Lafuente (stronk7) added a comment -

            Sent upstream! Just in time for Moodle 2.2rc1 (if related), yay!

            Closing and big thanks!

            Show
            Eloy Lafuente (stronk7) added a comment - Sent upstream! Just in time for Moodle 2.2rc1 (if related), yay! Closing and big thanks!
            Hide
            Ashley Holman added a comment -

            This is still failing for me with 2.3.1, running PHP 5.3.3. The problem appears to be caused by this bug: https://bugs.php.net/bug.php?id=52971

            \w only matches ASCII characters, even in unicode mode. \W would presumably match anything that's not \w, so it matches the unicode characters, thus splitting on them.

            Show
            Ashley Holman added a comment - This is still failing for me with 2.3.1, running PHP 5.3.3. The problem appears to be caused by this bug: https://bugs.php.net/bug.php?id=52971 \w only matches ASCII characters, even in unicode mode. \W would presumably match anything that's not \w, so it matches the unicode characters, thus splitting on them.
            Hide
            Ashley Holman added a comment -

            Under PHP 5.4 however, it works fine.

            Show
            Ashley Holman added a comment - Under PHP 5.4 however, it works fine.
            Hide
            Aparup Banerjee added a comment -

            Thanks for that Ashley, i've mentioned your comment in MDL-30519.

            Show
            Aparup Banerjee added a comment - Thanks for that Ashley, i've mentioned your comment in MDL-30519 .

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: