Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 2.2
    • Fix Version/s: 2.2
    • Component/s: Gradebook
    • Labels:
    • Environment:
      qa.moodle.net
    • Rank:
      33184

      Description

      The following unit tests are failing on qa.moodle.net. The same failures did not appear on my local test server.

      Fail: grade/grading/simpletest/testlib.php / ► grading_manager_test / ► test_tokenize
      at [/html/grade/grading/simpletest/testlib.php line 137]
      Fail: grade/grading/simpletest/testlib.php / ► grading_manager_test / ► test_tokenize
      at [/html/grade/grading/simpletest/testlib.php line 138] 
      

      There was no further information about these failures.

        Issue Links

          Activity

          Hide
          Michael de Raadt added a comment -

          It would be good to get some more information about this and if it can be replicated elsewhere.

          Show
          Michael de Raadt added a comment - It would be good to get some more information about this and if it can be replicated elsewhere.
          Hide
          Andrew Davis added a comment -

          Ive run the unit tests and unfortunately cannot reproduce this which will make fixing it interesting.

          The unit tests that are failing on qa.moodle.net are the 2nd and 3rd in this group (the two in_array() tests)

          $needle = "    šašek, \n\n   \r    a král;  \t";
                  $tokens = testable_grading_manager::tokenize($needle);
                  $this->assertEqual(2, count($tokens));
                  $this->assertTrue(in_array('šašek', $tokens));
                  $this->assertTrue(in_array('král', $tokens));
          
          Show
          Andrew Davis added a comment - Ive run the unit tests and unfortunately cannot reproduce this which will make fixing it interesting. The unit tests that are failing on qa.moodle.net are the 2nd and 3rd in this group (the two in_array() tests) $needle = " šašek, \n\n \r a král; \t" ; $tokens = testable_grading_manager::tokenize($needle); $ this ->assertEqual(2, count($tokens)); $ this ->assertTrue(in_array('šašek', $tokens)); $ this ->assertTrue(in_array('král', $tokens));
          Hide
          Sam Hemelryk added a comment -

          Hi Andrew,
          I can reproduce the error on line 137.
          A quick debug of the needle and the tokens variables shows the following:

          Array
          (
              [needle] =>     šašek, 
          
             
              a král;  	
              [tokens] => Array
                  (
                      [0] => ek
                      [1] => král
                  )
          )
          

          Let me know what other information you would like.

          Show
          Sam Hemelryk added a comment - Hi Andrew, I can reproduce the error on line 137. A quick debug of the needle and the tokens variables shows the following: Array ( [needle] => šašek, a král; [tokens] => Array ( [0] => ek [1] => král ) ) Let me know what other information you would like.
          Hide
          Andrew Davis added a comment - - edited

          Hi Sam. Can you go to /admin/phpinfo.php and copy paste the pcre and intl sections.

          My computer (UNIT TESTS PASS)

          PCRE
          PCRE (Perl Compatible Regular Expressions) Support 	enabled
          PCRE Library Version 	8.12 2011-01-15
          
          Directive	Local Value	Master Value
          pcre.backtrack_limit	100000	100000
          pcre.recursion_limit	100000	100000
          
          INTL
          not present
          

          update: I installed the intl extension and it made no difference (tests still pass)

          Internationalization support	enabled
          version 	1.1.0
          ICU version 	4.4.2 
          

          qa.moodle.net UNIT TESTS FAIL

          PCRE
          PCRE (Perl Compatible Regular Expressions) Support 	enabled
          PCRE Library Version 	6.6 06-Feb-2006
          
          Directive	Local Value	Master Value
          pcre.backtrack_limit	20971520	1000000
          pcre.recursion_limit	100000	100000
          
          intl
          Internationalization support	enabled
          version 	1.1.0
          ICU version 	3.6 
          
          Show
          Andrew Davis added a comment - - edited Hi Sam. Can you go to /admin/phpinfo.php and copy paste the pcre and intl sections. My computer (UNIT TESTS PASS) PCRE PCRE (Perl Compatible Regular Expressions) Support enabled PCRE Library Version 8.12 2011-01-15 Directive Local Value Master Value pcre.backtrack_limit 100000 100000 pcre.recursion_limit 100000 100000 INTL not present update: I installed the intl extension and it made no difference (tests still pass) Internationalization support enabled version 1.1.0 ICU version 4.4.2 qa.moodle.net UNIT TESTS FAIL PCRE PCRE (Perl Compatible Regular Expressions) Support enabled PCRE Library Version 6.6 06-Feb-2006 Directive Local Value Master Value pcre.backtrack_limit 20971520 1000000 pcre.recursion_limit 100000 100000 intl Internationalization support enabled version 1.1.0 ICU version 3.6
          Hide
          Sam Hemelryk added a comment -

          Hi Andrew,

          I've got:

          PCRE (Perl Compatible Regular Expressions) Support enabled
          PCRE Library Version 8.02 2010-03-19

          Directive Local Value Master Value
          pcre.backtrack_limit 20971520 100000
          pcre.recursion_limit 100000 100000

          Just as a heads up I'd also check the version of the Intl plugin, (especially ICU version).

          Show
          Sam Hemelryk added a comment - Hi Andrew, I've got: PCRE (Perl Compatible Regular Expressions) Support enabled PCRE Library Version 8.02 2010-03-19 Directive Local Value Master Value pcre.backtrack_limit 20971520 100000 pcre.recursion_limit 100000 100000 Just as a heads up I'd also check the version of the Intl plugin, (especially ICU version).
          Hide
          Sam Hemelryk added a comment -

          LOL sorry just saw you were asking for Intl as well!

          I don't have it installed myself, so I suppose that rules it out (unless its some mac native thing)

          Show
          Sam Hemelryk added a comment - LOL sorry just saw you were asking for Intl as well! I don't have it installed myself, so I suppose that rules it out (unless its some mac native thing)
          Hide
          Andrew Davis added a comment -

          Aparup (UNIT TESTS FAILING)

          pcre : 7.8 2008-09-05

          intl : 1.0.3
          icu : 4.2.1

          Show
          Andrew Davis added a comment - Aparup (UNIT TESTS FAILING) pcre : 7.8 2008-09-05 intl : 1.0.3 icu : 4.2.1
          Hide
          Sam Hemelryk added a comment -

          Although perhaps that is the answer, perhaps its the lack of native support combined with an old Intl plugin?

          Show
          Sam Hemelryk added a comment - Although perhaps that is the answer, perhaps its the lack of native support combined with an old Intl plugin?
          Hide
          Andrew Davis added a comment -

          the needle and tokens on my computer (unit tests pass)

          string(34) " šašek, a král; "
          
          array(2) { [0]=> string(7) "šašek" [1]=> string(5) "král" } 
          

          The same variables from Sam's machine (unit tests fail). he posted these above. repeating for side by side comparison.

          Array
          (
              [needle] =>     šašek, 
          
             
              a král;  	
              [tokens] => Array
                  (
                      [0] => ek
                      [1] => král
                  )
          )
          

          sasek is being incorrectly cut in half.

          Show
          Andrew Davis added a comment - the needle and tokens on my computer (unit tests pass) string(34) " šašek, a král; " array(2) { [0]=> string(7) "šašek" [1]=> string(5) "král" } The same variables from Sam's machine (unit tests fail). he posted these above. repeating for side by side comparison. Array ( [needle] => šašek, a král; [tokens] => Array ( [0] => ek [1] => král ) ) sasek is being incorrectly cut in half.
          Hide
          Andrew Davis added a comment -

          Aparup is now trying to get his pcre and intl php extensions to upgrade as his versions are a bit behind mine. Once that's done he'll rerun the tests and we can see if there's any change. If he can get his versions to make (or exceed) mine and he is still getting a failure then we can eliminate those extensions as the source of failure.

          Show
          Andrew Davis added a comment - Aparup is now trying to get his pcre and intl php extensions to upgrade as his versions are a bit behind mine. Once that's done he'll rerun the tests and we can see if there's any change. If he can get his versions to make (or exceed) mine and he is still getting a failure then we can eliminate those extensions as the source of failure.
          Hide
          Andrew Davis added a comment -

          Also, Im on Ubuntu 11.04 if thats relevant.

          Show
          Andrew Davis added a comment - Also, Im on Ubuntu 11.04 if thats relevant.
          Hide
          Aparup Banerjee added a comment -

          i've added ppa:ondrej/php5 to my repos list (as reccommended by Jordan)

          i've then updated to Intl version 1.1.0 and ICU version 4.2.1

          and i'm still getting failures. likely since ICU wasn't updated? not yet figured out how to update that tho.

          Show
          Aparup Banerjee added a comment - i've added ppa:ondrej/php5 to my repos list (as reccommended by Jordan) i've then updated to Intl version 1.1.0 and ICU version 4.2.1 and i'm still getting failures. likely since ICU wasn't updated? not yet figured out how to update that tho.
          Hide
          Eloy Lafuente (stronk7) added a comment - - edited

          Uhm, I've seen the /u (unicode) modifier failing before sometimes. That leaded me to implement on backup this trick that has been working ok so far:

          https://github.com/stronk7/moodle/blob/master/backup/moodle2/backup_xml_transformer.class.php#L41

          So, one quick way to test if your regexp machinery has been compiled with support to unicode is to execute, simply, this:

          php -r "var_dump(@preg_match('/\pL/u', 'a'));"

          I get one precious int(1) here, false means not supported afaik.

          Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - - edited Uhm, I've seen the /u (unicode) modifier failing before sometimes. That leaded me to implement on backup this trick that has been working ok so far: https://github.com/stronk7/moodle/blob/master/backup/moodle2/backup_xml_transformer.class.php#L41 So, one quick way to test if your regexp machinery has been compiled with support to unicode is to execute, simply, this: php -r "var_dump(@preg_match('/\pL/u', 'a'));" I get one precious int(1) here, false means not supported afaik. Ciao
          Hide
          Eloy Lafuente (stronk7) added a comment -

          And, if that's the case (the servers failing the tests have not regexp cimpiled with unicode support), then there are 2 solutions:

          • Compile them with unicode support (requires recompiling php.. with some extra flags, sounds to me, not 100$ sure).
          • Find one (almost equivalent) regular expression doing the split in an utf-8 savy way. Perhaps specifying common whitespace and punctuation chars is enough in this case. But always avoid any uppercase character class (\S, \W, \D...) they are the ones leading to problems, 100% guarantied (because they match "too much" under borked unicode mode).

          Hope it helps, ciao

          Show
          Eloy Lafuente (stronk7) added a comment - And, if that's the case (the servers failing the tests have not regexp cimpiled with unicode support), then there are 2 solutions: Compile them with unicode support (requires recompiling php.. with some extra flags, sounds to me, not 100$ sure). Find one (almost equivalent) regular expression doing the split in an utf-8 savy way. Perhaps specifying common whitespace and punctuation chars is enough in this case. But always avoid any uppercase character class (\S, \W, \D...) they are the ones leading to problems, 100% guarantied (because they match "too much" under borked unicode mode). Hope it helps, ciao
          Hide
          Andrew Davis added a comment - - edited

          Unfortunately I dont think thats it. The unit tests pass for me but fail for Sam and Aparup however all three of us get int(1) when we run that unicode regex test.

          As an experiment I tried flipping the logic of the regex from looking for non-word characters

          $tokens = preg_split("/\W/u", $needle);
          

          to looking for anything thats not a word character avoiding the use of \W

          $tokens = preg_split("/[^\w]/u", $needle);
          

          Either way works for me but they both fail for Aparup

          Show
          Andrew Davis added a comment - - edited Unfortunately I dont think thats it. The unit tests pass for me but fail for Sam and Aparup however all three of us get int(1) when we run that unicode regex test. As an experiment I tried flipping the logic of the regex from looking for non-word characters $tokens = preg_split( "/\W/u" , $needle); to looking for anything thats not a word character avoiding the use of \W $tokens = preg_split( "/[^\w]/u" , $needle); Either way works for me but they both fail for Aparup
          Hide
          Andrew Davis added a comment -

          Im adding a master branch. There are two commits. One fills out the tokenize() unit tests a little. The other provides a failure string so that anyone seeing the failure at least has a clue as to what is going on.

          Show
          Andrew Davis added a comment - Im adding a master branch. There are two commits. One fills out the tokenize() unit tests a little. The other provides a failure string so that anyone seeing the failure at least has a clue as to what is going on.
          Hide
          Aparup Banerjee added a comment - - edited

          This (patch) looks fine to me.

          As we haven't really found the root cause here and this is really an interim solution, perhaps we should create a follow up issue (clone?) to have this dealt with in future. A TODO with the MDL follow up issue seems in order here.

          edit: i've looked around (eg:http://www.regular-expressions.info/unicode.html) and it seems that \W doesn't have a proper unicode equivalent - we need to break that group down a bit and define them more explicitly and in a unicode friendly way to '\p'

          Show
          Aparup Banerjee added a comment - - edited This (patch) looks fine to me. As we haven't really found the root cause here and this is really an interim solution, perhaps we should create a follow up issue (clone?) to have this dealt with in future. A TODO with the MDL follow up issue seems in order here. edit: i've looked around (eg: http://www.regular-expressions.info/unicode.html ) and it seems that \W doesn't have a proper unicode equivalent - we need to break that group down a bit and define them more explicitly and in a unicode friendly way to '\p'
          Hide
          Andrew Davis added a comment -

          Ive raised MDL-30519 to come back and take another look at this.

          Show
          Andrew Davis added a comment - Ive raised MDL-30519 to come back and take another look at this.
          Hide
          Andrew Davis added a comment -

          The commits dont apply cleanly to 2.1. Rather than figuring out why on release eve Ive raised MDL-30520 to backport the commits.

          Show
          Andrew Davis added a comment - The commits dont apply cleanly to 2.1. Rather than figuring out why on release eve Ive raised MDL-30520 to backport the commits.
          Hide
          Aparup Banerjee added a comment -

          Thanks, integrated. hopefully we can have this really resolved in the follow up issues.

          (faster than lightening due to MdR's request for haste)

          Show
          Aparup Banerjee added a comment - Thanks, integrated. hopefully we can have this really resolved in the follow up issues. (faster than lightening due to MdR's request for haste)
          Hide
          Aparup Banerjee added a comment -

          passed, its failing better now!

          Show
          Aparup Banerjee added a comment - passed, its failing better now!
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Re Aparup: \W uses to be changed to [\p

          {Space}

          \p

          {Punctuation}

          ].

          Also there is a more detailed implementation @ http://stackoverflow.com/questions/4304928/unicode-equivalents-for-w-and-b-in-java-regular-expressions (it is required by Java, but applicable to any regexp machinery supporting unicode).

          Perhaps that's is the problem that, while it seems that your regexp allows unicode, it has a "restrictive" interpretation of \W (detecting too many chars as non-word). In that case the replacement to [\p

          {Z}

          \p

          {P}

          ] should work.

          In general the rule should be to avoid any uppercase char-class, because they are negative classes, leading to excess of matching when working with unicode. Lowercase ones are safer and, of course, own-unicode classes too (if the test passes).

          Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - Re Aparup: \W uses to be changed to [\p {Space} \p {Punctuation} ]. Also there is a more detailed implementation @ http://stackoverflow.com/questions/4304928/unicode-equivalents-for-w-and-b-in-java-regular-expressions (it is required by Java, but applicable to any regexp machinery supporting unicode). Perhaps that's is the problem that, while it seems that your regexp allows unicode, it has a "restrictive" interpretation of \W (detecting too many chars as non-word). In that case the replacement to [\p {Z} \p {P} ] should work. In general the rule should be to avoid any uppercase char-class, because they are negative classes, leading to excess of matching when working with unicode. Lowercase ones are safer and, of course, own-unicode classes too (if the test passes). Ciao
          Hide
          Aparup Banerjee added a comment -

          ok i've tested more , especially with $tokens = preg_split("/[\p

          {Z}

          \p

          {P}

          ]/u", $needle);
          i get
          Arrayn(n [0] => xc5xa1axc5xa1ekn [1] => nnn [2] => krxc3xa1ln)
          hence line 138 fails when trying to testing for a count of 2.

          This means the unicode bit is working fine, but i've no idea about why we're dropping 1 character tokens here. if we drop 3 that would fix the unit test but very surely break other things.

          Show
          Aparup Banerjee added a comment - ok i've tested more , especially with $tokens = preg_split("/[\p {Z} \p {P} ]/u", $needle); i get Arrayn(n [0] => xc5xa1axc5xa1ekn [1] => nnn [2] => krxc3xa1ln) hence line 138 fails when trying to testing for a count of 2. This means the unicode bit is working fine, but i've no idea about why we're dropping 1 character tokens here. if we drop 3 that would fix the unit test but very surely break other things.
          Hide
          Eloy Lafuente (stronk7) added a comment - - edited

          preg_split('#[\pZ\pP\pC]#u', $needle, 0, PREG_SPLIT_NO_EMPTY)

          Show
          Eloy Lafuente (stronk7) added a comment - - edited preg_split('# [\pZ\pP\pC] #u', $needle, 0, PREG_SPLIT_NO_EMPTY)
          Hide
          Aparup Banerjee added a comment -

          the above ^^ fails for me : 1/1 test cases complete: 14 passes, 11 fails and 26 exceptions.

          (added David as watcher)

          using $tokens = preg_split("#[\pZ\pP\pC]#u", $needle);
          fails at
          /home/aparup/mcode/integration/grade/grading/simpletest/testlib.php line 159 & 160 (integration.git) :
          $this->assertTrue(in_array('span', $tokens)); // Extracted the tag name
          $this->assertTrue(in_array('Aha', $tokens));

          So really its not clear what we want the unit tests here to do. definitively needs some clarification there.

          anyway, this issue is closed now. I don't see this unit test failure as a blocker for QA/2.2 Release as its the units test itself that is the problem.
          Hopefully the follow up issue will help resolve these tests clearly and soon.

          Show
          Aparup Banerjee added a comment - the above ^^ fails for me : 1/1 test cases complete: 14 passes, 11 fails and 26 exceptions. (added David as watcher) using $tokens = preg_split("# [\pZ\pP\pC] #u", $needle); fails at /home/aparup/mcode/integration/grade/grading/simpletest/testlib.php line 159 & 160 (integration.git) : $this->assertTrue(in_array('span', $tokens)); // Extracted the tag name $this->assertTrue(in_array('Aha', $tokens)); So really its not clear what we want the unit tests here to do. definitively needs some clarification there. anyway, this issue is closed now. I don't see this unit test failure as a blocker for QA/2.2 Release as its the units test itself that is the problem. Hopefully the follow up issue will help resolve these tests clearly and soon.
          Hide
          Michael de Raadt added a comment -

          I've removed this as a blocker for the linked QA issue.

          It looks like progress is still being made towards solving this mystery. Good stuff.

          Show
          Michael de Raadt added a comment - I've removed this as a blocker for the linked QA issue. It looks like progress is still being made towards solving this mystery. Good stuff.
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Sent upstream! Just in time for Moodle 2.2rc1 (if related), yay!

          Closing and big thanks!

          Show
          Eloy Lafuente (stronk7) added a comment - Sent upstream! Just in time for Moodle 2.2rc1 (if related), yay! Closing and big thanks!
          Hide
          Ashley Holman added a comment -

          This is still failing for me with 2.3.1, running PHP 5.3.3. The problem appears to be caused by this bug: https://bugs.php.net/bug.php?id=52971

          \w only matches ASCII characters, even in unicode mode. \W would presumably match anything that's not \w, so it matches the unicode characters, thus splitting on them.

          Show
          Ashley Holman added a comment - This is still failing for me with 2.3.1, running PHP 5.3.3. The problem appears to be caused by this bug: https://bugs.php.net/bug.php?id=52971 \w only matches ASCII characters, even in unicode mode. \W would presumably match anything that's not \w, so it matches the unicode characters, thus splitting on them.
          Hide
          Ashley Holman added a comment -

          Under PHP 5.4 however, it works fine.

          Show
          Ashley Holman added a comment - Under PHP 5.4 however, it works fine.
          Hide
          Aparup Banerjee added a comment -

          Thanks for that Ashley, i've mentioned your comment in MDL-30519.

          Show
          Aparup Banerjee added a comment - Thanks for that Ashley, i've mentioned your comment in MDL-30519 .

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: