Moodle
  1. Moodle
  2. MDL-37858

Use 'tricky data' in tests by default

    Details

    • Rank:
      47608

      Description

      As suggested by Tim in the chat:
      https://moodle.org/local/chatlogs/index.php?conversationid=12049

      We should use 'tricky data' in tests. E.g. wide unicode characters, crazy stuff which is accepted (like > and <). < etc. Basically, the most unusual data we should see should be used and tested against then we will hopefully find problems before our users.

      Ideally this would be tied to some magic data generator, so test writers just write (excuse my lack of cucumber :-P)

      'I write some long text in a forum post'.

      Not I write 'これは日本語じゃありません &lt > < foobar'

        Issue Links

          Activity

          Hide
          Tim Hunt added a comment -

          More nastly string suggestsions in this forum thread: https://moodle.org/mod/forum/discuss.php?d=221394

          Show
          Tim Hunt added a comment - More nastly string suggestsions in this forum thread: https://moodle.org/mod/forum/discuss.php?d=221394
          Hide
          David Monllaó added a comment -

          At the moment we only have tests to ensure that the steps definitions are working properly but once we begin automating the the MDLQA tests we can use a set of nasty strings in regular basis, I like the idea to have all the historically (or usually) strings-related problems together in a string and use it in all the QA tests, we will need a set of these strings because most of the time we will assert against them and probably use more than one at the same time.

          Copying the list you wrote:

          • Characters that have special meaning in HTML.
          • The escaped equivalents of those.
          • Single and double quote characters, and backslashes.
          • Back-slash escaped versions of those.
          • Unicode characters outside the usual ASCII range. (That is Japan in Japanese.)
          • Various other characters that occasionally get used with special meantings.

          To write a test with one of these values we can use something like Given I fill in "Description" with "$NASTYSTRING1" I don't like the idea to mix tests writers language, more human-friendly, with $ or programming specific characters, but is important to use this contents so it will be worth.

          Also all this var/value replacements should be managed in a centralized way to keep the creation of new steps definitions as easy as we can, maybe I can modify the step hooking the process at the step level, that would make things a lot easier.

          Show
          David Monllaó added a comment - At the moment we only have tests to ensure that the steps definitions are working properly but once we begin automating the the MDLQA tests we can use a set of nasty strings in regular basis, I like the idea to have all the historically (or usually) strings-related problems together in a string and use it in all the QA tests, we will need a set of these strings because most of the time we will assert against them and probably use more than one at the same time. Copying the list you wrote: Characters that have special meaning in HTML. The escaped equivalents of those. Single and double quote characters, and backslashes. Back-slash escaped versions of those. Unicode characters outside the usual ASCII range. (That is Japan in Japanese.) Various other characters that occasionally get used with special meantings. To write a test with one of these values we can use something like Given I fill in "Description" with "$NASTYSTRING1" I don't like the idea to mix tests writers language, more human-friendly, with $ or programming specific characters, but is important to use this contents so it will be worth. Also all this var/value replacements should be managed in a centralized way to keep the creation of new steps definitions as easy as we can, maybe I can modify the step hooking the process at the step level, that would make things a lot easier.
          Hide
          Joseph Rézeau added a comment -

          What about adding to that list: non-breaking space, colon, semi-colon and maybe other punctuation characters?

          Show
          Joseph Rézeau added a comment - What about adding to that list: non-breaking space, colon, semi-colon and maybe other punctuation characters?
          Hide
          David Monllaó added a comment - - edited

          Thanks Joseph, I've also added SOFT HYPHEN (commonly abbreviated as SHY)

          I've added more than one string (10 in total) but all of them based on the same set of characters; they can be accessed though a new class which receives a key and returns a nasty string, storing the key-nastystring mapping for future references (assertions against this values for example) This class can also be reused from phpunit (lib/testing/classes/nasty_strings.php)

          List of characters contained in the nasty strings:

          < IS, hex: 3c, decimal: 60
            IS, hex: 20, decimal: 32
          > IS, hex: 3e, decimal: 62
            IS, hex: 20, decimal: 32
          & IS, hex: 26, decimal: 38
            IS, hex: 20, decimal: 32
          & IS, hex: 26, decimal: 38
          l IS, hex: 6c, decimal: 108
          t IS, hex: 74, decimal: 116
          ; IS, hex: 3b, decimal: 59
            IS, hex: 20, decimal: 32
          & IS, hex: 26, decimal: 38
          g IS, hex: 67, decimal: 103
          t IS, hex: 74, decimal: 116
          ; IS, hex: 3b, decimal: 59
            IS, hex: 20, decimal: 32
          & IS, hex: 26, decimal: 38
          a IS, hex: 61, decimal: 97
          m IS, hex: 6d, decimal: 109
          p IS, hex: 70, decimal: 112
          ; IS, hex: 3b, decimal: 59
            IS, hex: 20, decimal: 32
          ' IS, hex: 27, decimal: 39
            IS, hex: 20, decimal: 32
          \ IS, hex: 5c, decimal: 92
          " IS, hex: 22, decimal: 34
            IS, hex: 20, decimal: 32
          \ IS, hex: 5c, decimal: 92
            IS, hex: 20, decimal: 32
          ' IS, hex: 27, decimal: 39
          $ IS, hex: 24, decimal: 36
          @ IS, hex: 40, decimal: 64
          N IS, hex: 4e, decimal: 78
          U IS, hex: 55, decimal: 85
          L IS, hex: 4c, decimal: 76
          L IS, hex: 4c, decimal: 76
          @ IS, hex: 40, decimal: 64
          $ IS, hex: 24, decimal: 36
            IS, hex: 20, decimal: 32
          @ IS, hex: 40, decimal: 64
          @ IS, hex: 40, decimal: 64
          T IS, hex: 54, decimal: 84
          E IS, hex: 45, decimal: 69
          S IS, hex: 53, decimal: 83
          T IS, hex: 54, decimal: 84
          @ IS, hex: 40, decimal: 64
          @ IS, hex: 40, decimal: 64
            IS, hex: 20, decimal: 32
          \ IS, hex: 5c, decimal: 92
          \ IS, hex: 5c, decimal: 92
          " IS, hex: 22, decimal: 34
            IS, hex: 20, decimal: 32
          \ IS, hex: 5c, decimal: 92
            IS, hex: 20, decimal: 32
          , IS, hex: 2c, decimal: 44
            IS, hex: 20, decimal: 32
          ; IS, hex: 3b, decimal: 59
            IS, hex: 20, decimal: 32
          : IS, hex: 3a, decimal: 58
            IS, hex: 20, decimal: 32
          . IS, hex: 2e, decimal: 46
            IS, hex: c2a0, decimal: 49824
            IS, hex: 20, decimal: 32
            IS, hex: 20, decimal: 32
          日 IS, hex: e697a5, decimal: 15112101
          本 IS, hex: e69cac, decimal: 15113388
          語 IS, hex: e8aa9e, decimal: 15248030
            IS, hex: 20, decimal: 32
          ­ IS, hex: c2ad, decimal: 49837
            IS, hex: 20, decimal: 32
          % IS, hex: 25, decimal: 37
            IS, hex: 20, decimal: 32
          % IS, hex: 25, decimal: 37
          % IS, hex: 25, decimal: 37
            IS, hex: 20, decimal: 32
          
          Show
          David Monllaó added a comment - - edited Thanks Joseph, I've also added SOFT HYPHEN (commonly abbreviated as SHY) I've added more than one string (10 in total) but all of them based on the same set of characters; they can be accessed though a new class which receives a key and returns a nasty string, storing the key-nastystring mapping for future references (assertions against this values for example) This class can also be reused from phpunit (lib/testing/classes/nasty_strings.php) List of characters contained in the nasty strings: < IS, hex: 3c, decimal: 60 IS, hex: 20, decimal: 32 > IS, hex: 3e, decimal: 62 IS, hex: 20, decimal: 32 & IS, hex: 26, decimal: 38 IS, hex: 20, decimal: 32 & IS, hex: 26, decimal: 38 l IS, hex: 6c, decimal: 108 t IS, hex: 74, decimal: 116 ; IS, hex: 3b, decimal: 59 IS, hex: 20, decimal: 32 & IS, hex: 26, decimal: 38 g IS, hex: 67, decimal: 103 t IS, hex: 74, decimal: 116 ; IS, hex: 3b, decimal: 59 IS, hex: 20, decimal: 32 & IS, hex: 26, decimal: 38 a IS, hex: 61, decimal: 97 m IS, hex: 6d, decimal: 109 p IS, hex: 70, decimal: 112 ; IS, hex: 3b, decimal: 59 IS, hex: 20, decimal: 32 ' IS, hex: 27, decimal: 39 IS, hex: 20, decimal: 32 \ IS, hex: 5c, decimal: 92 " IS, hex: 22, decimal: 34 IS, hex: 20, decimal: 32 \ IS, hex: 5c, decimal: 92 IS, hex: 20, decimal: 32 ' IS, hex: 27, decimal: 39 $ IS, hex: 24, decimal: 36 @ IS, hex: 40, decimal: 64 N IS, hex: 4e, decimal: 78 U IS, hex: 55, decimal: 85 L IS, hex: 4c, decimal: 76 L IS, hex: 4c, decimal: 76 @ IS, hex: 40, decimal: 64 $ IS, hex: 24, decimal: 36 IS, hex: 20, decimal: 32 @ IS, hex: 40, decimal: 64 @ IS, hex: 40, decimal: 64 T IS, hex: 54, decimal: 84 E IS, hex: 45, decimal: 69 S IS, hex: 53, decimal: 83 T IS, hex: 54, decimal: 84 @ IS, hex: 40, decimal: 64 @ IS, hex: 40, decimal: 64 IS, hex: 20, decimal: 32 \ IS, hex: 5c, decimal: 92 \ IS, hex: 5c, decimal: 92 " IS, hex: 22, decimal: 34 IS, hex: 20, decimal: 32 \ IS, hex: 5c, decimal: 92 IS, hex: 20, decimal: 32 , IS, hex: 2c, decimal: 44 IS, hex: 20, decimal: 32 ; IS, hex: 3b, decimal: 59 IS, hex: 20, decimal: 32 : IS, hex: 3a, decimal: 58 IS, hex: 20, decimal: 32 . IS, hex: 2e, decimal: 46   IS, hex: c2a0, decimal: 49824 IS, hex: 20, decimal: 32 IS, hex: 20, decimal: 32 日 IS, hex: e697a5, decimal: 15112101 本 IS, hex: e69cac, decimal: 15113388 語 IS, hex: e8aa9e, decimal: 15248030 IS, hex: 20, decimal: 32 ­ IS, hex: c2ad, decimal: 49837 IS, hex: 20, decimal: 32 % IS, hex: 25, decimal: 37 IS, hex: 20, decimal: 32 % IS, hex: 25, decimal: 37 % IS, hex: 25, decimal: 37 IS, hex: 20, decimal: 32
          Hide
          David Monllaó added a comment -

          Requesting peer review.

          Note that this issue is on top of MDL-37912 as it requires it to pass the tests.

          I've finally used step arguments transformations which is more suitable than a before-step hook, also with this we can remove all fixStepArgument() used around all other steps definitions.

          I've attached a PHP script to generate the list of characters used.

          Show
          David Monllaó added a comment - Requesting peer review. Note that this issue is on top of MDL-37912 as it requires it to pass the tests. I've finally used step arguments transformations which is more suitable than a before-step hook, also with this we can remove all fixStepArgument() used around all other steps definitions. I've attached a PHP script to generate the list of characters used.
          Hide
          Tim Hunt added a comment -

          I don't really understand Behat well enough to peer review this, but it looks quite good to me.

          I particularly like the way that you were able to remove so many lines of code like $this->fixStepArgument($section);

          Show
          Tim Hunt added a comment - I don't really understand Behat well enough to peer review this, but it looks quite good to me. I particularly like the way that you were able to remove so many lines of code like $this->fixStepArgument($section);
          Hide
          Dan Poltawski added a comment -

          Skimmed an noticed a blank line at the top of lib/tests/behat/behat_transformations.php.

          Show
          Dan Poltawski added a comment - Skimmed an noticed a blank line at the top of lib/tests/behat/behat_transformations.php.
          Hide
          David Monllaó added a comment -

          Thanks Dan, I've removed the ugly blank line

          Show
          David Monllaó added a comment - Thanks Dan, I've removed the ugly blank line
          Hide
          David Monllaó added a comment -

          Note for integrators and peer reviewers: MDL-37858 requires both MDL-38029 and MDL-37912 changes, to avoid conflicts when merging (there would be a conflict) I've added MDL-37912 on top of MDL-38029 so the hierarchy is:

          1. MDL-38029
          2. MDL-37912
          3. MDL-37858
          Show
          David Monllaó added a comment - Note for integrators and peer reviewers: MDL-37858 requires both MDL-38029 and MDL-37912 changes, to avoid conflicts when merging (there would be a conflict) I've added MDL-37912 on top of MDL-38029 so the hierarchy is: MDL-38029 MDL-37912 MDL-37858
          Hide
          Dan Poltawski added a comment - - edited

          David - please can you remember to setup a blocker relationship when you set them up blocked like that.

          Show
          Dan Poltawski added a comment - - edited David - please can you remember to setup a blocker relationship when you set them up blocked like that.
          Hide
          David Monllaó added a comment -

          np, thanks, changing links

          Show
          David Monllaó added a comment - np, thanks, changing links
          Hide
          Dan Poltawski added a comment -

          Integrated to master - thanks

          Show
          Dan Poltawski added a comment - Integrated to master - thanks
          Hide
          Rajesh Taneja added a comment -

          No test done for this, as this will be run on nightly machine.

          Show
          Rajesh Taneja added a comment - No test done for this, as this will be run on nightly machine.
          Hide
          Damyon Wiese added a comment -

          Thanks for your hard work - this issue has made it! Moodle is now a little bit better.

          Regards, Damyon

          Show
          Damyon Wiese added a comment - Thanks for your hard work - this issue has made it! Moodle is now a little bit better. Regards, Damyon
          Hide
          David Monllaó added a comment -

          After writing a few tests I think that we can not use the nasty strings as much as we'd like. The main reason is that when we use a nasty string in a step like I fill "Field name" with "$NASTYSTRING" the "Field name" field can apply a PARAM_TEXT or a PARAM_CLEAN, PARAM_RAW.... filter, when we assert against this $NASTYSTRING values in form text fields (for example with the "Field name" field should contain "$NASTYSTRING") it can work, but only if there is no filter applied (PARAM_RAW) Also when we expect to find the $NASTYSTRINGs contents embedded in HTML as part of a link or HTML content (for example I should see "$NASTYSTRING" or I click on "$NASTYSTRING") the HTML entity codes like & are rendered as the entity, this would be a minor issue that could be solved with a html_entity_decode() in a few places, but, adding to this, browsers transforms multiple %20%20%20 into one single and transforms (or not depending on the case) non-breaking spaces into regular %20 spaces, this without mentioning the previous filter that would be applied when submitting the form with the nasty string on it. When writing tests for headless browsers (without JS) the nasty strings are still very useful as there are no rendering issues like in regular browsers.

          Another important point is that test writers will not know about the implementation details, so only the code developer can write tests using this nasty strings vars because they know where they can find what kind of filter and they can make good use of them looking for specific problems. The other option we have is to remove a few of those nasty characters (nightmare characters I would call them after spending a couple of days with this issues) to allow this strings "skip" the derived problems in most of the cases, but I know that we are losing the main purpose of the nasty strings doing this.

          I would propose two approaches:

          • Try getting rid of HTML entities and non-breaking spaces, and continue using $NASTYSTRINGs as much as we can
          • Restrict it's usage to specific cases like asserting field contents filtered with PARAM_RAW and checking that the current usage of nasty strings in core is perfectly valid

          Please comment on it, right now I have to continue writing tests for 2.5 release and I can not invest more time on looking for complex solutions.

          Show
          David Monllaó added a comment - After writing a few tests I think that we can not use the nasty strings as much as we'd like. The main reason is that when we use a nasty string in a step like I fill "Field name" with "$NASTYSTRING" the "Field name" field can apply a PARAM_TEXT or a PARAM_CLEAN, PARAM_RAW.... filter, when we assert against this $NASTYSTRING values in form text fields (for example with the "Field name" field should contain "$NASTYSTRING" ) it can work, but only if there is no filter applied (PARAM_RAW) Also when we expect to find the $NASTYSTRINGs contents embedded in HTML as part of a link or HTML content (for example I should see "$NASTYSTRING" or I click on "$NASTYSTRING" ) the HTML entity codes like & are rendered as the entity, this would be a minor issue that could be solved with a html_entity_decode() in a few places, but, adding to this, browsers transforms multiple %20%20%20 into one single and transforms (or not depending on the case) non-breaking spaces into regular %20 spaces, this without mentioning the previous filter that would be applied when submitting the form with the nasty string on it. When writing tests for headless browsers (without JS) the nasty strings are still very useful as there are no rendering issues like in regular browsers. Another important point is that test writers will not know about the implementation details, so only the code developer can write tests using this nasty strings vars because they know where they can find what kind of filter and they can make good use of them looking for specific problems. The other option we have is to remove a few of those nasty characters (nightmare characters I would call them after spending a couple of days with this issues) to allow this strings "skip" the derived problems in most of the cases, but I know that we are losing the main purpose of the nasty strings doing this. I would propose two approaches: Try getting rid of HTML entities and non-breaking spaces, and continue using $NASTYSTRINGs as much as we can Restrict it's usage to specific cases like asserting field contents filtered with PARAM_RAW and checking that the current usage of nasty strings in core is perfectly valid Please comment on it, right now I have to continue writing tests for 2.5 release and I can not invest more time on looking for complex solutions.

            People

            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: