Issue Details (XML | Word | Printable)

Key: MDL-17358
Type: Bug Bug
Status: Closed Closed
Resolution: Not a bug
Priority: Major Major
Assignee: Tim Hunt
Reporter: Nicolas Martignoni
Votes: 11
Watchers: 8
Operations

Add/Edit UI Mockup to this issue
If you were logged in you would be able to see more operations.
Moodle

Non-breaking space in a lang string after $a will not work due to PHP wiredness

Created: 22/Nov/08 08:17 AM   Updated: 12/Apr/09 09:08 PM
Return to search
Component/s: Languages, Lib
Affects Version/s: 1.9.3
Fix Version/s: None

File Attachments: 1. File test.php (0.7 kB)
2. File test2.php (2 kB)
3. File test3.php (2 kB)
4. File test4.php (1 kB)
5. File test5.php (3 kB)

Issue Links:
Relates
 

Participants: Dan Poltawski, David Mudrak, Eloy Lafuente (stronk7), Helen Foster, Koen Roggemans, Nicolas Martignoni and Tim Hunt
Security Level: None
QA Assignee: Nicolas Martignoni
Resolved date: 10/Apr/09
Affected Branches: MOODLE_19_STABLE


 Description  « Hide
The sprintf() function used in get_string_from_file() (moodlelib.php) doesn't parse correctly all UTF-8 chars, notably multibyte ones.

As a consequence, lang strings cannot use UTF-8 chars like the non-breaking space.

 All   Comments   Change History   Version Control      Sort Order: Ascending order - Click to sort in descending order
Helen Foster added a comment - 03/Dec/08 06:54 PM
VIP watchers added.

Nicolas Martignoni added a comment - 24/Jan/09 10:06 PM
See http://www.php.net/manual/en/function.sprintf.php#55837 for a possible solution. But what about performance?

Nicolas Martignoni added a comment - 24/Jan/09 10:21 PM

Eloy Lafuente (stronk7) added a comment - 26/Jan/09 04:02 AM
Hi Nicolas... can you execute the attached test in your server to see what it returns? It's a simple string, containing one utf-8 no-break space (C2A0) and here, being processed by sprintf() doesn't cause apparently any problem.

Ciao


Eloy Lafuente (stronk7) made changes - 26/Jan/09 04:02 AM
Field Original Value New Value
Attachment test.php [ 16137 ]
David Mudrak added a comment - 26/Jan/09 06:19 AM
Hi Eloy. Your test.php passes at my web server as well. However there are some more examples at the pages referenced by Nicolas where this issue should appear. If I understood it correctly (haven't spent much time on it), the problem is with the percentage sign contained in multibyte strings code.

Nicolas Martignoni added a comment - 26/Jan/09 06:33 AM
Hi Eloy and thanks for looking at this. Just like David, your test file runs correctly on my servers.

However the problem occurs with embedded vars ($a) in strings. Please have a look at the test2.php file I attach here now. It describes the problem I experience. Thanks in advance for your feedback.


Nicolas Martignoni made changes - 26/Jan/09 06:33 AM
Attachment test2.php [ 16138 ]
Eloy Lafuente (stronk7) added a comment - 26/Jan/09 07:25 AM
Bah!

Forgot my last comment!!! It's sprintf(). Bloody sprintf(). Deleting previous comment!


Eloy Lafuente (stronk7) made changes - 26/Jan/09 07:25 AM
Comment [ Yup,

it renders incorrect results... but I think it isn't the sprintf() function but the "eval" one. I just echoed the $result variable after get_string_from_file() and I get:

result: $resultstring= sprintf("Connecté sous le nom « $a »");

that seems to be correct.

And echoing results after eval() I get the borked string. Bloody eval!

]
Eloy Lafuente (stronk7) added a comment - 26/Jan/09 07:41 AM
aha!

I think I've found a workaround:

This is current string to evaluate, that is broken:

$resultstring= sprintf("Connecté sous le nom « $a »");

But this seems to be properly evaluated:

$resultstring= sprintf("Connecté sous le nom « ". $a. " »");

Silly trick but seems that we could, simply, replace all the $a (or $a->whatever) by " . $a . ", and that way, forcing the concatenation... results seem to be properly processed by sprintf/eval. Stupid PHP after all.

My only concern is about the cost of applying that replace (regular expression) to all the strings all the time. Could be too much.


Eloy Lafuente (stronk7) added a comment - 26/Jan/09 07:44 AM
Hehe, another workaround that is properly processed too:

$resultstring= sprintf("Connecté sous le nom « {$a} »");

Idiot PHP. :-D


Eloy Lafuente (stronk7) added a comment - 26/Jan/09 07:54 AM
Attached the test3.php example that uses the {$a} trick to allow sprintf()/eval() to work properly in the example given.

Ciao


Eloy Lafuente (stronk7) made changes - 26/Jan/09 07:54 AM
Attachment test3.php [ 16139 ]
Eloy Lafuente (stronk7) added a comment - 26/Jan/09 08:13 AM
Just for reference, some execution times, to compare:
  • 1000 strings processed, without the regular expression: 0.032 seconds
  • 1000 strings processed, with the regular expression: 0.039 seconds

Difference: 0.007 seconds (21% slower)

  • 100000 strings processed, without the regular expression: 3.365 seconds
  • 100000 strings processed, with the regular expression: 4.114 seconds

Difference: 0.749 seconds (22% slower)

Note that this is a comparison between the execution of test2.php and test3.php above, processing the same "in-memory" string 1000 and 100000 times.

I guess differences in real world (where strings are "in-file", hence slower access) will be really smaller. More yet, 0.007 seconds for 1000 strings seems to be a viable solution if finally implemented.

Ciao


David Mudrak added a comment - 26/Jan/09 04:07 PM
Or, we can make sure that $a parameters are properly quoted by the curly brackets at the time of writing the string via the translation tool. Let us check this condition into the syntax checker, recently published by Petr (was it he?). No need to to search and replace on the fly.
Translators not-using the translation UI should run the checker before they commit anyway.

Nicolas Martignoni added a comment - 26/Jan/09 05:49 PM
After running test3.php, I confirm that the solution with {$a} works correctly for me too Bloody PHP indeed

The grep solution would work out of the box for a small performance cost. With the checker solcution, there's a risk for the translator to forget the check before committing (I know myself ).

In any case, both solution will be better than actual situation.


Nicolas Martignoni added a comment - 26/Jan/09 05:55 PM
For the grep solution, I'll suggest to take potential {} in the lang strings:

return preg_replace('/{?(\$a(->\w+)?)}?/', '{\\1}', $temp);

or even

return preg_replace('/{(\$a(->\w+)?)}/', '{\1}}', $temp);


Koen Roggemans added a comment - 26/Jan/09 07:17 PM
Petr's script is in MDL-14152

Please include it in moodle/admin with all the other nice testscripts (might be the time to create a folder for them )


Eloy Lafuente (stronk7) added a comment - 27/Jan/09 03:18 AM
Hi,

I think that we shouldn't make translators life harder here, by forcing them to know about those {} internals at all. IMO, if the times are aceptable, we should grep in code and leave them free from complexities.

Only if times aren't aceptable I'd convert/produce that in the editor.

Just my opinion.


David Mudrak added a comment - 27/Jan/09 03:51 AM
What about a pre-commit CVS hook script to search and replace all language files before they are actually saved into the repository. All I am leading towards is to do this thing once instead of doing it everytime we load the strings.

Eloy Lafuente (stronk7) added a comment - 27/Jan/09 03:57 AM
But that doesn't work for local changes and so on, plus the idea of modifying files in the commit process, wow. -10

IMO it should be handled from core. The only thing is to measure carefully the cost before doing that. Hopefully, some day, PHP won't bug there, so the workaround will be erased safely.

Ciao


David Mudrak added a comment - 27/Jan/09 04:21 AM
Yeah, make sense... So what about sanity this by cron before generating the lang pack zip archives? Local modifications should be OK because master pack will already contain {$a} form so admin will preserve such format. Just a final idea...

Nicolas Martignoni added a comment - 27/Jan/09 04:43 AM
Yes Eloy, I agree with you. +1 from me (if it counts) to handle this in core, if the cost is not too high.

And re PHP bug: it seems that php6 will be UTF-8 compatible, so there's some hope here.

Ciao


Nicolas Martignoni added a comment - 29/Jan/09 06:50 AM
Eloy, do you have an estimate date for pushing your fix into CVS, and if it's a candidate for backporting in 1.8.x and 1.9.x?

Ciao!


Eloy Lafuente (stronk7) added a comment - 29/Jan/09 07:58 AM
Well, the commit is a matter of a few seconds... but it seems that there are two alternatives:

1)- apply the patch and leave moodle to process it, leaving languages unmodified.
2)- don't apply the patch, updating all languages and editor to use {} plus sanity checks, leaving moodle unmodified.

My +1 is about to implement 1), apply the patch and forget modifications in langs.

But I'd like to hear some more votes here before doing that.

Ciao


Eloy Lafuente (stronk7) added a comment - 10/Feb/09 07:30 PM
Adding Martin and Tim here.

Martin, after some HQ chat about this, Tim detected that that sprintf() has been there since the beginning. See:

http://cvs.moodle.org/moodle/lib/moodlelib.php?view=diff&pathrev=MOODLE_19_STABLE&r1=1.25&r2=1.26

Do you remember any case where it is necessary? It seems that directly eval() the string is enough to perform all the $a(whatever) parsing.

Also, Tim comments about to use one manual solution for translators having the problem as far as it doesn't affect too many strings, instead of preg_replace always the evaled sentence.


Eloy Lafuente (stronk7) added a comment - 10/Feb/09 08:01 PM
Added test 4 that:
  • Shows no differences between using sprintf() and not using it.
  • Shows that the original preg_replace() isn't valid for strings like «$a» (without spaces).
  • Shows that adding the /u (unicode) modifier to the preg_replace() it processes the strings properly.

I any case, things to decide:

1) Erase sprintf() use if we don't need it. HEAD only for now + unit tests. Backported later.
2) Automated vs. manual {} parsing.

Ciao


Eloy Lafuente (stronk7) made changes - 10/Feb/09 08:01 PM
Attachment test4.php [ 16243 ]
Tim Hunt added a comment - 08/Apr/09 07:35 PM
FYI, I have removed the sprintf in HEAD. Would be good if you could re-test there.

Nicolas Martignoni added a comment - 10/Apr/09 02:21 AM
New file test5.php shows that the problem is not sprintf() function, but eval(). Silly PHP

{$a} workaround still applies.

Is there any alternative to eval() for detecting parse errors?


Nicolas Martignoni made changes - 10/Apr/09 02:21 AM
Attachment test5.php [ 16833 ]
Tim Hunt added a comment - 10/Apr/09 09:46 AM
I'll look into this, since I have been messing with get_string.

If eval is really not Unicode-safe, then we may have to resort to writing our own parser to replace $a and $a->xxx in lang strings. Actually, that should be an easy preg_replace. The only thing to be careful about is to know whether we are relying on eval to do anything else for us. I guess it also replaces \" and \$ with " and $ respectively.


Tim Hunt made changes - 10/Apr/09 09:46 AM
Component/s Lib [ 10096 ]
Component/s General [ 10049 ]
Fix Version/s 2.0 [ 10122 ]
Assignee moodle.com [ moodle.com ] Tim Hunt [ timhunt ]
Tim Hunt made changes - 10/Apr/09 12:37 PM
Summary Some UTF-8 chars in lang strings are not supported Non-breaking space in a lang string after $a will not work due to PHP wiredness
Tim Hunt added a comment - 10/Apr/09 12:37 PM
OK, you have to RTFM (http://au2.php.net/manual/en/language.variables.basics.php) and then bang your head against the desk at the stupidity of PHP.

Non-breaking space (character \xa0) is a legal character in a variable name, so, if you are insane you could write code like

$user name = "Nicolas Martignoni";
echo 'This is the output of Moodle:<br />';
echo get_string('loggedinas', 'moodle', $user name);

Anyway, that explains why '$a »' does not work as expected. It is looking for a variable called '$a ' (a followed by non-breaking-space) which obviously does not exist.

The correct work-around is to define

$string['loggedinas'] = 'Connecté sous le nom « {$a} »';

That is, use curly braces to separate the variable from the following character, just like you would say

$string['distancerun'] = 'You have run {$a}km.';

to avoid referring to the variable $akm.

So, this is resolve not-a-bug, but translators need to be aware. I'll post in the Languages forum.


Tim Hunt made changes - 10/Apr/09 12:37 PM
Resolution Not a bug [ 7 ]
Fix Version/s 2.0 [ 10122 ]
Status Open [ 1 ] Resolved [ 5 ]
Nicolas Martignoni added a comment - 10/Apr/09 04:15 PM
Thanks Tim, I was getting mad, as I couldn't understand what was going here. What a stupidity...

Nicolas Martignoni added a comment - 10/Apr/09 04:19 PM
Indeed not a bug, just a silly PHP convention.

Nicolas Martignoni made changes - 10/Apr/09 04:19 PM
Status Resolved [ 5 ] Closed [ 6 ]
QA Assignee mina
Dan Poltawski added a comment - 10/Apr/09 07:30 PM
Wow, that is ridiculous - is it worth greping through all the language packs and ensuring that we use braces everywhere in the lang packs?

Koen Roggemans added a comment - 12/Apr/09 05:16 PM
Is it possible to add to the language pack editor to add curly brackets arround $a if they are not already there? That way, opening and saving a language file would fix all $a strings in one go. New errors are not possible anymore, old ones, if they exist, are easy to fix and will be fixed over time automatically.

Tim Hunt added a comment - 12/Apr/09 09:08 PM
Sounds like a good idea, but I think that changing the editing tool is separate from the underlying issue discussed here, so I created MDL-18841.

Tim Hunt made changes - 12/Apr/09 09:08 PM
Link This issue has a non-specific relationship to MDL-18841 [ MDL-18841 ]
martignoni committed 53 files to 'Lang CVS' - 14/Apr/09 04:41 PM
MDL-17358 Replaced &nbsp; entities with utf-8 non-breaking spaces and other tweaks
MODIFY fr_ca_utf8/help/resource/program.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/html.html   Rev. 1.2    (+0 -5 lines)
MODIFY fr_ca_utf8/help/quiz/formatcustom.html   Rev. 1.2    (+0 -5 lines)
MODIFY fr_ca_utf8/help/quiz/shortanswer.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/moodle.php   Rev. 1.5    (+2 -2 lines)
MODIFY fr_ca_utf8/help/glossary/studentcanpost.html   Rev. 1.2    (+0 -7 lines)
MODIFY fr_ca_utf8/help/quiz/repeatattempts.html   Rev. 1.2    (+0 -5 lines)
MODIFY fr_ca_utf8/help/glossary/usedynalink.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/quiz/formatblackboard.html   Rev. 1.2    (+0 -5 lines)
MODIFY fr_ca_utf8/help/workshop/leaguetable.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/workshop/specimen.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/assignment.php   Rev. 1.4    (+4 -4 lines)
MODIFY fr_ca_utf8/help/workshop/graded.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/workshop/gradingsubmissions.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/workshop/managing.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/quiz/questiontypes.html   Rev. 1.2    (+0 -12 lines)
MODIFY fr_ca_utf8/help/workshop/addingacomment.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/workshop/calculatingfinalgrade.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/workshop/includeteachersgrade.html   Rev. 1.2    (+0 -5 lines)
MODIFY fr_ca_utf8/help/glossary/mainglossary.html   Rev. 1.3    (+0 -7 lines)
MODIFY fr_ca_utf8/help/glossary/globalglossary.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/forum/allowdiscussions.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/workshop/breakdownoffinalgrade.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/glossary/displayformat.html   Rev. 1.3    (+0 -7 lines)
MODIFY fr_ca_utf8/help/workshop/elements.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/resource/resourcetype.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/workshop/anonymous.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/workshop/nassessmentsofstudentsubmissions.html   Rev. 1.2    (+0 -5 lines)
MODIFY fr_ca_utf8/help/forum/attachment.html   Rev. 1.3    (+0 -7 lines)
MODIFY fr_ca_utf8/help/quiz/review.html   Rev. 1.2    (+0 -5 lines)
MODIFY fr_ca_utf8/help/Attic/text.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/forum/index.html   Rev. 1.2    (+0 -5 lines)
MODIFY fr_ca_utf8/help/quiz/formatmissingword.html   Rev. 1.2    (+0 -5 lines)
MODIFY fr_ca_utf8/help/quiz/index.html   Rev. 1.2    (+0 -5 lines)
MODIFY fr_ca_utf8/help/workshop/assignmenttype.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/quiz/createmultiple.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/forum/ratings.html   Rev. 1.2    (+0 -7 lines)
MODIFY fr_ca_utf8/help/resource/summary.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/assignment/resubmit.html   Rev. 1.2    (+0 -5 lines)
MODIFY fr_ca_utf8/help/quiz/multichoice.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/workshop/gradingassessments.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/quiz/import.html   Rev. 1.3    (+0 -7 lines)
MODIFY fr_ca_utf8/help/glossary/mods.html   Rev. 1.2    (+0 -7 lines)
MODIFY fr_ca_utf8/help/quiz/categories.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/workshop/gradingstrategy.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/workshop/index.html   Rev. 1.2    (+0 -5 lines)
MODIFY fr_ca_utf8/help/assignment/assignmenttype.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/forum/forumtype.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/quiz/numerical.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/forum/subscription.html   Rev. 1.3    (+0 -5 lines)
MODIFY fr_ca_utf8/help/glossary/linkcategory.html   Rev. 1.3    (+0 -7 lines)
MODIFY fr_ca_utf8/help/survey/mods.html   Rev. 1.2    (+1 -8 lines)
MODIFY fr_ca_utf8/help/quiz/multianswer.html   Rev. 1.3    (+0 -8 lines)
martignoni committed 5 files to 'Lang CVS' - 14/Apr/09 06:25 PM
MDL-17358 Replaced &nbsp; entities with utf-8 non-breaking spaces
MODIFY fr_ca_utf8/admin.php   Rev. 1.3    (+2 -2 lines)
MODIFY fr_ca_utf8/moodle.php   Rev. 1.6    (+3 -3 lines)
MODIFY fr_ca_utf8/assignment.php   Rev. 1.5    (+4 -4 lines)
MODIFY fr_ca_utf8/wiki.php   Rev. 1.2    (+2 -2 lines)
MODIFY fr_ca_utf8/chat.php   Rev. 1.3    (+3 -3 lines)