One big inform of the preliminary steps until here can be found at:
Basically, we are using both the TDS and/or the ODBTP "drivers" because Moodle is 100% UTF-8, MSSQL is 100% UCS-2 and they provide us with transparent char conversion between them.
Some hours ago, I was testing how Moodle performs with MSSQL when I mixed, in the same field, some Spanish UTF-8 chars with some Japanese UTF-8 chars (the string, exactly is: "áéíóúß????") and then I got some ??? from DB, then it was supposed that was working perfectly.
After triple-checking that everything was ok in the client side, I began to debug a bit more all the process. By using the the SQL*Server Profiler I was able to see how that string was arriving to MSSQL and, surprise, I was able to see the correct "áéíóúß????" value in that tool.
But, at the end, this was inserted instead: "áéíóúß????" (note that the three Japanese chars - although they are proper UCS-2 chars - are transformed to question marks).
Then I changed SQL*server collation to "Japanese" and then, the string inserted was: "aeiou?????", i.e. all the normal latin chars without the accents, plus two ?? marks for the symbols plus the Japanese ideograms properly stored.
Then I did one more test, setting the SQL*Server collation to "latin2" and I introduced some Czech characters in the string above. Only characters in the "latin2" charset were respected, the rest, once more got converted to "??".
If those NCHAR, NVARCHAR and NTEXT DB fields are specially prepared to store UCS-2 data, and we are sending UCS-2 data to them, it seems pretty incorrect that the server was performing some conversion based in the "collation" info.
I've spent some hours looking and reading for a solution for the problem (being able to send UCS-2 data mixing different encodings) and I must say that I've found, at least, two alternatives.
- Prepend the N char to all the "constant strings" present in all the queries.
- Use prepared statements everywhere.
Both this solutions (specially the second one) seems to work properly (attached example script to run under Moodle) but perhaps it would be too much for Moodle 1.7 because the number and complexity of places to modify.
So, perhaps we should try to discover if there is some alternative to skip such "internal" conversion that MSSQL is doing with data that is 100% correct UCS-2. Perhaps some server locales will allow all the data, or some switch will disable that "feature"....
The main effect of all this issue is that, although we are running under UTF-8 and that mix of chars is perfectly allowed, in the practice sites running MSSQL won't be able to mix such characters because some of them (those not matching the collation) will be lost.