Moodle
  1. Moodle
  2. MDL-29390 DB layer improvements 2.2 META
  3. MDL-29322

oracle driver creates varchar size in bytes instead of utf-8 chars

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 2.0, 2.1, 2.2
    • Fix Version/s: 2.2
    • Component/s: Database SQL/XMLDB
    • Labels:
    • Testing Instructions:
      Hide

      0) This requires access as DBA to grant some perms and install PL/SQL packages.

      1) Install Moodle 2.2. under Oracle => no error should happen

      2) Upgrade Moodle 2.1 to 2.2 under Oracle => no error should happen

      3) Run the DB functional tests: Some failures related to column lengths may happen.

      Notes for 3):

      • More exactly the errors happen in one test_add_field() and one test_temp_tables() test, both asserting values related with max_length.
      • Only those and the concat failure should happen (don't forget to install the moodle PL/SQL package or more failures will be shown).
      • Finally, note that if MDL-29415 has been integrated too, then no failures related to column lengths will happen anymore.
      Show
      0) This requires access as DBA to grant some perms and install PL/SQL packages. 1) Install Moodle 2.2. under Oracle => no error should happen 2) Upgrade Moodle 2.1 to 2.2 under Oracle => no error should happen 3) Run the DB functional tests: Some failures related to column lengths may happen. Notes for 3): More exactly the errors happen in one test_add_field() and one test_temp_tables() test, both asserting values related with max_length. Only those and the concat failure should happen (don't forget to install the moodle PL/SQL package or more failures will be shown). Finally, note that if MDL-29415 has been integrated too, then no failures related to column lengths will happen anymore.
    • Affected Branches:
      MOODLE_20_STABLE, MOODLE_21_STABLE, MOODLE_22_STABLE
    • Fixed Branches:
      MOODLE_22_STABLE
    • Pull from Repository:
    • Pull Master Branch:
      w37_MDL-29322_m22_oraclebytes
    • Rank:
      18857

      Description

      VARCHAR2(255) should be in fact VARCHAR2(255 CHAR) in order to prevent problems on servers with default configuration.

      Sources of information:
      http://ss64.com/ora/syntax-datatypes.html
      http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/sql_elements001.htm
      http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch3globenv.htm#i1008522

        Issue Links

          Activity

          Hide
          Petr Škoda added a comment -

          This should be probably cherry picked to all stable branches.

          Existing sites are not fixed, people should use dbtransfer tool to fix the table structure (I would personally recommend to give up on Oracle completely.)

          Show
          Petr Škoda added a comment - This should be probably cherry picked to all stable branches. Existing sites are not fixed, people should use dbtransfer tool to fix the table structure (I would personally recommend to give up on Oracle completely.)
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Uhm, I'm not sure that length*3 is the solution here (there are DBs our there running under the correct semantics, I've recommended it more than once). That way we'll lost the track completely.

          We have had already reports about this issue and the solution is to set proper semantics (NLS) at system / session and/or switch to nvarchar/ntext.

          That's something I've had in my mind since ages ago but always delaying it to the moment of adding support of varchar(>255) like the point to implement it.

          Show
          Eloy Lafuente (stronk7) added a comment - Uhm, I'm not sure that length*3 is the solution here (there are DBs our there running under the correct semantics, I've recommended it more than once). That way we'll lost the track completely. We have had already reports about this issue and the solution is to set proper semantics (NLS) at system / session and/or switch to nvarchar/ntext. That's something I've had in my mind since ages ago but always delaying it to the moment of adding support of varchar(>255) like the point to implement it.
          Hide
          Petr Škoda added a comment - - edited

          My new unittests prove it is the solution unless I did set up my test server incorrectly. Please point me to detailed set-up instructions for my XE and some Oracle docs that explains this more.

          I do not care what happens to oracle at all, I just do not want this sh.. to block my other work on database layer.

          (the unittests are here: https://github.com/skodak/moodle/compare/w36_MDL-29314_m22_dbindexes...w36_MDL-29313_m22_varcharenforce)

          Show
          Petr Škoda added a comment - - edited My new unittests prove it is the solution unless I did set up my test server incorrectly. Please point me to detailed set-up instructions for my XE and some Oracle docs that explains this more. I do not care what happens to oracle at all, I just do not want this sh.. to block my other work on database layer. (the unittests are here: https://github.com/skodak/moodle/compare/w36_MDL-29314_m22_dbindexes...w36_MDL-29313_m22_varcharenforce )
          Hide
          Petr Škoda added a comment -

          If I understand it correctly the NLS affects the actual limit when VARCHAR2 (255 CHAR) or NVARCHAR2 (255) used, at present we still use VARCHAR2 where VARCHAR(255*3 BYTE) should be safe in any NLS configuration. The only problem is that the field may store longer texts than expected if only ascii used.

          In any case we should imo use either BYTE or CHAR in current VARCHAR2 because the default is bytes, you can override it to CHARs in server settings. Maybe the "Configuring Apache" section in http://docs.moodle.org/20/en/Installing_Oracle_for_PHP sets up correct UTF-8 aware NLS in Linux, but there is nothing said about Windows.

          Looks like the explanation at http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch3globenv.htm#i1008522
          seems to confirm I did my patch right.

          I am going to do some more testing now...

          Show
          Petr Škoda added a comment - If I understand it correctly the NLS affects the actual limit when VARCHAR2 (255 CHAR) or NVARCHAR2 (255) used, at present we still use VARCHAR2 where VARCHAR(255*3 BYTE) should be safe in any NLS configuration. The only problem is that the field may store longer texts than expected if only ascii used. In any case we should imo use either BYTE or CHAR in current VARCHAR2 because the default is bytes, you can override it to CHARs in server settings. Maybe the "Configuring Apache" section in http://docs.moodle.org/20/en/Installing_Oracle_for_PHP sets up correct UTF-8 aware NLS in Linux, but there is nothing said about Windows. Looks like the explanation at http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch3globenv.htm#i1008522 seems to confirm I did my patch right. I am going to do some more testing now...
          Hide
          Petr Škoda added a comment - - edited

          God bless unittests! The VARCHAR2(255 CHAR) works fine on my test server, yay! I have changed the patch to use CHAR insead of the more backwards compatible 3*BYTE option.

          Show
          Petr Škoda added a comment - - edited God bless unittests! The VARCHAR2(255 CHAR) works fine on my test server, yay! I have changed the patch to use CHAR insead of the more backwards compatible 3*BYTE option.
          Hide
          Eloy Lafuente (stronk7) added a comment - - edited

          Petr, when I talk about NLS I'm refering to use:

          ALTER session set NLS_LENGTH_SEMANTICS = 'CHAR'
          

          and done, no need to harcode/specifiy it on each DDL.

          It also can be set @ system (ALTER system/init.ora) level and sounds to me that, also there were one php.ini setting controlling it but, for sure, specifying it on connection (session) is 100% safe (we already configure other things that way).

          Also, if I'm not wrong, it would be relatively easy to build one script able to change all current BYTE column specs to CHAR and offer it to the masses. Surely one thing to add to the XMLDB editor "reports/helpers".

          Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - - edited Petr, when I talk about NLS I'm refering to use: ALTER session set NLS_LENGTH_SEMANTICS = 'CHAR' and done, no need to harcode/specifiy it on each DDL. It also can be set @ system (ALTER system/init.ora) level and sounds to me that, also there were one php.ini setting controlling it but, for sure, specifying it on connection (session) is 100% safe (we already configure other things that way). Also, if I'm not wrong, it would be relatively easy to build one script able to change all current BYTE column specs to CHAR and offer it to the masses. Surely one thing to add to the XMLDB editor "reports/helpers". Ciao
          Hide
          Petr Škoda added a comment -

          Every extra "ALTER session set" executed in each http request has performance cost, no?
          The BYTE/CHAR specification has even higher preference than your session statement and no perf cost, right?

          Show
          Petr Škoda added a comment - Every extra "ALTER session set" executed in each http request has performance cost, no? The BYTE/CHAR specification has even higher preference than your session statement and no perf cost, right?
          Hide
          Eloy Lafuente (stronk7) added a comment - - edited

          1) Performance cost, well, sure it is not 0ms, but not far from there. Also note we would need that only on install/upgrade. Not for normal operations.
          2) Higher preference, well, it's not about preference. It's about how columns are created by default (without specifying byte/char). Simply the default semantic.

          In any case, if we are going to modify this, perhaps we should change all our varchar2() types to nvarchar2() and done. It's more correct and it uses CHAR semantics by definition (you cannot apply BYTE semantic to one nvarchar2 column).

          Also, it seems that the script to make the conversion is pretty straightforward (no need to rebuild indexes and friends):

          
          create table testlen (
              varchar2byte  varchar2(10 byte),
              varchar2char  varchar2(10 char),
              nvarchar2char nvarchar2(10),
          constraint pktest primary key (varchar2byte));
          
          create unique index testleninx on testlen (varchar2byte, varchar2char, nvarchar2char);
          
          select * from col where tname = 'TESTLEN';
          
          insert into testlen (varchar2byte, varchar2char, nvarchar2char) values ('€', 'áéíóú', '日本日本日本日本日本');
          
          alter table testlen modify varchar2byte nvarchar2(20);
          alter table testlen modify varchar2char nvarchar2(20);
          alter table testlen modify nvarchar2char nvarchar2(20);
          
          select * from col where tname = 'TESTLEN';
          
          desc testlen;
          
          select * from testlen;
          
          drop table testlen;
          
          
          Show
          Eloy Lafuente (stronk7) added a comment - - edited 1) Performance cost, well, sure it is not 0ms, but not far from there. Also note we would need that only on install/upgrade. Not for normal operations. 2) Higher preference, well, it's not about preference. It's about how columns are created by default (without specifying byte/char). Simply the default semantic. In any case, if we are going to modify this, perhaps we should change all our varchar2() types to nvarchar2() and done. It's more correct and it uses CHAR semantics by definition (you cannot apply BYTE semantic to one nvarchar2 column). Also, it seems that the script to make the conversion is pretty straightforward (no need to rebuild indexes and friends): create table testlen ( varchar2byte varchar2(10 byte ), varchar2char varchar2(10 char ), nvarchar2char nvarchar2(10), constraint pktest primary key (varchar2byte)); create unique index testleninx on testlen (varchar2byte, varchar2char, nvarchar2char); select * from col where tname = 'TESTLEN'; insert into testlen (varchar2byte, varchar2char, nvarchar2char) values ('€', 'áéíóú', '日本日本日本日本日本'); alter table testlen modify varchar2byte nvarchar2(20); alter table testlen modify varchar2char nvarchar2(20); alter table testlen modify nvarchar2char nvarchar2(20); select * from col where tname = 'TESTLEN'; desc testlen; select * from testlen; drop table testlen;
          Hide
          Petr Škoda added a comment -

          I do not like the idea of different init on some pages, that is very very bad imho.

          NVARCHAR2(255) is fine for me, I am going try it now.

          Show
          Petr Škoda added a comment - I do not like the idea of different init on some pages, that is very very bad imho. NVARCHAR2(255) is fine for me, I am going try it now.
          Hide
          Petr Škoda added a comment -

          BLOODY ORACLE!!!

          VARCHAR2(2048 CHAR) is ok, but NVARCHAR2(2048) throws error!

          Show
          Petr Škoda added a comment - BLOODY ORACLE!!! VARCHAR2(2048 CHAR) is ok, but NVARCHAR2(2048) throws error!
          Hide
          Petr Škoda added a comment - - edited

          I have switched it back to VARCHAR2(xx CHAR) because it is more compatible with current code that requires server wide change of defaults from BYTE--->CHAR.

          The NVARCHAR2 would lock us out from increasing our char size above 1333, with the current hack we can go up to 4000 without installation errors, Moodle would simply error out the same way it does now when you reach the 255 char limit. Statistically it should not be any real problem for all Western and some asian languages.

          I still vote for at least 2048 limit for out chars and simply keep ignoring any potential problems in oracle when used for non-western languages.

          Show
          Petr Škoda added a comment - - edited I have switched it back to VARCHAR2(xx CHAR) because it is more compatible with current code that requires server wide change of defaults from BYTE--->CHAR. The NVARCHAR2 would lock us out from increasing our char size above 1333, with the current hack we can go up to 4000 without installation errors, Moodle would simply error out the same way it does now when you reach the 255 char limit. Statistically it should not be any real problem for all Western and some asian languages. I still vote for at least 2048 limit for out chars and simply keep ignoring any potential problems in oracle when used for non-western languages.
          Hide
          Petr Škoda added a comment -

          Please note that some fields are expected to contain ascii only data - such as user preferences, limiting to 1333 does not make much sense imo...

          Show
          Petr Škoda added a comment - Please note that some fields are expected to contain ascii only data - such as user preferences, limiting to 1333 does not make much sense imo...
          Hide
          Eloy Lafuente (stronk7) added a comment - - edited
          create table a000 (id number(2), content varchar2(2048 CHAR));
          insert into a000 (id, content) select 1, rpad('€', 10, '€') from dual;
          insert into a000 (id, content) select 2, rpad('€', 100, '€') from dual;
          insert into a000 (id, content) select 3, rpad('€', 1000, '€') from dual;
          insert into a000 (id, content) select 4, rpad('€', 1333, '€') from dual;
          insert into a000 (id, content) select 5, rpad('€', 2048, '€') from dual;
          insert into a000 (id, content) select 6, rpad('€', 10000, '€') from dual;
          
          select id, length(content) from a000;
          
          drop table A000;
          

          Results:

          ID                     LENGTH(CONTENT)        
          ---------------------- ---------------------- 
          1                      10                     
          2                      100                    
          3                      1000                   
          4                      1333                   
          5                      1334                   
          6                      1334  
          

          -1 to pass the 1333 barrier. If you prefer a rounded number let's use 1024 for now. It's already 4x current max allowed. Or 1280 (5x).

          But, results above show the sort of problems that will happen once the real (1333 for UTF8) limit is passed. No matter if it's probable or no. It is possible 100%.

          Show
          Eloy Lafuente (stronk7) added a comment - - edited create table a000 (id number(2), content varchar2(2048 CHAR)); insert into a000 (id, content) select 1, rpad('€', 10, '€') from dual; insert into a000 (id, content) select 2, rpad('€', 100, '€') from dual; insert into a000 (id, content) select 3, rpad('€', 1000, '€') from dual; insert into a000 (id, content) select 4, rpad('€', 1333, '€') from dual; insert into a000 (id, content) select 5, rpad('€', 2048, '€') from dual; insert into a000 (id, content) select 6, rpad('€', 10000, '€') from dual; select id, length(content) from a000; drop table A000; Results: ID LENGTH(CONTENT) ---------------------- ---------------------- 1 10 2 100 3 1000 4 1333 5 1334 6 1334 -1 to pass the 1333 barrier. If you prefer a rounded number let's use 1024 for now. It's already 4x current max allowed. Or 1280 (5x). But, results above show the sort of problems that will happen once the real (1333 for UTF8) limit is passed. No matter if it's probable or no. It is possible 100%.
          Hide
          Petr Škoda added a comment -

          But still the 1024 is not going to eliminate problems with data that does not fit at present, you are saying that it is better to break it for all four databases instead of just one? Alternative solution is to use text fields, but that will create even more problems for oracle and mssql. We should imo choose the lesser evil and that is to ignore potential problems of 2048length chars in oracle and have it 100% working in all other supported databases.

          Show
          Petr Škoda added a comment - But still the 1024 is not going to eliminate problems with data that does not fit at present, you are saying that it is better to break it for all four databases instead of just one? Alternative solution is to use text fields, but that will create even more problems for oracle and mssql. We should imo choose the lesser evil and that is to ignore potential problems of 2048length chars in oracle and have it 100% working in all other supported databases.
          Hide
          Petr Škoda added a comment - - edited

          In any case 1333 is an improvement and the current patch allows us to go up to 4000 chars (ascii only in case of oracle). We can discuss >1333 sizes later.

          My +1 for 1333 now and lets discuss the 2048/4000 later.

          Show
          Petr Škoda added a comment - - edited In any case 1333 is an improvement and the current patch allows us to go up to 4000 chars (ascii only in case of oracle). We can discuss >1333 sizes later. My +1 for 1333 now and lets discuss the 2048/4000 later.
          Hide
          Eloy Lafuente (stronk7) added a comment - - edited

          Oki, I've been reviewing the FOUR (linked) issues (ordered list below):

          http://tracker.moodle.org/browse/MDL-29322
          http://tracker.moodle.org/browse/MDL-29314
          http://tracker.moodle.org/browse/MDL-29321
          http://tracker.moodle.org/browse/MDL-29313

          And I think they are perfect. BUT I consider that this (the change of semantics on Oracle) requires, for sure:

          1) document it both in oracle and release notes docs.
          2) build one report able to provide the statements to convert the whole DB to the new CHAR semantics.

          The question is:

          1) Should I create two blocker followups for 1) and 2) and then integrate these 4.
          2) Should I keep these 4 opened until 1) and 2) are implemented / done.

          I hate this complexity, grrr. Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - - edited Oki, I've been reviewing the FOUR (linked) issues (ordered list below): http://tracker.moodle.org/browse/MDL-29322 http://tracker.moodle.org/browse/MDL-29314 http://tracker.moodle.org/browse/MDL-29321 http://tracker.moodle.org/browse/MDL-29313 And I think they are perfect. BUT I consider that this (the change of semantics on Oracle) requires, for sure: 1) document it both in oracle and release notes docs. 2) build one report able to provide the statements to convert the whole DB to the new CHAR semantics. The question is: 1) Should I create two blocker followups for 1) and 2) and then integrate these 4. 2) Should I keep these 4 opened until 1) and 2) are implemented / done. I hate this complexity, grrr. Ciao
          Hide
          Eloy Lafuente (stronk7) added a comment - - edited

          so...

          1) create followups for the documentation and the migration script and integrate this (and the 3 dependent ones)

          or

          2) keep this (and the 3 dependent ones) open till the docs and the migration script are ready

          Your decision, ciao

          Show
          Eloy Lafuente (stronk7) added a comment - - edited so... 1) create followups for the documentation and the migration script and integrate this (and the 3 dependent ones) or 2) keep this (and the 3 dependent ones) open till the docs and the migration script are ready Your decision, ciao
          Hide
          Petr Škoda added a comment -

          You love Oracle - you decide....

          Show
          Petr Škoda added a comment - You love Oracle - you decide....
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Delaying the 4 issues for next week.

          Show
          Eloy Lafuente (stronk7) added a comment - Delaying the 4 issues for next week.
          Hide
          Eloy Lafuente (stronk7) added a comment -

          The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week.

          TIA and ciao

          Show
          Eloy Lafuente (stronk7) added a comment - The main moodle.git repository has just been updated with latest weekly modifications. You may wish to rebase your PULL branches to simplify history and avoid any possible merge conflicts. This would also make integrator's life easier next week. TIA and ciao
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Doh,

          was working in the XMLDB report to allow the change of all columns from BYTE to CHAR when I detected that the change causes the dictionary view "COL" to return the new length in bytes always.

          So we also need to change oci_native_moodle_database->get_columns() to fetch the length in chars from user_tab_columns->char_length instead of col->width in order to get things consistent in get_columns(). Else I bet there are a lot of tests that will fail.

          Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - Doh, was working in the XMLDB report to allow the change of all columns from BYTE to CHAR when I detected that the change causes the dictionary view "COL" to return the new length in bytes always. So we also need to change oci_native_moodle_database->get_columns() to fetch the length in chars from user_tab_columns->char_length instead of col->width in order to get things consistent in get_columns(). Else I bet there are a lot of tests that will fail. Ciao
          Hide
          Eloy Lafuente (stronk7) added a comment -

          I've created:

          • MDL-29415 : About to fix the get_columns() method for oracle and add some more tests.
          • MDL-29416 : About to provide one xmldb-editor report to allow people to switch to CHAR semantics and document it.
          Show
          Eloy Lafuente (stronk7) added a comment - I've created: MDL-29415 : About to fix the get_columns() method for oracle and add some more tests. MDL-29416 : About to provide one xmldb-editor report to allow people to switch to CHAR semantics and document it.
          Hide
          Eloy Lafuente (stronk7) added a comment -

          Integrated, MDL-29415 will fix the regressions caused by this. Thanks!

          Show
          Eloy Lafuente (stronk7) added a comment - Integrated, MDL-29415 will fix the regressions caused by this. Thanks!
          Hide
          Aparup Banerjee added a comment -

          installation (2.1 and also 2.2) no problems. 2.1->2.2 no problems.

          the mentioned tests are passing for me too, all good for me!

          Show
          Aparup Banerjee added a comment - installation (2.1 and also 2.2) no problems. 2.1->2.2 no problems. the mentioned tests are passing for me too, all good for me!
          Hide
          Eloy Lafuente (stronk7) added a comment -

          And this code has been spread to all Moodle git and cvs repositories. Many thanks! Closing.

          Ciao

          Show
          Eloy Lafuente (stronk7) added a comment - And this code has been spread to all Moodle git and cvs repositories. Many thanks! Closing. Ciao

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: