Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-72342

Group import from CSV is broken by BOM

    XMLWordPrintable

    Details

    • Affected Branches:
      MOODLE_310_STABLE, MOODLE_311_STABLE, MOODLE_39_STABLE, MOODLE_400_STABLE
    • Fixed Branches:
      MOODLE_310_STABLE, MOODLE_311_STABLE
    • Pull 3.10 Branch:
      MDL-72432-MOODLE_310_STABLE
    • Pull 3.11 Branch:
      MDL-72432-MOODLE_311_STABLE
    • Pull Master Branch:
      MDL-72432-master
    • Workaround:
      Hide

      Remove BOM from the input CSV file is a workaround, but correctly processing BOM is a better option.

      Show
      Remove BOM from the input CSV file is a workaround, but correctly processing BOM is a better option.
    • Difficulty:
      Easy
    • Testing Instructions:
      Hide

      The test is done manually. It aims to check that the CSV file in UTF-8 encoding is correctly processed. In particular, the Byte Order Mark, which is a 3-byte header is appended to the name of the first column causing column name mismatch and importing errors. For instance, if the first column is expected to be groupname the header string will be \EF\BB\BFgroupname, which is not right.

      The following steps need to be performed to test the patch:

      1. Create a test course (if the attached csv is used the course name should be "test course 01")
      2. Go to Course -> Users -> Groups
      3. Click Import Groups
      4. Choose grouptest.csv file (file is attached to the ticket) in the file picker. Alternatively, another csv file can be picked, but to test the correct BOM processing the csv file needs BOM header bytes.
      5. CSV delimiter should be comma (,) and Encoding should be UTF-8 (which is default)
      6. Click "Import"
      7. Verify that no errors are displayed and the data is imported correctly

       

      Show
      The test is done manually. It aims to check that the CSV file in UTF-8 encoding is correctly processed. In particular, the Byte Order Mark, which is a 3-byte header is appended to the name of the first column causing column name mismatch and importing errors. For instance, if the first column is expected to be  groupname the header string will be  \EF\BB\BFgroupname , which is not right. The following steps need to be performed to test the patch: Create a test course (if the attached csv is used the course name should be " test course 01 ") Go to Course -> Users -> Groups Click Import Groups Choose grouptest.csv file (file is attached to the ticket) in the file picker. Alternatively, another csv file can be picked, but to test the correct BOM processing the csv file needs BOM header bytes. CSV delimiter should be comma ( , ) and Encoding should be UTF-8 (which is default) Click " Import " Verify that no errors are displayed and the data is imported correctly  

      Description

      Group import from CSV can fail when the CSV file is in UTF-8 and is containing the Byte Order Mark (EF BB BF) at the beginning of the file.

      Steps to reproduce:

      1. Create a test course (if the attached csv is used the course name should be "test course 01")
      2. Go to Course -> Users -> Groups
      3. Click Import Groups
      4. Choose grouptest.csv file (file is attached to the ticket) in the file picker. Alternatively, another csv file can be picked, but to test the correct BOM processing the csv file needs BOM header bytes.
      5. CSV delimiter should be comma (,) and Encoding should be UTF-8 (which is default)
      6. Click "Import"
      7. The error is as follows:

      "groupname" is not a valid field name
      Debug info:
      Error code: invalidfieldname
      Stack trace: * line 498 of /lib/setuplib.php: moodle_exception thrown

      • line 103 of /group/import.php: call to print_error()
        The logic of import.php trims spaces and quotes, but it does not take into account the BOM, which is neither space nor quote sequence. It just uses the variable $rawlines, which starts as "\EF\BB\BFgroupname,coursename":

       

      $header = explode($csvimport::get_delimiter($delimiter), array_shift($rawlines));

       

      Hence $header[0] is "\EF\BB\BFgroupname", which is incorrect and it does not pass the check causing the error.

      The correct way of obtaining the header information would be to use the method get_columns() from csv_import_reader class. The method load_csv_content()  in __ csv_import_reader  class __ trims BOM in the first line allowing to process the headers in the right way.

      The fix is very simple:

      ------------------------------- group/import.php -------------------------------index 58f405bd0cd..b50f880d738 100644
      @@ -61,8 +61,6 @@ if ($importform->is_cancelled()) {
           $text = $importform->get_file_content('userfile');
           $text = preg_replace('!\r\n?!', "\n", $text);
       
      -    $rawlines = explode("\n", $text);
      -
           require_once($CFG->libdir . '/csvlib.class.php');
           $importid = csv_import_reader::get_new_iid('groupimport');
           $csvimport = new csv_import_reader($importid, 'groupimport');
      @@ -95,7 +93,9 @@ if ($importform->is_cancelled()) {
               );
       
           // --- get header (field names) ---
      -    $header = explode($csvimport::get_delimiter($delimiter), array_shift($rawlines));
      +    // Using get_columns() ensures the Byte Order Mark is removed.
      +    $header = $csvimport->get_columns();
      +
           // check for valid field names
           foreach ($header as $i => $h) {
               $h = trim($h); $header[$i] = $h; // remove whitespace
      
      

        Attachments

        1. group_import_error.png
          group_import_error.png
          73 kB
        2. grouptest.csv
          0.0 kB
        3. MDL-72342_Test Passed.PNG
          MDL-72342_Test Passed.PNG
          45 kB

          Activity

            People

            Assignee:
            scottverbeek Scott Verbeek
            Reporter:
            katerynadegtyariova Kateryna Degtyariova
            Peer reviewer:
            Scott Verbeek Scott Verbeek
            Integrator:
            Eloy Lafuente (stronk7) Eloy Lafuente (stronk7)
            Tester:
            Gladys Basiana Gladys Basiana
            Participants:
            Component watchers:
            Andrew Lyons, Dongsheng Cai, Huong Nguyen, Jun Pataleta, Michael Hawkins, Shamim Rezaie, Simey Lameze
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:
              Fix Release Date:
              8/Nov/21

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 40 minutes
                40m