Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-80104

Create an admin tool to migrate bad base64 inline data into file api




      Due to a bug in atto under a limited and unclear set of circumstances which are hard to reproduce, certain images and videos would be added as base64 encoded inline data in the text and not converted to proper mdl_file references to pluginfile.php

      Atto is going in favor of Tiny so the bug itself is not going to be fixed but we still need a way to clean up the base64 data. These could be in the database in any table which makes this hard to solve in the general case. Querying the DB to find the dodgy data is also potentially very slow.

      So proposing to make an admin tool, either in core or standalone:

      1. It analyses all of the tables and columns similar to how the xmldb editor does and identifies all the potential fields which could contain editor text
      2. Click a button and it searches a specific table / column for any data which looks like base64. This is probably best done as an adhoc task one per column. When it finds an entry that looks bad it stashes that somewhere
      3. An admin report showing all the found issues so far, how many there area, and what mimetype they have
      4. Make an action which given a table, column name, and a file area it pulls the content out and puts it back into the file api and edits the content. This should be optionally limited to specific mimetypes
      5. There should also be an action to just remove the data.
      6. The core challenge is that fach table and column will map to a different pluginfile.php and I do not believe there is a proper api which can tell us the mapping. For some common file areas we can hard code this into the tool. For the long tail of different columns we'd want to be able to specify this on the fly if and when we need to fix the data.
      7. For each set of table and column as well as being able to map to a pluginfile url there should also be a mapping to a moodle url where this text area can be manually edited. Such as into a workshop submission, or into an assignment etc. Not all mappings need to be provided just a good subset to get the pattern right.
      8. To aid in the step above the tool could also query a few random good records in the same table and column looking for existing pluginfile references. 

      All of this is strongly suggesting the solution is a standalone tool used once off per site and then uninstalled.



      Original tracker details:

      I came across a slow page today which was mod/workshop/assessment.php?asid=xxxx

      It is fairly variable but can take up to 30 seconds to render this page.

      After profiling it there was a clear bottleneck in the html purifier code which was called by clean_text

      I can't see the root cause easily but this feels like clean_text is calling ord() on 128 million characters which feels like it is accidentally trying to clean a binary file as part of the submission or something equally weird. I can't paste the text here of the submission but its only ~1 page of html and certain not 128 million chars.

      The submission was a ~100 meg jpeg which doesn't exactly align but smells similar enough.

      Ah! Yes the submission was cleaning the content which had this:

      <p dir="ltr" style="text-align: left;"><img src="data:video/mp4;base64,AAAAHGZ0eXBtcDQyAAAAAWlzb21tcDQxbXA0MgAB5k5tb292AAAA

      So this is a drag and drop bug by the looks of it




        Issue Links



              tuekorsgaard Tue Korsgaard
              brendanheywood Brendan Heywood
              1 Vote for this issue
              20 Start watching this issue




                  Error rendering 'clockify-timesheets-time-tracking-reports:timer-sidebar'. Please contact your Jira administrators.