Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.1, 2.2
    • Fix Version/s: DEV backlog
    • Component/s: Files API
    • Labels:
    • Testing Instructions:
      Hide

      Unit test is included. For a simple custom test, open any page with WYSIWYG editor and click on 'Add image' or 'Add media', in the first case only image file should be listed in the repositories, in the second only audio and video ones.

      Show
      Unit test is included. For a simple custom test, open any page with WYSIWYG editor and click on 'Add image' or 'Add media', in the first case only image file should be listed in the repositories, in the second only audio and video ones.
    • Affected Branches:
      MOODLE_21_STABLE, MOODLE_22_STABLE
    • Pull Master Branch:
      wip-MDL-32247-files23
    • Rank:
      39029

      Description

      I have discovered recently that there are two separate mimetype-related features in Moodle. One is several mimetype functions that use get_mimetypes_array() as the source, another one is filetype_parser class that is using file_types.mm mindmap file as a data source.

      In my opinion, having two data sources may cause inconsistencies and dissimilarities in mime data we work with, thus I suggest to remove one of them (likely the mindmap one) and design a function for deriving the list of file extensions from existing mimetypes array (i.e. replicate filetype_parser functionality and refactor few bits where it is used).

      Andrew provided a quick proof of concept for this:

      function get_mimetype_matches($match) {
          if (!is_array($match)) {
              $match = array($match);
          }    
          $mimetypes = get_mimetypes_array();
          $results = array();
          foreach($mimetypes as $key => $data) {
              if (in_array($data['type'], $match)) {
                  $results[$key] = $data;
              }    
          }    
          return $results;
      }
      

      Would be good to know people's opinion about this change.

        Issue Links

          Activity

          Hide
          Eloy Lafuente (stronk7) added a comment -

          +1 for unique source, no matter which one it is.

          Show
          Eloy Lafuente (stronk7) added a comment - +1 for unique source, no matter which one it is.
          Hide
          Sam Marshall added a comment -

          I don't really understand the proposal but also support having only a single source.

          Oh wait, now I understand it - okay that is supposed to return a list of file extensions corresponding to a mime type? If this is implemented, please rename it to something like get_mimetype_extensions

          Show
          Sam Marshall added a comment - I don't really understand the proposal but also support having only a single source. Oh wait, now I understand it - okay that is supposed to return a list of file extensions corresponding to a mime type? If this is implemented, please rename it to something like get_mimetype_extensions
          Hide
          Dan Poltawski added a comment -

          I think that its fairly evil that we are parsing a mind map file for this and agree we should be rid of it (since when was mind map file a standard data-storage format?). We are parsing that file 221 times on the multi-question quiz page

          However in your proposal its not clear that you have considered the difference between the two pieces of functionality. One gets a mimetype based on file extension, the other groups different mimetypes into 'meta types' like web_video.

          Show
          Dan Poltawski added a comment - I think that its fairly evil that we are parsing a mind map file for this and agree we should be rid of it (since when was mind map file a standard data-storage format?). We are parsing that file 221 times on the multi-question quiz page However in your proposal its not clear that you have considered the difference between the two pieces of functionality. One gets a mimetype based on file extension, the other groups different mimetypes into 'meta types' like web_video.
          Hide
          Ruslan Kabalin added a comment -

          Dan Poltawski: that is true, detailed look at mindmap file content revealed that it is not dealing with the standard mime types, just providing a list of extensions for some "custom" type. In which case it is even easier that I thought. I can design an array that would replicate the mm file content, e.g.

          $typesarray = array(
          'image' => array('jpg', 'png', 'gif'),
          'non_web_image' => array('bmp', 'svg', 'psd'),
          ...
          )
          

          and a small function that will return an array of extensions based on provided array of types. This will replace the whole filetype_parser class along with mindmap file.

          Show
          Ruslan Kabalin added a comment - Dan Poltawski: that is true, detailed look at mindmap file content revealed that it is not dealing with the standard mime types, just providing a list of extensions for some "custom" type. In which case it is even easier that I thought. I can design an array that would replicate the mm file content, e.g. $typesarray = array( 'image' => array('jpg', 'png', 'gif'), 'non_web_image' => array('bmp', 'svg', 'psd'), ... ) and a small function that will return an array of extensions based on provided array of types. This will replace the whole filetype_parser class along with mindmap file.
          Hide
          Dan Poltawski added a comment -

          +1

          Show
          Dan Poltawski added a comment - +1
          Hide
          Ruslan Kabalin added a comment -

          Added the fix along with the unit-test.

          The class remains the same as it can be used by external plugins, in addition it separates related methods from the rest of the code making them structured and easier to follow.

          Show
          Ruslan Kabalin added a comment - Added the fix along with the unit-test. The class remains the same as it can be used by external plugins, in addition it separates related methods from the rest of the code making them structured and easier to follow.
          Hide
          Ruslan Kabalin added a comment -

          Ready for revision.

          Show
          Ruslan Kabalin added a comment - Ready for revision.
          Hide
          Ruslan Kabalin added a comment -

          Forgot to mention that profiling demonstrated that filetype_parser->get_extensions cost changed from 23 to 2ms

          Show
          Ruslan Kabalin added a comment - Forgot to mention that profiling demonstrated that filetype_parser->get_extensions cost changed from 23 to 2ms
          Hide
          Dan Poltawski added a comment -

          Hi Ruslan

          1. I do not like the constructor and get_filetypes_array() at all with the passing by reference and static. Why not instead put that typesarray into the class definition and get_filetypes_array() can just return that?
          2. I am not convinced about the implementation of this parser, it seems the logic for browsing it is fairly complicated for what perhaps is not necessary the most complicated problem. I think we need to consider if this 'tree structure' is really correct and necessary. Maybe we introduce duplication rather than the perfect data structure.
          3. The unit tests seem fairly brittle - they'll break as soon as someone adds an extension and doesn't change the unit test?
          Show
          Dan Poltawski added a comment - Hi Ruslan I do not like the constructor and get_filetypes_array() at all with the passing by reference and static. Why not instead put that typesarray into the class definition and get_filetypes_array() can just return that? I am not convinced about the implementation of this parser, it seems the logic for browsing it is fairly complicated for what perhaps is not necessary the most complicated problem. I think we need to consider if this 'tree structure' is really correct and necessary. Maybe we introduce duplication rather than the perfect data structure. The unit tests seem fairly brittle - they'll break as soon as someone adds an extension and doesn't change the unit test?
          Hide
          Ruslan Kabalin added a comment -

          Hi Dan,

          1. I do not like the constructor and get_filetypes_array() at all with the passing by reference and static. Why not instead put that typesarray into the class definition and get_filetypes_array() can just return that?

          We create several instances of filetype_parser (24 for instance on file resource adding page). Using static and reference we take advantage of the "static" nature of array (we do not need changing it) and save up memory consumption. Reference might not have a significant impact, and can be removed in fact, but static should be there. You may easily validate the importance of static there by adding:

          diff --git a/repository/lib.php b/repository/lib.php
          index b3ee06a..cecb026 100644
          --- a/repository/lib.php
          +++ b/repository/lib.php
          @@ -1447,7 +1447,10 @@ abstract class repository {
                */
               final public function get_meta() {
                   global $CFG, $OUTPUT;
          +        $before = memory_get_usage();
                   $ft = new filetype_parser;
          +        $after = memory_get_usage();
          +        print_object($after - $before);
                   $meta = new stdClass();
                   $meta->id   = $this->id;
                   $meta->name = $this->get_name();
          

          opening the page once, then removing static keyword from get_filetypes_array and running it again. The class size changes from 2144 (with static) to 27168 (no static).

          In fact, moving array to class property saves even more memory, making class size equal to 600 bytes! In which case get_filetypes_array() is not required at all (it was used for defining $typesarray property).

          2. I am not convinced about the implementation of this parser, it seems the logic for browsing it is fairly complicated for what perhaps is not necessary the most complicated problem. I think we need to consider if this 'tree structure' is really correct and necessary. Maybe we introduce duplication rather than the perfect data structure.

          Well, it is not that complicated to be honest, it took a couple of minutes to get the existing logic of these methods pair. With my implementation things are even simpler - just a recursion through array, once the key has matched, the task is passed over to select_values to collect all extensions and add them to result.

          3. The unit tests seem fairly brittle - they'll break as soon as someone adds an extension and doesn't change the unit test?

          Yes, it will break. Good point actually, I have modified them slightly to ensure the general extensions are there (and not there when not required).

          Ha-ha, we should rename "Request peer review" button to "Fight through review"

          Show
          Ruslan Kabalin added a comment - Hi Dan, 1. I do not like the constructor and get_filetypes_array() at all with the passing by reference and static. Why not instead put that typesarray into the class definition and get_filetypes_array() can just return that? We create several instances of filetype_parser (24 for instance on file resource adding page). Using static and reference we take advantage of the "static" nature of array (we do not need changing it) and save up memory consumption. Reference might not have a significant impact, and can be removed in fact, but static should be there. You may easily validate the importance of static there by adding: diff --git a/repository/lib.php b/repository/lib.php index b3ee06a..cecb026 100644 --- a/repository/lib.php +++ b/repository/lib.php @@ -1447,7 +1447,10 @@ abstract class repository { */ final public function get_meta() { global $CFG, $OUTPUT; + $before = memory_get_usage(); $ft = new filetype_parser; + $after = memory_get_usage(); + print_object($after - $before); $meta = new stdClass(); $meta->id = $ this ->id; $meta->name = $ this ->get_name(); opening the page once, then removing static keyword from get_filetypes_array and running it again. The class size changes from 2144 (with static) to 27168 (no static). In fact, moving array to class property saves even more memory, making class size equal to 600 bytes! In which case get_filetypes_array() is not required at all (it was used for defining $typesarray property). 2. I am not convinced about the implementation of this parser, it seems the logic for browsing it is fairly complicated for what perhaps is not necessary the most complicated problem. I think we need to consider if this 'tree structure' is really correct and necessary. Maybe we introduce duplication rather than the perfect data structure. Well, it is not that complicated to be honest, it took a couple of minutes to get the existing logic of these methods pair. With my implementation things are even simpler - just a recursion through array, once the key has matched, the task is passed over to select_values to collect all extensions and add them to result. 3. The unit tests seem fairly brittle - they'll break as soon as someone adds an extension and doesn't change the unit test? Yes, it will break. Good point actually, I have modified them slightly to ensure the general extensions are there (and not there when not required). Ha-ha, we should rename "Request peer review" button to "Fight through review"
          Hide
          Dan Poltawski added a comment -

          [Note i'm not very articulate on language constructs but I will do my best with this..]

          Using static and reference we take advantage of the "static" nature of array (we do not need changing it) and save up memory consumption.

          Yes I understand when you are 'storing that value' within within a method, but since as far as I can tell there is no point where you are accessing this property without the object being instantiated you can surely utilise the OO nature of the class and use it like a normal property and not declare it static?

          Note in case it was unclear there is sort of a difference between a static variable in function scope and as a class property. To quote the php manual on variable scoping:

          A static variable exists only in a local function scope, but it does not lose its value when program execution leaves this scope.

          And on a static property:

          Declaring class properties or methods as static makes them accessible without needing an instantiation of the class. A property declared as static can not be accessed with an instantiated class object (though a static method can).

          As far as I can tell, your class does not have any static methods which need access to that property - it needs to be instantiated whenever its used so in fact there is no need for static and your methods which currently recieve this as an argument can act upon the property without needing to be passed as an argument.

          Show
          Dan Poltawski added a comment - [Note i'm not very articulate on language constructs but I will do my best with this..] Using static and reference we take advantage of the "static" nature of array (we do not need changing it) and save up memory consumption. Yes I understand when you are 'storing that value' within within a method, but since as far as I can tell there is no point where you are accessing this property without the object being instantiated you can surely utilise the OO nature of the class and use it like a normal property and not declare it static? Note in case it was unclear there is sort of a difference between a static variable in function scope and as a class property. To quote the php manual on variable scoping: A static variable exists only in a local function scope, but it does not lose its value when program execution leaves this scope. And on a static property: Declaring class properties or methods as static makes them accessible without needing an instantiation of the class. A property declared as static can not be accessed with an instantiated class object (though a static method can). As far as I can tell, your class does not have any static methods which need access to that property - it needs to be instantiated whenever its used so in fact there is no need for static and your methods which currently recieve this as an argument can act upon the property without needing to be passed as an argument.
          Hide
          Dan Poltawski added a comment -

          Ah, its sort of just dawned on me a little bit of what you were getting at with multiple calls - singleton seems to be common pattern to achieve this recently.

          Show
          Dan Poltawski added a comment - Ah, its sort of just dawned on me a little bit of what you were getting at with multiple calls - singleton seems to be common pattern to achieve this recently.
          Hide
          Ruslan Kabalin added a comment - - edited

          As far as I can tell, your class does not have any static methods which need access to that property - it needs to be instantiated whenever its used so in fact there is no need for static and your methods which currently recieve this as an argument can act upon the property without needing to be passed as an argument.

          Dan, there is obviously no need for 'static' from functional requirements point, that is obvious, but the feature that 'it does not lose its value when program execution leaves this scope' allows us to improve performance (see profiling memory consumption figures), e.g. in our case all 24 instances of filetype_parser are sharing exactly the same static property declared within the class (memory is not being allocated for each instance to keep a copy of self::$typesarray, it is allocated just once), if we remove static, more memory will be required for the "copies" of $typesarray property in each instance - you can easily test it and see the difference.

          So, it is nothing about being able to access property without instantiation, I am just using a feature of 'static' being really static across the object instances.

          Show
          Ruslan Kabalin added a comment - - edited As far as I can tell, your class does not have any static methods which need access to that property - it needs to be instantiated whenever its used so in fact there is no need for static and your methods which currently recieve this as an argument can act upon the property without needing to be passed as an argument. Dan, there is obviously no need for 'static' from functional requirements point, that is obvious, but the feature that 'it does not lose its value when program execution leaves this scope' allows us to improve performance (see profiling memory consumption figures), e.g. in our case all 24 instances of filetype_parser are sharing exactly the same static property declared within the class (memory is not being allocated for each instance to keep a copy of self::$typesarray, it is allocated just once), if we remove static, more memory will be required for the "copies" of $typesarray property in each instance - you can easily test it and see the difference. So, it is nothing about being able to access property without instantiation, I am just using a feature of 'static' being really static across the object instances.
          Hide
          Dan Poltawski added a comment -

          Yep, understood - this is what I meant in second comment about singleton pattern. Apologies if I sounded patronising, the point was being made because I come across a lot of people who do not understand the distinction.

          Anyway letting someone else review this

          Show
          Dan Poltawski added a comment - Yep, understood - this is what I meant in second comment about singleton pattern. Apologies if I sounded patronising, the point was being made because I come across a lot of people who do not understand the distinction. Anyway letting someone else review this
          Hide
          Ruslan Kabalin added a comment -

          I see Dan, no problem, it is me not being clear about purpose of using static (may be I will add a comment). I am fine to remove it in fact, as this does not cause a significant performance impact. Let us see the opinion of someone else.

          Show
          Ruslan Kabalin added a comment - I see Dan, no problem, it is me not being clear about purpose of using static (may be I will add a comment). I am fine to remove it in fact, as this does not cause a significant performance impact. Let us see the opinion of someone else.
          Hide
          Dan Poltawski added a comment -

          Was just talking to ruslan and reminded of

          lib/portfolio/formats.php - mimetype store there

          and that utilises mimeinfo_from_icon! which also sort of achieves what we are talking about here (based on icon).

          So for images in portfolio we do:
          return mimeinfo_from_icon('type', 'image', true);

          TBH on reflection this I think we should perhaps think about just add what extra data is the original mimetypes array in a similar way.

          Show
          Dan Poltawski added a comment - Was just talking to ruslan and reminded of lib/portfolio/formats.php - mimetype store there and that utilises mimeinfo_from_icon! which also sort of achieves what we are talking about here (based on icon). So for images in portfolio we do: return mimeinfo_from_icon('type', 'image', true); TBH on reflection this I think we should perhaps think about just add what extra data is the original mimetypes array in a similar way.
          Hide
          Dan Poltawski added a comment -

          Hi Ruslan,

          I'm moving this out of peer review because based on our discussion last night I think you were doing more thinking on this soliution?

          Show
          Dan Poltawski added a comment - Hi Ruslan, I'm moving this out of peer review because based on our discussion last night I think you were doing more thinking on this soliution?
          Hide
          Ruslan Kabalin added a comment -

          Affirmative

          Show
          Ruslan Kabalin added a comment - Affirmative
          Hide
          Dan Poltawski added a comment -

          Was just fixing MDL-32134 and thought I would put a a comment here that there are a number of other issues that would be good to be tackled relating to this - like the use of language strings in mime-types.

          Thought on this.. it doesn't make sense to have individual language strings for things like PNG/JPEG - I might be nieve but these file formats are not translatable - the 'meta type' of image is of course. So perhaps we could be more intelligent here relating to groupings.

          Show
          Dan Poltawski added a comment - Was just fixing MDL-32134 and thought I would put a a comment here that there are a number of other issues that would be good to be tackled relating to this - like the use of language strings in mime-types. Thought on this.. it doesn't make sense to have individual language strings for things like PNG/JPEG - I might be nieve but these file formats are not translatable - the 'meta type' of image is of course. So perhaps we could be more intelligent here relating to groupings.
          Show
          Dan Poltawski added a comment - In fact, please consider triaging these issues: http://tracker.moodle.org/secure/IssueNavigator.jspa?reset=true&jqlQuery=%28summary+~+mimetype+OR+description+~+mimetype+OR+comment+~+mimetype%29+and+%28status+%21%3D+closed+and+status+%21%3D+rejected%29
          Hide
          Martin Dougiamas added a comment -

          I wanted to add in here, because I don't think it's been said, that the mimetype file was added when we wanted the ability to say (when calling the filepicker) "only show me files that are images" and thus:

          • remove repositories that don't show images and
          • remove all non-image files from display

          If the hierarchical file is removed then this features needs to be preserved.

          The Mimetypes array was designed originally ONLY to map mimetypes -> icon names, and that is all. I hear it's been vastly abused since then, though.

          Show
          Martin Dougiamas added a comment - I wanted to add in here, because I don't think it's been said, that the mimetype file was added when we wanted the ability to say (when calling the filepicker) "only show me files that are images" and thus: remove repositories that don't show images and remove all non-image files from display If the hierarchical file is removed then this features needs to be preserved. The Mimetypes array was designed originally ONLY to map mimetypes -> icon names, and that is all. I hear it's been vastly abused since then, though.
          Hide
          Marina Glancy added a comment -

          I'm working currently on this

          Show
          Marina Glancy added a comment - I'm working currently on this
          Hide
          Marina Glancy added a comment -

          what do you think if we implement groups inside function https://github.com/moodle/moodle/blob/master/lib/filelib.php#L1221
          instead of another array in the class filetype_parser

          for example

          'bmp'  => array ('type'=>'image/bmp', 'icon'=>'image', 'groups'=>array('image')),
          'gif'  => array ('type'=>'image/gif', 'icon'=>'image', 'groups'=>array('image', 'web_image')),
          
          Show
          Marina Glancy added a comment - what do you think if we implement groups inside function https://github.com/moodle/moodle/blob/master/lib/filelib.php#L1221 instead of another array in the class filetype_parser for example 'bmp' => array ('type'=>'image/bmp', 'icon'=>'image', 'groups'=>array('image')), 'gif' => array ('type'=>'image/gif', 'icon'=>'image', 'groups'=>array('image', 'web_image')),
          Hide
          Ruslan Kabalin added a comment -

          what do you think if we implement groups inside function https://github.com/moodle/moodle/blob/master/lib/filelib.php#L1221

          The idea is nice, but how you will quickly get a list of extension for particular list of types? You will likely need to re-map it into array('image' => array('bmp', 'gif')) and then pick ones requested or parse through it upon type request each time, which is not ideal.

          Show
          Ruslan Kabalin added a comment - what do you think if we implement groups inside function https://github.com/moodle/moodle/blob/master/lib/filelib.php#L1221 The idea is nice, but how you will quickly get a list of extension for particular list of types? You will likely need to re-map it into array('image' => array('bmp', 'gif')) and then pick ones requested or parse through it upon type request each time, which is not ideal.
          Hide
          Marina Glancy added a comment -

          linking to MDL-32831

          Show
          Marina Glancy added a comment - linking to MDL-32831
          Hide
          Marina Glancy added a comment -

          This has been fixed in 2.3. Class filetype_parser does not exist any more, all groupping is stored in get_mimetypes_array()

          Show
          Marina Glancy added a comment - This has been fixed in 2.3. Class filetype_parser does not exist any more, all groupping is stored in get_mimetypes_array()

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: