Moodle
  1. Moodle
  2. MDL-16769

Search for unused strings across Moodle code

    Details

    • Type: Task Task
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.0
    • Fix Version/s: DEV backlog
    • Component/s: Language
    • Labels:
      None
    • Affected Branches:
      MOODLE_20_STABLE
    • Rank:
      348

      Description

      Make and independent script, to be executed in moodle base dir able to:

      • extract all tokens from a given lang (en_utf8 mainly).
      • search all moodle codebase (and contrib!!!) for any occurrence of those tokens between single or double quotes.
      • print summary information (total tokens in lang, number of unused tokens...)
      • print unused tokens details.

      This will help to get some real numbers about MDL-15252.

        Issue Links

          Activity

          Hide
          David Mudrak added a comment -

          At the moment, I've got a script that fetches all get_string() and print_string() calls from .php, .html and .htm files. It recognizes the string identifier and the module. There are some more complicated callers that we will have to deal manually with - eg:

          get_string('error'.$this->response['faultCode'], 'mnet')
          get_string("format$courseformat","format_$courseformat")
          get_string('modulename', $this->cm->modname)

          I will try to mark up such places. Hopefully some alpha results soon.

          Show
          David Mudrak added a comment - At the moment, I've got a script that fetches all get_string() and print_string() calls from .php, .html and .htm files. It recognizes the string identifier and the module. There are some more complicated callers that we will have to deal manually with - eg: get_string('error'.$this->response ['faultCode'] , 'mnet') get_string("format$courseformat","format_$courseformat") get_string('modulename', $this->cm->modname) I will try to mark up such places. Hopefully some alpha results soon.
          Hide
          David Mudrak added a comment -

          First estimations are here! I need to double check everything before I publish the langchecker.php script but it seems we've got something around 3387 strings that are not called directly. But this number also includes countries.php, currencies.php and timezones.php that we should ignore as they are used to populate <select> form fields.

          The problem mentioned above remains. And it influence results in a very bad way. Example: we have the caller
          get_string('modulename', $this->cm->modname)
          or
          get_string('filtername', $myname)

          My langchecker.php is unable to determine stringis and module from such caller and therefore it considers
          $string['filtername'] = "Filter name";
          as an unused string, which is wrong.

          Show
          David Mudrak added a comment - First estimations are here! I need to double check everything before I publish the langchecker.php script but it seems we've got something around 3387 strings that are not called directly. But this number also includes countries.php, currencies.php and timezones.php that we should ignore as they are used to populate <select> form fields. The problem mentioned above remains. And it influence results in a very bad way. Example: we have the caller get_string('modulename', $this->cm->modname) or get_string('filtername', $myname) My langchecker.php is unable to determine stringis and module from such caller and therefore it considers $string ['filtername'] = "Filter name"; as an unused string, which is wrong.
          Hide
          David Mudrak added a comment -

          The first preview of the results. Please comment the format of the output report.

          Show
          David Mudrak added a comment - The first preview of the results. Please comment the format of the output report.
          Hide
          Koen Roggemans added a comment -

          Interesting document. It means that over 30% of the strings is there for backward compatability. Those should definitly go out the HEAD branch, so new language packs for 2.0 have a smaller workload to start off with.

          The second group of strings, the "could not recognise" section": sorry, my head is too small to understand why and what the importance/impact is of those.
          I notice "parentlanguage in there - that needs some special attention, since it is a setting, not a language string.Those settings should imho only exist in langconfig.php

          The thirth group are calls for a string, that don't have a string. I see some obvious ones, like parentlanguage, descep, etc. Are all the others missing strings in the en_utf8 language pack?

          Thanks a lot for your work David!

          Show
          Koen Roggemans added a comment - Interesting document. It means that over 30% of the strings is there for backward compatability. Those should definitly go out the HEAD branch, so new language packs for 2.0 have a smaller workload to start off with. The second group of strings, the "could not recognise" section": sorry, my head is too small to understand why and what the importance/impact is of those. I notice "parentlanguage in there - that needs some special attention, since it is a setting, not a language string.Those settings should imho only exist in langconfig.php The thirth group are calls for a string, that don't have a string. I see some obvious ones, like parentlanguage, descep, etc. Are all the others missing strings in the en_utf8 language pack? Thanks a lot for your work David!
          Hide
          Dan Marsden added a comment -

          this is Great David! - I just went looking for a tool like this hoping that someone else had already done the work! thanks for sharing it!

          Show
          Dan Marsden added a comment - this is Great David! - I just went looking for a tool like this hoping that someone else had already done the work! thanks for sharing it!
          Hide
          Tim Hunt added a comment -

          Good work. Is the source of the script available?

          David, I think that what you should do with strings like

          get_string('error'.$this->response['faultCode'], 'mnet')
          get_string("format$courseformat","format_$courseformat")
          get_string('modulename', $this->cm->modname)

          is, where possible, remember the name of the lang file, so in your report you can list the lang files that we know have extra used strings in them.

          Also, make a list of all these tricky get_string calls, and displays the with two lines of surrounding context (a bit like a diff) so we can review them.

          Then we can review them, and perhaps make the script a bit smarter, so that, for example, it can correctly handle cases like get_string('modulename', $this->cm->modname).

          Show
          Tim Hunt added a comment - Good work. Is the source of the script available? David, I think that what you should do with strings like get_string('error'.$this->response ['faultCode'] , 'mnet') get_string("format$courseformat","format_$courseformat") get_string('modulename', $this->cm->modname) is, where possible, remember the name of the lang file, so in your report you can list the lang files that we know have extra used strings in them. Also, make a list of all these tricky get_string calls, and displays the with two lines of surrounding context (a bit like a diff) so we can review them. Then we can review them, and perhaps make the script a bit smarter, so that, for example, it can correctly handle cases like get_string('modulename', $this->cm->modname).
          Hide
          Martin Dougiamas added a comment -

          What's the status of this? It would speed up translation for all translators.

          Show
          Martin Dougiamas added a comment - What's the status of this? It would speed up translation for all translators.
          Hide
          David Mudrak added a comment -

          I have partially solved this for now. All the string names that were not found in Moodle 2.0 source code (as a pattern) are now marked as "greylisted" in AMOS. We have ~2000 greylisted strings at the moment, some of them are false positives. We can go through the greylist in AMOS and if we are 100% sure the string is not used any more, it can be removed from CVS HEAD. Although I would suggest to wait after we branch off MOODLE_20_STABLE.

          Show
          David Mudrak added a comment - I have partially solved this for now. All the string names that were not found in Moodle 2.0 source code (as a pattern) are now marked as "greylisted" in AMOS. We have ~2000 greylisted strings at the moment, some of them are false positives. We can go through the greylist in AMOS and if we are 100% sure the string is not used any more, it can be removed from CVS HEAD. Although I would suggest to wait after we branch off MOODLE_20_STABLE.
          Hide
          David Mudrak added a comment -

          Waiting for forking MOODLE_20_STABLE

          Show
          David Mudrak added a comment - Waiting for forking MOODLE_20_STABLE
          Hide
          David Mudrak added a comment -

          Re-triage: Moving into a seprate task in DEV backlog.

          Show
          David Mudrak added a comment - Re-triage: Moving into a seprate task in DEV backlog.
          Hide
          David Mudrak added a comment -

          This issue was assigned to me, however I will not be able to work on this issue in the immediate future. In order to create a truer sense of the state of this issue and to allow other developers to have chance to become involved, I am removing myself as the assignee of this issue. For more information, see http://docs.moodle.org/dev/Changes_to_issue_assignment

          Show
          David Mudrak added a comment - This issue was assigned to me, however I will not be able to work on this issue in the immediate future. In order to create a truer sense of the state of this issue and to allow other developers to have chance to become involved, I am removing myself as the assignee of this issue. For more information, see http://docs.moodle.org/dev/Changes_to_issue_assignment

            People

            • Votes:
              5 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated: