Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-16769

Search for unused strings across Moodle code

    Details

    • Type: Task
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.0
    • Fix Version/s: DEV backlog
    • Component/s: Language
    • Labels:
      None
    • Affected Branches:
      MOODLE_20_STABLE

      Description

      Make and independent script, to be executed in moodle base dir able to:

      • extract all tokens from a given lang (en_utf8 mainly).
      • search all moodle codebase (and contrib!!!) for any occurrence of those tokens between single or double quotes.
      • print summary information (total tokens in lang, number of unused tokens...)
      • print unused tokens details.

      This will help to get some real numbers about MDL-15252.

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

              Hide
              mudrd8mz David Mudrák added a comment -

              At the moment, I've got a script that fetches all get_string() and print_string() calls from .php, .html and .htm files. It recognizes the string identifier and the module. There are some more complicated callers that we will have to deal manually with - eg:

              get_string('error'.$this->response['faultCode'], 'mnet')
              get_string("format$courseformat","format_$courseformat")
              get_string('modulename', $this->cm->modname)

              I will try to mark up such places. Hopefully some alpha results soon.

              Show
              mudrd8mz David Mudrák added a comment - At the moment, I've got a script that fetches all get_string() and print_string() calls from .php, .html and .htm files. It recognizes the string identifier and the module. There are some more complicated callers that we will have to deal manually with - eg: get_string('error'.$this->response ['faultCode'] , 'mnet') get_string("format$courseformat","format_$courseformat") get_string('modulename', $this->cm->modname) I will try to mark up such places. Hopefully some alpha results soon.
              Hide
              mudrd8mz David Mudrák added a comment -

              First estimations are here! I need to double check everything before I publish the langchecker.php script but it seems we've got something around 3387 strings that are not called directly. But this number also includes countries.php, currencies.php and timezones.php that we should ignore as they are used to populate <select> form fields.

              The problem mentioned above remains. And it influence results in a very bad way. Example: we have the caller
              get_string('modulename', $this->cm->modname)
              or
              get_string('filtername', $myname)

              My langchecker.php is unable to determine stringis and module from such caller and therefore it considers
              $string['filtername'] = "Filter name";
              as an unused string, which is wrong.

              Show
              mudrd8mz David Mudrák added a comment - First estimations are here! I need to double check everything before I publish the langchecker.php script but it seems we've got something around 3387 strings that are not called directly. But this number also includes countries.php, currencies.php and timezones.php that we should ignore as they are used to populate <select> form fields. The problem mentioned above remains. And it influence results in a very bad way. Example: we have the caller get_string('modulename', $this->cm->modname) or get_string('filtername', $myname) My langchecker.php is unable to determine stringis and module from such caller and therefore it considers $string ['filtername'] = "Filter name"; as an unused string, which is wrong.
              Hide
              mudrd8mz David Mudrák added a comment -

              The first preview of the results. Please comment the format of the output report.

              Show
              mudrd8mz David Mudrák added a comment - The first preview of the results. Please comment the format of the output report.
              Hide
              koen Koen Roggemans added a comment -

              Interesting document. It means that over 30% of the strings is there for backward compatability. Those should definitly go out the HEAD branch, so new language packs for 2.0 have a smaller workload to start off with.

              The second group of strings, the "could not recognise" section": sorry, my head is too small to understand why and what the importance/impact is of those.
              I notice "parentlanguage in there - that needs some special attention, since it is a setting, not a language string.Those settings should imho only exist in langconfig.php

              The thirth group are calls for a string, that don't have a string. I see some obvious ones, like parentlanguage, descep, etc. Are all the others missing strings in the en_utf8 language pack?

              Thanks a lot for your work David!

              Show
              koen Koen Roggemans added a comment - Interesting document. It means that over 30% of the strings is there for backward compatability. Those should definitly go out the HEAD branch, so new language packs for 2.0 have a smaller workload to start off with. The second group of strings, the "could not recognise" section": sorry, my head is too small to understand why and what the importance/impact is of those. I notice "parentlanguage in there - that needs some special attention, since it is a setting, not a language string.Those settings should imho only exist in langconfig.php The thirth group are calls for a string, that don't have a string. I see some obvious ones, like parentlanguage, descep, etc. Are all the others missing strings in the en_utf8 language pack? Thanks a lot for your work David!
              Hide
              danmarsden Dan Marsden added a comment -

              this is Great David! - I just went looking for a tool like this hoping that someone else had already done the work! thanks for sharing it!

              Show
              danmarsden Dan Marsden added a comment - this is Great David! - I just went looking for a tool like this hoping that someone else had already done the work! thanks for sharing it!
              Hide
              timhunt Tim Hunt added a comment -

              Good work. Is the source of the script available?

              David, I think that what you should do with strings like

              get_string('error'.$this->response['faultCode'], 'mnet')
              get_string("format$courseformat","format_$courseformat")
              get_string('modulename', $this->cm->modname)

              is, where possible, remember the name of the lang file, so in your report you can list the lang files that we know have extra used strings in them.

              Also, make a list of all these tricky get_string calls, and displays the with two lines of surrounding context (a bit like a diff) so we can review them.

              Then we can review them, and perhaps make the script a bit smarter, so that, for example, it can correctly handle cases like get_string('modulename', $this->cm->modname).

              Show
              timhunt Tim Hunt added a comment - Good work. Is the source of the script available? David, I think that what you should do with strings like get_string('error'.$this->response ['faultCode'] , 'mnet') get_string("format$courseformat","format_$courseformat") get_string('modulename', $this->cm->modname) is, where possible, remember the name of the lang file, so in your report you can list the lang files that we know have extra used strings in them. Also, make a list of all these tricky get_string calls, and displays the with two lines of surrounding context (a bit like a diff) so we can review them. Then we can review them, and perhaps make the script a bit smarter, so that, for example, it can correctly handle cases like get_string('modulename', $this->cm->modname).
              Hide
              dougiamas Martin Dougiamas added a comment -

              What's the status of this? It would speed up translation for all translators.

              Show
              dougiamas Martin Dougiamas added a comment - What's the status of this? It would speed up translation for all translators.
              Hide
              mudrd8mz David Mudrák added a comment -

              I have partially solved this for now. All the string names that were not found in Moodle 2.0 source code (as a pattern) are now marked as "greylisted" in AMOS. We have ~2000 greylisted strings at the moment, some of them are false positives. We can go through the greylist in AMOS and if we are 100% sure the string is not used any more, it can be removed from CVS HEAD. Although I would suggest to wait after we branch off MOODLE_20_STABLE.

              Show
              mudrd8mz David Mudrák added a comment - I have partially solved this for now. All the string names that were not found in Moodle 2.0 source code (as a pattern) are now marked as "greylisted" in AMOS. We have ~2000 greylisted strings at the moment, some of them are false positives. We can go through the greylist in AMOS and if we are 100% sure the string is not used any more, it can be removed from CVS HEAD. Although I would suggest to wait after we branch off MOODLE_20_STABLE.
              Hide
              mudrd8mz David Mudrák added a comment -

              Waiting for forking MOODLE_20_STABLE

              Show
              mudrd8mz David Mudrák added a comment - Waiting for forking MOODLE_20_STABLE
              Hide
              mudrd8mz David Mudrák added a comment -

              Re-triage: Moving into a seprate task in DEV backlog.

              Show
              mudrd8mz David Mudrák added a comment - Re-triage: Moving into a seprate task in DEV backlog.
              Hide
              mudrd8mz David Mudrák added a comment -

              This issue was assigned to me, however I will not be able to work on this issue in the immediate future. In order to create a truer sense of the state of this issue and to allow other developers to have chance to become involved, I am removing myself as the assignee of this issue. For more information, see http://docs.moodle.org/dev/Changes_to_issue_assignment

              Show
              mudrd8mz David Mudrák added a comment - This issue was assigned to me, however I will not be able to work on this issue in the immediate future. In order to create a truer sense of the state of this issue and to allow other developers to have chance to become involved, I am removing myself as the assignee of this issue. For more information, see http://docs.moodle.org/dev/Changes_to_issue_assignment

                People

                • Votes:
                  4 Vote for this issue
                  Watchers:
                  6 Start watching this issue

                  Dates

                  • Created:
                    Updated: