Uploaded image for project: 'Moodle'
  1. Moodle
  2. MDL-16483

Strategy for generating and cleaning data for unit tests


    • Difficulty:
    • Affected Branches:
    • Fixed Branches:


      This needs an urgent consensus and decision.

      We have all the tools we need to write effective unit tests for our new code. However, many parts of the code require actual database records or physical files in order to be tested properly. We need to decide which strategy will be used for cleaning up generated data, among the following:

      1. Use an alternate prefix for Database tables. This has the advantage of securing the "real" tables against actions by the unit tests. The main problem is that it is complex to keep these tables in sync with the "real" tables schema-wise, and it would be too slow to rebuild them entirely between each test
      2. Use the real tables but restrict the use of data generation and cleaning for sites where the $CFG->developmentmode variable is set in config.php (assuming that only one user will use the site). This is much simpler than 1., but can only be used on a development site, preferably freshly installed.

      In addition to deciding between real and alternate db prefix, we need to decide how the database will be "cleaned up" between each test.

      1. Complete rebuild using a dump sql file previously generated on demand. This is not really feasible because it would be horrendously slow
      2. Specifically hard-code a list of tables to be cleaned up after or before each test, some having special conditions if some records must be kept (like the users table). This is fast, but complex to set up and not very reliable. It is also difficult to maintain, as the db schema changes throughout the dev process.
      3. Before each generation of data, save in memory the highest PK number of every non-empty table (also record tables that are empty), and use this data during clean up to truncate these tables conditionally. This is much more reliable than 2., but probably slower. It also precludes any more than one concurrent user on the site, because of the risks of parallel db activity.

      Another option is available, which is more radical since it involves completely mocking the $DB object, and controlling all DB activity starting in unit tests. This puts a huge part of responsibility on the unit tests, since they have to set up return values for every get_* DB call their tested code is likely to produce. The process could be optimised using clever object orientation and abstraction, but it is a huge job nonetheless. It can however be done, and may be the strategy of choice for certain unit tests. Text or XML fixture data could easily be used in combination with this.

      This may all seem like overkill or a waste of time in this very busy time of 2.0 development, but I think we have a great opportunity to strengthen the long-term integrity of our code base, and to reduce the risks of insidious bugs causing problems in the future.

      Please comment and vote for your solution, as well as propose your own if you have any.

        Gliffy Diagrams


            Issue Links



                • Votes:
                  1 Vote for this issue
                  6 Start watching this issue


                  • Created:
                    Fix Release Date: