Layout Engine Visual Tests (reftest) L. David Baron , Mozilla Corporation July 19, 2006 This code is designed to run tests of Mozilla's layout engine. These tests consist of an HTML (or other format) file along with a reference in the same format. The tests are run based on a manifest file, and for each test, PASS or FAIL is reported, and UNEXPECTED is reported if the result (PASS or FAIL) was not the expected result noted in the manifest. Images of the display of both tests are captured, and most test types involve comparing these images (e.g., test types == or !=) to determine whether the test passed. The captures of the tests are taken in a viewport that is 800 pixels wide and 1000 pixels tall, so any content outside that area will be ignored (except for any scrollbars that are displayed). Ideally, however, tests should be written so that they fit within 600x600, since we may in the future want to switch to 600x600 to match http://lists.w3.org/Archives/Public/www-style/2012Sep/0562.html . Why this way? ============= Writing HTML tests where the reference rendering is also in HTML is harder than simply writing bits of HTML that can be regression-tested by comparing the rendering of an older build to that of a newer build (perhaps using stored reference images from the older build). However, comparing across time has major disadvantages: * Comparisons across time either require two runs for every test, or they require stored reference images appropriate for the platform and configuration (often limiting testing to a very specific configuration). * Comparisons across time may fail due to expected changes, for example, changes in the default style sheet for HTML, changes in the appearance of form controls, or changes in default preferences like default font size or default colors. Using tests for which the pass criteria were explicitly chosen allows running tests at any time to see whether they still pass. Manifest Format =============== The test manifest format is a plain text file. A line starting with a "#" is a comment. Lines may be commented using whitespace followed by a "#" and the comment. Each non-blank line (after removal of comments) must be one of the following: 1. Inclusion of another manifest * include

is the same as listed below for a test item. As for test items, multiple failure types listed on the same line are combined by using the last matching failure type listed. However, the failure type on a manifest is combined with the failure type on the test (or on a nested manifest) with the rule that the last in the following list wins: fails, random, skip. (In other words, skip always wins, and random beats fails.) 2. A test item [ | ]* [] where a. (optional) is one of the following: fails The test passes if the images of the two renderings DO NOT meet the conditions specified in the . fails-if(condition) If the condition is met, the test passes if the images of the two renderings DO NOT meet the conditions of . If the condition is not met, the test passes if the conditions of are met. needs-focus The test fails or times out if the reftest window is not focused. random The results of the test are random and therefore not to be considered in the output. random-if(condition) The results of the test are random if a given condition is met. silentfail This test may fail silently, and if that happens it should count as if the test passed. This is useful for cases where silent failure is the intended behavior (for example, in an out of memory situation in JavaScript, we stop running the script silently and immediately, in hopes of reclaiming enough memory to keep the browser functioning). silentfail-if(condition) This test may fail silently if the condition is met. skip This test should not be run. This is useful when a test fails in a catastrophic way, such as crashing or hanging the browser. Using 'skip' is preferred to simply commenting out the test because we want to report the test failure at the end of the test run. skip-if(condition) If the condition is met, the test is not run. This is useful if, for example, the test crashes only on a particular platform (i.e. it allows us to get test coverage on the other platforms). slow The test may take a long time to run, so run it if slow tests are either enabled or not disabled (test manifest interpreters may choose whether or not to run such tests by default). slow-if(condition) If the condition is met, the test is treated as if 'slow' had been specified. This is useful for tests which are slow only on particular platforms (e.g. a test which exercised out-of-memory behavior might be fast on a 32-bit system but inordinately slow on a 64-bit system). fuzzy(maxDiff, diffCount) This allows a test to pass if the pixel value differences are <= maxDiff and the total number of different pixels is <= diffCount. It can also be used with '!=' to ensure that the difference is greater than maxDiff. fuzzy-if(condition, maxDiff, diffCount) If the condition is met, the test is treated as if 'fuzzy' had been specified. This is useful if there are differences on particular platforms. require-or(cond1&&cond2&&...,fallback) Require some particular setup be performed or environmental condition(s) made true (eg setting debug mode) before the test is run. If any condition is unknown, unimplemented, or fails, revert to the fallback failure-type. Example: require-or(debugMode,skip) asserts(count) Loading the test and reference is known to assert exactly count times. NOTE: An asserts() notation with a non-zero count or maxCount suppresses use of a cached canvas for the test with the annotation. However, if later occurrences of the same test are not annotated, they will use the cached canvas (potentially from the load that asserted). This allows repeated use of the same test or reference to be annotated correctly (which may be particularly useful when the uses are in different subdirectories that can be tested independently), but does not force them to be, nor does it force suppression of caching for a common reference when it is the test that asserts. asserts(minCount-maxCount) Loading the test and reference is known to assert between minCount and maxCount times, inclusive. NOTE: See above regarding canvas caching. asserts-if(condition,count) asserts-if(condition,minCount-maxCount) Same as above, but only if condition is true. Conditions are JavaScript expressions *without spaces* in them. They are evaluated in a sandbox in which a limited set of variables are defined. See the BuildConditionSandbox function in layout/tools/reftest.js for details. Examples of using conditions: fails-if(winWidget) == test reference asserts-if(cocoaWidget,2) load crashtest b. (optional) is a string of the form pref(,) test-pref(,) ref-pref(,) where is the name of a preference setting, as seen in about:config, and is the value to which this preference should be set. may be a boolean (true/false), an integer, or a quoted string *without spaces*, according to the type of the preference. The preference will be set to the specified value prior to rendering the test and/or reference canvases (pref() applies to both, test-pref() only to the test, and ref-pref() only to the reference), and will be restored afterwards so that following tests are not affected. Note that this feature is only useful for "live" preferences that take effect immediately, without requiring a browser restart. c. , if present, is one of the strings (sans quotes) "HTTP" or "HTTP(..)" or "HTTP(../..)" or "HTTP(../../..)", etc. , indicating that the test should be run over an HTTP server because it requires certain HTTP headers or a particular HTTP status. (Don't use this if your test doesn't require this functionality, because it unnecessarily slows down the test.) With "HTTP", HTTP tests have the restriction that any resource an HTTP test accesses must be accessed using a relative URL, and the test and the resource must be within the directory containing the reftest manifest that describes the test (or within a descendant directory). The variants "HTTP(..)", etc., can be used to relax this restriction by allowing resources in the parent directory, etc. To modify the HTTP status or headers of a resource named FOO, create a sibling file named FOO^headers^ with the following contents: [] * A line of the form "HTTP ###[ ]", where ### indicates the desired HTTP status and indicates a desired HTTP status description, if any. If this line is omitted, the default is "HTTP 200 OK". A line in standard HTTP header line format, i.e. "Field-Name: field-value". You may not repeat the use of a Field-Name and must coalesce such headers together, and each header must be specified on a single line, but otherwise the format exactly matches that from HTTP itself. HTTP tests may also incorporate SJS files. SJS files provide similar functionality to CGI scripts, in that the response they produce can be dependent on properties of the incoming request. Currently these properties are restricted to method type and headers, but eventually it should be possible to examine data in the body of the request as well when computing the generated response. An SJS file is a JavaScript file with a .sjs extension which defines a global |handleRequest| function (called every time that file is loaded during reftests) in this format: function handleRequest(request, response) { response.setStatusLine(request.httpVersion, 200, "OK"); // You *probably* want this, or else you'll get bitten if you run // reftest multiple times with the same profile. response.setHeader("Cache-Control", "no-cache"); response.write("any ASCII data you want"); var outputStream = response.bodyOutputStream; // ...anything else you want to do, synchronously... } For more details on exactly which functions and properties are available on request/response in handleRequest, see the nsIHttpRe(quest|sponse) definitions in . d. is one of the following: == The test passes if the images of the two renderings are the SAME. != The test passes if the images of the two renderings are DIFFERENT. load The test passes unconditionally if the page loads. url_ref must be omitted, and the test cannot be marked as fails or random. (Used to test for crashes, hangs, assertions, and leaks.) script The loaded page records the test's pass or failure status in a JavaScript data structure accessible through the following API. getTestCases() returns an array of test result objects representing the results of the tests performed by the page. Each test result object has two methods: testPassed() returns true if the test result object passed, otherwise it returns false. testDescription() returns a string describing the test result. url_ref must be omitted. The test may be marked as fails or random. (Used to test the JavaScript Engine.) e. is either a relative file path or an absolute URL for the test page f. is either a relative file path or an absolute URL for the reference page The only difference between and is that results of the test are reported using only. 3. Specification of a url prefix url-prefix will be prepended to relative and for all following test items in the manifest. will not be prepended to the relative path when including another manifest, e.g. include . will not be prepended to any or matching the pattern /^\w+:/. This will prevent the prefix from being applied to any absolute url containing a protocol such as data:, about:, or http:. While the typical use of url-prefix is expected to be as the first line of a manifest, it is legal to use it anywhere in a manifest. Subsequent uses of url-prefix overwrite any existing values. 4. Specification of default preferences default-preferences * where is defined above. The settings will be used for all following test items in the manifest. If a test item includes its own preference settings, then they will override any settings for preferences of the same names that are set using default-preferences, just as later items within a line override earlier ones. A default-preferences line with no settings following it will reset the set of default preferences to be empty. As with url-prefix, default-preferences will often be used at the start of a manifest file so that it applies to all test items, but it is legal for default-preferences to appear anywhere in the manifest. A subsequent default-preferences will reset any previous default preference values and overwrite them with the specified values. This test manifest format could be used by other harnesses, such as ones that do not depend on XUL, or even ones testing other layout engines. Running Tests ============= (If you're not using a DEBUG build, first set browser.dom.window.dump.enabled to true (in about:config, in the profile you'll be using to run the tests). Create the option as a new boolean if it doesn't exist already. If you skip this step you won't get any output in the terminal.) At some point in the future there will hopefully be a cleaner way to do this. For now, go to your object directory, and run (perhaps using MOZ_NO_REMOTE=1 or the -profile option) ./firefox -reftest /path/to/srcdir/mozilla/layout/reftests/reftest.list > reftest.out and then search/grep reftest.out for "UNEXPECTED". There are two scripts provided to convert the reftest.out to HTML. clean-reftest-output.pl converts reftest.out into simple HTML, stripping lines from the log that aren't relevant. reftest-to-html.pl converts the output into html that makes it easier to visually check for failures. Testable Areas ============== This framework is capable of testing many areas of the layout engine. It is particularly well-suited to testing dynamic change handling (by comparison to the static end-result as a reference) and incremental layout (comparison of a script-interrupted layout to one that was not). However, it is also possible to write tests for many other things that can be described in terms of equivalence, for example: * CSS cascading could be tested by comparing the result of a complicated set of style rules that makes a word green to word. *