Following on from bug
MDL-7861 "Strict XHTML 1.0", I am building up a patch to fix stray ampersands, poor semantics, etc. in help HTML files (lang/en_utf8/help/*). This is what I'm tackling:
1 Well-formed XML
- Stray ampersands, regular expression /&[^a^l^g^q^n^#]/ (Not perfect, but a start). Specific searches &file, &bug, &mod.
2 Semantics
- Each */index.html should start with a <h2> </h2>, except help/index.html which should start <h1> </h1>
- Other headings, regular expression /<p><b>(.*?)</b></p>/, should be replaced variously with <h2>$1</h2>, <h3>$1</h3> as appropriate - a headache!
NOTES,
- I want to be pragmatic. - should we aim for Strict DTD for help? Would Transitional be sufficient?
- I'm ONLY fixing English language, en_utf8 - how can we systematically fix other languages.
I'm attaching version 2 of the patch (v1 was 4 April) - anyone want to review? There are still "Other headings" to fix (x100!), but I may just commit what I've done.